Regular Expressions modifiers and delimiters

Just starting out? Need help? Post your questions and find answers here.
User avatar
Kukulkan
Addict
Addict
Posts: 1396
Joined: Mon Jun 06, 2005 2:35 pm
Location: germany
Contact:

Regular Expressions modifiers and delimiters

Post by Kukulkan »

Hello,

the documentation gives me no information about the difference between PCRE syntax and PureBasic syntax.

To search case insensitive, normally I use (PCRE style):

"/test/i"

In Purebasic, this does not work. It seems I have to use

"(?i)test"

Questions:
1) is this correct? And there are no delimiters in PureBasic?
2) shouldn't this be in the documentation?

Kukulkan
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Regular Expressions modifiers and delimiters

Post by IdeasVacuum »

PB4.61 Help:
All the regular expressions supported in PCRE will be supported in PureBasic
I assume that means syntax too.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
Kukulkan
Addict
Addict
Posts: 1396
Joined: Mon Jun 06, 2005 2:35 pm
Location: germany
Contact:

Re: Regular Expressions modifiers and delimiters

Post by Kukulkan »

Kukulkan wrote: "/test/i"

In Purebasic, this does not work. It seems I have to use

"(?i)test"
The official syntax with delimiters does not work in Purebasic:

Code: Select all

If CreateRegularExpression(0, "/some/i")

  Dim Result$(0)
  
  a = ExtractRegularExpression(0, "This is for SOME test.", result$())
  
  MessageRequester("Info", "Nb strings found: "+Str(a))
  
  For k=0 To a-1
    MessageRequester("Info", Result$(k))
  Next

Else
  MessageRequester("Error", RegularExpressionError())
EndIf
It does not work until you change the RegExp to

Code: Select all

If CreateRegularExpression(0, "(?i)some")
Kukulkan
Little John
Addict
Addict
Posts: 4777
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Regular Expressions modifiers and delimiters

Post by Little John »

Kukulkan wrote:The official syntax with delimiters does not work in Purebasic:

Code: Select all

If CreateRegularExpression(0, "/some/i")
Are you sure that this is the official PCRE syntax?
http://www.pcre.org/pcre.txt wrote:PCRE_CASELESS

If this bit is set, letters in the pattern match both upper and lower
case letters. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting.
As I understand this quote, (?i) is the official PCRE syntax for this.

Regards, Little John
Little John
Addict
Addict
Posts: 4777
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Regular Expressions modifiers and delimiters

Post by Little John »

Kukulkan wrote:It does not work until you change the RegExp to

Code: Select all

If CreateRegularExpression(0, "(?i)some")
In the meantime I found out, that even this does not work correctly with special characters such as the German umlauts. The following code does not find a match (PB 4.61 on Windows XP x86, tested in ASCII mode and in Unicode mode):

Code: Select all

If CreateRegularExpression(0, "(?i)someäöü")
   Dim Result$(0)
   a = ExtractRegularExpression(0, "This is for SOMEÄÖÜ test.", result$())
   MessageRequester("Info", "Nb strings found: " + Str(a))
   For k = 0 To a-1
      MessageRequester("Info", Result$(k))
   Next
   
Else
   MessageRequester("Error", RegularExpressionError())
EndIf
So if there can be non-ASCII characters in your strings, on Windows it's better to use LCase() or UCase() instead:

Code: Select all

pattern$ = "someäöü"
search$ = "This is for SOMEÄÖÜ test."

If CreateRegularExpression(0, LCase(pattern$))
   Dim Result$(0)
   a = ExtractRegularExpression(0, LCase(search$), result$())
   MessageRequester("Info", "Nb strings found: " + Str(a))
   For k = 0 To a-1
      MessageRequester("Info", Result$(k))
   Next
   
Else
   MessageRequester("Error", RegularExpressionError())
EndIf
Using LCase() or UCase() on Linux does not help in this regard, because they can't handle special characters as well. :-(

Regards, Little John
Last edited by Little John on Tue Dec 25, 2012 10:33 am, edited 1 time in total.
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Regular Expressions modifiers and delimiters

Post by luis »

@Little John

I think PCRE for PB has been compiled with UTF-8 support but without Unicode property support:
PCRE Help wrote: In UTF-8 mode,
PCRE always understands the concept of case for characters whose values are
less than 128, so caseless matching is always possible. For characters
with higher values, the concept of case is supported if PCRE is com-
piled with Unicode property support, but not otherwise. If you want to
use caseless matching for characters 128 and above, you must ensure
that PCRE is compiled with Unicode property support as well as with UTF-8 support.
This seem to be confirmed by this code:

Code: Select all

If CreateRegularExpression(0, "\p") = 0
    Debug RegularExpressionError() 
EndIf
prints:
support for \P, \p, and \X has not been compiled
I read that to use \p, \P or \X in regular expressions, PCRE must be compiled with the SUPPORT_UTF8 and SUPPORT_UCP (Unicode properties) conditional defines.

So the message above seem to suggest the Unicode property support is disabled.

In the end to fix all this and make the caseless match works for chars with a codepoint > 128 you should type ./configure --enable-unicode-properties before running make, or something like that.

So maybe you could make a request for that if you like.

Edit: In the meantime I did it -> http://www.purebasic.fr/english/viewtop ... =3&t=51463 ;)
"Have you tried turning it off and on again ?"
A little PureBasic review
Little John
Addict
Addict
Posts: 4777
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Regular Expressions modifiers and delimiters

Post by Little John »

luis wrote:So maybe you could make a request for that if you like.

Edit: In the meantime I did it -> http://www.purebasic.fr/english/viewtop ... =3&t=51463 ;)
Thank you, Luis! ( I was too lazy. :-) )

BTW: The problem still exists in PB 5.10 Beta 1, when using the new #PB_RegularExpression_NoCase option.

Regards, Little John
Post Reply