Page 1 of 1
Regular Expressions modifiers and delimiters
Posted: Fri Jun 15, 2012 10:27 am
by Kukulkan
Hello,
the documentation gives me no information about the difference between PCRE syntax and PureBasic syntax.
To search case insensitive, normally I use (PCRE style):
"/test/i"
In Purebasic, this does not work. It seems I have to use
"(?i)test"
Questions:
1) is this correct? And there are no delimiters in PureBasic?
2) shouldn't this be in the documentation?
Kukulkan
Re: Regular Expressions modifiers and delimiters
Posted: Sat Jun 16, 2012 1:27 am
by IdeasVacuum
PB4.61 Help:
All the regular expressions supported in PCRE will be supported in PureBasic
I assume that means syntax too.
Re: Regular Expressions modifiers and delimiters
Posted: Sat Jun 16, 2012 12:48 pm
by Kukulkan
Kukulkan wrote:
"/test/i"
In Purebasic, this does not work. It seems I have to use
"(?i)test"
The official syntax with delimiters does not work in Purebasic:
Code: Select all
If CreateRegularExpression(0, "/some/i")
Dim Result$(0)
a = ExtractRegularExpression(0, "This is for SOME test.", result$())
MessageRequester("Info", "Nb strings found: "+Str(a))
For k=0 To a-1
MessageRequester("Info", Result$(k))
Next
Else
MessageRequester("Error", RegularExpressionError())
EndIf
It does not work until you change the RegExp to
Code: Select all
If CreateRegularExpression(0, "(?i)some")
Kukulkan
Re: Regular Expressions modifiers and delimiters
Posted: Sat Jun 16, 2012 2:17 pm
by Little John
Kukulkan wrote:The official syntax with delimiters does not work in Purebasic:
Code: Select all
If CreateRegularExpression(0, "/some/i")
Are you sure that this is the official PCRE syntax?
http://www.pcre.org/pcre.txt wrote:PCRE_CASELESS
If this bit is set, letters in the pattern match both upper and lower
case letters. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting.
As I understand this quote,
(?i) is the official PCRE syntax for this.
Regards, Little John
Re: Regular Expressions modifiers and delimiters
Posted: Sun Jun 17, 2012 11:02 am
by Little John
Kukulkan wrote:It does not work until you change the RegExp to
Code: Select all
If CreateRegularExpression(0, "(?i)some")
In the meantime I found out, that even this
does not work correctly with special characters such as the German umlauts. The following code does not find a match (PB 4.61 on Windows XP x86, tested in ASCII mode and in Unicode mode):
Code: Select all
If CreateRegularExpression(0, "(?i)someäöü")
Dim Result$(0)
a = ExtractRegularExpression(0, "This is for SOMEÄÖÜ test.", result$())
MessageRequester("Info", "Nb strings found: " + Str(a))
For k = 0 To a-1
MessageRequester("Info", Result$(k))
Next
Else
MessageRequester("Error", RegularExpressionError())
EndIf
So if there can be non-ASCII characters in your strings,
on Windows it's better to use LCase() or UCase() instead:
Code: Select all
pattern$ = "someäöü"
search$ = "This is for SOMEÄÖÜ test."
If CreateRegularExpression(0, LCase(pattern$))
Dim Result$(0)
a = ExtractRegularExpression(0, LCase(search$), result$())
MessageRequester("Info", "Nb strings found: " + Str(a))
For k = 0 To a-1
MessageRequester("Info", Result$(k))
Next
Else
MessageRequester("Error", RegularExpressionError())
EndIf
Using LCase() or UCase()
on Linux does not help in this regard, because they
can't handle special characters as well.
Regards, Little John
Re: Regular Expressions modifiers and delimiters
Posted: Sat Sep 01, 2012 7:30 pm
by luis
@Little John
I think PCRE for PB has been compiled with UTF-8 support but without Unicode property support:
PCRE Help wrote:
In UTF-8 mode,
PCRE always understands the concept of case for characters whose values are
less than 128, so caseless matching is always possible. For characters
with higher values, the concept of case is supported if PCRE is com-
piled with Unicode property support, but not otherwise. If you want to
use caseless matching for characters 128 and above, you must ensure
that PCRE is compiled with Unicode property support as well as with UTF-8 support.
This seem to be confirmed by this code:
Code: Select all
If CreateRegularExpression(0, "\p") = 0
Debug RegularExpressionError()
EndIf
prints:
support for \P, \p, and \X has not been compiled
I read that to use \p, \P or \X in regular expressions, PCRE must be compiled with the SUPPORT_UTF8 and SUPPORT_UCP (Unicode properties) conditional defines.
So the message above seem to suggest the Unicode property support is disabled.
In the end to fix all this and make the caseless match works for chars with a codepoint > 128 you should type ./configure --enable-unicode-properties before running make, or something like that.
So maybe you could make a request for that if you like.
Edit: In the meantime I did it ->
http://www.purebasic.fr/english/viewtop ... =3&t=51463 
Re: Regular Expressions modifiers and delimiters
Posted: Mon Dec 24, 2012 4:22 pm
by Little John
Thank you, Luis! ( I was too lazy.

)
BTW: The problem
still exists in PB 5.10 Beta 1, when using the new
#PB_RegularExpression_NoCase option.
Regards, Little John