Hello,
the documentation gives me no information about the difference between PCRE syntax and PureBasic syntax.
To search case insensitive, normally I use (PCRE style):
"/test/i"
In Purebasic, this does not work. It seems I have to use
"(?i)test"
Questions:
1) is this correct? And there are no delimiters in PureBasic?
2) shouldn't this be in the documentation?
Kukulkan
Regular Expressions modifiers and delimiters
-
- Always Here
- Posts: 6426
- Joined: Fri Oct 23, 2009 2:33 am
- Location: Wales, UK
- Contact:
Re: Regular Expressions modifiers and delimiters
PB4.61 Help:
I assume that means syntax too.All the regular expressions supported in PCRE will be supported in PureBasic
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
If it sounds simple, you have not grasped the complexity.
Re: Regular Expressions modifiers and delimiters
The official syntax with delimiters does not work in Purebasic:Kukulkan wrote: "/test/i"
In Purebasic, this does not work. It seems I have to use
"(?i)test"
Code: Select all
If CreateRegularExpression(0, "/some/i")
Dim Result$(0)
a = ExtractRegularExpression(0, "This is for SOME test.", result$())
MessageRequester("Info", "Nb strings found: "+Str(a))
For k=0 To a-1
MessageRequester("Info", Result$(k))
Next
Else
MessageRequester("Error", RegularExpressionError())
EndIf
Code: Select all
If CreateRegularExpression(0, "(?i)some")
-
- Addict
- Posts: 4777
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: Regular Expressions modifiers and delimiters
Are you sure that this is the official PCRE syntax?Kukulkan wrote:The official syntax with delimiters does not work in Purebasic:Code: Select all
If CreateRegularExpression(0, "/some/i")
As I understand this quote, (?i) is the official PCRE syntax for this.http://www.pcre.org/pcre.txt wrote:PCRE_CASELESS
If this bit is set, letters in the pattern match both upper and lower
case letters. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting.
Regards, Little John
-
- Addict
- Posts: 4777
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: Regular Expressions modifiers and delimiters
In the meantime I found out, that even this does not work correctly with special characters such as the German umlauts. The following code does not find a match (PB 4.61 on Windows XP x86, tested in ASCII mode and in Unicode mode):Kukulkan wrote:It does not work until you change the RegExp toCode: Select all
If CreateRegularExpression(0, "(?i)some")
Code: Select all
If CreateRegularExpression(0, "(?i)someäöü")
Dim Result$(0)
a = ExtractRegularExpression(0, "This is for SOMEÄÖÜ test.", result$())
MessageRequester("Info", "Nb strings found: " + Str(a))
For k = 0 To a-1
MessageRequester("Info", Result$(k))
Next
Else
MessageRequester("Error", RegularExpressionError())
EndIf
Code: Select all
pattern$ = "someäöü"
search$ = "This is for SOMEÄÖÜ test."
If CreateRegularExpression(0, LCase(pattern$))
Dim Result$(0)
a = ExtractRegularExpression(0, LCase(search$), result$())
MessageRequester("Info", "Nb strings found: " + Str(a))
For k = 0 To a-1
MessageRequester("Info", Result$(k))
Next
Else
MessageRequester("Error", RegularExpressionError())
EndIf

Regards, Little John
Last edited by Little John on Tue Dec 25, 2012 10:33 am, edited 1 time in total.
Re: Regular Expressions modifiers and delimiters
@Little John
I think PCRE for PB has been compiled with UTF-8 support but without Unicode property support:
prints:
So the message above seem to suggest the Unicode property support is disabled.
In the end to fix all this and make the caseless match works for chars with a codepoint > 128 you should type ./configure --enable-unicode-properties before running make, or something like that.
So maybe you could make a request for that if you like.
Edit: In the meantime I did it -> http://www.purebasic.fr/english/viewtop ... =3&t=51463
I think PCRE for PB has been compiled with UTF-8 support but without Unicode property support:
This seem to be confirmed by this code:PCRE Help wrote: In UTF-8 mode,
PCRE always understands the concept of case for characters whose values are
less than 128, so caseless matching is always possible. For characters
with higher values, the concept of case is supported if PCRE is com-
piled with Unicode property support, but not otherwise. If you want to
use caseless matching for characters 128 and above, you must ensure
that PCRE is compiled with Unicode property support as well as with UTF-8 support.
Code: Select all
If CreateRegularExpression(0, "\p") = 0
Debug RegularExpressionError()
EndIf
I read that to use \p, \P or \X in regular expressions, PCRE must be compiled with the SUPPORT_UTF8 and SUPPORT_UCP (Unicode properties) conditional defines.support for \P, \p, and \X has not been compiled
So the message above seem to suggest the Unicode property support is disabled.
In the end to fix all this and make the caseless match works for chars with a codepoint > 128 you should type ./configure --enable-unicode-properties before running make, or something like that.
So maybe you could make a request for that if you like.
Edit: In the meantime I did it -> http://www.purebasic.fr/english/viewtop ... =3&t=51463

"Have you tried turning it off and on again ?"
A little PureBasic review
A little PureBasic review
-
- Addict
- Posts: 4777
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: Regular Expressions modifiers and delimiters
Thank you, Luis! ( I was too lazy.luis wrote:So maybe you could make a request for that if you like.
Edit: In the meantime I did it -> http://www.purebasic.fr/english/viewtop ... =3&t=51463

BTW: The problem still exists in PB 5.10 Beta 1, when using the new #PB_RegularExpression_NoCase option.
Regards, Little John