It is currently Sun May 26, 2013 6:19 am

All times are UTC + 1 hour




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: Regular Expressions modifiers and delimiters
PostPosted: Fri Jun 15, 2012 10:27 am 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Mon Jun 06, 2005 2:35 pm
Posts: 578
Location: germany
Hello,

the documentation gives me no information about the difference between PCRE syntax and PureBasic syntax.

To search case insensitive, normally I use (PCRE style):

"/test/i"

In Purebasic, this does not work. It seems I have to use

"(?i)test"

Questions:
1) is this correct? And there are no delimiters in PureBasic?
2) shouldn't this be in the documentation?

Kukulkan

_________________
When somebody says "Expect the unexpected" slap them in the face and say" You didn’t expect that, did you?"


Top
 Profile  
 
 Post subject: Re: Regular Expressions modifiers and delimiters
PostPosted: Sat Jun 16, 2012 1:27 am 
Offline
Addict
Addict

Joined: Fri Oct 23, 2009 2:33 am
Posts: 2865
Location: Wales, UK
PB4.61 Help:
Quote:
All the regular expressions supported in PCRE will be supported in PureBasic
I assume that means syntax too.

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.


Top
 Profile  
 
 Post subject: Re: Regular Expressions modifiers and delimiters
PostPosted: Sat Jun 16, 2012 12:48 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Mon Jun 06, 2005 2:35 pm
Posts: 578
Location: germany
Kukulkan wrote:
"/test/i"

In Purebasic, this does not work. It seems I have to use

"(?i)test"

The official syntax with delimiters does not work in Purebasic:
Code:
If CreateRegularExpression(0, "/some/i")

  Dim Result$(0)
 
  a = ExtractRegularExpression(0, "This is for SOME test.", result$())
 
  MessageRequester("Info", "Nb strings found: "+Str(a))
 
  For k=0 To a-1
    MessageRequester("Info", Result$(k))
  Next

Else
  MessageRequester("Error", RegularExpressionError())
EndIf

It does not work until you change the RegExp to
Code:
If CreateRegularExpression(0, "(?i)some")


Kukulkan

_________________
When somebody says "Expect the unexpected" slap them in the face and say" You didn’t expect that, did you?"


Top
 Profile  
 
 Post subject: Re: Regular Expressions modifiers and delimiters
PostPosted: Sat Jun 16, 2012 2:17 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 1575
Location: Berlin, Germany
Kukulkan wrote:
The official syntax with delimiters does not work in Purebasic:
Code:
If CreateRegularExpression(0, "/some/i")

Are you sure that this is the official PCRE syntax?

PCRE_CASELESS

If this bit is set, letters in the pattern match both upper and lower
case letters. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting.

As I understand this quote, (?i) is the official PCRE syntax for this.

Regards, Little John

_________________
Math problems?
Call 1-800-[(10x)(13i)^2]-[sin(xy)/2.362x].


Top
 Profile  
 
 Post subject: Re: Regular Expressions modifiers and delimiters
PostPosted: Sun Jun 17, 2012 11:02 am 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 1575
Location: Berlin, Germany
Kukulkan wrote:
It does not work until you change the RegExp to
Code:
If CreateRegularExpression(0, "(?i)some")

In the meantime I found out, that even this does not work correctly with special characters such as the German umlauts. The following code does not find a match (PB 4.61 on Windows XP x86, tested in ASCII mode and in Unicode mode):
Code:
If CreateRegularExpression(0, "(?i)someäöü")
   Dim Result$(0)
   a = ExtractRegularExpression(0, "This is for SOMEÄÖÜ test.", result$())
   MessageRequester("Info", "Nb strings found: " + Str(a))
   For k = 0 To a-1
      MessageRequester("Info", Result$(k))
   Next
   
Else
   MessageRequester("Error", RegularExpressionError())
EndIf

So if there can be non-ASCII characters in your strings, on Windows it's better to use LCase() or UCase() instead:
Code:
pattern$ = "someäöü"
search$ = "This is for SOMEÄÖÜ test."

If CreateRegularExpression(0, LCase(pattern$))
   Dim Result$(0)
   a = ExtractRegularExpression(0, LCase(search$), result$())
   MessageRequester("Info", "Nb strings found: " + Str(a))
   For k = 0 To a-1
      MessageRequester("Info", Result$(k))
   Next
   
Else
   MessageRequester("Error", RegularExpressionError())
EndIf

Using LCase() or UCase() on Linux does not help in this regard, because they can't handle special characters as well. :-(

Regards, Little John

_________________
Math problems?
Call 1-800-[(10x)(13i)^2]-[sin(xy)/2.362x].


Last edited by Little John on Tue Dec 25, 2012 10:33 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Regular Expressions modifiers and delimiters
PostPosted: Sat Sep 01, 2012 7:30 pm 
Offline
Addict
Addict
User avatar

Joined: Wed Aug 31, 2005 11:09 pm
Posts: 2242
Location: Italy
@Little John

I think PCRE for PB has been compiled with UTF-8 support but without Unicode property support:

PCRE Help wrote:
In UTF-8 mode,
PCRE always understands the concept of case for characters whose values are
less than 128, so caseless matching is always possible. For characters
with higher values, the concept of case is supported if PCRE is com-
piled with Unicode property support, but not otherwise. If you want to
use caseless matching for characters 128 and above, you must ensure
that PCRE is compiled with Unicode property support as well as with UTF-8 support.


This seem to be confirmed by this code:

Code:
If CreateRegularExpression(0, "\p") = 0
    Debug RegularExpressionError()
EndIf


prints:

Quote:
support for \P, \p, and \X has not been compiled


I read that to use \p, \P or \X in regular expressions, PCRE must be compiled with the SUPPORT_UTF8 and SUPPORT_UCP (Unicode properties) conditional defines.

So the message above seem to suggest the Unicode property support is disabled.

In the end to fix all this and make the caseless match works for chars with a codepoint > 128 you should type ./configure --enable-unicode-properties before running make, or something like that.

So maybe you could make a request for that if you like.

Edit: In the meantime I did it -> viewtopic.php?f=3&t=51463 ;)

_________________
[ Home ] [ My PC ] [ New to PB ? ]


Top
 Profile  
 
 Post subject: Re: Regular Expressions modifiers and delimiters
PostPosted: Mon Dec 24, 2012 4:22 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 1575
Location: Berlin, Germany
luis wrote:
So maybe you could make a request for that if you like.

Edit: In the meantime I did it -> viewtopic.php?f=3&t=51463 ;)

Thank you, Luis! ( I was too lazy. :-) )

BTW: The problem still exists in PB 5.10 Beta 1, when using the new #PB_RegularExpression_NoCase option.

Regards, Little John

_________________
Math problems?
Call 1-800-[(10x)(13i)^2]-[sin(xy)/2.362x].


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye