Page 1 of 1

PCRE - regular expressions

Posted: Tue Mar 12, 2013 3:09 pm
by SFSxOI
Before I waste my time in planning to replace some older code with some regular expressions ....

Does anyone know if the PCRE implementation of Regular Expressions in PureBasic has the multi-line matching mode of PCRE enabled or is it disabled (the default in PCRE)?

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 3:49 pm
by Little John
SFSxOI wrote:Does anyone know if the PCRE implementation of Regular Expressions in PureBasic has the multi-line matching mode of PCRE enabled or is it disabled (the default in PCRE)?
Just use the appropriate flags. See Reference manual: CreateRegularExpression()

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 4:01 pm
by SFSxOI
Thanks LittleJohn, so, the answer is what?

That it is not enabled, and requires it to be turned on via the flag ?

Or....

That it is enabled but needs the flag to use it ?

Also.... does any one know if the \x hexade­cimal digit character class is implemented? 'cause it does not seem to be working here. For example, this does not work:

Code: Select all

Procedure.b IsRepeatPat(in_str.s) 
  rex_rpt = CreateRegularExpression(#PB_Any,"^\x*(?<repeat>\x+)\k<repeat>+\x*$")
  is_rng = MatchRegularExpression(rex_rpt, in_str)
  FreeRegularExpression(rex_rpt)
  ProcedureReturn is_rng
EndProcedure

Debug IsRepeatPat("AA")

but this does work using the \w character class (not specifically for hex):

Code: Select all

Procedure.b IsRepeatPat(in_str.s) 
  rex_rpt = CreateRegularExpression(#PB_Any,"^\w*(?<repeat>\w+)\k<repeat>+\w*$")
  is_rng = MatchRegularExpression(rex_rpt, in_str)
  FreeRegularExpression(rex_rpt)
  ProcedureReturn is_rng
EndProcedure

Debug IsRepeatPat("AA")

(AA are definitely hex digits)


(the above is a simple regex for detecting repeating patterns or sequences in a string, to set the number of repeats change '(?<repeat>\w+)' to '(?<repeat>\w{n,})' where n is the number at which to start detecting e.g... if n = 3 then 'if it repeats 3 or more times. For example, detects the sequence '123' in the string "oo4679824123x10", or repeating pattern duplicates such as the oo or 123123 when they follow each other, returns True if a sequence or pattern exists)

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 4:15 pm
by Little John
SFSxOI wrote:Thanks LittleJohn, so, the answer is what?

That it is not enabled, and requires it to be turned on via the flag ?

Or....

That it is enabled but needs the flag to use it ?
Sorry, I'm not a native English speaker, and it seems that I don't understand what you are after.

If you e.g. try the example that is given in the documentation for CreateRegularExpression() with and without the flag #PB_RegularExpression_NoCase, than you can compare both results.

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 4:23 pm
by SFSxOI
LittleJohn, thanks for your replies. I tried the flags, doesn't answer the question, all the flags tell me is if that attribute is being used or not and not if the PureBasic implementation of PCRE is enabled or disabled for these things.

This is for some things for work having to do with detecting malware activity; The reason I need to know is, if multi-line is enabled (the PCRE default is disabled) but simply not used until the flag is present then I need to write extra code to detect if its being tampered with, however, if its not enabled until the flag is used then I don't need to write extra code. In other words, its potentially exploitable in one state (enabled but just not used until the flag is present) but not in the other (the PCRE default of disabled).

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 6:01 pm
by Fred
Did you tried with the #PB_RegularExpression_MultiLine flag as mentioned in the doc ?

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 6:34 pm
by Little John
@SFSxOI:
I think I understand now. Well, that's beyond my knowledge.
Fred wrote:Did you tried with the #PB_RegularExpression_MultiLine flag as mentioned in the doc ?
I've also written something like this, but that's not what he is after (so at least I'm not the only one who misunderstood the question). :-)
He wants to know some internal details about how PureBasic exactly utilizes the PCRE library:
SFSxOI wrote:This is for some things for work having to do with detecting malware activity; The reason I need to know is, if multi-line is enabled (the PCRE default is disabled) but simply not used until the flag is present then I need to write extra code to detect if its being tampered with, however, if its not enabled until the flag is used then I don't need to write extra code. In other words, its potentially exploitable in one state (enabled but just not used until the flag is present) but not in the other (the PCRE default of disabled).

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 6:37 pm
by SFSxOI
Fred, the flag has nothing to do with the question.

Using the flag does not tell me if the PCRE multi-line is enabled or not for PCRE its self, it could be enabled all the time but just not used in PB code until the flag is present, or it could be in its native default state of disabled and turned on when the flag is present. Its two different things. The default for PCRE for multi-line is that it is disabled. The flag only tells me if multiline is to be used or not in PB code, it does not tell me if the PureBasic implementation of PCRE has the multi-line enabled as its native state and its enabled all the time regardless of the flag use in PB code or not. The flag does not tell me if it has to change the native state of the PCRE multi-line support from its default disabled to enabled (turn it on or off), it only tells me if the multi-line attribute its self is to be, or is being, used in PB code of not.

So, to re-phrase this some hoping you understand, what I need to know is if the native multi-line support is enabled all the time but the attribute is just not used in PB code until the flag is present in code.

The flag is useless for telling me this because it does not tell me the native static or dynamic state of PCRE (if the default native PCRE state disabled is in force or not) its self for multi-line. I can use the flag all day long and multi-line works, but it still does not tell me if the native state of multi-line for the PureBasic PCRE implementation is enabled or not.

@LittleJohn - Thank You :)

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 6:41 pm
by Fred
PB PCRE is compiled with multiline ON, and the multiline is enabled when the flag is specified (it's all handled by PCRE).

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 6:47 pm
by SFSxOI
That's what I needed to know, thanks Fred. :)

That answer saved almost a month of testing time, man power, resources, and expenses amounting to approximately $45,000.00. That's $45,000.00 saved for my budget that I can spend on something else now.

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 6:49 pm
by Fred
What about sending half of it to me ?? :lol:

Re: PCRE - regular expressions

Posted: Tue Mar 12, 2013 6:51 pm
by SFSxOI
LoL :)

How about the \x hexade­cimal digit character class, it does not seem to be working?

edit: whoopsie, never mind the \x class. I figured it out. That was a leftover from some previous code that used another flavor of a regular expression engine. The use of the \w class is correct for this particular application because in PCRE the \w (lowercase w) class is for alphanumeric which is what hex is anyway. PCRE does have a \X (upper case X) but its for something different and its not a character class.