Page 1 of 1

RegularExpression functions: Add start pos and length param

Posted: Sat Apr 13, 2019 10:16 am
by Sicro
It would be very good if the regular expression functions had a parameter for the start position.
Currently, I have to preprocess the string with Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.

There should also be a length parameter so that the function does not have to calculate the length of the string every time it is called.
That would be better: String length should be stored for string variables

I hope that the functions receive the string by reference and not by value ...

Re: RegularExpression functions: Add start pos parameter

Posted: Sat Apr 13, 2019 3:52 pm
by GedB
Sicro wrote:It would be very good if the regular expression functions had a parameter for the start position.
Currently, I have to preprocess the string with Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.

I hope that the functions receive the string by reference and not by value ...
Have you tried putting .{n} at the beginning of the your regular expression? This will ignore the first n characters in your string.


Sent from my iPhone using Tapatalk

Re: RegularExpression functions: Add start pos parameter

Posted: Sat Apr 13, 2019 6:56 pm
by Sicro
Thank you for your suggestion.

However, the first n characters will then not be ignored, but will also be included in the result.
With RegularExpressionNamedGroup() it can be bypassed:

Code: Select all

Define string$                    = "Hello Bob"
Define regEx$                     = ".*"
Define numberOfCharactersToIgnore = 6

If CreateRegularExpression(0, "(.{" + Str(numberOfCharactersToIgnore) + "})(?<root_group>" + regEx$ + ")")
  If ExamineRegularExpression(0, string$) And NextRegularExpressionMatch(0)
    Debug RegularExpressionNamedGroup(0, "root_group")
  EndIf
  FreeRegularExpression(0)
Else
  Debug RegularExpressionError()
EndIf
Maybe you meant it that way.
Unfortunately my Lexer doesn't run faster with this variant. Apparently, the regular expression functions copy the string passed by parameter at each call instead of passing the string by reference. That slows it down enormously.

My current variant with Mid() is therefore still the best solution, because I can also limit the string length, so that the regular expression function has less to copy each time and the whole process runs faster.

Re: RegularExpression functions: Add start pos parameter

Posted: Mon Apr 15, 2019 9:53 am
by GedB
Thanks for the detail. That was how I meant it and I was curious about the performance.


Sent from my iPhone using Tapatalk

Re: RegularExpression functions: Add start pos parameter

Posted: Mon Apr 15, 2019 11:18 am
by #NULL
Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct. I guess Mid() copies the source string first and adds some calls to strlen() where PeekS() just reads and copies the substring.

Code: Select all

If #PB_Compiler_Debugger
  MessageRequester("", "disable the debugger!")
  End
EndIf
OpenConsole()

t1 = 0
t2 = 0

alternatingCases = 10
maxPerCase = 100000

s1.s = Space(10000)
s2.s = ""

;PokeS(@ s1 + 4999 * SizeOf(Character), "-0123456789-")
;alternatingCases = 1
;maxPerCase = 1;50000

For n=1 To alternatingCases
  
  ; ----------------------------------
  
  s2 = ""
  t = ElapsedMilliseconds()
  For i=0 To maxPerCase
    s2 = Mid(s1, 5000 + 1, 10)
    ;PrintN(s2)
  Next
  t = ElapsedMilliseconds()-t
  PrintN("case 1: " + t)
  t1 + t
  
  ; ----------------------------------
  
  s2 = ""
  t = ElapsedMilliseconds()
  For i=0 To maxPerCase
    s2 = PeekS(@ s1 + 5000 * SizeOf(Character), 10)
    ;PrintN(s2)
  Next
  t = ElapsedMilliseconds()-t
  PrintN("case 2: " + t)
  t2 + t
  
  ; ----------------------------------
  
Next

PrintN("")
PrintN("case 1 total: " + t1)
PrintN("case 2 total: " + t2)
Input()

Code: Select all

case 1 total: 3777
case 2 total: 49

Re: RegularExpression functions: Add start pos parameter

Posted: Tue Apr 16, 2019 6:34 am
by Little John
#NULL wrote:Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct.
Interesting. Thanks for the idea and the test!

Code: Select all

Macro FastMid (_string_, _start_, _length_=-1)
   PeekS(@ _string_ + (_start_-1)*SizeOf(Character), _length_)
EndMacro

Re: RegularExpression functions: Add start pos parameter

Posted: Thu Apr 18, 2019 1:38 pm
by RSBasic
Little John wrote:
#NULL wrote:Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct.
Interesting. Thanks for the idea and the test!

Code: Select all

Macro FastMid (_string_, _start_, _length_=-1)
   PeekS(@ _string_ + (_start_-1)*SizeOf(Character), _length_)
EndMacro
Nice tip Image

Re: RegularExpression functions: Add start pos parameter

Posted: Fri Apr 19, 2019 11:13 pm
by Sicro
Very cool, but also shocking at the same time. With PeekS() my Lexer is 80% faster than with Mid(). Mid() should definitely be optimized.

Thanks #NULL for the idea to use PeekS() instead of Mid(), and thanks to Little John for the macro version.
I improved the macro a bit by encapsulating the parameter _start_ in parentheses. This ensures that mathematical operations are always processed in the correct order, even if a mathematical formula is passed:

Code: Select all

Macro FastMid(_string_, _start_, _length_=-1)
  PeekS(@_string_ + ((_start_) - 1) * SizeOf(Character), _length_)
EndMacro

Re: RegularExpression functions: Add start pos parameter

Posted: Sat Apr 20, 2019 4:14 am
by BarryG
Hi, as far as I know, SizeOf(Character) is evaluated at runtime. So replacing it with a constant should make the macro a tad faster again.

Re: RegularExpression functions: Add start pos parameter

Posted: Sat Apr 20, 2019 4:43 am
by Little John
BarryG wrote:Hi, as far as I know, SizeOf(Character) is evaluated at runtime.
That assumption is wrong.

Try e.g.

Code: Select all

#Length = Len("abc")
Len() actually is evaluated at runtime, and that's why this code does not work.

The value of a constant is assigned at compile time. So from the fact that your above code works, we can draw the conclusion that SizeOf() is evaluated at compile time, too. That's probably the reason why in the help it is mentioned in the section "Compiler Functions". :-)
Your code works, but there is no advantage in it.