Page 1 of 1
RegularExpression functions: Add start pos and length param
Posted: Sat Apr 13, 2019 10:16 am
by Sicro
It would be very good if the regular expression functions had a parameter for the start position.
Currently, I have to preprocess the string with
Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.
There should also be a length parameter so that the function does not have to calculate the length of the string every time it is called.
That would be better:
String length should be stored for string variables
I hope that the functions receive the string by reference and not by value ...
Re: RegularExpression functions: Add start pos parameter
Posted: Sat Apr 13, 2019 3:52 pm
by GedB
Sicro wrote:It would be very good if the regular expression functions had a parameter for the start position.
Currently, I have to preprocess the string with Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.
I hope that the functions receive the string by reference and not by value ...
Have you tried putting .{n} at the beginning of the your regular expression? This will ignore the first n characters in your string.
Sent from my iPhone using Tapatalk
Re: RegularExpression functions: Add start pos parameter
Posted: Sat Apr 13, 2019 6:56 pm
by Sicro
Thank you for your suggestion.
However, the first
n characters will then not be ignored, but will also be included in the result.
With
RegularExpressionNamedGroup() it can be bypassed:
Code: Select all
Define string$ = "Hello Bob"
Define regEx$ = ".*"
Define numberOfCharactersToIgnore = 6
If CreateRegularExpression(0, "(.{" + Str(numberOfCharactersToIgnore) + "})(?<root_group>" + regEx$ + ")")
If ExamineRegularExpression(0, string$) And NextRegularExpressionMatch(0)
Debug RegularExpressionNamedGroup(0, "root_group")
EndIf
FreeRegularExpression(0)
Else
Debug RegularExpressionError()
EndIf
Maybe you meant it that way.
Unfortunately my Lexer doesn't run faster with this variant. Apparently, the regular expression functions copy the string passed by parameter at each call instead of passing the string by reference. That slows it down enormously.
My current variant with
Mid() is therefore still the best solution, because I can also limit the string length, so that the regular expression function has less to copy each time and the whole process runs faster.
Re: RegularExpression functions: Add start pos parameter
Posted: Mon Apr 15, 2019 9:53 am
by GedB
Thanks for the detail. That was how I meant it and I was curious about the performance.
Sent from my iPhone using Tapatalk
Re: RegularExpression functions: Add start pos parameter
Posted: Mon Apr 15, 2019 11:18 am
by #NULL
Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct. I guess Mid() copies the source string first and adds some calls to strlen() where PeekS() just reads and copies the substring.
Code: Select all
If #PB_Compiler_Debugger
MessageRequester("", "disable the debugger!")
End
EndIf
OpenConsole()
t1 = 0
t2 = 0
alternatingCases = 10
maxPerCase = 100000
s1.s = Space(10000)
s2.s = ""
;PokeS(@ s1 + 4999 * SizeOf(Character), "-0123456789-")
;alternatingCases = 1
;maxPerCase = 1;50000
For n=1 To alternatingCases
; ----------------------------------
s2 = ""
t = ElapsedMilliseconds()
For i=0 To maxPerCase
s2 = Mid(s1, 5000 + 1, 10)
;PrintN(s2)
Next
t = ElapsedMilliseconds()-t
PrintN("case 1: " + t)
t1 + t
; ----------------------------------
s2 = ""
t = ElapsedMilliseconds()
For i=0 To maxPerCase
s2 = PeekS(@ s1 + 5000 * SizeOf(Character), 10)
;PrintN(s2)
Next
t = ElapsedMilliseconds()-t
PrintN("case 2: " + t)
t2 + t
; ----------------------------------
Next
PrintN("")
PrintN("case 1 total: " + t1)
PrintN("case 2 total: " + t2)
Input()
Code: Select all
case 1 total: 3777
case 2 total: 49
Re: RegularExpression functions: Add start pos parameter
Posted: Tue Apr 16, 2019 6:34 am
by Little John
#NULL wrote:Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct.
Interesting. Thanks for the idea and the test!
Code: Select all
Macro FastMid (_string_, _start_, _length_=-1)
PeekS(@ _string_ + (_start_-1)*SizeOf(Character), _length_)
EndMacro
Re: RegularExpression functions: Add start pos parameter
Posted: Thu Apr 18, 2019 1:38 pm
by RSBasic
Little John wrote:#NULL wrote:Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct.
Interesting. Thanks for the idea and the test!
Code: Select all
Macro FastMid (_string_, _start_, _length_=-1)
PeekS(@ _string_ + (_start_-1)*SizeOf(Character), _length_)
EndMacro
Nice tip

Re: RegularExpression functions: Add start pos parameter
Posted: Fri Apr 19, 2019 11:13 pm
by Sicro
Very cool, but also shocking at the same time. With
PeekS() my Lexer is 80% faster than with
Mid().
Mid() should definitely be optimized.
Thanks
#NULL for the idea to use
PeekS() instead of
Mid(), and thanks to
Little John for the macro version.
I improved the macro a bit by encapsulating the parameter
_start_ in parentheses. This ensures that mathematical operations are always processed in the correct order, even if a mathematical formula is passed:
Code: Select all
Macro FastMid(_string_, _start_, _length_=-1)
PeekS(@_string_ + ((_start_) - 1) * SizeOf(Character), _length_)
EndMacro
Re: RegularExpression functions: Add start pos parameter
Posted: Sat Apr 20, 2019 4:14 am
by BarryG
Hi, as far as I know, SizeOf(Character) is evaluated at runtime. So replacing it with a constant should make the macro a tad faster again.
Re: RegularExpression functions: Add start pos parameter
Posted: Sat Apr 20, 2019 4:43 am
by Little John
BarryG wrote:Hi, as far as I know, SizeOf(Character) is evaluated at runtime.
That assumption is wrong.
Try e.g.
Len() actually
is evaluated at runtime, and that's why this code does
not work.
The value of a constant is assigned at compile time. So from the fact that your above code works, we can draw the conclusion that
SizeOf() is evaluated at compile time, too. That's probably the reason why in the help it is mentioned in the section "Compiler Functions".

Your code works, but there is no advantage in it.