RegularExpression functions: Add start pos and length param
RegularExpression functions: Add start pos and length param
It would be very good if the regular expression functions had a parameter for the start position.
Currently, I have to preprocess the string with Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.
There should also be a length parameter so that the function does not have to calculate the length of the string every time it is called.
That would be better: String length should be stored for string variables
I hope that the functions receive the string by reference and not by value ...
Currently, I have to preprocess the string with Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.
There should also be a length parameter so that the function does not have to calculate the length of the string every time it is called.
That would be better: String length should be stored for string variables
I hope that the functions receive the string by reference and not by value ...
Last edited by Sicro on Sat Jul 18, 2020 1:29 pm, edited 2 times in total.
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegularExpression functions: Add start pos parameter
Have you tried putting .{n} at the beginning of the your regular expression? This will ignore the first n characters in your string.Sicro wrote:It would be very good if the regular expression functions had a parameter for the start position.
Currently, I have to preprocess the string with Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.
I hope that the functions receive the string by reference and not by value ...
Sent from my iPhone using Tapatalk
Re: RegularExpression functions: Add start pos parameter
Thank you for your suggestion.
However, the first n characters will then not be ignored, but will also be included in the result.
With RegularExpressionNamedGroup() it can be bypassed:Maybe you meant it that way.
Unfortunately my Lexer doesn't run faster with this variant. Apparently, the regular expression functions copy the string passed by parameter at each call instead of passing the string by reference. That slows it down enormously.
My current variant with Mid() is therefore still the best solution, because I can also limit the string length, so that the regular expression function has less to copy each time and the whole process runs faster.
However, the first n characters will then not be ignored, but will also be included in the result.
With RegularExpressionNamedGroup() it can be bypassed:
Code: Select all
Define string$ = "Hello Bob"
Define regEx$ = ".*"
Define numberOfCharactersToIgnore = 6
If CreateRegularExpression(0, "(.{" + Str(numberOfCharactersToIgnore) + "})(?<root_group>" + regEx$ + ")")
If ExamineRegularExpression(0, string$) And NextRegularExpressionMatch(0)
Debug RegularExpressionNamedGroup(0, "root_group")
EndIf
FreeRegularExpression(0)
Else
Debug RegularExpressionError()
EndIf
Unfortunately my Lexer doesn't run faster with this variant. Apparently, the regular expression functions copy the string passed by parameter at each call instead of passing the string by reference. That slows it down enormously.
My current variant with Mid() is therefore still the best solution, because I can also limit the string length, so that the regular expression function has less to copy each time and the whole process runs faster.
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegularExpression functions: Add start pos parameter
Thanks for the detail. That was how I meant it and I was curious about the performance.
Sent from my iPhone using Tapatalk
Sent from my iPhone using Tapatalk
Re: RegularExpression functions: Add start pos parameter
Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct. I guess Mid() copies the source string first and adds some calls to strlen() where PeekS() just reads and copies the substring.
Code: Select all
If #PB_Compiler_Debugger
MessageRequester("", "disable the debugger!")
End
EndIf
OpenConsole()
t1 = 0
t2 = 0
alternatingCases = 10
maxPerCase = 100000
s1.s = Space(10000)
s2.s = ""
;PokeS(@ s1 + 4999 * SizeOf(Character), "-0123456789-")
;alternatingCases = 1
;maxPerCase = 1;50000
For n=1 To alternatingCases
; ----------------------------------
s2 = ""
t = ElapsedMilliseconds()
For i=0 To maxPerCase
s2 = Mid(s1, 5000 + 1, 10)
;PrintN(s2)
Next
t = ElapsedMilliseconds()-t
PrintN("case 1: " + t)
t1 + t
; ----------------------------------
s2 = ""
t = ElapsedMilliseconds()
For i=0 To maxPerCase
s2 = PeekS(@ s1 + 5000 * SizeOf(Character), 10)
;PrintN(s2)
Next
t = ElapsedMilliseconds()-t
PrintN("case 2: " + t)
t2 + t
; ----------------------------------
Next
PrintN("")
PrintN("case 1 total: " + t1)
PrintN("case 2 total: " + t2)
Input()
Code: Select all
case 1 total: 3777
case 2 total: 49
-
- Addict
- Posts: 4519
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: RegularExpression functions: Add start pos parameter
Interesting. Thanks for the idea and the test!#NULL wrote:Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct.
Code: Select all
Macro FastMid (_string_, _start_, _length_=-1)
PeekS(@ _string_ + (_start_-1)*SizeOf(Character), _length_)
EndMacro
- RSBasic
- Moderator
- Posts: 1218
- Joined: Thu Dec 31, 2009 11:05 pm
- Location: Gernsbach (Germany)
- Contact:
Re: RegularExpression functions: Add start pos parameter
Nice tipLittle John wrote:Interesting. Thanks for the idea and the test!#NULL wrote:Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct.
Code: Select all
Macro FastMid (_string_, _start_, _length_=-1) PeekS(@ _string_ + (_start_-1)*SizeOf(Character), _length_) EndMacro
Re: RegularExpression functions: Add start pos parameter
Very cool, but also shocking at the same time. With PeekS() my Lexer is 80% faster than with Mid(). Mid() should definitely be optimized.
Thanks #NULL for the idea to use PeekS() instead of Mid(), and thanks to Little John for the macro version.
I improved the macro a bit by encapsulating the parameter _start_ in parentheses. This ensures that mathematical operations are always processed in the correct order, even if a mathematical formula is passed:
Thanks #NULL for the idea to use PeekS() instead of Mid(), and thanks to Little John for the macro version.
I improved the macro a bit by encapsulating the parameter _start_ in parentheses. This ensures that mathematical operations are always processed in the correct order, even if a mathematical formula is passed:
Code: Select all
Macro FastMid(_string_, _start_, _length_=-1)
PeekS(@_string_ + ((_start_) - 1) * SizeOf(Character), _length_)
EndMacro
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegularExpression functions: Add start pos parameter
Hi, as far as I know, SizeOf(Character) is evaluated at runtime. So replacing it with a constant should make the macro a tad faster again.
Last edited by BarryG on Sat Apr 20, 2019 7:06 am, edited 1 time in total.
-
- Addict
- Posts: 4519
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: RegularExpression functions: Add start pos parameter
That assumption is wrong.BarryG wrote:Hi, as far as I know, SizeOf(Character) is evaluated at runtime.
Try e.g.
Code: Select all
#Length = Len("abc")
The value of a constant is assigned at compile time. So from the fact that your above code works, we can draw the conclusion that SizeOf() is evaluated at compile time, too. That's probably the reason why in the help it is mentioned in the section "Compiler Functions".
Your code works, but there is no advantage in it.