It is currently Wed Apr 24, 2019 5:05 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: RegularExpression functions: Add start pos parameter
PostPosted: Sat Apr 13, 2019 10:16 am 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Wed Jun 25, 2014 5:25 pm
Posts: 314
Location: Germany
It would be very good if the regular expression functions had a parameter for the start position.
Currently, I have to preprocess the string with Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.

I hope that the functions receive the string by reference and not by value ...

_________________
Image
Why OpenSource should have a license
PureBasic-CodeArchiv-Rebirth: Git-Repository / Download -- Any help is welcome!
Manjaro Xfce x64 (Main system) :: WindowsXP/Xubuntu x86 (VirtualBox) :: PureBasic (Linux: x86/x64, Windows: x86) :: All are up to date


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Sat Apr 13, 2019 3:52 pm 
Offline
Addict
Addict
User avatar

Joined: Fri May 16, 2003 3:47 pm
Posts: 1302
Location: England
Sicro wrote:
It would be very good if the regular expression functions had a parameter for the start position.
Currently, I have to preprocess the string with Mid() before I can pass the string to the regular expression functions, which slows down the code considerably.

I hope that the functions receive the string by reference and not by value ...


Have you tried putting .{n} at the beginning of the your regular expression? This will ignore the first n characters in your string.


Sent from my iPhone using Tapatalk


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Sat Apr 13, 2019 6:56 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Wed Jun 25, 2014 5:25 pm
Posts: 314
Location: Germany
Thank you for your suggestion.

However, the first n characters will then not be ignored, but will also be included in the result.
With RegularExpressionNamedGroup() it can be bypassed:
Code:
Define string$                    = "Hello Bob"
Define regEx$                     = ".*"
Define numberOfCharactersToIgnore = 6

If CreateRegularExpression(0, "(.{" + Str(numberOfCharactersToIgnore) + "})(?<root_group>" + regEx$ + ")")
  If ExamineRegularExpression(0, string$) And NextRegularExpressionMatch(0)
    Debug RegularExpressionNamedGroup(0, "root_group")
  EndIf
  FreeRegularExpression(0)
Else
  Debug RegularExpressionError()
EndIf
Maybe you meant it that way.
Unfortunately my Lexer doesn't run faster with this variant. Apparently, the regular expression functions copy the string passed by parameter at each call instead of passing the string by reference. That slows it down enormously.

My current variant with Mid() is therefore still the best solution, because I can also limit the string length, so that the regular expression function has less to copy each time and the whole process runs faster.

_________________
Image
Why OpenSource should have a license
PureBasic-CodeArchiv-Rebirth: Git-Repository / Download -- Any help is welcome!
Manjaro Xfce x64 (Main system) :: WindowsXP/Xubuntu x86 (VirtualBox) :: PureBasic (Linux: x86/x64, Windows: x86) :: All are up to date


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Mon Apr 15, 2019 9:53 am 
Offline
Addict
Addict
User avatar

Joined: Fri May 16, 2003 3:47 pm
Posts: 1302
Location: England
Thanks for the detail. That was how I meant it and I was curious about the performance.


Sent from my iPhone using Tapatalk


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Mon Apr 15, 2019 11:18 am 
Offline
Addict
Addict

Joined: Thu Aug 30, 2007 11:54 pm
Posts: 954
Location: right here
Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct. I guess Mid() copies the source string first and adds some calls to strlen() where PeekS() just reads and copies the substring.
Code:
If #PB_Compiler_Debugger
  MessageRequester("", "disable the debugger!")
  End
EndIf
OpenConsole()

t1 = 0
t2 = 0

alternatingCases = 10
maxPerCase = 100000

s1.s = Space(10000)
s2.s = ""

;PokeS(@ s1 + 4999 * SizeOf(Character), "-0123456789-")
;alternatingCases = 1
;maxPerCase = 1;50000

For n=1 To alternatingCases
 
  ; ----------------------------------
 
  s2 = ""
  t = ElapsedMilliseconds()
  For i=0 To maxPerCase
    s2 = Mid(s1, 5000 + 1, 10)
    ;PrintN(s2)
  Next
  t = ElapsedMilliseconds()-t
  PrintN("case 1: " + t)
  t1 + t
 
  ; ----------------------------------
 
  s2 = ""
  t = ElapsedMilliseconds()
  For i=0 To maxPerCase
    s2 = PeekS(@ s1 + 5000 * SizeOf(Character), 10)
    ;PrintN(s2)
  Next
  t = ElapsedMilliseconds()-t
  PrintN("case 2: " + t)
  t2 + t
 
  ; ----------------------------------
 
Next

PrintN("")
PrintN("case 1 total: " + t1)
PrintN("case 2 total: " + t2)
Input()


Code:
case 1 total: 3777
case 2 total: 49


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Tue Apr 16, 2019 6:34 am 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 3520
Location: Berlin, Germany
#NULL wrote:
Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct.

Interesting. Thanks for the idea and the test!

Code:
Macro FastMid (_string_, _start_, _length_=-1)
   PeekS(@ _string_ + (_start_-1)*SizeOf(Character), _length_)
EndMacro

_________________
Please excuse my flawed English. My native language is PureBasic.
Search
RSBasic's backups


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Thu Apr 18, 2019 1:38 pm 
Offline
Moderator
Moderator
User avatar

Joined: Thu Dec 31, 2009 11:05 pm
Posts: 759
Location: Berlin and Ibiza
Little John wrote:
#NULL wrote:
Maybe try PeekS() instead of Mid(), seems to be much faster if my test is correct.

Interesting. Thanks for the idea and the test!

Code:
Macro FastMid (_string_, _start_, _length_=-1)
   PeekS(@ _string_ + (_start_-1)*SizeOf(Character), _length_)
EndMacro

Nice tip Image

_________________
ImageImageImageImageImage


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Fri Apr 19, 2019 11:13 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Wed Jun 25, 2014 5:25 pm
Posts: 314
Location: Germany
Very cool, but also shocking at the same time. With PeekS() my Lexer is 80% faster than with Mid(). Mid() should definitely be optimized.

Thanks #NULL for the idea to use PeekS() instead of Mid(), and thanks to Little John for the macro version.
I improved the macro a bit by encapsulating the parameter _start_ in parentheses. This ensures that mathematical operations are always processed in the correct order, even if a mathematical formula is passed:
Code:
Macro FastMid(_string_, _start_, _length_=-1)
  PeekS(@_string_ + ((_start_) - 1) * SizeOf(Character), _length_)
EndMacro

_________________
Image
Why OpenSource should have a license
PureBasic-CodeArchiv-Rebirth: Git-Repository / Download -- Any help is welcome!
Manjaro Xfce x64 (Main system) :: WindowsXP/Xubuntu x86 (VirtualBox) :: PureBasic (Linux: x86/x64, Windows: x86) :: All are up to date


Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Sat Apr 20, 2019 4:14 am 
Offline
New User
New User

Joined: Thu Apr 18, 2019 8:17 am
Posts: 4
Hi, as far as I know, SizeOf(Character) is evaluated at runtime. So replacing it with a constant should make the macro a tad faster again.


Last edited by BarryG on Sat Apr 20, 2019 7:06 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: RegularExpression functions: Add start pos parameter
PostPosted: Sat Apr 20, 2019 4:43 am 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 3520
Location: Berlin, Germany
BarryG wrote:
Hi, as far as I know, SizeOf(Character) is evaluated at runtime.
That assumption is wrong.

Try e.g.
Code:
#Length = Len("abc")
Len() actually is evaluated at runtime, and that's why this code does not work.

The value of a constant is assigned at compile time. So from the fact that your above code works, we can draw the conclusion that SizeOf() is evaluated at compile time, too. That's probably the reason why in the help it is mentioned in the section "Compiler Functions". :-)
Your code works, but there is no advantage in it.

_________________
Please excuse my flawed English. My native language is PureBasic.
Search
RSBasic's backups


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye