Page 1 of 1
Is findstring really slow?
Posted: Wed Apr 23, 2025 2:26 am
by Randy Walker
Obvious;y as seen at the bottom of this code snippit, 'countstring" is the simple way to get a word count.
But recently I saw someone complain that "findstring" was slow, so I wrote this simple routine to count occurrences of a word to see just how slow it really was. Seems quite fast to me, or maybe its just the clock speed of my CPU that gives it speed. I let you be the judge.
I used The constitution of the US as a text source for my word search.
I just right clicked this link and used the [Save link as] option to download the Titan Bug.txt file :
https://www.gutenberg.org/cache/epub/5/pg5.txt
Code: Select all
; text source: https://www.gutenberg.org/cache/epub/5/pg5.txt
Debug FileSize("D:\TEMP\TEMP\Titan Bug.txt")
search$ = "free"
ReadFile(0,"D:\TEMP\TEMP\Titan Bug.txt",#PB_File_SharedRead | #PB_Ascii)
Buffer.s = ReadString(o,#PB_File_IgnoreEOL)
CloseFile(0)
time = ElapsedMilliseconds()
L = Len(buffer)
Debug L
P = 1
While P < L
P = FindString(buffer,search$,P+1)
If P
C +1
Else
Break
EndIf
Wend
Debug C
Debug ElapsedMilliseconds() - time
time = ElapsedMilliseconds()
Debug CountString(buffer,search$)
Debug ElapsedMilliseconds() - time
I count zero milliseconds to be pretty fast.
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 4:02 am
by idle
No it's not slow it's just a wrapper to the c strstr function which it's fine for small strings it's a naive search which isn't optimized.
If you want faster search you can often use a boyer moore which is pretty much the gold standard for string search functions but the assumption is that you're searching for repeated occurrences of the search term and that your search term is multiple characters long. The search speed gets faster as the length of the search term increases as it doesn't' need to compare each character of the string and so it will often jump through the search string by the length of the search term when there's no partial match. but it has a slow start since it has to build the skip table with each change of the search term or specifically when the pos=0
there are also SSE simd parallel searches that will beat boyer moore but as a generic function a BMSearch is hard to beat.
This should really start to shine when your search term is more than 4 characters long and it reports the byte position in the memory so you can use peeks to extract the string
Code: Select all
Structure ara
a.a[0]
EndStructure
Procedure BMSearch(*pinput,inlen,*pat.ara,palen,pos=0)
;booyer moore search
Protected i,t,len,skip,*input, *pa.Ascii,*pb.Ascii
Structure ST
a.a[256]
EndStructure
Static skiptable.ST
inlen-pos
*input = *pinput+pos
len = inlen - palen
If pos = 0
For i = 0 To 255
Skiptable\a[i] = 255;
Next
t= palen-1
For i = 0 To t
skiptable\a[*pat\a[i]] = i
Next
EndIf
i=0
skip=0
While skip <= len
i = palen - 1;
*pa = (*input + skip + i)
*pb = *pat+i
While (*pb\a = *pa\a)
i-1
*pa - 1
*pb - 1
Wend
If i > 0
t = i - Skiptable\a[*pa\a]
If t > 1
skip + t
Else
skip + 1
EndIf
Else
ProcedureReturn skip + pos
EndIf
Wend
ProcedureReturn 0
EndProcedure
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 5:39 am
by AZJIO
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 6:15 am
by wilbert
Randy Walker wrote: Wed Apr 23, 2025 2:26 am
But recently I saw someone complain that "findstring" was slow
It depends on what you need it for.
It is especially slower as what you can achieve if you need to do multiple searches on a long string.
In the example code you posted you start FindString at a certain position but all characters prior to that position have to be scanned each time to see if there is no string termination.
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 8:23 am
by BarryG
Randy Walker wrote: Wed Apr 23, 2025 2:26 amrecently I saw someone complain that "findstring" was slow
Was that an old post? Because it
used to be slow in older versions of PureBasic, but this was fixed some time ago. It can find a match at the end of a huge 100 MB string in less than half a second here on my old PC. Hardly what I'd call "slow".
Code: Select all
Debug FindString(Space(100*1048756)+"z","z")
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 8:59 am
by NicTheQuick
As long as you test it with an enabled debugger your code will always be slowed down by it drastically.
Don't compare speeds with an enabled debugger!
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 9:43 am
by NicTheQuick
I rewrote the code to make it usable with disabled debugger. I tested it with the `pg5.txt` file.
Even with the debugger it only takes 0 ms. We need a way bigger text file to create a reasonable test case.
Besides of that, idle already posted a code with the
Boyer–Moore string-search algorithm if I see that right. It has a performance of O(m + n) when m and n are the lengths of the input text and search pattern. This algorithm especially makes sense if you want to search for the same pattern over and over.
Here's the code:
Code: Select all
; text source: https://www.gutenberg.org/cache/epub/5/pg5.txt
EnableExplicit
Define file.s, buffer.s, search.s
file.s = "/home/nicolas/tmp/purebasic/pg5.txt"
search = "free"
Debug "Filesize: " + StrU(FileSize(file))
If Not ReadFile(0, file,#PB_File_SharedRead | #PB_Ascii)
DebuggerError("Could not open file: " + file)
End 1
EndIf
buffer.s = ReadString(0, #PB_File_IgnoreEOL)
CloseFile(0)
Define time1.i, time2.i, L.i, P.i, C.i
time1 = ElapsedMilliseconds()
L = Len(buffer)
Debug "String legngth: " + StrU(L)
P = 1
While P < L
P = FindString(buffer, search, P + 1)
If P
C + 1
Else
Break
EndIf
Wend
Debug "Occurences: " + StrU(C)
time1 = ElapsedMilliseconds() - time1
time2 = ElapsedMilliseconds()
C = CountString(buffer, search)
time2 = ElapsedMilliseconds() - time2
MessageRequester("Times", "FindString: " + time1 + " ms" + #LF$ + "CountString: " + time2 + " ms")
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 10:29 am
by Axolotl
I haven't had any speed problems yet. If possible the use of optional StartPosition is a speed-booster.
Depending on what you have in mind, there are other algorithms that all have their strengths and weaknesses ...... but the professionals have already started to explain this.
BTW: I like this pages for learning stuff. (However, you need to be able to read and understand C/C++, Java, Python or similar.)
Maybe it helps a bit (if you want to move in that direction.)
Tutorial to learn Data Structures and Algorithms
Or as Overview and Introduction:
Searching Algorithms
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 11:21 am
by Little John
I have some books about data structures and algorithms. The website mentioned above is apparently also a good source of information on the subject. Many thanks for the link!
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 2:17 pm
by Quin
I don't think it's FindString() that's truly slow, as others have mentioned here it's not the fastest string algorithm out there but it's not too bad either. What is slow, at least as far as I recall last time I tested in I believe PB 6.10, StringField is horrifyingly slow.
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 2:24 pm
by Little John
Quin wrote: Wed Apr 23, 2025 2:17 pm
What is slow, at least as far as I recall last time I tested in I believe PB 6.10, StringField is horrifyingly slow.
The reason is, that StringField() uses
Shlemiel the painter’s algorithm, see
freak's corresponding message.
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 7:33 pm
by Randy Walker
BarryG wrote: Wed Apr 23, 2025 8:23 am
Randy Walker wrote: Wed Apr 23, 2025 2:26 amrecently I saw someone complain that "findstring" was slow
Was that an old post?
It was afairly recent pos. I don'rt know where exactly -- have to go back and review all the threads I've posted in.
... It can find a match at the end of a huge 100 MB string in less than half a second here on my old PC. Hardly what I'd call "slow".
Code: Select all
Debug FindString(Space(100*1048756)+"z","z")

Thank you. Exactly my point.
Re: Is findstring really slow?
Posted: Wed Apr 23, 2025 7:42 pm
by Randy Walker
wilbert wrote: Wed Apr 23, 2025 6:15 am
Randy Walker wrote: Wed Apr 23, 2025 2:26 am
But recently I saw someone complain that "findstring" was slow
It depends on what you need it for.
It is especially slower as what you can achieve if you need to do multiple searches on a long string.
In the example code you posted you start FindString at a certain position but all characters prior to that position have to be scanned each time to see if there is no string termination.
It's very true. My code was not as optimized as it could be. OH!!, Did I happen to mention somewhere? "I
*never* claimed to be a programmer."