Is findstring really slow?

For everything that's not in any way related to PureBasic. General chat etc...
Randy Walker
Addict
Addict
Posts: 1060
Joined: Sun Jul 25, 2004 4:21 pm
Location: USoA

Is findstring really slow?

Post by Randy Walker »

Obvious;y as seen at the bottom of this code snippit, 'countstring" is the simple way to get a word count.
But recently I saw someone complain that "findstring" was slow, so I wrote this simple routine to count occurrences of a word to see just how slow it really was. Seems quite fast to me, or maybe its just the clock speed of my CPU that gives it speed. I let you be the judge.
I used The constitution of the US as a text source for my word search.
I just right clicked this link and used the [Save link as] option to download the Titan Bug.txt file : https://www.gutenberg.org/cache/epub/5/pg5.txt

Code: Select all

; text source: https://www.gutenberg.org/cache/epub/5/pg5.txt
Debug FileSize("D:\TEMP\TEMP\Titan Bug.txt")
search$ = "free"
ReadFile(0,"D:\TEMP\TEMP\Titan Bug.txt",#PB_File_SharedRead | #PB_Ascii)
Buffer.s = ReadString(o,#PB_File_IgnoreEOL)
CloseFile(0)
time = ElapsedMilliseconds()
L = Len(buffer)
Debug L
P = 1
While P < L
  P = FindString(buffer,search$,P+1)
  If P
    C +1
  Else
    Break
  EndIf
Wend
Debug C
Debug ElapsedMilliseconds() - time

time = ElapsedMilliseconds()
Debug CountString(buffer,search$)
Debug ElapsedMilliseconds() - time
I count zero milliseconds to be pretty fast.
- - - - - - - - - - - - - - - -
Randy
I *never* claimed to be a programmer.
User avatar
idle
Always Here
Always Here
Posts: 5896
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Is findstring really slow?

Post by idle »

No it's not slow it's just a wrapper to the c strstr function which it's fine for small strings it's a naive search which isn't optimized.

If you want faster search you can often use a boyer moore which is pretty much the gold standard for string search functions but the assumption is that you're searching for repeated occurrences of the search term and that your search term is multiple characters long. The search speed gets faster as the length of the search term increases as it doesn't' need to compare each character of the string and so it will often jump through the search string by the length of the search term when there's no partial match. but it has a slow start since it has to build the skip table with each change of the search term or specifically when the pos=0

there are also SSE simd parallel searches that will beat boyer moore but as a generic function a BMSearch is hard to beat.

This should really start to shine when your search term is more than 4 characters long and it reports the byte position in the memory so you can use peeks to extract the string

Code: Select all

 Structure ara
  a.a[0] 
 EndStructure

Procedure BMSearch(*pinput,inlen,*pat.ara,palen,pos=0) 
  ;booyer moore search 
  Protected i,t,len,skip,*input, *pa.Ascii,*pb.Ascii   
  
  Structure ST
    a.a[256] 
  EndStructure  
  Static skiptable.ST 
  inlen-pos 
  *input = *pinput+pos 
  len = inlen - palen
  If pos = 0  
    For i = 0 To 255
      Skiptable\a[i] = 255;
    Next  
    t= palen-1
    For i = 0 To t
      skiptable\a[*pat\a[i]] = i
    Next     
  EndIf 
  i=0
  skip=0
  While skip <= len
    i = palen - 1;
    *pa = (*input + skip + i)
    *pb = *pat+i
    While (*pb\a = *pa\a) 
      i-1 
      *pa - 1 
      *pb - 1
    Wend   
    If i > 0 
      t = i - Skiptable\a[*pa\a]   
      If t > 1 
        skip + t 
      Else 
        skip + 1
      EndIf 
    Else   
      ProcedureReturn skip + pos 
    EndIf     
  Wend  
  ProcedureReturn 0
  
EndProcedure 

AZJIO
Addict
Addict
Posts: 2191
Joined: Sun May 14, 2017 1:48 am

Re: Is findstring really slow?

Post by AZJIO »

Randy Walker wrote: Wed Apr 23, 2025 2:26 am

Code: Select all

P+1)

Code: Select all

length = Len(search$)
P + length)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Is findstring really slow?

Post by wilbert »

Randy Walker wrote: Wed Apr 23, 2025 2:26 am But recently I saw someone complain that "findstring" was slow
It depends on what you need it for.
It is especially slower as what you can achieve if you need to do multiple searches on a long string.
In the example code you posted you start FindString at a certain position but all characters prior to that position have to be scanned each time to see if there is no string termination.
Windows (x64)
Raspberry Pi OS (Arm64)
BarryG
Addict
Addict
Posts: 4173
Joined: Thu Apr 18, 2019 8:17 am

Re: Is findstring really slow?

Post by BarryG »

Randy Walker wrote: Wed Apr 23, 2025 2:26 amrecently I saw someone complain that "findstring" was slow
Was that an old post? Because it used to be slow in older versions of PureBasic, but this was fixed some time ago. It can find a match at the end of a huge 100 MB string in less than half a second here on my old PC. Hardly what I'd call "slow".

Code: Select all

Debug FindString(Space(100*1048756)+"z","z")
User avatar
NicTheQuick
Addict
Addict
Posts: 1519
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Is findstring really slow?

Post by NicTheQuick »

As long as you test it with an enabled debugger your code will always be slowed down by it drastically.
Don't compare speeds with an enabled debugger!
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
NicTheQuick
Addict
Addict
Posts: 1519
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Is findstring really slow?

Post by NicTheQuick »

I rewrote the code to make it usable with disabled debugger. I tested it with the `pg5.txt` file.
Even with the debugger it only takes 0 ms. We need a way bigger text file to create a reasonable test case.

Besides of that, idle already posted a code with the Boyer–Moore string-search algorithm if I see that right. It has a performance of O(m + n) when m and n are the lengths of the input text and search pattern. This algorithm especially makes sense if you want to search for the same pattern over and over.

Here's the code:

Code: Select all

; text source: https://www.gutenberg.org/cache/epub/5/pg5.txt

EnableExplicit
Define file.s, buffer.s, search.s

file.s = "/home/nicolas/tmp/purebasic/pg5.txt"
search = "free"


Debug "Filesize: " + StrU(FileSize(file))

If Not ReadFile(0, file,#PB_File_SharedRead | #PB_Ascii)
	DebuggerError("Could not open file: " + file)
	End 1
EndIf
buffer.s = ReadString(0, #PB_File_IgnoreEOL)
CloseFile(0)

Define time1.i, time2.i, L.i, P.i, C.i

time1 = ElapsedMilliseconds()
L = Len(buffer)
Debug "String legngth: " + StrU(L)
P = 1
While P < L
	P = FindString(buffer, search, P + 1)
	If P
		C + 1
	Else
		Break
	EndIf
Wend
Debug "Occurences: " + StrU(C)
time1 = ElapsedMilliseconds() - time1


time2 = ElapsedMilliseconds()
C = CountString(buffer, search)
time2 = ElapsedMilliseconds() - time2

MessageRequester("Times", "FindString: " + time1 + " ms" + #LF$ + "CountString: " + time2 + " ms")
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Axolotl
Addict
Addict
Posts: 837
Joined: Wed Dec 31, 2008 3:36 pm

Re: Is findstring really slow?

Post by Axolotl »

I haven't had any speed problems yet. If possible the use of optional StartPosition is a speed-booster.
Depending on what you have in mind, there are other algorithms that all have their strengths and weaknesses ...... but the professionals have already started to explain this.

BTW: I like this pages for learning stuff. (However, you need to be able to read and understand C/C++, Java, Python or similar.)
Maybe it helps a bit (if you want to move in that direction.)
Tutorial to learn Data Structures and Algorithms

Or as Overview and Introduction:
Searching Algorithms
Just because it worked doesn't mean it works.
PureBasic 6.04 (x86) and <latest stable version and current alpha/beta> (x64) on Windows 11 Home. Now started with Linux (VM: Ubuntu 22.04).
Little John
Addict
Addict
Posts: 4789
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Is findstring really slow?

Post by Little John »

I have some books about data structures and algorithms. The website mentioned above is apparently also a good source of information on the subject. Many thanks for the link!
Quin
Addict
Addict
Posts: 1133
Joined: Thu Mar 31, 2022 7:03 pm
Location: Colorado, United States
Contact:

Re: Is findstring really slow?

Post by Quin »

I don't think it's FindString() that's truly slow, as others have mentioned here it's not the fastest string algorithm out there but it's not too bad either. What is slow, at least as far as I recall last time I tested in I believe PB 6.10, StringField is horrifyingly slow.
Little John
Addict
Addict
Posts: 4789
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Is findstring really slow?

Post by Little John »

Quin wrote: Wed Apr 23, 2025 2:17 pm What is slow, at least as far as I recall last time I tested in I believe PB 6.10, StringField is horrifyingly slow.
The reason is, that StringField() uses Shlemiel the painter’s algorithm, see freak's corresponding message.
Randy Walker
Addict
Addict
Posts: 1060
Joined: Sun Jul 25, 2004 4:21 pm
Location: USoA

Re: Is findstring really slow?

Post by Randy Walker »

BarryG wrote: Wed Apr 23, 2025 8:23 am
Randy Walker wrote: Wed Apr 23, 2025 2:26 amrecently I saw someone complain that "findstring" was slow
Was that an old post?
It was afairly recent pos. I don'rt know where exactly -- have to go back and review all the threads I've posted in.
... It can find a match at the end of a huge 100 MB string in less than half a second here on my old PC. Hardly what I'd call "slow".

Code: Select all

Debug FindString(Space(100*1048756)+"z","z")
:lol: Thank you. Exactly my point.
- - - - - - - - - - - - - - - -
Randy
I *never* claimed to be a programmer.
Randy Walker
Addict
Addict
Posts: 1060
Joined: Sun Jul 25, 2004 4:21 pm
Location: USoA

Re: Is findstring really slow?

Post by Randy Walker »

wilbert wrote: Wed Apr 23, 2025 6:15 am
Randy Walker wrote: Wed Apr 23, 2025 2:26 am But recently I saw someone complain that "findstring" was slow
It depends on what you need it for.
It is especially slower as what you can achieve if you need to do multiple searches on a long string.
In the example code you posted you start FindString at a certain position but all characters prior to that position have to be scanned each time to see if there is no string termination.
It's very true. My code was not as optimized as it could be. OH!!, Did I happen to mention somewhere? "I *never* claimed to be a programmer."
- - - - - - - - - - - - - - - -
Randy
I *never* claimed to be a programmer.
Post Reply