Why FindString() is so terribly slow?

Just starting out? Need help? Post your questions and find answers here.
User avatar
STARGÅTE
Addict
Addict
Posts: 2237
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Why FindString() is so terribly slow?

Post by STARGÅTE »

@uwekel:

Your Python code takes on my system 330ms (in comparison to 2787ms in PureBasic).
And your code need more time (663ms), if the string contains unicode characters (which is in PB automatically).

Code: Select all

from time import time

s = '√' * 1000000 + '∑'
t = time()
for i in range(1000):
    x = s.find('∑')
print(int((time()-t)*1000), 'ms')
input()

Code: Select all

OpenConsole()
s.s = LSet("", 1000000, "√") + "∑"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "∑")
Next
PrintN(Str(ElapsedMilliseconds()-t))
Input()
So, PB is not terribly slow!

I think we can speed up the FindString in PB by using commands such PCMPEQB/PCMPEQW to compare multiply bytes/word in one instruction. Maybe that uses Python, because 2787ms (PB) / 8 Bytes = 348ms (Python)
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
#NULL
Addict
Addict
Posts: 1499
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: Why FindString() is so terribly slow?

Post by #NULL »

Maybe there is some optimization going on. every loop iteration produces the same x and x is not even used. Can you try to add x up to some sum and print that so the compiler can't elide anything?
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: Why FindString() is so terribly slow?

Post by Marc56us »

Regular expression version: 2 ms (1000 time faster than FindString)

Code: Select all

Text$ = Space(1000000) + "!"
Debug "String Lenght: " + Len(Text$)
Start = ElapsedMilliseconds()

CreateRegularExpression(0, "!")
If MatchRegularExpression(0, Text$)
    Stop = ElapsedMilliseconds()
    ExamineRegularExpression(0, Text$)
    While NextRegularExpressionMatch(0)
        Debug "Found: « " + RegularExpressionMatchString(0) + 
              " » at position: " + RegularExpressionMatchPosition(0) +
              " in: " + Str(Stop - Start) + " ms"
    Wend
EndIf

FreeRegularExpression(0)

Code: Select all

String Lenght: 1000001
Found: « ! » at position: 1000001 in: 2 ms
(and with debug on)

So, yes FindString() is slow, but RegEx engine is here and do the job and do more things easyly for strings search.
User avatar
the.weavster
Addict
Addict
Posts: 1577
Joined: Thu Jul 03, 2003 6:53 pm
Location: England

Re: Why FindString() is so terribly slow?

Post by the.weavster »

Marc56us wrote:Regular expression version: 2 ms (1000 time faster than FindString)
..//..
So, yes FindString() is slow, but RegEx engine is here and do the job and do more things easyly for strings search.
I think you're supposed to perform the search 1000 times for the test :wink:
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: Why FindString() is so terribly slow?

Post by Marc56us »

the.weavster wrote:I think you're supposed to perform the search 1000 times for the test :wink:
Oh :o yes :oops:
With 1000 time: 2049 ms so same as FindString()
:wink:
User avatar
NicTheQuick
Addict
Addict
Posts: 1523
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Why FindString() is so terribly slow?

Post by NicTheQuick »

I tested it for fun too:

Code: Select all

Purebasic     1640 ms
Python2.7.15  1280 ms
Python3.6.7     76 ms
That is very impressive. Somehow Python 3.6 does a much better job at finding something.

Here is the Python code I used:

Code: Select all

# -*- coding: utf-8 -*-
from time import time

s = '√' * 1000000 + '∑'
t = time()
sum = 0
for i in range(1000):
	x = s.find('∑')
	sum += x
print(int((time()-t)*1000))
print(sum)
Edit:
You can even make it faster with Unicode in Python2.7:

Code: Select all

# -*- coding: utf-8 -*-
from time import time

s = u'√' * 1000000 + u'∑'
t = time()
sum = 0
for i in range(1000):
	x = s.find(u'∑')
	sum += x
print(int((time()-t)*1000))
print(sum)
Now the timings are:

Code: Select all

Python2.7.15   575 ms
Python3.6.7     70 ms
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
RSBasic
Moderator
Moderator
Posts: 1228
Joined: Thu Dec 31, 2009 11:05 pm
Location: Gernsbach (Germany)
Contact:

Re: Why FindString() is so terribly slow?

Post by RSBasic »

ccode wrote:@RSBasic

x = FindString(LCase(s), LCase("!"))

is slower!!!
Hm, that's weird. I tested on an old laptop with 1 to 100:

Code: Select all

s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 100
  x = FindString(s, "!")
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
t = ElapsedMilliseconds()
For i = 1 To 100
  x = FindString(s, "!", 0, #PB_String_NoCase)
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
t = ElapsedMilliseconds()
For i = 1 To 100
  x = FindString(LCase(s), LCase("!"))
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
340
4289
975
LCase() is faster than #PB_String_NoCase! :?
Image
Image
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: Why FindString() is so terribly slow?

Post by Marc56us »

RSBasic wrote:LCase() is faster than #PB_String_NoCase! :?
This is logical: #PB_String_NoCase must test the two forms of each letter while LCase (and UCase) automatically transform all letters (before) so they do only one test after :wink:

Code: Select all

  #PB_String_CaseSensitive: case sensitive search (a=a) (default).
  #PB_String_NoCase       : case insensitive search (A=a).
It therefore seems that the automatic transformation is faster than the double test
infratec
Always Here
Always Here
Posts: 7625
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Why FindString() is so terribly slow?

Post by infratec »

Ups, already cleared that 100x faster is not true :wink:
Last edited by infratec on Mon Nov 26, 2018 12:57 pm, edited 1 time in total.
User avatar
NicTheQuick
Addict
Addict
Posts: 1523
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Why FindString() is so terribly slow?

Post by NicTheQuick »

infratec wrote:In my test it runs in python 1.5 times faster then PB.
But not 100x faster :wink:

Same PC, with Linux x86.
If you read my post, you will see that Python 3.6 runs more than 23 times faster than PB.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
infratec
Always Here
Always Here
Posts: 7625
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Why FindString() is so terribly slow?

Post by infratec »

Maybe Python 3 starts searching from the end :mrgreen:
#NULL
Addict
Addict
Posts: 1499
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: Why FindString() is so terribly slow?

Post by #NULL »

infratec wrote:Maybe Python 3 starts searching from the end :mrgreen:
If i add the sigma to the beginning of the string instead then it takes 0 milliseconds with Python 2.7.15rc1 8)
infratec
Always Here
Always Here
Posts: 7625
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Why FindString() is so terribly slow?

Post by infratec »

I was talking about 3 and not 2.

Since 3 is faster then it is possible if it start from beginning.

It is also possible that 3 starts 2 threads one starts from the beginning one from the end.
If the pointer matches in the middle the string was not found.
Else one of them finds it earlier and stop the searching.
User avatar
Derren
Enthusiast
Enthusiast
Posts: 316
Joined: Sat Jul 23, 2011 1:13 am
Location: Germany

Re: Why FindString() is so terribly slow?

Post by Derren »

I tried this code by Nic (viewtopic.php?p=529390#p529390)with an online python compiler: https://repl.it/repls/CommonMeagerVendor

Code: Select all

# -*- coding: utf-8 -*-
from time import time

s = '√' * 1000000 + '∑'
t = time()
sum = 0
for i in range(1000):
   x = s.find('∑')
   sum += x
print(int((time()-t)*1000))
print(sum)
I can't get under 300 ms, for some reason, BUT changing the string like this:

Code: Select all

s = '√' * 500000 + '∑' + '√' * 500000 
which I suppose (not fluent in python) puts the Sigma in the middle of the 1 mio character string, it significantly faster (about 50ms).

Also, shouldn't a "from end" search need only 0ms, like #NULL wrote, if that's the case when the Sigma is in the beginning.
User avatar
NicTheQuick
Addict
Addict
Posts: 1523
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Why FindString() is so terribly slow?

Post by NicTheQuick »

infratec wrote:Since 3 is faster then it is possible if it start from beginning.

It is also possible that 3 starts 2 threads one starts from the beginning one from the end.
If the pointer matches in the middle the string was not found.
Else one of them finds it earlier and stop the searching.
That would be a bad idea. The creation of the threads alone would last a small amount of time. That is the reason why you should use thread pools when you want to distribute work. But Python 3 does nothing of that. Also traversing a memory block from the end back to the beginning is not a good choice regarding cache management and the result you want to know. You want to know the first occurrence of a string. Why should you start from the end then? That makes no sense.
But what I think Python 3 is doing is that it knows that s and '∑' were not changed since the last iteration of the for loop and the 'find' method has no side effects, so it can simply use the result of the previous call instead of searching for '∑' in s a second time.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Post Reply