Page 2 of 8
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 8:41 am
by STARGÅTE
@uwekel:
Your Python code takes on my system 330ms (in comparison to 2787ms in PureBasic).
And your code need more time (663ms), if the string contains unicode characters (which is in PB automatically).
Code: Select all
from time import time
s = '√' * 1000000 + '∑'
t = time()
for i in range(1000):
x = s.find('∑')
print(int((time()-t)*1000), 'ms')
input()
Code: Select all
OpenConsole()
s.s = LSet("", 1000000, "√") + "∑"
t = ElapsedMilliseconds()
For i = 1 To 1000
x = FindString(s, "∑")
Next
PrintN(Str(ElapsedMilliseconds()-t))
Input()
So, PB is
not terribly slow!
I think we can speed up the FindString in PB by using commands such
PCMPEQB/PCMPEQW to compare multiply bytes/word in one instruction. Maybe that uses Python, because 2787ms (PB) / 8 Bytes = 348ms (Python)
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 8:44 am
by #NULL
Maybe there is some optimization going on. every loop iteration produces the same x and x is not even used. Can you try to add x up to some sum and print that so the compiler can't elide anything?
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 8:54 am
by Marc56us
Regular expression version: 2 ms (1000 time faster than FindString)
Code: Select all
Text$ = Space(1000000) + "!"
Debug "String Lenght: " + Len(Text$)
Start = ElapsedMilliseconds()
CreateRegularExpression(0, "!")
If MatchRegularExpression(0, Text$)
Stop = ElapsedMilliseconds()
ExamineRegularExpression(0, Text$)
While NextRegularExpressionMatch(0)
Debug "Found: « " + RegularExpressionMatchString(0) +
" » at position: " + RegularExpressionMatchPosition(0) +
" in: " + Str(Stop - Start) + " ms"
Wend
EndIf
FreeRegularExpression(0)
Code: Select all
String Lenght: 1000001
Found: « ! » at position: 1000001 in: 2 ms
(and with debug on)
So, yes FindString() is slow, but RegEx engine is here and do the job and do more things easyly for strings search.
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 9:36 am
by the.weavster
Marc56us wrote:Regular expression version: 2 ms (1000 time faster than FindString)
..//..
So, yes FindString() is slow, but RegEx engine is here and do the job and do more things easyly for strings search.
I think you're supposed to perform the search 1000 times for the test

Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 9:43 am
by Marc56us
the.weavster wrote:I think you're supposed to perform the search 1000 times for the test

Oh

yes
With 1000 time: 2049 ms so same as FindString()

Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 11:00 am
by NicTheQuick
I tested it for fun too:
Code: Select all
Purebasic 1640 ms
Python2.7.15 1280 ms
Python3.6.7 76 ms
That is very impressive. Somehow Python 3.6 does a much better job at finding something.
Here is the Python code I used:
Code: Select all
# -*- coding: utf-8 -*-
from time import time
s = '√' * 1000000 + '∑'
t = time()
sum = 0
for i in range(1000):
x = s.find('∑')
sum += x
print(int((time()-t)*1000))
print(sum)
Edit:
You can even make it faster with Unicode in Python2.7:
Code: Select all
# -*- coding: utf-8 -*-
from time import time
s = u'√' * 1000000 + u'∑'
t = time()
sum = 0
for i in range(1000):
x = s.find(u'∑')
sum += x
print(int((time()-t)*1000))
print(sum)
Now the timings are:
Code: Select all
Python2.7.15 575 ms
Python3.6.7 70 ms
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 11:21 am
by RSBasic
ccode wrote:@RSBasic
x = FindString(LCase(s), LCase("!"))
is slower!!!
Hm, that's weird. I tested on an old laptop with 1 to 100:
Code: Select all
s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 100
x = FindString(s, "!")
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
t = ElapsedMilliseconds()
For i = 1 To 100
x = FindString(s, "!", 0, #PB_String_NoCase)
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
t = ElapsedMilliseconds()
For i = 1 To 100
x = FindString(LCase(s), LCase("!"))
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
340
4289
975
LCase() is faster than #PB_String_NoCase!

Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 11:59 am
by Marc56us
RSBasic wrote:LCase() is faster than #PB_String_NoCase!

This is logical:
#PB_String_NoCase must test the two forms of each letter while LCase (and UCase) automatically transform all letters (before) so they do only one test after
Code: Select all
#PB_String_CaseSensitive: case sensitive search (a=a) (default).
#PB_String_NoCase : case insensitive search (A=a).
It therefore seems that the automatic transformation is faster than the double test
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 12:53 pm
by infratec
Ups, already cleared that 100x faster is not true

Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 12:56 pm
by NicTheQuick
infratec wrote:In my test it runs in python 1.5 times faster then PB.
But not 100x faster
Same PC, with Linux x86.
If you read my post, you will see that Python 3.6 runs more than 23 times faster than PB.
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 1:02 pm
by infratec
Maybe Python 3 starts searching from the end

Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 1:11 pm
by #NULL
infratec wrote:Maybe Python 3 starts searching from the end

If i add the sigma to the beginning of the string instead then it takes 0 milliseconds with Python 2.7.15rc1

Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 1:22 pm
by infratec
I was talking about 3 and not 2.
Since 3 is faster then it is possible if it start from beginning.
It is also possible that 3 starts 2 threads one starts from the beginning one from the end.
If the pointer matches in the middle the string was not found.
Else one of them finds it earlier and stop the searching.
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 2:02 pm
by Derren
I tried this code by Nic (
viewtopic.php?p=529390#p529390)with an online python compiler:
https://repl.it/repls/CommonMeagerVendor
Code: Select all
# -*- coding: utf-8 -*-
from time import time
s = '√' * 1000000 + '∑'
t = time()
sum = 0
for i in range(1000):
x = s.find('∑')
sum += x
print(int((time()-t)*1000))
print(sum)
I can't get under 300 ms, for some reason, BUT changing the string like this:
Code: Select all
s = '√' * 500000 + '∑' + '√' * 500000
which I suppose (not fluent in python) puts the Sigma in the middle of the 1 mio character string, it significantly faster (about 50ms).
Also, shouldn't a "from end" search need only 0ms, like #NULL wrote, if that's the case when the Sigma is in the beginning.
Re: Why FindString() is so terribly slow?
Posted: Mon Nov 26, 2018 2:07 pm
by NicTheQuick
infratec wrote:Since 3 is faster then it is possible if it start from beginning.
It is also possible that 3 starts 2 threads one starts from the beginning one from the end.
If the pointer matches in the middle the string was not found.
Else one of them finds it earlier and stop the searching.
That would be a bad idea. The creation of the threads alone would last a small amount of time. That is the reason why you should use thread pools when you want to distribute work. But Python 3 does nothing of that. Also traversing a memory block from the end back to the beginning is not a good choice regarding cache management and the result you want to know. You want to know the first occurrence of a string. Why should you start from the end then? That makes no sense.
But what I think Python 3 is doing is that it knows that s and '∑' were not changed since the last iteration of the for loop and the 'find' method has no side effects, so it can simply use the result of the previous call instead of searching for '∑' in s a second time.