Your Python code takes on my system 330ms (in comparison to 2787ms in PureBasic).
And your code need more time (663ms), if the string contains unicode characters (which is in PB automatically).
OpenConsole()
s.s = LSet("", 1000000, "√") + "∑"
t = ElapsedMilliseconds()
For i = 1 To 1000
x = FindString(s, "∑")
Next
PrintN(Str(ElapsedMilliseconds()-t))
Input()
So, PB is not terribly slow!
I think we can speed up the FindString in PB by using commands such PCMPEQB/PCMPEQW to compare multiply bytes/word in one instruction. Maybe that uses Python, because 2787ms (PB) / 8 Bytes = 348ms (Python)
Maybe there is some optimization going on. every loop iteration produces the same x and x is not even used. Can you try to add x up to some sum and print that so the compiler can't elide anything?
Marc56us wrote:Regular expression version: 2 ms (1000 time faster than FindString)
..//..
So, yes FindString() is slow, but RegEx engine is here and do the job and do more things easyly for strings search.
I think you're supposed to perform the search 1000 times for the test
# -*- coding: utf-8 -*-
from time import time
s = '√' * 1000000 + '∑'
t = time()
sum = 0
for i in range(1000):
x = s.find('∑')
sum += x
print(int((time()-t)*1000))
print(sum)
Edit:
You can even make it faster with Unicode in Python2.7:
# -*- coding: utf-8 -*-
from time import time
s = u'√' * 1000000 + u'∑'
t = time()
sum = 0
for i in range(1000):
x = s.find(u'∑')
sum += x
print(int((time()-t)*1000))
print(sum)
s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 100
x = FindString(s, "!")
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
t = ElapsedMilliseconds()
For i = 1 To 100
x = FindString(s, "!", 0, #PB_String_NoCase)
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
t = ElapsedMilliseconds()
For i = 1 To 100
x = FindString(LCase(s), LCase("!"))
Next
MessageRequester("", Str(ElapsedMilliseconds() - t))
RSBasic wrote:LCase() is faster than #PB_String_NoCase!
This is logical: #PB_String_NoCase must test the two forms of each letter while LCase (and UCase) automatically transform all letters (before) so they do only one test after
Since 3 is faster then it is possible if it start from beginning.
It is also possible that 3 starts 2 threads one starts from the beginning one from the end.
If the pointer matches in the middle the string was not found.
Else one of them finds it earlier and stop the searching.
# -*- coding: utf-8 -*-
from time import time
s = '√' * 1000000 + '∑'
t = time()
sum = 0
for i in range(1000):
x = s.find('∑')
sum += x
print(int((time()-t)*1000))
print(sum)
I can't get under 300 ms, for some reason, BUT changing the string like this:
infratec wrote:Since 3 is faster then it is possible if it start from beginning.
It is also possible that 3 starts 2 threads one starts from the beginning one from the end.
If the pointer matches in the middle the string was not found.
Else one of them finds it earlier and stop the searching.
That would be a bad idea. The creation of the threads alone would last a small amount of time. That is the reason why you should use thread pools when you want to distribute work. But Python 3 does nothing of that. Also traversing a memory block from the end back to the beginning is not a good choice regarding cache management and the result you want to know. You want to know the first occurrence of a string. Why should you start from the end then? That makes no sense.
But what I think Python 3 is doing is that it knows that s and '∑' were not changed since the last iteration of the for loop and the 'find' method has no side effects, so it can simply use the result of the previous call instead of searching for '∑' in s a second time.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.