Page 1 of 2
Extreme fast Len() replacement !!!
Posted: Wed Oct 05, 2005 8:44 pm
by va!n
Code updated For 5.20+
here is a very fast Len() replacement based on MMX... Credits for the original MMX version must go to "Ryan Mack"! And many thanks to "remi_meier" for the PureBasic version!

(just take a look at:
viewtopic.php?t=17084)
Code: Select all
EnableASM
DataSection
STRINGTBL:
Data.l 0, 0, 0, $FF
Data.l 0, $FFFF, 0, $FFFFFF
Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
EndDataSection
Procedure.l STRLen(pString.l)
MOV eax, pString
!pxor mm1, mm1
!MOV ecx, eax
!MOV edx, eax
!AND ecx, -8
!AND eax, 7
!movq mm0, [ecx]
!por mm0, [l_stringtbl+eax*8]
!@@:
!ADD ecx, 8
!pcmpeqb mm0, mm1
!packsswb mm0, mm0
!movd eax, mm0
!movq mm0, [ecx]
!TEST eax, eax
!JZ @b
!BSF eax, eax
!SHR eax, 2
!LEA ecx, [ecx+eax-8]
!SUB ecx, edx
!MOV eax, ecx
ProcedureReturn
EndProcedure
StartTime = GetTickCount_()
For i = 0 To 10000000
;
;----Test1---- 3718 ms ----
; beta$ = "test"
; Len(beta$)
;----Test2 ---- 2188 ms ----
; Len("test")
;----Test3 ---- 2094 ms ----
; beta$ = "test"
; STRLen(@beta$)
;----Test3 ---- 359 ms ----
STRLen(@"Test")
Next
MessageRequester("Result:",Str(GetTickCount_()-StartTime))
End
Have a nice day

Posted: Wed Oct 05, 2005 8:54 pm
by Trond
In delphi strings the length is stored in string so the len command is very fast.
Posted: Wed Oct 05, 2005 8:56 pm
by va!n
mhhhh... seems i was a bit to fast
declaring beta$ in every loop is bad... so i changed it... and now the results are "a little bit" different
Code: Select all
DataSection
STRINGTBL:
Data.l 0, 0, 0, $FF
Data.l 0, $FFFF, 0, $FFFFFF
Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
EndDataSection
Procedure.l STRLen(pString.l)
MOV eax, pString
!pxor mm1, mm1
!MOV ecx, eax
!MOV edx, eax
!AND ecx, -8
!AND eax, 7
!movq mm0, [ecx]
!por mm0, [l_stringtbl+eax*8]
!@@:
!ADD ecx, 8
!pcmpeqb mm0, mm1
!packsswb mm0, mm0
!movd eax, mm0
!movq mm0, [ecx]
!TEST eax, eax
!JZ @b
!BSF eax, eax
!SHR eax, 2
!LEA ecx, [ecx+eax-8]
!SUB ecx, edx
!MOV eax, ecx
ProcedureReturn
EndProcedure
StartTime = GetTickCount_()
beta$ = "test"
For i = 0 To 10000000
;
;----Test1---- 3718 ms ---- 110
; Len(beta$)
;----Test2 ---- 2188 ms ---- 109
; Len("test")
;----Test3 ---- 2094 ms ---- 359
; STRLen(@beta$)
;----Test3 ---- 359 ms ---- 359
STRLen(@"Test")
Next
MessageRequester("Result:",Str(GetTickCount_()-StartTime))
End
Posted: Wed Oct 05, 2005 10:25 pm
by traumatic
You didn't forget to disable the debugger by any chance, did you?
PB's Len() is much faster here for lengths < 12 chars.
EDIT: Ah, now I see you corrected the result.

Posted: Wed Oct 05, 2005 11:35 pm
by Deeem2031
It should be faster?!
Ok, maybe. Also whether here are other result but:
1. It's bad code because it reads possibly7 bytes after the null-byte.
2. Debug StrLen2(@"test") -> -4 ?!?
3. This use MMX so you must call "emms" at the end of the procedure or float-operations will fail after using this proc.
EDIT: Use the StrLen-Proc from my library if you want a fast and clean procedure:
http://www.deeem2031.de/PHPString
Posted: Thu Oct 06, 2005 4:14 pm
by va!n
@traumatic:
thats the point
@deeem2031:
i tried a new test and as traumatic say, its a lot faster for larger strings! I dont know if the code may be bad using 7 bytes... i only read that this should be a very fast and working way...!? anyway take a look at fasm forum, for more info

Thanks for the tip with ""emms" at end of procedure!
Code: Select all
DataSection
STRINGTBL:
Data.l 0, 0, 0, $FF
Data.l 0, $FFFF, 0, $FFFFFF
Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
EndDataSection
Procedure.l STRLen(pString.l)
MOV eax, pString
!pxor mm1, mm1
!MOV ecx, eax
!MOV edx, eax
!AND ecx, -8
!AND eax, 7
!movq mm0, [ecx]
!por mm0, [l_stringtbl+eax*8]
!@@:
!ADD ecx, 8
!pcmpeqb mm0, mm1
!packsswb mm0, mm0
!movd eax, mm0
!movq mm0, [ecx]
!TEST eax, eax
!JZ @b
!BSF eax, eax
!SHR eax, 2
!LEA ecx, [ecx+eax-8]
!SUB ecx, edx
!MOV eax, ecx
!emms
ProcedureReturn
EndProcedure
StartTime = GetTickCount_()
; beta$ = "Test"
beta$ = "We are checking the speed of StrLen() for strings larger than 12 chars!"
For i = 0 To 10000000
;
; Len(beta$)
; Len("Test")
; Len("We are checking the speed of StrLen() for strings larger than 12 chars!")
; STRLen(@beta$)
; STRLen("Test)
STRLen(@"We are checking the speed of StrLen() for strings larger than 12 chars!")
Next
MessageRequester("Result:",Str(GetTickCount_()-StartTime))
End
here are my results:
Results using == beta$ = "Test"
Len(beta$) = 110 ms
Len("test") = 110 ms
STRLen(@beta$) = 312 ms
STRLen("Test) = 312 ms
Results using == beta$ = "We are checking the speed of StrLen() for strings larger than 12 chars!"
Len(beta$) = 953 ms
Len("test") = 953 ms
STRLen(@beta$) = 406 ms
STRLen("Test) = 313 ms
So you may see, for large strings the MMX version is a lot faster! But for small strings the original Len() command may be faster

Posted: Fri Oct 07, 2005 2:01 am
by Rescator
I'm surprised that Len() varies, as far as I knew PB tracks string lengths internaly? So Len() should be same speed always.
Unless it simply lookes inside the 0-n length string for first 0 which I guess makes sense.
Which brings me to the next bit...
Could someone please modify the example codes psted here
to include a 2nd maxlen argument.
Just scanning until you hit 0 on unknown length strings is bad and ripe for security exploits.
A too long string could be anything up to 2GB, or worse..
I have no idea what would happen if the loop continues and smacks into the end of addressable memory. application crash most likely?
So please please please! *begs on his knees* add a maxlen argument to the function.
Do that and not only is it a worty alternative to the API lstrlen()
but a nice PB alternative for the API strsafe.lib StringCbLength()
Posted: Fri Oct 07, 2005 2:32 am
by Rescator
This is just a speed test example, it it NOT production safe as the routine has no maxlen limit, fixed/auto/user specified.
Also, interestingly enough PB's Len() is always faster here.
The source is based on the one above, but improved for more reliable test result. (minimum overhead for the loop, uniform loops, as static loops as possible to allow the CPU to kick in caching/optimization.)
My results where (3 seperate runs, 20% existing system load, AMD Athlon XP 1.85Ghz):
STRLen() 71chars: 4036ms
Len() 71chars: 2343ms
STRLen() 1chars: 3626ms
Len() 1chars: 941ms
STRLen() 71chars: 3995ms
Len() 71chars: 2303ms
STRLen() 1chars: 3725ms
Len() 1chars: 922ms
STRLen() 71chars: 3956ms
Len() 71chars: 2273ms
STRLen() 1chars: 3495ms
Len() 1chars: 882ms
Code: Select all
DataSection
STRINGTBL:
Data.l 0, 0, 0, $FF
Data.l 0, $FFFF, 0, $FFFFFF
Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
EndDataSection
Procedure.l STRLen(*pString)
MOV eax, *pString
!pxor mm1, mm1
!MOV ecx, eax
!MOV edx, eax
!AND ecx, -8
!AND eax, 7
!movq mm0, [ecx]
!por mm0, [l_stringtbl+eax*8]
!@@:
!ADD ecx, 8
!pcmpeqb mm0, mm1
!packsswb mm0, mm0
!movd eax, mm0
!movq mm0, [ecx]
!TEST eax, eax
!JZ @b
!BSF eax, eax
!SHR eax, 2
!LEA ecx, [ecx+eax-8]
!SUB ecx, edx
!MOV eax, ecx
!emms
ProcedureReturn
EndProcedure
beta1.s = "We are checking the speed of StrLen() for strings larger than 12 chars!"
beta2.s = "P"
StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=STRLen(@beta1)
Next
EndTime=GetTickCount_()
Debug "STRLen() "+Str(testlen)+"chars: "+Str(EndTime-StartTime)+"ms"
StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=Len(beta1)
Next
EndTime=GetTickCount_()
Debug "Len() "+Str(testlen)+"chars: "+Str(EndTime-StartTime)+"ms"
StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=STRLen(@beta2)
Next
EndTime=GetTickCount_()
Debug "STRLen() "+Str(testlen)+"chars: "+Str(EndTime-StartTime)+"ms"
StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=Len(beta2)
Next
EndTime=GetTickCount_()
Debug "Len() "+Str(testlen)+"chars: "+Str(EndTime-StartTime)+"ms"
End
Yes it may seem a odd way to do it,
but remember. the For Next itself as well as the StartTime and EndTime
is all part of the timing so those need to be as simple and minimalistic as possible too.
I always make speed tests in a similar way to this on stuff.
The 1 sec delay is to allow the cpu and system to breathe.
Posted: Fri Oct 07, 2005 2:39 am
by GeoTrail
3 seperate runs, AMD Athlon XP 2400+ 1.99Ghz:
First run:
STRLen() 71chars: 3234ms
Len() 71chars: 1938ms
STRLen() 1chars: 3172ms
Len() 1chars: 765ms
Second run:
STRLen() 71chars: 3235ms
Len() 71chars: 1906ms
STRLen() 1chars: 3063ms
Len() 1chars: 781ms
Third run:
STRLen() 71chars: 3218ms
Len() 71chars: 1922ms
STRLen() 1chars: 3125ms
Len() 1chars: 781ms
Posted: Fri Oct 07, 2005 3:10 am
by Rescator
Very consistent results, as expected!
Now, this code on the other hand must be
compiled to a exe.
When run it will do the 3 tests, and the results will be found in the file named results.txt
A message requester will pop up saying "Complete" when the test is finshed, and it's time to open results.txt with excitement
This time with compiled code we see a dramatic shift in the numbers,
and I'd say very inconsistent, not to mention faster.

Obviously with the compiled code the CPU is fully able to cache/optimize the performance. and the numbers look way more impressive for the 10 million loop now.
STRLen() 71chars: 531ms
Len() 71chars: 1573ms
STRLen() 1chars: 431ms
Len() 1chars: 100ms
STRLen() 71chars: 571ms
Len() 71chars: 1613ms
STRLen() 1chars: 401ms
Len() 1chars: 90ms
STRLen() 71chars: 571ms
Len() 71chars: 1553ms
STRLen() 1chars: 411ms
Len() 1chars: 120ms
Code: Select all
DataSection
STRINGTBL:
Data.l 0, 0, 0, $FF
Data.l 0, $FFFF, 0, $FFFFFF
Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
EndDataSection
Procedure.l STRLen(*pString)
MOV eax, *pString
!pxor mm1, mm1
!MOV ecx, eax
!MOV edx, eax
!AND ecx, -8
!AND eax, 7
!movq mm0, [ecx]
!por mm0, [l_stringtbl+eax*8]
!@@:
!ADD ecx, 8
!pcmpeqb mm0, mm1
!packsswb mm0, mm0
!movd eax, mm0
!movq mm0, [ecx]
!TEST eax, eax
!JZ @b
!BSF eax, eax
!SHR eax, 2
!LEA ecx, [ecx+eax-8]
!SUB ecx, edx
!MOV eax, ecx
!emms
ProcedureReturn
EndProcedure
file=CreateFile(#PB_Any,"results.txt")
beta1.s = "We are checking the speed of StrLen() for strings larger than 12 chars!"
beta2.s = "P"
c=0
Repeat
result1.l=0
result2.l=0
result3.l=0
result4.l=0
StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=STRLen(@beta1)
Next
EndTime=GetTickCount_()
result1=EndTime-StartTime
StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=Len(beta1)
Next
EndTime=GetTickCount_()
result2=EndTime-StartTime
StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=STRLen(@beta2)
Next
EndTime=GetTickCount_()
result3=EndTime-StartTime
StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=Len(beta2)
Next
EndTime=GetTickCount_()
result4=EndTime-StartTime
msg$="STRLen() "+Str(Len(beta1))+"chars: "+Str(result1)+"ms"+#CRLF$
msg$=msg$+"Len() "+Str(Len(beta1))+"chars: "+Str(result2)+"ms"+#CRLF$
msg$=msg$+"STRLen() "+Str(Len(beta2))+"chars: "+Str(result3)+"ms"+#CRLF$
msg$=msg$+"Len() "+Str(Len(beta2))+"chars: "+Str(result4)+"ms"+#CRLF$+#CRLF$
WriteString(msg$)
c=c+1
Delay(1000)
Until c=3
CloseFile(file)
MessageRequester("Len/STRLen Test","Complete")
End
Posted: Fri Oct 07, 2005 3:15 am
by GeoTrail
STRLen() 71chars: 515ms
Len() 71chars: 1313ms
STRLen() 1chars: 344ms
Len() 1chars: 93ms
STRLen() 71chars: 469ms
Len() 71chars: 1344ms
STRLen() 1chars: 344ms
Len() 1chars: 78ms
STRLen() 71chars: 484ms
Len() 71chars: 1313ms
STRLen() 1chars: 328ms
Len() 1chars: 94ms
Posted: Fri Oct 07, 2005 3:43 am
by Rescator
Ok, you speedfreaks are gonna get a kick out of this one!
Reduce the loops to 1 million. (otherwise this is gonna take ages as you will see)
Then change beta1 string to something REALLY long.
1 million loops used, 7999 char beta1 string. (I tried a 63999 string but the PB compiler seem to choke/bug out at source code lines that are above much longer than 8000 chars.
STRLen() 7999chars: 2744ms
Len() 7999chars: 15753ms
STRLen() 1chars: 60ms
Len() 1chars: 10ms
STRLen() 7999chars: 2904ms
Len() 7999chars: 15833ms
STRLen() 1chars: 40ms
Len() 1chars: 10ms
STRLen() 7999chars: 2744ms
Len() 7999chars: 15883ms
STRLen() 1chars: 41ms
Len() 1chars: 0ms
Very interesting. Len() is actualy very slow on long strings.
(and this was only 1 million loops rather than 10 mill as in above posts).
The 1 char reults are amusing though, I can only guess these are te result
of a CPU rebounce, or task scheduling/system optimization, or something.
(anyone got any ideas of the suddenly super fast 1char tests?)
Posted: Fri Oct 07, 2005 3:51 am
by GeoTrail
1 million loops and 7998 characters
STRLen() 7998chars: 2625ms
Len() 7998chars: 12766ms
STRLen() 1chars: 31ms
Len() 1chars: 0ms
STRLen() 7998chars: 2313ms
Len() 7998chars: 12828ms
STRLen() 1chars: 31ms
Len() 1chars: 0ms
STRLen() 7998chars: 2328ms
Len() 7998chars: 13141ms
STRLen() 1chars: 31ms
Len() 1chars: 16ms
Posted: Fri Oct 07, 2005 4:14 am
by Rescator
Awsome Geo. Well I guess that we proved that Len() is sloooow, and screaming in pain the larger the strings are.
While the MMX code seems to be very consistent,
actually since the MMX code is so consistent comparing 1 char and xxxx chars, my guess is that the main bottleneck is calling the procedure itself.
In other words, the MMX code has way more juice still before maxing out.
while with Len() it seems the function call itself is a drop in the sea compared to it's actual code speed.
Hmm! I wonder how a non-MMX ASM procedure would compare against the Len() and MMX ones?
Posted: Fri Oct 07, 2005 2:15 pm
by va!n
a little modified version... just change the lMaxChars value for different speed comparsion...
Code: Select all
DataSection
STRINGTBL:
Data.l 0, 0, 0, $FF
Data.l 0, $FFFF, 0, $FFFFFF
Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
EndDataSection
Procedure.l STRLen(*pString)
MOV eax, *pString
!pxor mm1, mm1
!MOV ecx, eax
!MOV edx, eax
!AND ecx, -8
!AND eax, 7
!movq mm0, [ecx]
!por mm0, [l_stringtbl+eax*8]
!@@:
!ADD ecx, 8
!pcmpeqb mm0, mm1
!packsswb mm0, mm0
!movd eax, mm0
!movq mm0, [ecx]
!TEST eax, eax
!JZ @b
!BSF eax, eax
!SHR eax, 2
!LEA ecx, [ecx+eax-8]
!SUB ecx, edx
!MOV eax, ecx
!emms
ProcedureReturn
EndProcedure
lMaxChars = 500
file=CreateFile(#PB_Any,"c:\results.txt")
RandomSeed(23)
For i = 1 To lMaxChars
newchar.s = Chr(32+Random(94))
beta1.s = beta1.s + newchar
Next
beta2.s = "P"
c=0
Repeat
result1.l=0
result2.l=0
result3.l=0
result4.l=0
StartTime.l=0
EndTime.l=0
testlen.l=0
; Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=STRLen(@beta1)
Next
EndTime=GetTickCount_()
result1=EndTime-StartTime
StartTime.l=0
EndTime.l=0
testlen.l=0
; Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=Len(beta1)
Next
EndTime=GetTickCount_()
result2=EndTime-StartTime
StartTime.l=0
EndTime.l=0
testlen.l=0
; Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=STRLen(@beta2)
Next
EndTime=GetTickCount_()
result3=EndTime-StartTime
StartTime.l=0
EndTime.l=0
testlen.l=0
; Delay(1000)
StartTime = GetTickCount_()
For i = 0 To 10000000
testlen=Len(beta2)
Next
EndTime=GetTickCount_()
result4=EndTime-StartTime
msg$="STRLen() "+Str(Len(beta1))+"chars: "+Str(result1)+"ms"+#CRLF$
msg$=msg$+"Len() "+Str(Len(beta1))+"chars: "+Str(result2)+"ms"+#CRLF$
msg$=msg$+"STRLen() "+Str(Len(beta2))+"chars: "+Str(result3)+"ms"+#CRLF$
msg$=msg$+"Len() "+Str(Len(beta2))+"chars: "+Str(result4)+"ms"+#CRLF$+#CRLF$
WriteString(msg$)
c=c+1
; Delay(1000)
Until c=3
CloseFile(file)
MessageRequester("Len/STRLen Test","Complete")
End