Page 1 of 2

Extreme fast Len() replacement !!!

Posted: Wed Oct 05, 2005 8:44 pm
by va!n
Code updated For 5.20+

here is a very fast Len() replacement based on MMX... Credits for the original MMX version must go to "Ryan Mack"! And many thanks to "remi_meier" for the PureBasic version! :wink: (just take a look at: viewtopic.php?t=17084)

Code: Select all

EnableASM
DataSection
  STRINGTBL:
  Data.l 0, 0, 0, $FF
  Data.l 0, $FFFF, 0, $FFFFFF
  Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
  Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
EndDataSection

Procedure.l STRLen(pString.l)
  MOV eax, pString
  
  !pxor     mm1, mm1
  !MOV      ecx, eax
  !MOV      edx, eax
  !AND      ecx, -8
  !AND      eax, 7
  !movq     mm0, [ecx]
  !por      mm0, [l_stringtbl+eax*8]
  !@@:
  !ADD      ecx, 8
  !pcmpeqb  mm0, mm1
  !packsswb mm0, mm0
  !movd     eax, mm0
  !movq     mm0, [ecx]
  !TEST     eax, eax
  !JZ       @b
  
  !BSF      eax, eax
  !SHR      eax, 2
  !LEA      ecx, [ecx+eax-8]
  !SUB      ecx, edx
  !MOV eax, ecx
  
  ProcedureReturn
EndProcedure



StartTime = GetTickCount_()

For i = 0 To 10000000
  ;
  ;----Test1---- 3718 ms ----
  
  ; beta$ = "test"
  ; Len(beta$)   
  
  ;----Test2 ---- 2188 ms ----
  
  ; Len("test")   
  
  ;----Test3 ---- 2094 ms ----
  
  ; beta$ = "test"
  ; STRLen(@beta$)
  
  ;----Test3 ----  359 ms ----
  
  STRLen(@"Test")
  
Next

MessageRequester("Result:",Str(GetTickCount_()-StartTime))
End
Have a nice day ;-)

Posted: Wed Oct 05, 2005 8:54 pm
by Trond
In delphi strings the length is stored in string so the len command is very fast.

Posted: Wed Oct 05, 2005 8:56 pm
by va!n
mhhhh... seems i was a bit to fast :lol:

declaring beta$ in every loop is bad... so i changed it... and now the results are "a little bit" different :wink:

Code: Select all

DataSection
STRINGTBL:
Data.l 0, 0, 0, $FF
Data.l 0, $FFFF, 0, $FFFFFF
Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
EndDataSection

Procedure.l STRLen(pString.l)
 MOV eax, pString
 
 !pxor     mm1, mm1 
 !MOV      ecx, eax 
 !MOV      edx, eax 
 !AND      ecx, -8 
 !AND      eax, 7 
 !movq     mm0, [ecx] 
 !por      mm0, [l_stringtbl+eax*8] 
 !@@: 
 !ADD      ecx, 8 
 !pcmpeqb  mm0, mm1 
 !packsswb mm0, mm0 
 !movd     eax, mm0 
 !movq     mm0, [ecx] 
 !TEST     eax, eax 
 !JZ       @b 
 
 !BSF      eax, eax 
 !SHR      eax, 2 
 !LEA      ecx, [ecx+eax-8] 
 !SUB      ecx, edx
 !MOV eax, ecx
 
 ProcedureReturn 
EndProcedure



StartTime = GetTickCount_()
beta$ = "test"


For i = 0 To 10000000
 ;
 ;----Test1---- 3718 ms ---- 110
 
; Len(beta$)   
 
 ;----Test2 ---- 2188 ms ---- 109
 
; Len("test")   
 
 ;----Test3 ---- 2094 ms ---- 359
 
; STRLen(@beta$)
 
 ;----Test3 ----  359 ms ---- 359
 
 STRLen(@"Test")
  
Next

MessageRequester("Result:",Str(GetTickCount_()-StartTime))
End

Posted: Wed Oct 05, 2005 10:25 pm
by traumatic
You didn't forget to disable the debugger by any chance, did you?

PB's Len() is much faster here for lengths < 12 chars.


EDIT: Ah, now I see you corrected the result. :)

Posted: Wed Oct 05, 2005 11:35 pm
by Deeem2031
It should be faster?!

Ok, maybe. Also whether here are other result but:

1. It's bad code because it reads possibly7 bytes after the null-byte.
2. Debug StrLen2(@"test") -> -4 ?!?
3. This use MMX so you must call "emms" at the end of the procedure or float-operations will fail after using this proc.

EDIT: Use the StrLen-Proc from my library if you want a fast and clean procedure: http://www.deeem2031.de/PHPString

Posted: Thu Oct 06, 2005 4:14 pm
by va!n
@traumatic:
thats the point ;)

@deeem2031:
i tried a new test and as traumatic say, its a lot faster for larger strings! I dont know if the code may be bad using 7 bytes... i only read that this should be a very fast and working way...!? anyway take a look at fasm forum, for more info ;) Thanks for the tip with ""emms" at end of procedure!

Code: Select all


 
 DataSection
 STRINGTBL:
 Data.l 0, 0, 0, $FF
 Data.l 0, $FFFF, 0, $FFFFFF
 Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
 Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
 EndDataSection
 
 Procedure.l STRLen(pString.l)
  MOV eax, pString
  
  !pxor     mm1, mm1
  !MOV      ecx, eax
  !MOV      edx, eax
  !AND      ecx, -8
  !AND      eax, 7
  !movq     mm0, [ecx]
  !por      mm0, [l_stringtbl+eax*8]
  !@@:
  !ADD      ecx, 8
  !pcmpeqb  mm0, mm1
  !packsswb mm0, mm0
  !movd     eax, mm0
  !movq     mm0, [ecx]
  !TEST     eax, eax
  !JZ       @b
  
  !BSF      eax, eax
  !SHR      eax, 2
  !LEA      ecx, [ecx+eax-8]
  !SUB      ecx, edx
  !MOV eax, ecx
  
  !emms
  ProcedureReturn
 EndProcedure
 
 
 
 StartTime = GetTickCount_()
 
; beta$ = "Test"
 beta$ = "We are checking the speed of StrLen() for strings larger than 12 chars!"
 
 
 For i = 0 To 10000000
  ;
;   Len(beta$)   
  
;   Len("Test")   
;   Len("We are checking the speed of StrLen() for strings larger than 12 chars!")   
  
 ;   STRLen(@beta$)
  
;  STRLen("Test)
  STRLen(@"We are checking the speed of StrLen() for strings larger than 12 chars!")
  
 Next
 
 MessageRequester("Result:",Str(GetTickCount_()-StartTime))
 End 
here are my results:

Results using == beta$ = "Test"

Len(beta$) = 110 ms
Len("test") = 110 ms
STRLen(@beta$) = 312 ms
STRLen("Test) = 312 ms

Results using == beta$ = "We are checking the speed of StrLen() for strings larger than 12 chars!"

Len(beta$) = 953 ms
Len("test") = 953 ms
STRLen(@beta$) = 406 ms
STRLen("Test) = 313 ms

So you may see, for large strings the MMX version is a lot faster! But for small strings the original Len() command may be faster ;)

Posted: Fri Oct 07, 2005 2:01 am
by Rescator
I'm surprised that Len() varies, as far as I knew PB tracks string lengths internaly? So Len() should be same speed always.
Unless it simply lookes inside the 0-n length string for first 0 which I guess makes sense.

Which brings me to the next bit...
Could someone please modify the example codes psted here
to include a 2nd maxlen argument.

Just scanning until you hit 0 on unknown length strings is bad and ripe for security exploits.
A too long string could be anything up to 2GB, or worse..
I have no idea what would happen if the loop continues and smacks into the end of addressable memory. application crash most likely?

So please please please! *begs on his knees* add a maxlen argument to the function.
Do that and not only is it a worty alternative to the API lstrlen()
but a nice PB alternative for the API strsafe.lib StringCbLength()

Posted: Fri Oct 07, 2005 2:32 am
by Rescator
This is just a speed test example, it it NOT production safe as the routine has no maxlen limit, fixed/auto/user specified.

Also, interestingly enough PB's Len() is always faster here.
The source is based on the one above, but improved for more reliable test result. (minimum overhead for the loop, uniform loops, as static loops as possible to allow the CPU to kick in caching/optimization.)

My results where (3 seperate runs, 20% existing system load, AMD Athlon XP 1.85Ghz):

STRLen() 71chars: 4036ms
Len() 71chars: 2343ms
STRLen() 1chars: 3626ms
Len() 1chars: 941ms

STRLen() 71chars: 3995ms
Len() 71chars: 2303ms
STRLen() 1chars: 3725ms
Len() 1chars: 922ms

STRLen() 71chars: 3956ms
Len() 71chars: 2273ms
STRLen() 1chars: 3495ms
Len() 1chars: 882ms

Code: Select all

 DataSection
 STRINGTBL:
 Data.l 0, 0, 0, $FF
 Data.l 0, $FFFF, 0, $FFFFFF
 Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
 Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
 EndDataSection
 
 Procedure.l STRLen(*pString)
  MOV eax, *pString
 
  !pxor     mm1, mm1
  !MOV      ecx, eax
  !MOV      edx, eax
  !AND      ecx, -8
  !AND      eax, 7
  !movq     mm0, [ecx]
  !por      mm0, [l_stringtbl+eax*8]
  !@@:
  !ADD      ecx, 8
  !pcmpeqb  mm0, mm1
  !packsswb mm0, mm0
  !movd     eax, mm0
  !movq     mm0, [ecx]
  !TEST     eax, eax
  !JZ       @b
 
  !BSF      eax, eax
  !SHR      eax, 2
  !LEA      ecx, [ecx+eax-8]
  !SUB      ecx, edx
  !MOV eax, ecx
 
  !emms
  ProcedureReturn
 EndProcedure
 

beta1.s = "We are checking the speed of StrLen() for strings larger than 12 chars!"
beta2.s = "P"


StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
 
 StartTime = GetTickCount_()
 For i = 0 To 10000000
  testlen=STRLen(@beta1)
 Next
 EndTime=GetTickCount_()
 Debug "STRLen() "+Str(testlen)+"chars: "+Str(EndTime-StartTime)+"ms"


StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)

 StartTime = GetTickCount_()
 For i = 0 To 10000000
  testlen=Len(beta1)   
 Next
 EndTime=GetTickCount_()
 Debug "Len() "+Str(testlen)+"chars: "+Str(EndTime-StartTime)+"ms"

StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
 
 StartTime = GetTickCount_()
 For i = 0 To 10000000
  testlen=STRLen(@beta2)
 Next
 EndTime=GetTickCount_()
 Debug "STRLen() "+Str(testlen)+"chars: "+Str(EndTime-StartTime)+"ms"


StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)

 StartTime = GetTickCount_()
 For i = 0 To 10000000
  testlen=Len(beta2)   
 Next
 EndTime=GetTickCount_()
 Debug "Len() "+Str(testlen)+"chars: "+Str(EndTime-StartTime)+"ms"


 End
 
Yes it may seem a odd way to do it,
but remember. the For Next itself as well as the StartTime and EndTime
is all part of the timing so those need to be as simple and minimalistic as possible too.
I always make speed tests in a similar way to this on stuff.
The 1 sec delay is to allow the cpu and system to breathe.

Posted: Fri Oct 07, 2005 2:39 am
by GeoTrail
3 seperate runs, AMD Athlon XP 2400+ 1.99Ghz:


First run:

STRLen() 71chars: 3234ms
Len() 71chars: 1938ms
STRLen() 1chars: 3172ms
Len() 1chars: 765ms


Second run:

STRLen() 71chars: 3235ms
Len() 71chars: 1906ms
STRLen() 1chars: 3063ms
Len() 1chars: 781ms


Third run:

STRLen() 71chars: 3218ms
Len() 71chars: 1922ms
STRLen() 1chars: 3125ms
Len() 1chars: 781ms

Posted: Fri Oct 07, 2005 3:10 am
by Rescator
Very consistent results, as expected!

Now, this code on the other hand must be compiled to a exe.
When run it will do the 3 tests, and the results will be found in the file named results.txt
A message requester will pop up saying "Complete" when the test is finshed, and it's time to open results.txt with excitement :P

This time with compiled code we see a dramatic shift in the numbers,
and I'd say very inconsistent, not to mention faster. ;)
Obviously with the compiled code the CPU is fully able to cache/optimize the performance. and the numbers look way more impressive for the 10 million loop now. :shock:

STRLen() 71chars: 531ms
Len() 71chars: 1573ms
STRLen() 1chars: 431ms
Len() 1chars: 100ms

STRLen() 71chars: 571ms
Len() 71chars: 1613ms
STRLen() 1chars: 401ms
Len() 1chars: 90ms

STRLen() 71chars: 571ms
Len() 71chars: 1553ms
STRLen() 1chars: 411ms
Len() 1chars: 120ms

Code: Select all

 DataSection
 STRINGTBL:
 Data.l 0, 0, 0, $FF
 Data.l 0, $FFFF, 0, $FFFFFF
 Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
 Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
 EndDataSection
 
 Procedure.l STRLen(*pString)
  MOV eax, *pString
 
  !pxor     mm1, mm1
  !MOV      ecx, eax
  !MOV      edx, eax
  !AND      ecx, -8
  !AND      eax, 7
  !movq     mm0, [ecx]
  !por      mm0, [l_stringtbl+eax*8]
  !@@:
  !ADD      ecx, 8
  !pcmpeqb  mm0, mm1
  !packsswb mm0, mm0
  !movd     eax, mm0
  !movq     mm0, [ecx]
  !TEST     eax, eax
  !JZ       @b
 
  !BSF      eax, eax
  !SHR      eax, 2
  !LEA      ecx, [ecx+eax-8]
  !SUB      ecx, edx
  !MOV eax, ecx
 
  !emms
  ProcedureReturn
 EndProcedure
 

file=CreateFile(#PB_Any,"results.txt")

beta1.s = "We are checking the speed of StrLen() for strings larger than 12 chars!"
beta2.s = "P"

c=0
Repeat

result1.l=0
result2.l=0
result3.l=0
result4.l=0

StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
 
 StartTime = GetTickCount_()
 For i = 0 To 10000000
  testlen=STRLen(@beta1)
 Next
 EndTime=GetTickCount_()
 result1=EndTime-StartTime


StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)

 StartTime = GetTickCount_()
 For i = 0 To 10000000
  testlen=Len(beta1)   
 Next
 EndTime=GetTickCount_()
 result2=EndTime-StartTime

StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)
 
 StartTime = GetTickCount_()
 For i = 0 To 10000000
  testlen=STRLen(@beta2)
 Next
 EndTime=GetTickCount_()
 result3=EndTime-StartTime


StartTime.l=0
EndTime.l=0
testlen.l=0
Delay(1000)

 StartTime = GetTickCount_()
 For i = 0 To 10000000
  testlen=Len(beta2)   
 Next
 EndTime=GetTickCount_()
 result4=EndTime-StartTime

msg$="STRLen() "+Str(Len(beta1))+"chars: "+Str(result1)+"ms"+#CRLF$
msg$=msg$+"Len() "+Str(Len(beta1))+"chars: "+Str(result2)+"ms"+#CRLF$
msg$=msg$+"STRLen() "+Str(Len(beta2))+"chars: "+Str(result3)+"ms"+#CRLF$
msg$=msg$+"Len() "+Str(Len(beta2))+"chars: "+Str(result4)+"ms"+#CRLF$+#CRLF$


 WriteString(msg$)
 c=c+1
 Delay(1000)
Until c=3
CloseFile(file)

MessageRequester("Len/STRLen Test","Complete")

 End
 

Posted: Fri Oct 07, 2005 3:15 am
by GeoTrail
STRLen() 71chars: 515ms
Len() 71chars: 1313ms
STRLen() 1chars: 344ms
Len() 1chars: 93ms

STRLen() 71chars: 469ms
Len() 71chars: 1344ms
STRLen() 1chars: 344ms
Len() 1chars: 78ms

STRLen() 71chars: 484ms
Len() 71chars: 1313ms
STRLen() 1chars: 328ms
Len() 1chars: 94ms

Posted: Fri Oct 07, 2005 3:43 am
by Rescator
Ok, you speedfreaks are gonna get a kick out of this one!

Reduce the loops to 1 million. (otherwise this is gonna take ages as you will see)

Then change beta1 string to something REALLY long.

1 million loops used, 7999 char beta1 string. (I tried a 63999 string but the PB compiler seem to choke/bug out at source code lines that are above much longer than 8000 chars.

STRLen() 7999chars: 2744ms
Len() 7999chars: 15753ms
STRLen() 1chars: 60ms
Len() 1chars: 10ms

STRLen() 7999chars: 2904ms
Len() 7999chars: 15833ms
STRLen() 1chars: 40ms
Len() 1chars: 10ms

STRLen() 7999chars: 2744ms
Len() 7999chars: 15883ms
STRLen() 1chars: 41ms
Len() 1chars: 0ms

Very interesting. Len() is actualy very slow on long strings.
(and this was only 1 million loops rather than 10 mill as in above posts).

The 1 char reults are amusing though, I can only guess these are te result
of a CPU rebounce, or task scheduling/system optimization, or something.
(anyone got any ideas of the suddenly super fast 1char tests?)

Posted: Fri Oct 07, 2005 3:51 am
by GeoTrail
1 million loops and 7998 characters

STRLen() 7998chars: 2625ms
Len() 7998chars: 12766ms
STRLen() 1chars: 31ms
Len() 1chars: 0ms

STRLen() 7998chars: 2313ms
Len() 7998chars: 12828ms
STRLen() 1chars: 31ms
Len() 1chars: 0ms

STRLen() 7998chars: 2328ms
Len() 7998chars: 13141ms
STRLen() 1chars: 31ms
Len() 1chars: 16ms

Posted: Fri Oct 07, 2005 4:14 am
by Rescator
Awsome Geo. Well I guess that we proved that Len() is sloooow, and screaming in pain the larger the strings are.

While the MMX code seems to be very consistent,
actually since the MMX code is so consistent comparing 1 char and xxxx chars, my guess is that the main bottleneck is calling the procedure itself.

In other words, the MMX code has way more juice still before maxing out.
while with Len() it seems the function call itself is a drop in the sea compared to it's actual code speed.

Hmm! I wonder how a non-MMX ASM procedure would compare against the Len() and MMX ones?

Posted: Fri Oct 07, 2005 2:15 pm
by va!n
a little modified version... just change the lMaxChars value for different speed comparsion...

Code: Select all

 DataSection
 STRINGTBL:
 Data.l 0, 0, 0, $FF
 Data.l 0, $FFFF, 0, $FFFFFF
 Data.l 0, $FFFFFFFF, $FF, $FFFFFFFF
 Data.l $FFFF, $FFFFFFFF, $FFFFFF, $FFFFFFFF
 EndDataSection
 
 Procedure.l STRLen(*pString)
  MOV eax, *pString
  
  !pxor     mm1, mm1
  !MOV      ecx, eax
  !MOV      edx, eax
  !AND      ecx, -8
  !AND      eax, 7
  !movq     mm0, [ecx]
  !por      mm0, [l_stringtbl+eax*8]
  !@@:
  !ADD      ecx, 8
  !pcmpeqb  mm0, mm1
  !packsswb mm0, mm0
  !movd     eax, mm0
  !movq     mm0, [ecx]
  !TEST     eax, eax
  !JZ       @b
  
  !BSF      eax, eax
  !SHR      eax, 2
  !LEA      ecx, [ecx+eax-8]
  !SUB      ecx, edx
  !MOV eax, ecx
  
  !emms
  ProcedureReturn
 EndProcedure
 
 
 lMaxChars = 500
 
 file=CreateFile(#PB_Any,"c:\results.txt")
 
 RandomSeed(23) 

 For i = 1 To lMaxChars
  newchar.s = Chr(32+Random(94))
  beta1.s = beta1.s + newchar
 Next
  

 beta2.s = "P"
 
 c=0
 Repeat
  
  result1.l=0
  result2.l=0
  result3.l=0
  result4.l=0
  
  StartTime.l=0
  EndTime.l=0
  testlen.l=0
;  Delay(1000)
  
  StartTime = GetTickCount_()
  For i = 0 To 10000000
   testlen=STRLen(@beta1)
  Next
  EndTime=GetTickCount_()
  result1=EndTime-StartTime
  
  
  StartTime.l=0
  EndTime.l=0
  testlen.l=0
;  Delay(1000)
  
  StartTime = GetTickCount_()
  For i = 0 To 10000000
   testlen=Len(beta1)   
  Next
  EndTime=GetTickCount_()
  result2=EndTime-StartTime
  
  StartTime.l=0
  EndTime.l=0
  testlen.l=0
;  Delay(1000)
  
  StartTime = GetTickCount_()
  For i = 0 To 10000000
   testlen=STRLen(@beta2)
  Next
  EndTime=GetTickCount_()
  result3=EndTime-StartTime
  
  
  StartTime.l=0
  EndTime.l=0
  testlen.l=0
 ; Delay(1000)
  
  StartTime = GetTickCount_()
  For i = 0 To 10000000
   testlen=Len(beta2)   
  Next
  EndTime=GetTickCount_()
  result4=EndTime-StartTime
  
  msg$="STRLen() "+Str(Len(beta1))+"chars: "+Str(result1)+"ms"+#CRLF$
  msg$=msg$+"Len() "+Str(Len(beta1))+"chars: "+Str(result2)+"ms"+#CRLF$
  msg$=msg$+"STRLen() "+Str(Len(beta2))+"chars: "+Str(result3)+"ms"+#CRLF$
  msg$=msg$+"Len() "+Str(Len(beta2))+"chars: "+Str(result4)+"ms"+#CRLF$+#CRLF$
  
  
  WriteString(msg$)
  c=c+1
;  Delay(1000)
 Until c=3
 CloseFile(file)
 
 MessageRequester("Len/STRLen Test","Complete")
 
 End