CompareMemory 4*as fast as PB´s ;)

Share your advanced PureBasic knowledge/code with the community.
Franky
Enthusiast
Enthusiast
Posts: 213
Joined: Sat Apr 26, 2003 2:58 pm

CompareMemory 4*as fast as PB´s ;)

Post by Franky »

Run it without debugger, else it takes very long ;)

Code: Select all

Procedure O_CompareMemory(wert1.l,wert2.l,l.l)
;Erstmal die Werte in Eax und Ebx setzen
MOV eax,wert2
MOV ebx,wert1

;dann die länge in ECX und EDX laden
MOV ecx,l

;EDX erhält den wert von EDX%4
MOV edx,l
Or edx,-4
XOR edx,-4

;ECX wird durch 4 geteilt
SAR ecx,2

;Um ne endlosschleife zu verhindern, gucken ob ecx=0, dann direkt zum EDX-Part springen
CMP ecx,0
JE l_o_cm_middle

;Wenn ecx>0, wird immer Longweise, also 4 Byte auf einmal verglichen
O_CM_Begin:
  MOV edi,DWord[eax]
  CMP edi,DWord[ebx]
  JNE l_o_cm_unlike
 
  ADD eax,4
  ADD ebx,4 
  DEC ecx
JNZ l_o_cm_begin


o_cm_middle:

;Direkt weg, wenn net Modulo 4
CMP edx,0
JE l_o_cm_is
MOV ecx,0
;Hier werden die übrigen bis zu 3 Byte einzeln verglichen
O_CM_Begin_B:
  MOV cl,byte[eax]
  CMP cl,byte[ebx]
  JNE l_o_cm_unlike
 
  INC eax
  INC ebx 
  DEC edx
JNZ l_o_cm_begin_b
O_cm_is:
  ProcedureReturn 1
 
O_CM_Unlike:
  ProcedureReturn 0 
       
EndProcedure 
And a little example

Code: Select all

;Test 1 228 Byte

text1.s="Das ist ein seehr langer text mit etwa 200 Zeichen, ok, noch nicht ganz, kann aber bald was werden, wenn ich nur lange mit dem Tippen von Buchstaben in dieses Fenster fortfahre und mir nicht einreden lasse, das mache keinen Sinn"
text2.s="Das ist ein seehr langer text mit etwa 200 Zeichen, ok, noch nicht ganz, kann aber bald was werden, wenn ich nur lange mit dem Tippen von Buchstaben in dieses Fenster fortfahre und mir nicht einreden lasse, das mache keinen Sinn"
x=GetTickCount_()
For a=1 To 100000
     l1=CompareMemory(text1,text2,228)
Next
x2=GetTickCount_()
For a=1 To 100000
     l2=o_CompareMemory(@text1,@text2,228)
Next


MessageRequester("Ergebnis:","1.) "+Str(x2-x)+" , "+Str(l1)+Chr(13)+"2.) "+Str(GetTickCount_()-x2)+" , "+Str(l2))     


;Test 2, 400 kb

adra=AllocateMemory(400003)
adrb=AllocateMemory(400003)

For a=0 To 400002
       PokeB(adra+a,Random(255))
Next

CopyMemory(adra,adrb,400003)

PokeB(adra+400002,19)
x=GetTickCount_()
For a=1 To 10000
     l1=CompareMemory(adra,adrb,400003)
Next
x2=GetTickCount_()
For a=1 To 10000
     l2=o_CompareMemory(adra,adrb,400003)
Next


MessageRequester("Ergebnis:","1.) "+Str(x2-x)+" , "+Str(l1)+Chr(13)+"2.) "+Str(GetTickCount_()-x2)+" , "+Str(l2))     
Sorry for the german comments, see my signature ;)
Give Up everything but trying!
va!n
Addict
Addict
Posts: 1104
Joined: Wed Apr 20, 2005 12:48 pm

Post by va!n »

Nice work Franky! Here are my results

i added and used RandomSeed to get at every run the same strings to compare, used RandomSeed(23):

Test1: 180, 1 : 0, 1
Test2: 31906, 0 : 3469, 0

Coool!
va!n aka Thorsten

Intel i7-980X Extreme Edition, 12 GB DDR3, Radeon 5870 2GB, Windows7 x64,
va!n
Addict
Addict
Posts: 1104
Joined: Wed Apr 20, 2005 12:48 pm

Post by va!n »

@franky:
just found this source (ASM)... maybe it may help you and possible faster!? just check it out...
http://www.codeproject.com/string/optimize_strings.asp
va!n aka Thorsten

Intel i7-980X Extreme Edition, 12 GB DDR3, Radeon 5870 2GB, Windows7 x64,
Hi-Toro
Enthusiast
Enthusiast
Posts: 270
Joined: Sat Apr 26, 2003 3:23 pm

PB4 fix?

Post by Hi-Toro »

Any asm coders know how to fix this for PB4? I get a syntax error on "XOR edx,-4" here, and a 4* speedup would be very useful indeed for my current project! Hopefully it's just that line, as it compiles without error if I comment that out...
James Boyd
http://www.hi-toro.com/
Death to the Pixies!
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

Hi-Toro wrote:Any asm coders know how to fix this for PB4? I get a syntax error on "XOR edx,-4" here, and a 4* speedup would be very useful indeed for my current project! Hopefully it's just that line, as it compiles without error if I comment that out...
I had a problem with the line before it too. Just put an "!" before each line that isn't recognized as inline ASM.
Pupil
Enthusiast
Enthusiast
Posts: 715
Joined: Fri Apr 25, 2003 3:56 pm

Post by Pupil »

Just put a '!' as first character on that line and you should be good to go.
Hi-Toro
Enthusiast
Enthusiast
Posts: 270
Joined: Sat Apr 26, 2003 3:23 pm

Post by Hi-Toro »

Hi Demivec/Pupil,

Many thanks for the quick response! Is this because 'XOr' is a PB keyword? If so, any idea why 'Or' doesn't cause the same problem?

It doesn't *really* matter to me, as the fixed !Xor works fine (saves half a second on my duplicate file checker with certain test folders!), but I'm just curious.

Anyway, thanks again for the fix! Much appreciated.
James Boyd
http://www.hi-toro.com/
Death to the Pixies!
Derek
Addict
Addict
Posts: 2354
Joined: Wed Apr 07, 2004 12:51 am
Location: England

Post by Derek »

Can't seem to get it to not work on mine, anyway, you should read the manual concerning inline assembler as the use of ! affects the way that variables are addressed in assembler commands.
Hi-Toro
Enthusiast
Enthusiast
Posts: 270
Joined: Sat Apr 26, 2003 3:23 pm

Post by Hi-Toro »

Thanks, Derek. I really don't have the mental capacity for asm coding, so the relevant manual entry doesn't really mean much to me! Not a problem, anyway, as the 'fix' is working fine for me. I think that file/folder parsing may be more of a target for optimisation now, rather than the memory check.
James Boyd
http://www.hi-toro.com/
Death to the Pixies!
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

Here's a translation of Franky's code and comments:

Code: Select all

Procedure O_CompareMemory(value1.l,value2.l,l.l)
  ; Copy the values into Eax and Ebx
  MOV Eax,value2
  MOV Ebx,value1
  
  ; then load the length into ECX and EDX
  MOV Ecx,l
  
  ; EDX holds value of EDX%4 (remainder 0-3)
  MOV Edx,l
  OR Edx,-4
  XOR Edx,-4
  
  ; ECX is divided by 4
  SAR Ecx,2
  
  ; In order to prevent an infinite loop, if ecx=0, then jump directly to the EDX part
  CMP Ecx,0
  JE l_o_cm_middle
  
  ; If ecx>0, 4 bytes at one time will be compared
  O_CM_Begin:
  MOV Edi,DWord[Eax]
  CMP Edi,DWord[Ebx]
  JNE l_o_cm_unlike
  
  ADD Eax,4
  ADD Ebx,4
  DEC Ecx
  JNZ l_o_cm_begin
  
  
  o_cm_middle:
  
  ; Already finished if length is an even modulo 4
  CMP Edx,0
  JE l_o_cm_is
  MOV Ecx,0
  ; Here the remaining bytes, up to 3, are compared separately
  O_CM_Begin_B:
  MOV cl,byte[Eax]
  CMP cl,byte[Ebx]
  JNE l_o_cm_unlike
  
  INC Eax
  INC Ebx
  DEC Edx
  JNZ l_o_cm_begin_b
  O_cm_is:
  ProcedureReturn 1
  
  O_CM_Unlike:
  ProcedureReturn 0
  
EndProcedure
breeze4me
Enthusiast
Enthusiast
Posts: 633
Joined: Thu Mar 09, 2006 9:24 am
Location: S. Kor

Post by breeze4me »

shorter code. :wink:

Code: Select all

Procedure bz_CompareMemory(*src, *dst, len)
  ;CLD
  PUSH esi
  PUSH edi
  MOV esi, [p.p_src + 8]
  MOV edi, [p.p_dst + 8]
  MOV eax, [p.v_len + 8]
  
  MOV ecx, eax
  AND eax, 3    ; len % 4
  SHR ecx, 2    ; len / 4
  
  XOR edx, edx  ;set ZF
  
  REPE CMPSD
  !JNE .not_eq
  
  MOV ecx, eax
  XOR edx, edx  ;set ZF
  REPE CMPSB
  !JNE .not_eq
  POP edi
  POP esi
  ProcedureReturn 1
  
  !.not_eq:
  POP edi
  POP esi
  ProcedureReturn 0
EndProcedure

example.
almost same speed or a bit slower :?:

Code: Select all

DisableDebugger

SetPriorityClass_(GetCurrentProcess_(), #REALTIME_PRIORITY_CLASS) 
SetThreadPriority_(GetCurrentThread_(), #THREAD_PRIORITY_TIME_CRITICAL) 

Procedure O_CompareMemory(value1.l,value2.l,l.l) 
  ; Copy the values into Eax and Ebx 
  MOV Eax,value2 
  MOV Ebx,value1 
  
  ; then load the length into ECX and EDX 
  MOV Ecx,l 
  
  ; EDX holds value of EDX%4 (remainder 0-3) 
  MOV Edx,l 
  OR Edx,-4 
  XOR Edx,-4 
  
  ; ECX is divided by 4 
  SAR Ecx,2 
  
  ; In order to prevent an infinite loop, if ecx=0, then jump directly to the EDX part 
  CMP Ecx,0 
  JE l_o_cm_middle 
  
  ; If ecx>0, 4 bytes at one time will be compared 
  O_CM_Begin: 
  MOV Edi,DWord[Eax] 
  CMP Edi,DWord[Ebx] 
  JNE l_o_cm_unlike 
  
  ADD Eax,4 
  ADD Ebx,4 
  DEC Ecx 
  JNZ l_o_cm_begin 
  
  
  o_cm_middle: 
  
  ; Already finished if length is an even modulo 4 
  CMP Edx,0 
  JE l_o_cm_is 
  MOV Ecx,0 
  ; Here the remaining bytes, up to 3, are compared separately 
  O_CM_Begin_B: 
  MOV cl,byte[Eax] 
  CMP cl,byte[Ebx] 
  JNE l_o_cm_unlike 
  
  INC Eax 
  INC Ebx 
  DEC Edx 
  JNZ l_o_cm_begin_b 
  O_cm_is: 
  ProcedureReturn 1 
  
  O_CM_Unlike: 
  ProcedureReturn 0 
  
EndProcedure 

Procedure bz_CompareMemory(*src, *dst, len)
  ;CLD
  PUSH esi
  PUSH edi
  MOV esi, [p.p_src + 8]
  MOV edi, [p.p_dst + 8]
  MOV eax, [p.v_len + 8]
  
  MOV ecx, eax
  AND eax, 3    ; len % 4
  SHR ecx, 2    ; len / 4
  
  XOR edx, edx  ;set ZF
  
  REPE CMPSD
  !JNE .not_eq
  
  MOV ecx, eax
  XOR edx, edx  ;set ZF
  REPE CMPSB
  !JNE .not_eq
  POP edi
  POP esi
  ProcedureReturn 1
  
  !.not_eq:
  POP edi
  POP esi
  ProcedureReturn 0
EndProcedure


;Test 1 228 Byte 

text1.s="Das ist ein seehr langer text mit etwa 200 Zeichen, ok, noch nicht ganz, kann aber bald was werden, wenn ich nur lange mit dem Tippen von Buchstaben in dieses Fenster fortfahre und mir nicht einreden lasse, das mache keinen Sinn" 
text2.s="Das ist ein seehr langer text mit etwa 200 Zeichen, ok, noch nicht ganz, kann aber bald was werden, wenn ich nur lange mit dem Tippen von Buchstaben in dieses Fenster fortfahre und mir nicht einreden lasse, das mache keinen Sinn" 

x=GetTickCount_() 
For a=1 To 10000000 
  l1=O_CompareMemory(@text1,@text2,228) 
Next 
x2=GetTickCount_() 
For a=1 To 10000000 
  l2=bz_CompareMemory(@text1,@text2,228) 
Next 
x3=GetTickCount_()
MessageRequester("Ergebnis:", "O.  "+Str(x2-x)+" , "+Str(l1)+Chr(13)+"bz. "+Str(x3-x2)+" , "+Str(l2))      

x=GetTickCount_() 
For a=1 To 10000000 
  l1=bz_CompareMemory(@text1,@text2,228) 
Next 
x2=GetTickCount_() 
For a=1 To 10000000 
  l2=O_CompareMemory(@text1,@text2,228) 
Next 
x3=GetTickCount_()
MessageRequester("Ergebnis:", "bz. "+Str(x2-x)+" , "+Str(l1)+Chr(13)+"O.  "+Str(x3-x2)+" , "+Str(l2))      




;Test 2, 400 kb 

adra=AllocateMemory(400003) 
adrb=AllocateMemory(400003) 
For a=0 To 400002 
  PokeB(adra+a,Random(255)) 
Next 
CopyMemory(adra,adrb,400003) 
PokeB(adra+400002,19) 


x=GetTickCount_() 
For a=1 To 10000 
  l1=O_CompareMemory(adra,adrb,400003) 
Next 
x2=GetTickCount_() 
For a=1 To 10000 
  l2=bz_CompareMemory(adra,adrb,400003) 
Next 
x3=GetTickCount_()
MessageRequester("Ergebnis:","O.   "+Str(x2-x)+" , "+Str(l1)+Chr(13)+"bz. "+Str(x3-x2)+" , "+Str(l2))      

x=GetTickCount_() 
For a=1 To 10000 
  l1=bz_CompareMemory(adra,adrb,400003) 
Next 
x2=GetTickCount_() 
For a=1 To 10000 
  l2=O_CompareMemory(adra,adrb,400003) 
Next 
x3=GetTickCount_()
MessageRequester("Ergebnis:", "bz. "+Str(x2-x)+" , "+Str(l1)+Chr(13)+"O.  "+Str(x3-x2)+" , "+Str(l2))
Edit: some PUSH/POP added.
Last edited by breeze4me on Mon May 02, 2011 2:43 pm, edited 2 times in total.
Hi-Toro
Enthusiast
Enthusiast
Posts: 270
Joined: Sat Apr 26, 2003 3:23 pm

Post by Hi-Toro »

Thanks guys! Your code seems to run at exactly the same speed, breeze4me, but I've switched to it just because it looks neater in my program... :)
James Boyd
http://www.hi-toro.com/
Death to the Pixies!
User avatar
Demivec
Addict
Addict
Posts: 4270
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

breeze4me wrote:shorter code. :wink:
Thanks for updating Franky's code. I don't see why this can't be made as a change to the way PureBasic does this, since it seems so straight forward.
User avatar
Joakim Christiansen
Addict
Addict
Posts: 2452
Joined: Wed Dec 22, 2004 4:12 pm
Location: Norway
Contact:

Post by Joakim Christiansen »

Demivec wrote:I don't see why this can't be made as a change to the way PureBasic does this, since it seems so straight forward.
Agreed, if anyone submits code that works faster than the inbuilt commands then the PureBasic team should use it.
I like logic, hence I dislike humans but love computers.
Post Reply