Page 1 of 1

CompareMemory() optimizations

Posted: Thu Apr 27, 2023 1:12 am
by technopol
I imagine CompareMemory() stops at the first data difference but does it?
Also, is it quadword optimized (using padding for bytelenght capability)?
What is faster?

Code: Select all

success = CompareMemory(*passBuf,*netBuf,passLen)
OR

Code: Select all

;( passLen already a multiple of 8 )
success = #True
For i = 0 To passLen - 8 Step 8
  If PeekQ(*passBuf+i) <> PeekQ(*netBuf+i)
    success = #False : Break
  EndIf
Next i
// Code Tags added (Kiffi)

Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 1:32 am
by jacdelad
Please use code tags and provide runable code!

CompareMemory stops as soon as further comparison isn't necessary (=the first difference).

Code: Select all

#Cycles=1000
*Mem1=AllocateMemory(1048576)
*Mem2=AllocateMemory(1048576)
For i=0 To 1048575
  RandomByte=Random(255,0)
  PokeA(*Mem1+i,RandomByte)
  PokeA(*Mem2+i,RandomByte)
Next

Timer=ElapsedMilliseconds()
For counter=1 To #Cycles
  result=CompareMemory(*Mem1,*Mem2,1048576)
Next
Timer1=ElapsedMilliseconds()-Timer

Timer=ElapsedMilliseconds()
For counter=1 To #Cycles
  success=#True
  For i=0 To 1048575 Step 8
    If PeekQ(*Mem1+i)<>PeekQ(*Mem2+i)
      success=#False
      Break
    EndIf
  Next
Next
Timer2=ElapsedMilliseconds()-Timer

MessageRequester("Result","Result:"+#CRLF$+"CompareMemory: "+Str(Timer1)+"ms"+#CRLF$+"Own function: "+Str(Timer2)+"ms",#PB_MessageRequester_Info)
The results speak for themselves.

Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 1:59 am
by idle
jacdelad wrote: Thu Apr 27, 2023 1:32 am Please use code tags and provide runable code!

CompareMemory stops as soon as further comparison isn't necessary (=the first difference).

Code: Select all

#Cycles=1000
*Mem1=AllocateMemory(1048576)
*Mem2=AllocateMemory(1048576)
For i=0 To 1048575
  RandomByte=Random(255,0)
  PokeA(*Mem1+i,RandomByte)
  PokeA(*Mem2+i,RandomByte)
Next

Timer=ElapsedMilliseconds()
For counter=1 To #Cycles
  result=CompareMemory(*Mem1,*Mem2,1048576)
Next
Timer1=ElapsedMilliseconds()-Timer

Timer=ElapsedMilliseconds()
For counter=1 To #Cycles
  success=#True
  For i=0 To 1048575 Step 8
    If PeekQ(*Mem1+i)<>PeekQ(*Mem2+i)
      success=#False
      Break
    EndIf
  Next
Next
Timer2=ElapsedMilliseconds()-Timer

MessageRequester("Result","Result:"+#CRLF$+"CompareMemory: "+Str(Timer1)+"ms"+#CRLF$+"Own function: "+Str(Timer2)+"ms",#PB_MessageRequester_Info)
The results speak for themselves.
I didn't know it did that! :shock:

Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 3:07 am
by technopol
Thanks a million times for the quick and very elaborate reply!

I'm sorry for the unrunable code, it was just part of the question, not meant to be run. But even if I'm an avid PureBasic user since 1998, it's the first time I read about code tags (not a big user of the forum; I know I should be). Where can I find docs about the forum's tags? phpBB?
jacdelad wrote: Thu Apr 27, 2023 1:32 am Please use code tags and provide runable code!

CompareMemory stops as soon as further comparison isn't necessary (=the first difference).

Code: Select all

#Cycles=1000
*Mem1=AllocateMemory(1048576)
*Mem2=AllocateMemory(1048576)
For i=0 To 1048575
  RandomByte=Random(255,0)
  PokeA(*Mem1+i,RandomByte)
  PokeA(*Mem2+i,RandomByte)
Next

Timer=ElapsedMilliseconds()
For counter=1 To #Cycles
  result=CompareMemory(*Mem1,*Mem2,1048576)
Next
Timer1=ElapsedMilliseconds()-Timer

Timer=ElapsedMilliseconds()
For counter=1 To #Cycles
  success=#True
  For i=0 To 1048575 Step 8
    If PeekQ(*Mem1+i)<>PeekQ(*Mem2+i)
      success=#False
      Break
    EndIf
  Next
Next
Timer2=ElapsedMilliseconds()-Timer

MessageRequester("Result","Result:"+#CRLF$+"CompareMemory: "+Str(Timer1)+"ms"+#CRLF$+"Own function: "+Str(Timer2)+"ms",#PB_MessageRequester_Info)
The results speak for themselves.

Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 3:31 am
by jacdelad
The editor itself offers them, right next to Bold/Italic/Underline etc.

Avid user since 1998 and only 15 posts? What have you done all the time in your cave?

Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 6:30 am
by technopol
Because of the industrial secrets and patent applications not yet done, I just couldn't share anything from my work as I always work on my inventions. Like trunked code I just shared I had to extract, simplify and even change all variable names. And since I started programming some 53 years ago on this monster https://www.hpmuseum.org/hp9100.htm, then Basic and Assembler 10 years after on an army of ZX80-81, TS1000 which I mostly destroyed with my own expansion boards; I very rarely (or never) need advice; only more infos. And because of my Asperger syndrome, I'm not much into giving advices as I always work (and live) alone.

But I can name a few stuff I have specifically programmed with PureBasic (if Fred knew all I've done with his software, he would ask my for a few millions):
On Amiga (most of them with added routines I wrote with HiSoft's Devpac Assembler)
- (1998) HoloEmulator™ holographic simulator with 3D shutter glasses, 60fps virtual reality screen update, 1.3 sec full 30 views 3D image loading, DCTV output
- (1999) VirtualScanLab™ automated 3D object/hologram scanner/cataloger, 30" robotic rotating plate with my custom stepper driver, VLab Motion Amiga board, linked to the HoloEmulator
- (1999) Comfy3D Compositing (unfinished) automated animation gel scanner and 3D animator/compositor for ALL old Disney animation for Disney Channel US
- (1999) an unamed high speed 3D and/or animated lenticular image interlacer (for lithographic printing directly on PETE lenticular plastics)
- (2000) DRIP™ (Dense Raster image Processor) for FULL COLOR LOW NOISE ZERO MOIRE ultra-high definition lithographic printing with pure 2D 600x1200 pixels per inch resolution, or 3D 150x4800 ppi for encoding uptp 63 images on 75.5 lpi lenticular plastic lenses, patent accepted under my name in US and Europe (brought me more Disney business and new customers: Sony, Microsoft, Universal Studio, Lucas Film, TV Guide, The Simpsons, Amex, US Army, Warner Bros., Paramount, Toyota, BASF, Malboro, Audi, Kellog, PepsiCo, The Matrix, The Mummy, The Mummy Returns,...; won 1st prize at PIA contest with it)
- (2002 on Amiga emulator running on big PCs) DRIP™ Composite, like Quark Xpress but with 0.000104" or 9600dpi image placement resolution (100x more)
- fully automated robotic 360°camera mount for Canon EOS 5D mk III taking 36 images per Google street view images spot with stichless ultra-HDR (with image merge done in the linear domain before logarithmic curve) software, the program generates a stereo soundtack the control the step motor and the camera shutter with a radically simple batterie powered electronic circuit and a little mp3 player.
- ...
On Dec Alpha 21164 WindowsNT (Dual 21164 CPU with 128-bit memory path monsters!)
- ...
And on PCs!!... ...very late, need to sleep; sorry the rest of story one day (and I'm not taking about stuff done without PureBasic, on other plateforms and in other fields then computer science like analog and digital electronics, electroacoustics, optics, music, synthetizers, sound studios, photography, NLE video editing, audio FX, agrotechnology, AI systems, light sequencer/controller/driver/installations, and my eyes are closed)

Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 8:18 am
by IceSoft
original:
CompareMemory: 97ms
Own function: 423ms
Little bit optimized (but PB bottle neck).
CompareMemory: 98ms
Own function: 262ms

The big bottle neck is this part:
*Mem11.quads = *Mem1+i
*Mem22.quads = *Mem2+i

If *Mem11\quad<>*Mem22\quad
@fred
maybe a direct use of this kind of source will be a performance hup:
If *Mem11+i\quad<>*Mem22+i\quad
Here the faster version but has the bottle neck

Code: Select all

#Cycles=1000


Structure quads
  quad.q
EndStructure



*Mem1=AllocateMemory(1048576)
*Mem2=AllocateMemory(1048576)
For i=0 To 1048575
  RandomByte=Random(255,0)
  PokeA(*Mem1+i,RandomByte)
  PokeA(*Mem2+i,RandomByte)
Next

Timer=ElapsedMilliseconds()
For counter=1 To #Cycles
  result=CompareMemory(*Mem1,*Mem2,1048576)
Next
Debug result
Timer1=ElapsedMilliseconds()-Timer

Timer=ElapsedMilliseconds()
For counter=1 To #Cycles
  success=#True
  For i=0 To 1048575 Step 8
    *Mem11.quads = *Mem1+i
    *Mem22.quads = *Mem2+i

    If *Mem11\quad<>*Mem22\quad
      success=#False
      Break
    EndIf
  Next
Next
Timer2=ElapsedMilliseconds()-Timer

MessageRequester("Result","Result:"+#CRLF$+"CompareMemory: "+Str(Timer1)+"ms"+#CRLF$+"Own function: "+Str(Timer2)+"ms",#PB_MessageRequester_Info)

Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 8:42 am
by Fred
I checked the current CompareMemory() code and we didn't used memcmp() which is twice faster than our custom code. I changed it for the next beta.

Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 11:38 am
by idle
if memcmp() is twice as fast as what you currently use then you can probably get it 4 times faster using SSE
the mcmp() function here runs same time as memcmp()
compile this with c backend with optimization. The mcmp function isn't complete, it would need a jmp table to account for remainders.
note the prototypes only needed to stop the c optimization from replacing the function call with the result.
CompareMemory 94 mcmp 45 *a=*b:1048576 *a<>*c: 0

Code: Select all


EnableExplicit 

Global *a,*b,*c,size,lp,a 

size = 1024*1024 
lp = 1000 

*a=AllocateMemory(size)  
*b=AllocateMemory(size) 
*c=AllocateMemory(size) 

RandomSeed(1)
RandomData(*a,size)    ;*a = *b 
RandomSeed(1)
RandomData(*b,size)
RandomData(*c,size)    ;*a <> *c     

ImportC "" 
  memcmp(*a,*b,size) 
EndImport   

Procedure mcmp(*a,*b,size) 
    Protected pt,*pa.quad,*pb.quad  
    *pa = *a 
    *pb = *b
    While (*pa\q ! *pb\q) = 0 
      *pa+8
      *pb+8 
      pt+8 
      If pt = size 
        ProcedureReturn size  
      EndIf   
    Wend 
EndProcedure   

Prototype pmcmp(*a,*b,size)
Global pmcmp.pmcmp = @mcmp() 

Global st,st1,et,et1 

st = ElapsedMilliseconds() 
For a = 0 To lp 
  CompareMemory(*a,*b,size)
  ;memcmp(*a,*b,size)
Next 
et = ElapsedMilliseconds() 

st1  = ElapsedMilliseconds() 
For a = 0 To lp 
  pmcmp(*a,*b,size) 
Next 
et1 = ElapsedMilliseconds() 

Global out.s = " CompareMemory " + Str(et-st) + " mcmp " + Str(et1-st1) + " *a=*b:" + Str(mcmp(*a,*b,size)) + " *a<>*c: " + Str(mcmp(*a,*c,size))

SetClipboardText(out)
MessageRequester("times",out) 


Re: CompareMemory() optimizations

Posted: Thu Apr 27, 2023 6:55 pm
by technopol
Bonne nouvelle! Great news! Merci Fred.
Fred wrote: Thu Apr 27, 2023 8:42 am I checked the current CompareMemory() code and we didn't used memcmp() which is twice faster than our custom code. I changed it for the next beta.