PureBasic Forums - English

Posted: **Wed Aug 24, 2011 6:56 pm**

@wilbert: After quite some tests my results show, for the full range of rotations 0-63, your latest is beating everything so far by a minimum of 8%. That is a significant improvement

Posted: **Wed Aug 24, 2011 7:07 pm**

Thanks for letting me know Netmaestro

What probably is the greatest difference, is the swap from eax and edx that you did with three instructions while I simply switched the place eax and edx where loaded from.
What surprised me with my code that uses push / pop ebx is the impact of where they are placed. I don't know much yet about optimizing but having the push and pop so close together without any instruction in between that accesses memory seemed faster compared to placing the push at the beginning and the pop at the end of the function.

A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.

Posted: **Wed Aug 24, 2011 8:28 pm**

wilbert wrote: A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.

Code: Select all

!push ecx
!shr ecx,2
!rep movsd
!pop ecx
!and ecx,3
!rep movsb

Size of memory block to copy goes into ecx. Source address goes into esi and destination address into edi.
Pretty basic code, but on Core i7 it's the fastest. I guess the CPU recognizes the algo and switch to a build in fast memory copy algo. On older CPU's using the SSE registers with prefetching is much faster, on Core i7 this simple code beats SSE.

Posted: **Wed Aug 24, 2011 8:45 pm**

I made any tests and this was the fastest (no jumps!):

Code: Select all

Procedure.q Rotr64_(val.q, n)
  !mov eax,[esp + 4]
  !mov edx,[esp + 8]
  !mov ecx,[esp + 12]
  !test ecx,100000b     ;test is my favorite ;-) 
  !cmovnz eax,edx
  !cmovnz edx,[esp + 4]
  !push ebx
  !mov ebx, eax
  !shrd eax, edx, cl
  !shrd edx, ebx, cl
  !pop ebx
  ProcedureReturn
EndProcedure

A test with "xchg eax,edx" was not faster. I use an Intel i7-2600; maybe is this code not faster on an older cpu. You can test it

!
Helle

Posted: **Wed Aug 24, 2011 9:28 pm**

It's very fast but on my rather weak machine (1.8ghz Intel E2160) it's losing to wilbert's latest by 5%. It's cool code, I'm still trying to figure out how it works.

PureBasic Forums - English

Optimizing rotations for quad

Re: Optimizing rotations for quad

Re: Optimizing rotations for quad

Re: Optimizing rotations for quad

Re: Optimizing rotations for quad

Re: Optimizing rotations for quad