Optimizing rotations for quad
- netmaestro
- PureBasic Bullfrog
- Posts: 8433
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: Optimizing rotations for quad
@wilbert: After quite some tests my results show, for the full range of rotations 0-63, your latest is beating everything so far by a minimum of 8%. That is a significant improvement
BERESHEIT
Re: Optimizing rotations for quad
Thanks for letting me know Netmaestro
What probably is the greatest difference, is the swap from eax and edx that you did with three instructions while I simply switched the place eax and edx where loaded from.
What surprised me with my code that uses push / pop ebx is the impact of where they are placed. I don't know much yet about optimizing but having the push and pop so close together without any instruction in between that accesses memory seemed faster compared to placing the push at the beginning and the pop at the end of the function.
A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.
What probably is the greatest difference, is the swap from eax and edx that you did with three instructions while I simply switched the place eax and edx where loaded from.
What surprised me with my code that uses push / pop ebx is the impact of where they are placed. I don't know much yet about optimizing but having the push and pop so close together without any instruction in between that accesses memory seemed faster compared to placing the push at the beginning and the pop at the end of the function.
A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.
Re: Optimizing rotations for quad
wilbert wrote: A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.
Code: Select all
!push ecx
!shr ecx,2
!rep movsd
!pop ecx
!and ecx,3
!rep movsb
Pretty basic code, but on Core i7 it's the fastest. I guess the CPU recognizes the algo and switch to a build in fast memory copy algo. On older CPU's using the SSE registers with prefetching is much faster, on Core i7 this simple code beats SSE.
Re: Optimizing rotations for quad
I made any tests and this was the fastest (no jumps!):
A test with "xchg eax,edx" was not faster. I use an Intel i7-2600; maybe is this code not faster on an older cpu. You can test it !
Helle
Code: Select all
Procedure.q Rotr64_(val.q, n)
!mov eax,[esp + 4]
!mov edx,[esp + 8]
!mov ecx,[esp + 12]
!test ecx,100000b ;test is my favorite ;-)
!cmovnz eax,edx
!cmovnz edx,[esp + 4]
!push ebx
!mov ebx, eax
!shrd eax, edx, cl
!shrd edx, ebx, cl
!pop ebx
ProcedureReturn
EndProcedure
Helle
- netmaestro
- PureBasic Bullfrog
- Posts: 8433
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: Optimizing rotations for quad
It's very fast but on my rather weak machine (1.8ghz Intel E2160) it's losing to wilbert's latest by 5%. It's cool code, I'm still trying to figure out how it works.
BERESHEIT