It is currently Tue Apr 07, 2020 9:04 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 20 posts ]  Go to page Previous  1, 2
Author Message
 Post subject: Re: Optimizing rotations for quad
PostPosted: Wed Aug 24, 2011 6:56 pm 
Offline
PureBasic Bullfrog
PureBasic Bullfrog
User avatar

Joined: Wed Jul 06, 2005 5:42 am
Posts: 8048
Location: Fort Nelson, BC, Canada
@wilbert: After quite some tests my results show, for the full range of rotations 0-63, your latest is beating everything so far by a minimum of 8%. That is a significant improvement 8)

_________________
Veni, vidi, vici.


Top
 Profile  
Reply with quote  
 Post subject: Re: Optimizing rotations for quad
PostPosted: Wed Aug 24, 2011 7:07 pm 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3606
Location: Netherlands
Thanks for letting me know Netmaestro :)
What probably is the greatest difference, is the swap from eax and edx that you did with three instructions while I simply switched the place eax and edx where loaded from.
What surprised me with my code that uses push / pop ebx is the impact of where they are placed. I don't know much yet about optimizing but having the push and pop so close together without any instruction in between that accesses memory seemed faster compared to placing the push at the beginning and the pop at the end of the function.

A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.


Top
 Profile  
Reply with quote  
 Post subject: Re: Optimizing rotations for quad
PostPosted: Wed Aug 24, 2011 8:28 pm 
Offline
Addict
Addict
User avatar

Joined: Sat Aug 15, 2009 6:59 pm
Posts: 1252
wilbert wrote:
A little off topic ... if you like such speed optimizations, another useful 'investigation' might be the fastest way to fill or copy a block of memory.


Code:
!push ecx
!shr ecx,2
!rep movsd
!pop ecx
!and ecx,3
!rep movsb

Size of memory block to copy goes into ecx. Source address goes into esi and destination address into edi.
Pretty basic code, but on Core i7 it's the fastest. I guess the CPU recognizes the algo and switch to a build in fast memory copy algo. On older CPU's using the SSE registers with prefetching is much faster, on Core i7 this simple code beats SSE.


Top
 Profile  
Reply with quote  
 Post subject: Re: Optimizing rotations for quad
PostPosted: Wed Aug 24, 2011 8:45 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Wed Apr 12, 2006 7:59 pm
Posts: 171
Location: Germany
I made any tests and this was the fastest (no jumps!):
Code:
Procedure.q Rotr64_(val.q, n)
  !mov eax,[esp + 4]
  !mov edx,[esp + 8]
  !mov ecx,[esp + 12]
  !test ecx,100000b     ;test is my favorite ;-)
  !cmovnz eax,edx
  !cmovnz edx,[esp + 4]
  !push ebx
  !mov ebx, eax
  !shrd eax, edx, cl
  !shrd edx, ebx, cl
  !pop ebx
  ProcedureReturn
EndProcedure

A test with "xchg eax,edx" was not faster. I use an Intel i7-2600; maybe is this code not faster on an older cpu. You can test it :) !
Helle


Top
 Profile  
Reply with quote  
 Post subject: Re: Optimizing rotations for quad
PostPosted: Wed Aug 24, 2011 9:28 pm 
Offline
PureBasic Bullfrog
PureBasic Bullfrog
User avatar

Joined: Wed Jul 06, 2005 5:42 am
Posts: 8048
Location: Fort Nelson, BC, Canada
It's very fast but on my rather weak machine (1.8ghz Intel E2160) it's losing to wilbert's latest by 5%. It's cool code, I'm still trying to figure out how it works.

_________________
Veni, vidi, vici.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 20 posts ]  Go to page Previous  1, 2

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye