Try this. It replaces the main loop in your code but will still need you to add the part before (while getting into alignment) and after (for the leftover bytes). Because it's done 2048 bytes at a time the leftover bytes may be as many as 2047.
'on entry to this part xmm0 contains 16x the byte to ...
Search found 48 matches
- Sat Mar 27, 2010 4:33 pm
- Forum: Tricks 'n' Tips
- Topic: Count char in string, ultra fast SSE4.2
- Replies: 28
- Views: 9561
- Fri Mar 26, 2010 11:16 pm
- Forum: Tricks 'n' Tips
- Topic: Count char in string, ultra fast SSE4.2
- Replies: 28
- Views: 9561
Re: Count char in string, ultra fast SSE4.2
!pxor xmm6,xmm6 'zero xmm6
!psadbw xmm6,xmm7 'sum the 8 upper and 8 lower bytes
!movdqa xmm7,xmm6 'take a copy
!psrldq xmm6,8 'shift to line up
!paddw xmm7,xmm6 'add the 2 8byte sums to give a 16 byte sum
'the low word of xmm7 should now contain the sum required.
'but it needs correcting ...
- Fri Mar 26, 2010 2:52 pm
- Forum: Tricks 'n' Tips
- Topic: Count char in string, ultra fast SSE4.2
- Replies: 28
- Views: 9561
Re: Count char in string, ultra fast SSE4.2
I know little about the i7 and I don't have one to test but if the code is to be run on a range of
processors then i7 quirks can't be relied on. PREFETCH will benefit most CPUs and it shouldn't harm the i7 performance.
I did see the 100,000 but that gave unreasonably high figures. Maximum ...
processors then i7 quirks can't be relied on. PREFETCH will benefit most CPUs and it shouldn't harm the i7 performance.
I did see the 100,000 but that gave unreasonably high figures. Maximum ...
- Thu Mar 25, 2010 3:14 am
- Forum: Tricks 'n' Tips
- Topic: Count char in string, ultra fast SSE4.2
- Replies: 28
- Views: 9561
Re: Count char in string, ultra fast SSE4.2
What am I missing? There was a contest a few years ago here:
http://www.powerbasic.com/support/forums/Forum12/HTML/001733.html
Some of the code submitted there scans a 4MB file and counts ALL characters, not just 1, and it runs in 200ms on a really old processor, faster than any of the code posted ...
http://www.powerbasic.com/support/forums/Forum12/HTML/001733.html
Some of the code submitted there scans a 4MB file and counts ALL characters, not just 1, and it runs in 200ms on a really old processor, faster than any of the code posted ...
- Fri Oct 10, 2008 1:28 pm
- Forum: General Discussion
- Topic: PureBench32 progress laughable, consistancy question!
- Replies: 31
- Views: 5553
- Fri Oct 10, 2008 11:44 am
- Forum: General Discussion
- Topic: PureBench32 progress laughable, consistancy question!
- Replies: 31
- Views: 5553
Both code and data alignment are important to timings but I can't imagine a modern compiler not aligning 32 bit data such as longs and floats, on a 32 bit (4 byte) boundary. Some compilers may misalign larger data such as doubles as they should be aligned on an 8 byte boundary for best performance ...
- Thu Oct 09, 2008 10:49 pm
- Forum: General Discussion
- Topic: PureBench32 progress laughable, consistancy question!
- Replies: 31
- Views: 5553
Code alignment/misalignment can easily change timings for small loops by as much 20%.
You can demonstrate alignment differences by inserting ASM NOP statements immediately ahead of your loop. Time the loop, insert one more NOP, time the loop, insert one more NOP .. until you have 16 or 32 NOPs and ...
You can demonstrate alignment differences by inserting ASM NOP statements immediately ahead of your loop. Time the loop, insert one more NOP, time the loop, insert one more NOP .. until you have 16 or 32 NOPs and ...
- Wed Sep 24, 2008 1:26 pm
- Forum: Coding Questions
- Topic: Chess Moves in a byte
- Replies: 55
- Views: 15285
Please, do add an unused padding byte element at the end so your structure ends up being 32bits long instead of 24. You can benchmark if you want, but you'll always see it's faster this way.
This is not always the case.
Using 4 bytes/structure instead of 3 requires 33% more memory reads which can ...
This is not always the case.
Using 4 bytes/structure instead of 3 requires 33% more memory reads which can ...
- Wed Sep 24, 2008 1:21 pm
- Forum: Coding Questions
- Topic: Circle to Line collision
- Replies: 8
- Views: 2406
- Mon Sep 22, 2008 7:10 pm
- Forum: Coding Questions
- Topic: Circle to Line collision
- Replies: 8
- Views: 2406
- Sun Sep 21, 2008 3:15 pm
- Forum: Coding Questions
- Topic: Don't hassle the Huff'
- Replies: 6
- Views: 1634
- Sat Sep 20, 2008 5:22 pm
- Forum: Coding Questions
- Topic: maths functions
- Replies: 37
- Views: 9410
- Sat Sep 20, 2008 3:12 pm
- Forum: Coding Questions
- Topic: maths functions
- Replies: 37
- Views: 9410
- Fri Sep 19, 2008 9:56 pm
- Forum: Coding Questions
- Topic: maths functions
- Replies: 37
- Views: 9410
- Fri Sep 19, 2008 2:25 pm
- Forum: Coding Questions
- Topic: maths functions
- Replies: 37
- Views: 9410