Search found 48 matches

by Road Runner
Sat Mar 27, 2010 4:33 pm
Forum: Tricks 'n' Tips
Topic: Count char in string, ultra fast SSE4.2
Replies: 28
Views: 9561

Re: Count char in string, ultra fast SSE4.2

Try this. It replaces the main loop in your code but will still need you to add the part before (while getting into alignment) and after (for the leftover bytes). Because it's done 2048 bytes at a time the leftover bytes may be as many as 2047.

'on entry to this part xmm0 contains 16x the byte to ...
by Road Runner
Fri Mar 26, 2010 11:16 pm
Forum: Tricks 'n' Tips
Topic: Count char in string, ultra fast SSE4.2
Replies: 28
Views: 9561

Re: Count char in string, ultra fast SSE4.2


!pxor xmm6,xmm6 'zero xmm6
!psadbw xmm6,xmm7 'sum the 8 upper and 8 lower bytes
!movdqa xmm7,xmm6 'take a copy
!psrldq xmm6,8 'shift to line up
!paddw xmm7,xmm6 'add the 2 8byte sums to give a 16 byte sum

'the low word of xmm7 should now contain the sum required.
'but it needs correcting ...
by Road Runner
Fri Mar 26, 2010 2:52 pm
Forum: Tricks 'n' Tips
Topic: Count char in string, ultra fast SSE4.2
Replies: 28
Views: 9561

Re: Count char in string, ultra fast SSE4.2

I know little about the i7 and I don't have one to test but if the code is to be run on a range of

processors then i7 quirks can't be relied on. PREFETCH will benefit most CPUs and it shouldn't harm the i7 performance.

I did see the 100,000 but that gave unreasonably high figures. Maximum ...
by Road Runner
Thu Mar 25, 2010 3:14 am
Forum: Tricks 'n' Tips
Topic: Count char in string, ultra fast SSE4.2
Replies: 28
Views: 9561

Re: Count char in string, ultra fast SSE4.2

What am I missing? There was a contest a few years ago here:
http://www.powerbasic.com/support/forums/Forum12/HTML/001733.html

Some of the code submitted there scans a 4MB file and counts ALL characters, not just 1, and it runs in 200ms on a really old processor, faster than any of the code posted ...
by Road Runner
Fri Oct 10, 2008 1:28 pm
Forum: General Discussion
Topic: PureBench32 progress laughable, consistancy question!
Replies: 31
Views: 5553

Paul,
you're going to have to look at the underlying ASM code to see what's going on.
Create the simplest demo you can that demonstrates the problem then compile it and look at the ASM to see how the compiled code for the integers differs in the 2 cases.
by Road Runner
Fri Oct 10, 2008 11:44 am
Forum: General Discussion
Topic: PureBench32 progress laughable, consistancy question!
Replies: 31
Views: 5553

Both code and data alignment are important to timings but I can't imagine a modern compiler not aligning 32 bit data such as longs and floats, on a 32 bit (4 byte) boundary. Some compilers may misalign larger data such as doubles as they should be aligned on an 8 byte boundary for best performance ...
by Road Runner
Thu Oct 09, 2008 10:49 pm
Forum: General Discussion
Topic: PureBench32 progress laughable, consistancy question!
Replies: 31
Views: 5553

Code alignment/misalignment can easily change timings for small loops by as much 20%.
You can demonstrate alignment differences by inserting ASM NOP statements immediately ahead of your loop. Time the loop, insert one more NOP, time the loop, insert one more NOP .. until you have 16 or 32 NOPs and ...
by Road Runner
Wed Sep 24, 2008 1:26 pm
Forum: Coding Questions
Topic: Chess Moves in a byte
Replies: 55
Views: 15285

Please, do add an unused padding byte element at the end so your structure ends up being 32bits long instead of 24. You can benchmark if you want, but you'll always see it's faster this way.
This is not always the case.
Using 4 bytes/structure instead of 3 requires 33% more memory reads which can ...
by Road Runner
Wed Sep 24, 2008 1:21 pm
Forum: Coding Questions
Topic: Circle to Line collision
Replies: 8
Views: 2406

Well, for one collision and intersections are two different, separate things.
How does the calculation differ?
by Road Runner
Mon Sep 22, 2008 7:10 pm
Forum: Coding Questions
Topic: Circle to Line collision
Replies: 8
Views: 2406

Untested pseudo code follows
'define the line by the co-ordinates of any 2 points on that line, (x1,y1) and (x2,y2)


'define the circle by the co-ordinates of its centre xc,yc and its radius, r

'now calculate the length of the sides of the triangle formed by the 3 points, call them a,b,c
a=SQR ...
by Road Runner
Sun Sep 21, 2008 3:15 pm
Forum: Coding Questions
Topic: Don't hassle the Huff'
Replies: 6
Views: 1634

The only other information you should need to decode your bit stream is the symbol table you just generated:S = 0, I = 11, P = 101, M = 100

You scan your coded data from the start and compare with each symbol in your symbol table. When you get a match, you have found the next symbol, you remove ...
by Road Runner
Sat Sep 20, 2008 5:22 pm
Forum: Coding Questions
Topic: maths functions
Replies: 37
Views: 9410

I'm surprised any standard, modern compiler "wins". Surely they all use the same FPU and it's that which sets the accuracy.
FPU arithmetic has been standardised for 20+ years.
by Road Runner
Sat Sep 20, 2008 3:12 pm
Forum: Coding Questions
Topic: maths functions
Replies: 37
Views: 9410

Sospel.
Can you post the instructions in POWERBASIC?

The following is the complete PB Console Compiler code (Currently $169)
DEFDBL a-z 'you want to test DOUBLES so set default variable type to that
FUNCTION PBMAIN () AS LONG

value = 1.0
FOR i=1 TO 10
FOR j=1 TO 1000
FOR k=1 TO 1000
value = TAN ...
by Road Runner
Fri Sep 19, 2008 9:56 pm
Forum: Coding Questions
Topic: maths functions
Replies: 37
Views: 9410

Psychophanta,
i'd like to demonstrate to blueznl and to all that these 2 users are wrong
Just because you don't understand what these 2 users are trying to explain doesn't mean they are wrong.
by Road Runner
Fri Sep 19, 2008 2:25 pm
Forum: Coding Questions
Topic: maths functions
Replies: 37
Views: 9410

Psychophanta,
The original post in this thread stated "error in % = 0.000010005" so we are dealing here with percentage errors.

In your example the exact answer is zero so if your "value" is anything other than zero then the % error is infinite (not undefined).

It may not be a useful result ...