Floating point speed test!

Everything else that doesn't fall into one of the other PB categories.
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Floating point speed test!

Post by Rescator »

PureBasic 4.61 beta 1

Code: Select all

EnableExplicit

;You only need to edit the two constants and the macro line when doing these type of tests.

#TestRuns=1000000000 ;Only change this if the test is way to slow. Note that the test use Delay() that add 1.3 seconds total.
#TestType="multiply, (value1*value2)" ;Change the type name when you change the test math.
Macro Test(value1,value2)
	(value1*value2) ;change the calculation inside the () to run the tests on something other than a multiply. 
EndMacro

DisableDebugger

Define.l time,timelast,t1,t2,t3,t4
Define.i i,l
Define.f s1,s2,s3
Define.d d1,d2,d3

l=#TestRuns
s1=#PI
s2=s1
d1=#PI
d2=d1

timeBeginPeriod_(1)

Delay(500)

timelast=timeGetTime_()
For i=1 To l
	s3=Test(s1,s2)	
Next
time=timeGetTime_()
t1=time-timelast

Delay(100)

timelast=timeGetTime_()
For i=1 To l
	d3=Test(d1,d2)	
Next
time=timeGetTime_()
t2=time-timelast

Delay(100)

timelast=timeGetTime_()
For i=1 To l
	s3=Test(d1,d2)
Next
time=timeGetTime_()
t3=time-timelast

Delay(100)

timelast=timeGetTime_()
For i=1 To l
	d3=Test(s1,s2)
Next
time=timeGetTime_()
t4=time-timelast

Delay(500)

timeEndPeriod_(1)

EnableDebugger

CompilerIf #PB_Compiler_Processor=#PB_Processor_x64
	Debug ";x64 test, "+Str(l)+" loops each, "+#TestType+"."
CompilerElse
	Debug ";x86 test, "+Str(l)+" loops each, "+#TestType+"."
CompilerEndIf
Debug ";"+Str(t1)+"ms float=float(x)float"
Debug ";"+Str(t2)+"ms double=double(x)double"
Debug ";"+Str(t3)+"ms float=double(x)double"
Debug ";"+Str(t4)+"ms double=float(x)float"
Debug ";Pure doubles have ~2.2x precision vs pure float/singles (53bit precision in doubles, 24bit in singles)"

;floating point multiply and store test with
;AMD Phenom II 1090T (6 x 3.2GHz cores)

;x64 test, 1 billion loops each.
;2282ms float=float*float (~10.7% higher performance vs pure double)
;2527ms double=double*double (~2.2x precision vs pure float, 53bit precision in doubles, 24bit in singles)
;2492ms float=double*double (~1.4% faster than pure double)
;2481ms double=float*float (~0.4% faster than float=double*double)

;x86 test, 1 billion loops each.
;2369ms float=float*float (~0.3% lower performance vs pure double)
;2362ms double=double*double (~2.2x precision vs pure float, 53bit precision in doubles, 24bit in singles)
;2296ms float=double*double (~2.9% faster than pure double)
;2312ms double=float*float (~0.7% slower than float=double*double)

;Thoughts:
;Very odd, one would assume that doubles would be faster on x64 than x86,
;but even more surpising is that that singles and doubles are about the same speed on x86,

;What does this all mean?:
;If these numbers are similar for others (AMD vs Intel would mostly differ the most in results) this means that
;calculation should be done as either pure singles or pure doubles on x86, likewise on x64.
;The reason is simple as avoiding conversions during the calculation does theoretical reduce overhead.
;But as you see from the numbers theoretical and practical do not always line up as you expected it.
;So stayig purely singles or purely doubles on x86 is out of pure convenience, and it avoids unexpected precision loss,
;same is true on x64 but on x64 there is a speed gain of about 10% if using purely singles vs doubles.

;Conclusion:
;If you want best precision, use purely doubles on both x86 and x64.
;If precision is secondary or singles are good enough then 32bit can give speed gains on x64 but not x86,
;you do however only need half as much space/memory to store singles than doubles though.
;What this test does show however is that you should not be afraid to use doubles as here the worst case was only about 10% difference.
;The mixing of single and double has almost no benefit, other than to show that converting to/from single and double is not that different.

;Why is...?:
;Why is single and double on x86 equally fast?
;This could be due to PureBasic optimization as the double is emulated while on x64 it is not!
;Why is single faster on x64 than doubles?
;This could be due to AMD CPU optimization as two singles could be transfered in the same space as one double.
;Or it could be a PureBasic optimization taking advantage of x64 registry behaviour or x64 features.
;I have not looked at the assembly output, so this is all speculation obviously!

;One thing is certain, there is not excuse to not use doubles whenever possible, the higher precision outweigh almost all downsides.

Code: Select all

;As a bonus here are other tests:
;Test runs was either 1 billion or a 0 was added/removed to ensure each test took from 1 to 10 seconds to run.
;Pure doubles have ~2.2x precision vs pure float/singles (53bit precision in doubles, 24bit in singles)

;x86 test, 1000000000 loops each, division, (value1/value2).
;5267ms float=float(x)float
;5275ms double=double(x)double
;5303ms float=double(x)double
;5286ms double=float(x)float

;x64 test, 1000000000 loops each, division, (value1/value2).
;5360ms float=float(x)float
;5343ms double=double(x)double
;5362ms float=double(x)double
;5318ms double=float(x)float

;x86 test, 1000000000 loops each, subtract, (value1-value2).
;2046ms float=float(x)float
;2025ms double=double(x)double
;2093ms float=double(x)double
;2021ms double=float(x)float

;x64 test, 1000000000 loops each, subtract, (value1-value2).
;2390ms float=float(x)float
;2335ms double=double(x)double
;2345ms float=double(x)double
;2317ms double=float(x)float

;x86 test, 1000000000 loops each, addition, (value1+value2).
;2034ms float=float(x)float
;2027ms double=double(x)double
;2093ms float=double(x)double
;2040ms double=float(x)float

;x64 test, 1000000000 loops each, addition, (value1+value2).
;2430ms float=float(x)float
;2364ms double=double(x)double
;2379ms float=double(x)double
;2260ms double=float(x)float

;x86 test, 100000000 loops each, log10, Log10(value1), reduced loops by one 0 due to being too slow.
;4682ms float=float(x)float
;4541ms double=double(x)double
;4618ms float=double(x)double
;4543ms double=float(x)float

;x64 test, 100000000 loops each, log10, Log10(value1), reduced loops by one 0 due to being too slow.
;3886ms float=float(x)float
;3826ms double=double(x)double
;3877ms float=double(x)double
;3838ms double=float(x)float

;x86 test, 100000000 loops each, pow, Pow(value1,value2), reduced loops by one 0 due to being too slow.
;7626ms float=float(x)float
;8634ms double=double(x)double
;7361ms float=double(x)double
;8161ms double=float(x)float

;x64 test, 100000000 loops each, pow, Pow(value1,value2), reduced loops by one 0 due to being too slow.
;8549ms float=float(x)float
;8672ms double=double(x)double
;8624ms float=double(x)double
;8543ms double=float(x)float

;x86 test, 1000000000 loops each, abs, Abs(value1).
;2452ms float=float(x)float
;2278ms double=double(x)double
;2509ms float=double(x)double
;2532ms double=float(x)float

;x64 test, 1000000000 loops each, abs, Abs(value1).
;2544ms float=float(x)float
;2231ms double=double(x)double
;2519ms float=double(x)double
;2541ms double=float(x)float

;x86 test, 1000000000 loops each, sqr, Sqr(value1).
;7513ms float=float(x)float
;7482ms double=double(x)double
;7493ms float=double(x)double
;7549ms double=float(x)float

;x64 test, 1000000000 loops each, sqr, Sqr(value1).
;7474ms float=float(x)float
;7466ms double=double(x)double
;7431ms float=double(x)double
;7484ms double=float(x)float

;x86 test, 1000000000 loops each, less than, (value1<value2).
;3237ms float=float(x)float
;2937ms double=double(x)double
;2919ms float=double(x)double
;3013ms double=float(x)float

;x64 test, 1000000000 loops each, less than, (value1<value2).
;2498ms float=float(x)float
;2880ms double=double(x)double
;2875ms float=double(x)double
;2494ms double=float(x)float

;x64 test, 1000000000 loops each, equal to, (value1=value2).
;2494ms float=float(x)float
;2502ms double=double(x)double
;2240ms float=double(x)double
;2483ms double=float(x)float

;x86 test, 1000000000 loops each, equal to, (value1=value2).
;2482ms float=float(x)float
;2652ms double=double(x)double
;2494ms float=double(x)double
;2500ms double=float(x)float
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3944
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Floating point speed test!

Post by wilbert »

You are wrong on one very important assumption.
A double is NOT emulated on x86. The FPU always uses 80 bit internally.

Try this

Code: Select all

EnableExplicit

;You only need to edit the two constants and the macro line when doing these type of tests.

#TestRuns=500000000 ;Only change this if the test is way to slow. Note that the test use Delay() that add 1.3 seconds total.
#TestType="multiply, (value1*value2)" ;Change the type name when you change the test math.
Macro Test(value1,value2)
   (value1*value2) ;change the calculation inside the () to run the tests on something other than a multiply. 
EndMacro

DisableDebugger

Define.l time,timelast,t1,t2,t3,t4
Define.i i,l
Define.f s1,s2,s3
Define.d d1,d2,d3

l=#TestRuns
s1=#PI
s2=s1
d1=#PI
d2=d1

timeBeginPeriod_(1)

Delay(500)

timelast=timeGetTime_()
For i=1 To l
   s3=Test(s1,s2)   
   s3=Test(s1,s2)  
   s3=Test(s1,s2)  
   s3=Test(s1,s2)  
   s3=Test(s1,s2)  
Next
time=timeGetTime_()
t1=time-timelast

Delay(100)

timelast=timeGetTime_()
For i=1 To l
   d3=Test(d1,d2)   
   d3=Test(d1,d2)   
   d3=Test(d1,d2)   
   d3=Test(d1,d2)   
   d3=Test(d1,d2)   
Next
time=timeGetTime_()
t2=time-timelast

Delay(100)

timelast=timeGetTime_()
For i=1 To l
   s3=Test(d1,d2)
   s3=Test(d1,d2)
   s3=Test(d1,d2)
   s3=Test(d1,d2)
   s3=Test(d1,d2)
Next
time=timeGetTime_()
t3=time-timelast

Delay(100)

timelast=timeGetTime_()
For i=1 To l
   d3=Test(s1,s2)
   d3=Test(s1,s2)
   d3=Test(s1,s2)
   d3=Test(s1,s2)
   d3=Test(s1,s2)
Next
time=timeGetTime_()
t4=time-timelast

Delay(500)

timeEndPeriod_(1)

EnableDebugger

CompilerIf #PB_Compiler_Processor=#PB_Processor_x64
   Debug ";x64 test, "+Str(l)+" loops each, "+#TestType+"."
CompilerElse
   Debug ";x86 test, "+Str(l)+" loops each, "+#TestType+"."
CompilerEndIf
Debug ";"+Str(t1)+"ms float=float(x)float"
Debug ";"+Str(t2)+"ms double=double(x)double"
Debug ";"+Str(t3)+"ms float=double(x)double"
Debug ";"+Str(t4)+"ms double=float(x)float"
Debug ";Pure doubles have ~2.2x precision vs pure float/singles (53bit precision in doubles, 24bit in singles)"
It decreases the impact of the loop by doing the multiplication 5 times for every loop.

When I add extra variables t5, t6, t7 it impacts performance on my computer even while they are not used.
What routine becomes slower depends on the amount of extra variables I add.
It seems the difference in speed has more to do with memory alignment and not so much the calculation itself.
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Re: Floating point speed test!

Post by Rescator »

wilbert wrote:You are wrong on one very important assumption.
A double is NOT emulated on x86. The FPU always uses 80 bit internally.
Nope, you just confirmed I'm right.
On x64, 64bit float is used for doubles.
So if you say that 80bit float is used for doubles on x86 for doubles then 64bit float is emulated.
Though I guess one can say they gimped x64 so it can't do 80bit floats but....*shrug*

BTW! It's quite possible that Windows actually does code replacement with the x86 exe (so that although it might look like "x87" float is used it could instead be SSE2), while the x64 exe get's x64 float as that is exactly what doubles are on x64.

Or how else do you explain the double speed on x86 here? x87 (80bit) float is way slower than x64 float, and AFAIK do not exist at all (but is emulated) in newer CPUs.
I'm just curious about the speed difference here. and I checked the ASM source and the exact same instructions are used for doubles on x64 and x86 so it's not the reason for the speed dif.
wilbert wrote: Try this
****snip***
It decreases the impact of the loop by doing the multiplication 5 times for every loop.

When I add extra variables t5, t6, t7 it impacts performance on my computer even while they are not used.
What routine becomes slower depends on the amount of extra variables I add.
It seems the difference in speed has more to do with memory alignment and not so much the calculation itself.
Well yes it does, the more variables and the more code you add the more CPU cache you use and the less optimization the CPU can do. PureBasic also does register optimizations where it tries to put variables/values in the registry where possible.
I created this loop as tight as I could without resorting to ASM as I was evaluating the floating point (single and double) speed of those instructions on x86 and x64.

Once you start adding more code all bets are off as you then can only evaluate that specific code rather than the instructions themselves.
Also just repeating the same line multiple times is fully unnatural, and if the compiler does optimize that away then modern CPU's might do so if they detect it as redundant (virtual cpu's probably do that always).

And what do you mean by adding extra variables? Inside the loop? If not then. then that is odd. You also did not state what CPU you have. And if you think there is alignment issues then that is not something I can fix.

Besides the point here was providing a nice testing framework using a nifty Macro to let you edit a couple lines to quickly test various functions or math.
And as floating point is much more interesting than integers (and that quads in PureBasic on x86 I know uses different code for sure) I used floating point as a quick test,
and threw in the numbers for some other floating point instructions just so people could compare vs their own system if there are any speed trends / differences to keep an eye out for or not.

So apologies if you found all this disappointing for some reason, but hey if you know a better way to test instructions or math then please post away, the only way code can be improved is by peer review and code evolution.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3944
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Floating point speed test!

Post by wilbert »

The FPU can handle three different types of float.
32 bit (single precision)
64 bit (double precision)
80 bit (extended precision http://en.wikipedia.org/wiki/Extended_precision )


Internally it always uses 80 bit. When 32 or 64 bit are used, it converts them to 80 bit when the value is loaded into a FPU register or stored from such a register.
PureBasic only supports 32 and 64 bit variables. The FPU is capable of loading and storing 80 bit also but PB has no support for extended precision variables.

If you compile using /commented and look at the generated ASM code, you will see it uses the exact same opcodes to handle the floats on x86 and on x64.
There is no such thing as a x64 float. The difference between x86 and x64 is that x64 handles quad variables natively and those are simulated when using x86.

Also when you look at the generated code, you will see the opcodes to handle the loop are different on x86 and x64.
Therefore speed differences between the two modes are most likely due to the code that handles the loop itself and not due to the FPU opcodes.

The other way of handling floats is using SSE2.
SSE2 can use single or double precision but has no support for extended precision.
The advantage of SSE2 is that you can do parallel calculations up to 4 single precision values at once improving speed but SSE2 is the same on x86 and x64.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Floating point speed test!

Post by Trond »

You have a big problem: When the debugger is enabled in the menu the peephole optimizer of the compiler is turned off, even if you have DisableDebugger in the source. The debugger needs to be off globally for a speed test.
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Re: Floating point speed test!

Post by Rescator »

wilbert wrote:The FPU can handle three different types of float.
32 bit (single precision)
64 bit (double precision)
80 bit (extended precision http://en.wikipedia.org/wiki/Extended_precision )
I know that. And you are referring to the x87 FPU, which does not exist any more.
Also if you look around some more on wikipedia you will see that 80bit float is deprecated in usermode on Windows now,
and 80bit float is not allowed at all in the Windows kernel.
wilbert wrote:Internally it always uses 80 bit. When 32 or 64 bit are used, it converts them to 80 bit when the value is loaded into a FPU register or stored from such a register.
Also, just because a x87 FPU support 80bit float does not mean you get to use 80bit, by default a x87 FPU is set to 80bit.
but Windows (before they deprecated it etc) set it to 64bit instead, effectively "truncating" 80bit floats to 64bit floats.
I don't feel like diving into the huge x64 manuals from intel and amd, but I wouldn't be surprised to find that 80bit precision is "not" guaranteed, only 32bit float or 64bit float.
wilbert wrote:PureBasic only supports 32 and 64 bit variables. The FPU is capable of loading and storing 80 bit also but PB has no support for extended precision variables.
Capable of handling 80bit float does not mean it actually performs 80bit float math. Also 80bit floating point is not really standardized at all like 32bit and 64bit IEEE floats are.
128bit floats are much more interesting in that regard.
wilbert wrote:If you compile using /commented and look at the generated ASM code, you will see it uses the exact same opcodes to handle the floats on x86 and on x64.
Um. *points* >>
Rescator wrote:I'm just curious about the speed difference here. and I checked the ASM source and the exact same instructions are used for doubles on x64 and x86 so it's not the reason for the speed dif.
wilbert wrote:There is no such thing as a x64 float. The difference between x86 and x64 is that x64 handles quad variables natively and those are simulated when using x86.
Typo: x64 float=64bit float

Also when you look at the generated code, you will see the opcodes to handle the loop are different on x86 and x64.
Therefore speed differences between the two modes are most likely due to the code that handles the loop itself and not due to the FPU opcodes.[/quote]
Really?

Code: Select all

; For i=1 To l
  MOV    qword [v_i],1
_For1:
  MOV    rax,qword [v_l]
  CMP    rax,qword [v_i]
  JL    _Next2
; s3=Test(s1,s2)	
  FLD    dword [v_s1]
  FMUL   dword [v_s2]
  FSTP   dword [v_s3]
; Next
_NextContinue2:
  INC    qword [v_i]
  JNO   _For1
_Next2:
vs

Code: Select all

; For i=1 To l
  MOV    dword [v_i],1
_For1:
  MOV    eax,dword [v_l]
  CMP    eax,dword [v_i]
  JL    _Next2
; s3=Test(s1,s2)	
  FLD    dword [v_s1]
  FMUL   dword [v_s2]
  FSTP   dword [v_s3]
; Next
_NextContinue2:
  INC    dword [v_i]
  JNO   _For1
_Next2:
Looks pretty damn similar if you ask me other than the obvious parts like eax and rax and the dword and qword for the integer.
I didn't do any empty loop test but I'm pretty sure that these tiny differences in are so miniscule that any overhead/bias will hardly impact the results, if at all.
But I'll do such a test anyway later today for my own sake, feel free to do the same.
wilbert wrote:The other way of handling floats is using SSE2.
SSE2 can use single or double precision but has no support for extended precision.
The advantage of SSE2 is that you can do parallel calculations up to 4 single precision values at once improving speed but SSE2 is the same on x86 and x64.
*points* >>
Rescator wrote:it might look like "x87" float is used it could instead be SSE2
and yeah I've seen and read the same articles you have.
And I've read enough about the x87 instruction set and the 80bit float to know that I'll stay the hell away from it, NTSC has been nicknamed Never Twice The same Color,
80bit float could easily be nicknamed never twice the same number, as that is what many have discovered with JIT compiling as the conversion to/from 80bit float from 32bit float or 64bit float is sometimes done differently, sometimes truncated(?) sometimes not, and that two identical lines of code executed at different times end up with slightly different numbers, on.the.same.machine... O.o wow!
As I said I have not dived into the intel and amd CPU architecture books yet, but I would not be surprised if the reading about 80bit float is rather dismal there too.
Section 12.1.3 (Virtual Execution System: Supported data types: Handling of floating-point data types) an implementation is free to use an internal representation available on a machine provided that there’s at least 32 (single) or 64 (double) bits.
Also many compilers compile exe's that on "32-bit is using the 80-bit FPU registers, 64-bit is using the 128-bit SSE registers. Neither is using SIMD instructions despite this being obviously parallelizable".
Also the ironic fact that all x64 CPU's have SSE and SSE2, and those from 2005 and later also has SSE3. I'm hoping PureBasic x64 will take advantage of the x64 features that are guaranteed to be there, like SSE2 etc in PureBasic x64 commands.

Trond wrote:You have a big problem: When the debugger is enabled in the menu the peephole optimizer of the compiler is turned off, even if you have DisableDebugger in the source. The debugger needs to be off globally for a speed test.
Ah crap, right, forgot about that quirk, thanks. But not much I can do about that really, DisableDebugger is the closest you can get in a source posting. (there's no way to copy'n'paste config lines along with the code and have that work in the IDE is there?)
User avatar
Demivec
Addict
Addict
Posts: 4283
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Floating point speed test!

Post by Demivec »

@Rescator: What is being reported on with these two tests? I wouldn't expect these to be done correctly when PureBasic doesn't support these results.

Code: Select all

less than, (value1<value2)

equal to, (value1=value2)
I directed output to a console and used a high definition timer (downgraded to whole milliseconds) instead of timeBeginPeriod_(), timeGetTime_(), timeEndPeriod(). I changed the the ending loop value to a constant instead of a variable set to the same value. I also removed the parenthesis around the calculation in the macro since this macro will be used only in non-complex ways.

Here are my results for an AMD Athlon 64 processor 3500+ 2.21GHz, running x86:

Code: Select all

;x86 test, 1000000000 loops each, multiply, value1*value2.
;2540ms float=float(x)float
;2982ms double=double(x)double
;2632ms float=double(x)double
;2442ms double=float(x)float

;x86 test, 1000000000 loops each, divide, value1/value2.
;7989ms float=float(x)float
;7985ms double=double(x)double
;8019ms float=double(x)double
;7959ms double=float(x)float

;x86 test, 1000000000 loops each, addition, value1+value2.
;2643ms float=float(x)float
;3763ms double=double(x)double
;3290ms float=double(x)double
;3054ms double=float(x)float

;x86 test, 1000000000 loops each, subtract, value1-value2.
;2622ms float=float(x)float
;3742ms double=double(x)double
;3295ms float=double(x)double
;3054ms double=float(x)float

;x86 test, 1000000000 loops each, abs, Abs(value1).
;3507ms float=float(x)float
;3510ms double=double(x)double
;2593ms float=double(x)double
;3471ms double=float(x)float

;x86 test, 1000000000 loops each, less than, value1<value2.
;3029ms float=float(x)float
;3189ms double=double(x)double
;3115ms float=double(x)double
;3115ms double=float(x)float

;x86 test, 1000000000 loops each, equal to, value1=value2.
;3029ms float=float(x)float
;3120ms double=double(x)double
;3112ms float=double(x)double
;3119ms double=float(x)float

;x86 test, 100000000 loops each, log10, Log10(value1), reduced loops by one 0 due to being too slow..
;6916ms float=float(x)float
;7050ms double=double(x)double
;7071ms float=double(x)double
;6999ms double=float(x)float

;x86 test, 100000000 loops each, pow, Pow(value1,value2), reduced loops by one 0 due to being too slow..
;11362ms float=float(x)float
;12284ms double=double(x)double
;11716ms float=double(x)double
;11973ms double=float(x)float

;x86 test, 100000000 loops each, sqr, Sqr(value1), reduced loops by one 0 due to being too slow..
;1124ms float=float(x)float
;1126ms double=double(x)double
;1122ms float=double(x)double
;1140ms double=float(x)float


For convenience, here is your test code I used with the modifications...:

Code: Select all

EnableExplicit

;You only need to edit the two constants and the macro line when doing these type of tests.

#TestRuns=1000000000 ;Only change this if the test is way to slow. Note that the test use Delay() that add 1.3 seconds total.
#TestType="multiply, value1*value2" ;Change the type name when you change the test math.
Macro test(value1,value2)
  value1*value2 ;change the calculation inside the () to run the tests on something other than a multiply. 
EndMacro

Define.l time,timelast,t1,t2,t3,t4
Define.i i,l
Define.f s1,s2,s3
Define.d d1,d2,d3

l=#TestRuns
s1=#PI
s2=s1
d1=#PI
d2=d1

Procedure.i ticksHQ()
  Static maxfreq.q 
  Protected t.q 
  If maxfreq=0 
    QueryPerformanceFrequency_(@maxfreq) 
    maxfreq=maxfreq/1000
  EndIf 
  QueryPerformanceCounter_(@t) 
  ProcedureReturn t/maxfreq ;Result is in milliseconds
EndProcedure 

Delay(500)

timelast=ticksHQ()
For i=1 To #TestRuns
  s3=test(s1,s2)   
Next
time=ticksHQ()
t1=time-timelast

Delay(500)

timelast=ticksHQ()
For i=1 To #TestRuns
  d3=test(d1,d2)   
Next
time=ticksHQ()
t2=time-timelast

Delay(500)

timelast=ticksHQ()
For i=1 To #TestRuns
  s3=test(d1,d2)
Next
time=ticksHQ()
t3=time-timelast

Delay(500)

timelast=ticksHQ()
For i=1 To #TestRuns
  d3=test(s1,s2)
Next
time=ticksHQ()
t4=time-timelast

Delay(500)

If OpenConsole()
  CompilerIf #PB_Compiler_Processor=#PB_Processor_x64
    PrintN(";x64 test, "+Str(l)+" loops each, "+#TestType+".")
  CompilerElse
    PrintN(";x86 test, "+Str(l)+" loops each, "+#TestType+".")
  CompilerEndIf
  PrintN(";"+Str(t1)+"ms float=float(x)float")
  PrintN(";"+Str(t2)+"ms double=double(x)double")
  PrintN(";"+Str(t3)+"ms float=double(x)double")
  PrintN(";"+Str(t4)+"ms double=float(x)float")
  PrintN(";Pure doubles have ~2.2x precision vs pure float/singles (53bit precision in doubles, 24bit in singles)")
  
  Print(#CRLF$ + #CRLF$ + "Press ENTER to exit"):
  Input()
EndIf 
User avatar
kenmo
Addict
Addict
Posts: 2083
Joined: Tue Dec 23, 2003 3:54 am

Re: Floating point speed test!

Post by kenmo »

Rescator wrote:Ah crap, right, forgot about that quirk, thanks. But not much I can do about that really, DisableDebugger is the closest you can get in a source posting. (there's no way to copy'n'paste config lines along with the code and have that work in the IDE is there?)
I always liked this 'trick':

Code: Select all

CompilerIf (#PB_Compiler_Debugger)
  CompilerError "Please disable debugger for testing!"
CompilerEndIf

MessageRequester("Testing", "One Two Three")
User avatar
Rings
Moderator
Moderator
Posts: 1435
Joined: Sat Apr 26, 2003 1:11 am

Re: Floating point speed test!

Post by Rings »

moved, as its not a tip or trick.
(correct me if i'm wrong )
SPAMINATOR NR.1
Thorium
Addict
Addict
Posts: 1314
Joined: Sat Aug 15, 2009 6:59 pm

Re: Floating point speed test!

Post by Thorium »

Rescator wrote: Also, just because a x87 FPU support 80bit float does not mean you get to use 80bit, by default a x87 FPU is set to 80bit.
but Windows (before they deprecated it etc) set it to 64bit instead, effectively "truncating" 80bit floats to 64bit floats.
I don't feel like diving into the huge x64 manuals from intel and amd, but I wouldn't be surprised to find that 80bit precision is "not" guaranteed, only 32bit float or 64bit float.
Discussion without actualy reading the manuals is contra productive.
The "Intel 64 and IA-32 Architectures Software Developer Manual" clearly states in section 8 that x87 is still present (just integrated into the CPU but still runs parallel to the main core). And still uses 80 bit for all caluculations in all modes. 64bit floats are exactly the same on x86 and x64.
Post Reply