Code: Select all
EnableExplicit
;You only need to edit the two constants and the macro line when doing these type of tests.
#TestRuns=1000000000 ;Only change this if the test is way to slow. Note that the test use Delay() that add 1.3 seconds total.
#TestType="multiply, (value1*value2)" ;Change the type name when you change the test math.
Macro Test(value1,value2)
(value1*value2) ;change the calculation inside the () to run the tests on something other than a multiply.
EndMacro
DisableDebugger
Define.l time,timelast,t1,t2,t3,t4
Define.i i,l
Define.f s1,s2,s3
Define.d d1,d2,d3
l=#TestRuns
s1=#PI
s2=s1
d1=#PI
d2=d1
timeBeginPeriod_(1)
Delay(500)
timelast=timeGetTime_()
For i=1 To l
s3=Test(s1,s2)
Next
time=timeGetTime_()
t1=time-timelast
Delay(100)
timelast=timeGetTime_()
For i=1 To l
d3=Test(d1,d2)
Next
time=timeGetTime_()
t2=time-timelast
Delay(100)
timelast=timeGetTime_()
For i=1 To l
s3=Test(d1,d2)
Next
time=timeGetTime_()
t3=time-timelast
Delay(100)
timelast=timeGetTime_()
For i=1 To l
d3=Test(s1,s2)
Next
time=timeGetTime_()
t4=time-timelast
Delay(500)
timeEndPeriod_(1)
EnableDebugger
CompilerIf #PB_Compiler_Processor=#PB_Processor_x64
Debug ";x64 test, "+Str(l)+" loops each, "+#TestType+"."
CompilerElse
Debug ";x86 test, "+Str(l)+" loops each, "+#TestType+"."
CompilerEndIf
Debug ";"+Str(t1)+"ms float=float(x)float"
Debug ";"+Str(t2)+"ms double=double(x)double"
Debug ";"+Str(t3)+"ms float=double(x)double"
Debug ";"+Str(t4)+"ms double=float(x)float"
Debug ";Pure doubles have ~2.2x precision vs pure float/singles (53bit precision in doubles, 24bit in singles)"
;floating point multiply and store test with
;AMD Phenom II 1090T (6 x 3.2GHz cores)
;x64 test, 1 billion loops each.
;2282ms float=float*float (~10.7% higher performance vs pure double)
;2527ms double=double*double (~2.2x precision vs pure float, 53bit precision in doubles, 24bit in singles)
;2492ms float=double*double (~1.4% faster than pure double)
;2481ms double=float*float (~0.4% faster than float=double*double)
;x86 test, 1 billion loops each.
;2369ms float=float*float (~0.3% lower performance vs pure double)
;2362ms double=double*double (~2.2x precision vs pure float, 53bit precision in doubles, 24bit in singles)
;2296ms float=double*double (~2.9% faster than pure double)
;2312ms double=float*float (~0.7% slower than float=double*double)
;Thoughts:
;Very odd, one would assume that doubles would be faster on x64 than x86,
;but even more surpising is that that singles and doubles are about the same speed on x86,
;What does this all mean?:
;If these numbers are similar for others (AMD vs Intel would mostly differ the most in results) this means that
;calculation should be done as either pure singles or pure doubles on x86, likewise on x64.
;The reason is simple as avoiding conversions during the calculation does theoretical reduce overhead.
;But as you see from the numbers theoretical and practical do not always line up as you expected it.
;So stayig purely singles or purely doubles on x86 is out of pure convenience, and it avoids unexpected precision loss,
;same is true on x64 but on x64 there is a speed gain of about 10% if using purely singles vs doubles.
;Conclusion:
;If you want best precision, use purely doubles on both x86 and x64.
;If precision is secondary or singles are good enough then 32bit can give speed gains on x64 but not x86,
;you do however only need half as much space/memory to store singles than doubles though.
;What this test does show however is that you should not be afraid to use doubles as here the worst case was only about 10% difference.
;The mixing of single and double has almost no benefit, other than to show that converting to/from single and double is not that different.
;Why is...?:
;Why is single and double on x86 equally fast?
;This could be due to PureBasic optimization as the double is emulated while on x64 it is not!
;Why is single faster on x64 than doubles?
;This could be due to AMD CPU optimization as two singles could be transfered in the same space as one double.
;Or it could be a PureBasic optimization taking advantage of x64 registry behaviour or x64 features.
;I have not looked at the assembly output, so this is all speculation obviously!
;One thing is certain, there is not excuse to not use doubles whenever possible, the higher precision outweigh almost all downsides.Code: Select all
;As a bonus here are other tests:
;Test runs was either 1 billion or a 0 was added/removed to ensure each test took from 1 to 10 seconds to run.
;Pure doubles have ~2.2x precision vs pure float/singles (53bit precision in doubles, 24bit in singles)
;x86 test, 1000000000 loops each, division, (value1/value2).
;5267ms float=float(x)float
;5275ms double=double(x)double
;5303ms float=double(x)double
;5286ms double=float(x)float
;x64 test, 1000000000 loops each, division, (value1/value2).
;5360ms float=float(x)float
;5343ms double=double(x)double
;5362ms float=double(x)double
;5318ms double=float(x)float
;x86 test, 1000000000 loops each, subtract, (value1-value2).
;2046ms float=float(x)float
;2025ms double=double(x)double
;2093ms float=double(x)double
;2021ms double=float(x)float
;x64 test, 1000000000 loops each, subtract, (value1-value2).
;2390ms float=float(x)float
;2335ms double=double(x)double
;2345ms float=double(x)double
;2317ms double=float(x)float
;x86 test, 1000000000 loops each, addition, (value1+value2).
;2034ms float=float(x)float
;2027ms double=double(x)double
;2093ms float=double(x)double
;2040ms double=float(x)float
;x64 test, 1000000000 loops each, addition, (value1+value2).
;2430ms float=float(x)float
;2364ms double=double(x)double
;2379ms float=double(x)double
;2260ms double=float(x)float
;x86 test, 100000000 loops each, log10, Log10(value1), reduced loops by one 0 due to being too slow.
;4682ms float=float(x)float
;4541ms double=double(x)double
;4618ms float=double(x)double
;4543ms double=float(x)float
;x64 test, 100000000 loops each, log10, Log10(value1), reduced loops by one 0 due to being too slow.
;3886ms float=float(x)float
;3826ms double=double(x)double
;3877ms float=double(x)double
;3838ms double=float(x)float
;x86 test, 100000000 loops each, pow, Pow(value1,value2), reduced loops by one 0 due to being too slow.
;7626ms float=float(x)float
;8634ms double=double(x)double
;7361ms float=double(x)double
;8161ms double=float(x)float
;x64 test, 100000000 loops each, pow, Pow(value1,value2), reduced loops by one 0 due to being too slow.
;8549ms float=float(x)float
;8672ms double=double(x)double
;8624ms float=double(x)double
;8543ms double=float(x)float
;x86 test, 1000000000 loops each, abs, Abs(value1).
;2452ms float=float(x)float
;2278ms double=double(x)double
;2509ms float=double(x)double
;2532ms double=float(x)float
;x64 test, 1000000000 loops each, abs, Abs(value1).
;2544ms float=float(x)float
;2231ms double=double(x)double
;2519ms float=double(x)double
;2541ms double=float(x)float
;x86 test, 1000000000 loops each, sqr, Sqr(value1).
;7513ms float=float(x)float
;7482ms double=double(x)double
;7493ms float=double(x)double
;7549ms double=float(x)float
;x64 test, 1000000000 loops each, sqr, Sqr(value1).
;7474ms float=float(x)float
;7466ms double=double(x)double
;7431ms float=double(x)double
;7484ms double=float(x)float
;x86 test, 1000000000 loops each, less than, (value1<value2).
;3237ms float=float(x)float
;2937ms double=double(x)double
;2919ms float=double(x)double
;3013ms double=float(x)float
;x64 test, 1000000000 loops each, less than, (value1<value2).
;2498ms float=float(x)float
;2880ms double=double(x)double
;2875ms float=double(x)double
;2494ms double=float(x)float
;x64 test, 1000000000 loops each, equal to, (value1=value2).
;2494ms float=float(x)float
;2502ms double=double(x)double
;2240ms float=double(x)double
;2483ms double=float(x)float
;x86 test, 1000000000 loops each, equal to, (value1=value2).
;2482ms float=float(x)float
;2652ms double=double(x)double
;2494ms float=double(x)double
;2500ms double=float(x)float

