Page 2 of 2
Posted: Sat Nov 15, 2008 4:32 pm
by superadnim
yes the debugger is off however I forgot to mention a key factor: my benchmark does a tiny attempt to reproduce real-life conditions in the sense that the input data is randomly generated however on both tests the code is the same and I'm not benchmarking the actual time it takes (well I am) but the relation in between test A and B (the ratio)
that said, I don't think my test is flawed
the test cases:
Code: Select all
Macro TestCaseA
Define.f temp = Random(10)+Random(10)*0.1
Define.f temp_decimals
decimals2(temp, temp_decimals)
EndMacro
Macro TestCaseB
Define.f temp = Random(10)+Random(10)*0.1
Define.f temp_decimals = decimals(temp)
EndMacro
since the first line is the same, it's ruled out of the equation pretty much even though the benchmarks take much longer... (and for the sake of it, in between each test the random seed is set to the same one used at the first case - and no I'm not counting this)
another thing is that I'm running the process in realtime priority class, the results differ by quite a lot if I don't do this.
the reason I asked about where to put the fstp is simply because of the way processors are predicting branching and whatnot I think by keeping all of the fpu calls together and in pairs things tend to run better?, might not be the case with this code though.
I'll simplify the test scenario and post the code soon.
Posted: Sat Nov 15, 2008 4:53 pm
by dioxin
superadnim,
even though the benchmarks take much longer
That's the answer, "even though it takes much longer".
For the sake of argument, let's say the INT() method takes 500clks and the ASM method takes 50clks and your generating of random numbers takes 2,000clks.
Total time would then be 2,500 vs 2,050 a saving of around 20% but the ASM is actually 10 times faster then the INT() it replaces.
My guess is that generating your random numbers is taking most of the time and is masking the savings from the ASM.
Posted: Sat Nov 15, 2008 5:00 pm
by superadnim
I don't understand your argument, the way I see it the random calls negate each other. a real benchmark should take into consideration real tests not just "how fast is this code executing by itself" because the latter is no good in reality, or is it?
Code: Select all
;---
Macro decimals( _n_ ) : (_n_-Int(_n_)) : EndMacro
Macro decimals2( _n_, _result_ )
!push $1f7f0000 ; FP control word needed to round to zero
!fstcw [esp] ; Save the current FP control word
!fldcw [esp+2] ; Set the FP to round to zero
!fld dword[v_#_n_] ; Load the FP with the number to work on
!fld st0 ; Replicate/push that number on the FP stack
!frndint ; Round number to leave integer
!fsubp st1,st0 ; Subtract from original to leave fraction part (and pop stack)
!fldcw [esp] ; Restore original FP control word
!fstp dword[v_#_result_]
!add esp,4 ; Clean up CPU stack
EndMacro
;---
Macro TestNumber
Random(10)+Random(10)*0.1 ; 123.456 ;
EndMacro
Macro TestCaseA
Define.f a_temp = TestNumber
Define.f a_temp_decimals
decimals2(a_temp, a_temp_decimals)
EndMacro
Macro TestCaseB
Define.f b_temp = TestNumber
Define.f b_temp_decimals = decimals(b_temp)
EndMacro
;---
#BENCHMARK_ITERATIONS = 100000000;0;0
SetPriorityClass_( GetCurrentProcess_(), #REALTIME_PRIORITY_CLASS)
Macro InitTestCase
Delay(1000)
RandomSeed(123456789)
EndMacro
;---
Define.i a_time_old, a_time_new
Define.i b_time_old, b_time_new
InitTestCase
a_time_old = ElapsedMilliseconds()
For i=1 To #BENCHMARK_ITERATIONS
TestCaseA
Next
a_time_new = ElapsedMilliseconds()
InitTestCase
b_time_old = ElapsedMilliseconds()
For i=1 To #BENCHMARK_ITERATIONS
TestCaseB
Next
b_time_new = ElapsedMilliseconds()
;---
Define.i a_result, b_result
Define.f a_percent, b_percent
a_result = ( a_time_new - a_time_old )
b_result = ( b_time_new - b_time_old )
a_percent = (( a_result / b_result ) * 100)
b_percent = (( b_result / a_result ) * 100)
Define.s result
result + #CRLF$
result + "A= " + Str(a_result) + "ms" + #LF$
result + "B= " + Str(b_result) + "ms" + #LF$
result + #CRLF$
result + StrF( a_percent, 2 ) +"%"+ " ( "+StrF( 100-a_percent, 2 )+"% )" + #LF$
result + StrF( b_percent, 2 ) +"%"+ " ( "+StrF( 100-b_percent, 2 )+"% )" + #LF$
result + #CRLF$
MessageRequester( "benchmark result", result )
no need for a higher resolution timer (tried, same results... it's a big scale). there should be a min or max routine to make more sense of the results but I left that out.
if you benchmark the random calls on each test, you'll see it takes the same time on both cases (with a margin of 0.1 due to the low resolution of the timer) hence they are "negating" each other in the test I just pasted.
so, there is no need to initialize the random numbers in an array at all but it would make for a better test.
Posted: Sat Nov 15, 2008 5:05 pm
by Trond
so, there is no need to initialize the random numbers in an array at all but it would make for a better test.
Think about it once more, please.
But I'm afraid it's much worse than that. Both codes actually gives the sign in addition to the decimals (the only thing asked for was the decimals). So they give the wrong result at negative values.
Better slow and correct than slow and wrong.

Posted: Sat Nov 15, 2008 5:11 pm
by superadnim
you can get rid of the top bit (the sign bit) and that'll do it, it's an extra instruction I recall.
anyway the results I got on this test is that test A is 24.90% faster than test B using the randoms and with the constant float (123.456) I get 39.80% for test A against test B.
since in reality I won't be using a constant... I think the first test is the only valid test here.

Posted: Sat Nov 15, 2008 5:14 pm
by superadnim
Trond wrote:[]Think about it once more, please.
I did and since benchmarking the random() calls (without anything else) on test A and B gave the same results on each test (equality) then it is safe to assume that they are not biasing nor skewing the results, it's just an extra few milliseconds I DONT care about since all I'm doing is taking a % out of the results, whether the percentage comes from bigger or smaller times doesn't matter since the extra added number here is assumed constant on both cases.
Posted: Sat Nov 15, 2008 5:18 pm
by Trond
superadnim wrote:Trond wrote:Think about it once more, please.
I did and since benchmarking the random() calls (without anything else) on test A and B gave the same results on each test (equality) then it is safe to assume that they are not biasing nor skewing the results, it's just an extra few milliseconds
I DONT care about since all I'm doing is taking a % out of the results, whether the percentage comes from bigger or smaller times doesn't matter since the extra added number here is assumed constant on both cases.
That's exactly why you care.
; Case 1
Define.f
NumA = 100
NumB = 200
C = NumA / NumB
Debug C
; Case with added constant value
Define.f
NumA = 100 + 800
NumB = 200 + 800
C = NumA / NumB
Debug C
Posted: Sat Nov 15, 2008 5:29 pm
by superadnim
But that's not what I'm doing, look:
Code: Select all
; Case 1
Define.f
NumA = 100
NumB = 200
result_a = (NumB-NumA)
; Case with added constant value
Define.f
NumA = 100 + 800
NumB = 200 + 800
result_b = (NumB-NumA)
Debug ((result_a / result_b) * 100)
Debug ((result_b / result_a) * 100)
the only problem I see is that you could get rounding errors if using floats and the constant being added is too big.
Posted: Sat Nov 15, 2008 5:38 pm
by Trond
Read your own code. I makes absolutely no sense.
Posted: Sat Nov 15, 2008 5:43 pm
by superadnim
Trond wrote:Read your own code. I makes absolutely no sense.
You're right, you aren't making any sense.
Please shed some light onto the obvious?
Posted: Sat Nov 15, 2008 5:53 pm
by Trond
superadnim wrote:Please shed some light onto the obvious?
Your variable a_result corresponds to my NumA with the added constant value. Think about it.
Code: Select all
Macro decimals( _n_ ) : (_n_-Int(_n_)) : EndMacro
Macro decimals2( _n_, _result_ )
!push $1f7f0000 ; FP control word needed to round to zero
!fstcw [esp] ; Save the current FP control word
!fldcw [esp+2] ; Set the FP to round to zero
!fld dword[v_#_n_] ; Load the FP with the number to work on
!fld st0 ; Replicate/push that number on the FP stack
!frndint ; Round number to leave integer
!fsubp st1,st0 ; Subtract from original to leave fraction part (and pop stack)
!fldcw [esp] ; Restore original FP control word
!fstp dword[v_#_result_]
!add esp,4 ; Clean up CPU stack
EndMacro
Macro decimals3(n, result)
! fld1
! fld dword [v_#n]
! fabs
! fld st0
! frndint
! fcomi st0, st1
! fldz
! fcmovnbe st0, st3
! fsub st1, st0
! fstp st0
! fsubp st1, st0
! fstp dword[v_#result]
! fstp st0
EndMacro
; SetPriorityClass_( GetCurrentProcess_(), #REALTIME_PRIORITY_CLASS)
#Tries = 10000000
temp.f = 6184927.2348192
temp2.f = -8234.2343
temp3.f = 0.0
temp4.f = 0.432
result.f
timeA = GetTickCount_()
For U = 0 To #Tries
decimals2(temp, result)
decimals2(temp2, result)
decimals2(temp3, result)
decimals2(temp4, result)
Next
timeA = GetTickCount_() - timeA
timeB = GetTickCount_()
For U = 0 To #Tries
result = decimals(temp)
result = decimals(temp2)
result = decimals(temp3)
result = decimals(temp4)
Next
timeB = GetTickCount_() - timeB
timeC = GetTickCount_()
For U = 0 To #Tries
decimals3(temp, result)
decimals3(temp2, result)
decimals3(temp3, result)
decimals3(temp4, result)
Next
timeC = GetTickCount_() - timeC
r.s = "decimals2: " + Str(timeA) + #CRLF$
r.s + "decimals: " + Str(timeB) + #CRLF$
r.s + "decimals3: " + Str(timeC) + #CRLF$
MessageRequester("", r)
Posted: Sat Nov 15, 2008 7:09 pm
by dioxin
Trond,
your method would appear more impressive if you called decimals3 somewhere instead of calling decimals2 twice!
Posted: Sat Nov 15, 2008 7:29 pm
by Trond
Thanks!

It should blow some doors now.

Let's just hope no bugs surface...
Posted: Sun Nov 16, 2008 9:03 am
by Little John
superadnim wrote:I don't understand your argument, the way I see it the random calls negate each other.
The way you see it is just mathematically wrong. The random calls
add to the time, and afterwards in order to get the percent value you are doing a
division.
This post by dioxin explains the situation pretty good.
Regards, Little John