PureBasic Forums - English

Posted: **Sat Nov 15, 2008 4:32 pm**

yes the debugger is off however I forgot to mention a key factor: my benchmark does a tiny attempt to reproduce real-life conditions in the sense that the input data is randomly generated however on both tests the code is the same and I'm not benchmarking the actual time it takes (well I am) but the relation in between test A and B (the ratio)

that said, I don't think my test is flawed

the test cases:

Code: Select all

Macro TestCaseA
	Define.f temp 					= Random(10)+Random(10)*0.1
	Define.f temp_decimals
	decimals2(temp, temp_decimals) 
EndMacro

Macro TestCaseB
	Define.f temp 					= Random(10)+Random(10)*0.1
	Define.f temp_decimals 	= decimals(temp) 
EndMacro

since the first line is the same, it's ruled out of the equation pretty much even though the benchmarks take much longer... (and for the sake of it, in between each test the random seed is set to the same one used at the first case - and no I'm not counting this)

another thing is that I'm running the process in realtime priority class, the results differ by quite a lot if I don't do this.

the reason I asked about where to put the fstp is simply because of the way processors are predicting branching and whatnot I think by keeping all of the fpu calls together and in pairs things tend to run better?, might not be the case with this code though.

I'll simplify the test scenario and post the code soon.

Posted: **Sat Nov 15, 2008 4:53 pm**

superadnim,

even though the benchmarks take much longer

That's the answer, "even though it takes much longer".
For the sake of argument, let's say the INT() method takes 500clks and the ASM method takes 50clks and your generating of random numbers takes 2,000clks.

Total time would then be 2,500 vs 2,050 a saving of around 20% but the ASM is actually 10 times faster then the INT() it replaces.

My guess is that generating your random numbers is taking most of the time and is masking the savings from the ASM.

Posted: **Sat Nov 15, 2008 5:00 pm**

I don't understand your argument, the way I see it the random calls negate each other. a real benchmark should take into consideration real tests not just "how fast is this code executing by itself" because the latter is no good in reality, or is it?

Code: Select all

;---

Macro decimals( _n_ ) : (_n_-Int(_n_)) : EndMacro 

Macro decimals2( _n_, _result_ )
	
	!push $1f7f0000      			; FP control word needed to round to zero 
	!fstcw [esp]         			; Save the current FP control word 
	!fldcw [esp+2]       			; Set the FP to round to zero 
	!fld dword[v_#_n_] 	 			; Load the FP with the number to work on 
	!fld st0             			; Replicate/push that number on the FP stack 
	!frndint             			; Round number to leave integer 
	!fsubp st1,st0       			; Subtract from original to leave fraction part (and pop stack) 
	!fldcw [esp]         			; Restore original FP control word 
	!fstp dword[v_#_result_]
	!add esp,4           			; Clean up CPU stack 
	
EndMacro

;---

Macro TestNumber
	Random(10)+Random(10)*0.1 ; 123.456 ; 
EndMacro

Macro TestCaseA
	Define.f a_temp 					= TestNumber
	Define.f a_temp_decimals
	decimals2(a_temp, a_temp_decimals) 
EndMacro

Macro TestCaseB
	Define.f b_temp 					= TestNumber
	Define.f b_temp_decimals 	= decimals(b_temp) 
EndMacro

;---

#BENCHMARK_ITERATIONS = 100000000;0;0
SetPriorityClass_( GetCurrentProcess_(), #REALTIME_PRIORITY_CLASS)

Macro InitTestCase
	Delay(1000)
	RandomSeed(123456789)
EndMacro

;---

Define.i a_time_old, a_time_new
Define.i b_time_old, b_time_new

InitTestCase

a_time_old = ElapsedMilliseconds()
For i=1 To #BENCHMARK_ITERATIONS
	TestCaseA
Next
a_time_new = ElapsedMilliseconds()

InitTestCase

b_time_old = ElapsedMilliseconds()
For i=1 To #BENCHMARK_ITERATIONS
	TestCaseB
Next
b_time_new = ElapsedMilliseconds()

;---

Define.i a_result, b_result
Define.f a_percent, b_percent

a_result 	= ( a_time_new - a_time_old )
b_result 	= ( b_time_new - b_time_old )
a_percent = (( a_result / b_result ) * 100)
b_percent = (( b_result / a_result ) * 100)

Define.s result

result + #CRLF$
result + "A= " 	+ Str(a_result) + "ms" + #LF$
result + "B= " 	+ Str(b_result) + "ms" + #LF$
result + #CRLF$
result + StrF( a_percent, 2 ) +"%"+ " ( "+StrF( 100-a_percent, 2 )+"% )" + #LF$
result + StrF( b_percent, 2 ) +"%"+ " ( "+StrF( 100-b_percent, 2 )+"% )" + #LF$
result + #CRLF$

MessageRequester( "benchmark result", result )

no need for a higher resolution timer (tried, same results... it's a big scale). there should be a min or max routine to make more sense of the results but I left that out.

if you benchmark the random calls on each test, you'll see it takes the same time on both cases (with a margin of 0.1 due to the low resolution of the timer) hence they are "negating" each other in the test I just pasted.

so, there is no need to initialize the random numbers in an array at all but it would make for a better test.

Posted: **Sat Nov 15, 2008 5:05 pm**

so, there is no need to initialize the random numbers in an array at all but it would make for a better test.

Think about it once more, please.

But I'm afraid it's much worse than that. Both codes actually gives the sign in addition to the decimals (the only thing asked for was the decimals). So they give the wrong result at negative values.

Better slow and correct than slow and wrong.

Posted: **Sat Nov 15, 2008 5:11 pm**

you can get rid of the top bit (the sign bit) and that'll do it, it's an extra instruction I recall.

anyway the results I got on this test is that test A is 24.90% faster than test B using the randoms and with the constant float (123.456) I get 39.80% for test A against test B.

since in reality I won't be using a constant... I think the first test is the only valid test here.

Posted: **Sat Nov 15, 2008 5:14 pm**

Trond wrote:[]Think about it once more, please.

I did and since benchmarking the random() calls (without anything else) on test A and B gave the same results on each test (equality) then it is safe to assume that they are not biasing nor skewing the results, it's just an extra few milliseconds I DONT care about since all I'm doing is taking a % out of the results, whether the percentage comes from bigger or smaller times doesn't matter since the extra added number here is assumed constant on both cases.

Posted: **Sat Nov 15, 2008 5:18 pm**

superadnim wrote:
Trond wrote:Think about it once more, please.
I did and since benchmarking the random() calls (without anything else) on test A and B gave the same results on each test (equality) then it is safe to assume that they are not biasing nor skewing the results, it's just an extra few milliseconds I DONT care about since all I'm doing is taking a % out of the results, whether the percentage comes from bigger or smaller times doesn't matter since the extra added number here is assumed constant on both cases.

That's exactly why you care.

; Case 1
Define.f
NumA = 100
NumB = 200
C = NumA / NumB
Debug C

; Case with added constant value
Define.f
NumA = 100 + 800
NumB = 200 + 800
C = NumA / NumB
Debug C

Posted: **Sat Nov 15, 2008 5:29 pm**

But that's not what I'm doing, look:

Code: Select all

; Case 1 
Define.f 
NumA = 100
NumB = 200
result_a = (NumB-NumA)

; Case with added constant value 
Define.f 
NumA = 100 + 800
NumB = 200 + 800
result_b = (NumB-NumA)

Debug ((result_a / result_b) * 100)
Debug ((result_b / result_a) * 100)

the only problem I see is that you could get rounding errors if using floats and the constant being added is too big.

Posted: **Sat Nov 15, 2008 5:38 pm**

Read your own code. I makes absolutely no sense.

Posted: **Sat Nov 15, 2008 5:43 pm**

Trond wrote:Read your own code. I makes absolutely no sense.

You're right, you aren't making any sense.

Please shed some light onto the obvious?

Posted: **Sat Nov 15, 2008 5:53 pm**

superadnim wrote:Please shed some light onto the obvious?

Your variable a_result corresponds to my NumA with the added constant value. Think about it.

Code: Select all

Macro decimals( _n_ ) : (_n_-Int(_n_)) : EndMacro

Macro decimals2( _n_, _result_ )
   
   !push $1f7f0000               ; FP control word needed to round to zero
   !fstcw [esp]                  ; Save the current FP control word
   !fldcw [esp+2]                ; Set the FP to round to zero
   !fld dword[v_#_n_]              ; Load the FP with the number to work on
   !fld st0                      ; Replicate/push that number on the FP stack
   !frndint                      ; Round number to leave integer
   !fsubp st1,st0                ; Subtract from original to leave fraction part (and pop stack)
   !fldcw [esp]                  ; Restore original FP control word
   !fstp dword[v_#_result_]
   !add esp,4                    ; Clean up CPU stack
   
EndMacro


Macro decimals3(n, result)
  ! fld1
  ! fld dword [v_#n]
  ! fabs
  ! fld st0
  ! frndint
  ! fcomi st0, st1
  ! fldz
  ! fcmovnbe st0, st3
  ! fsub st1, st0
  ! fstp st0
  ! fsubp st1, st0
  ! fstp dword[v_#result]
  ! fstp st0
EndMacro


; SetPriorityClass_( GetCurrentProcess_(), #REALTIME_PRIORITY_CLASS)

#Tries = 10000000

temp.f = 6184927.2348192
temp2.f = -8234.2343
temp3.f = 0.0
temp4.f = 0.432
result.f

timeA = GetTickCount_()
For U = 0 To #Tries
  decimals2(temp, result)
  decimals2(temp2, result)
  decimals2(temp3, result)
  decimals2(temp4, result)
Next
timeA = GetTickCount_() - timeA

timeB = GetTickCount_()
For U = 0 To #Tries
  result = decimals(temp)
  result = decimals(temp2)
  result = decimals(temp3)
  result = decimals(temp4)
Next
timeB = GetTickCount_() - timeB

timeC = GetTickCount_()
For U = 0 To #Tries
  decimals3(temp, result)
  decimals3(temp2, result)
  decimals3(temp3, result)
  decimals3(temp4, result)
Next
timeC = GetTickCount_() - timeC

r.s = "decimals2: " + Str(timeA) + #CRLF$
r.s + "decimals: " + Str(timeB) + #CRLF$
r.s + "decimals3: " + Str(timeC) + #CRLF$

MessageRequester("", r)

Posted: **Sat Nov 15, 2008 7:09 pm**

Trond,
your method would appear more impressive if you called decimals3 somewhere instead of calling decimals2 twice!

Posted: **Sat Nov 15, 2008 7:29 pm**

Thanks!

It should blow some doors now.

Let's just hope no bugs surface...

Posted: **Sun Nov 16, 2008 9:03 am**

superadnim wrote:I don't understand your argument, the way I see it the random calls negate each other.

The way you see it is just mathematically wrong. The random calls add to the time, and afterwards in order to get the percent value you are doing a division. This post by dioxin explains the situation pretty good.

Regards, Little John

PureBasic Forums - English

Fastest way to get the decimals of a float?