SqrtFloat ASM replacement !?

Everything else that doesn't fall into one of the other PB categories.
Helle
Enthusiast
Enthusiast
Posts: 178
Joined: Wed Apr 12, 2006 7:59 pm
Location: Germany
Contact:

Post by Helle »

For Fun with SSE:

Code: Select all

;- (near) single precision

Global T.f = 19.0
Global A.f

!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0 
!movq qword[v_A],xmm0
Debug A
or

Code: Select all

;- (near) double precision

Global T.d = 19.0
Global A.d

!movq xmm0,qword[v_T]
!sqrtsd xmm0,xmm0 
!movq qword[v_A],xmm0
Debug A
Test:

Code: Select all

Global T.f = 19.0

Procedure.f Sqrt(N.f) 
  !mov eax, [p.v_N] 
  !sub eax, $3F800000 
  !shr eax, 1 
  !add eax, $3F800000 
  !mov [esp-4], eax 
  !fld dword [esp-4] 
  CompilerIf #PB_Compiler_Debugger 
    ProcedureReturn 
  CompilerElse 
    !ret 4 
  CompilerEndIf 
EndProcedure 

#Tries = 50000000 

time = GetTickCount_() 
For I = 0 To #Tries 
 !movq xmm0,qword[v_T]
 !sqrtss xmm0,xmm0 
Next 
MessageRequester("", Str(GetTickCount_()-time)) 

z.f 
time = GetTickCount_() 
For I = 0 To #Tries 
  z = I 
  Sqrt(z) 
Next 
MessageRequester("", Str(GetTickCount_()-time)) 

z.f 
time = GetTickCount_() 
For I = 0 To #Tries 
  z = I 
  Sqr(z) 
Next 
MessageRequester("", Str(GetTickCount_()-time))
Gruss
Helle
Derek
Addict
Addict
Posts: 2354
Joined: Wed Apr 07, 2004 12:51 am
Location: England

Post by Derek »

Well, thats certainly a lot faster. Isn't it safe to assume that most processors come with sse now and use this code.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3943
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

I don't want to spoil the fun Helle but the speed comparisson isn't fair.

From my point of view to do a fair comparisson all methods should assign the return value of the sqr function to a variable. The sqr function is useless if no return value is used.

Besides that all methods should do the computation immediately or all should call a procedure since embedding the computation in a procedure slows things down. In this case the SSE routine is not embedded in a procedure and is only half a routine since no return value is stored. Therefore the speed can't be compared with the Sqrt function in the beginning.
User avatar
Shardik
Addict
Addict
Posts: 2060
Joined: Thu Apr 21, 2005 2:38 pm
Location: Germany

Post by Shardik »

Derek wrote: Isn't it safe to assume that most processors come with sse now and use this code
With Helle's fine code example from the German forum you can check the available extensions of the CPU on which your program is running:
http://www.purebasic.fr/german/viewtopic.php?t=10459
Derek
Addict
Addict
Posts: 2354
Joined: Wed Apr 07, 2004 12:51 am
Location: England

Post by Derek »

@Shardik, thanks, will take a look.

I tried a few tests with the compiler options, changing the cpu, but as it says in the help file, nothing is implemented yet.

Using the code you pointed out will allow coding of processor specific optimizations.
Helle
Enthusiast
Enthusiast
Posts: 178
Joined: Wed Apr 12, 2006 7:59 pm
Location: Germany
Contact:

Post by Helle »

@wilbert: The result is stored in a xmm-register.

Code: Select all

Procedure.f Sqrt(N.f) 
  !mov eax, [p.v_N] 
  !sub eax, $3F800000 
  !shr eax, 1 
  !add eax, $3F800000 
  !mov [esp-4], eax 
  !fld dword [esp-4] 
  CompilerIf #PB_Compiler_Debugger 
    ProcedureReturn 
  CompilerElse 
    !ret 4 
  CompilerEndIf 
EndProcedure 

#Tries = 50000000 

z.f
time = GetTickCount_() 
For I = 0 To #Tries 
 !cvtsi2ss xmm1,[v_I]
 !sqrtss xmm0,xmm1
Next 
!movd [v_z],xmm0
MessageRequester("SSE", Str(GetTickCount_()-time)+#CRLF$+StrF(z)) 

z.f 
time = GetTickCount_() 
For I = 0 To #Tries 
  z = I 
  Sqrt(z) 
Next 
z=Sqrt(z)
MessageRequester("Procedure", Str(GetTickCount_()-time)+#CRLF$+StrF(z)) 

z.f 
time = GetTickCount_() 
For I = 0 To #Tries 
  z = I 
  Sqr(z) 
Next 
z=Sqr(z)
MessageRequester("PB", Str(GetTickCount_()-time)+#CRLF$+StrF(z))
Have a look at the results :wink: !

Gruss
Helle
Fred
Administrator
Administrator
Posts: 18265
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

This is not a fair comparison because you don't use a procedure for your test, so you have a lot less overhead. Put your SSE code in a procedure and get the results..
remi_meier
Enthusiast
Enthusiast
Posts: 468
Joined: Sat Dec 20, 2003 6:19 pm
Location: Switzerland

Post by remi_meier »

No it's you who isn't fair :twisted: , you implemented Sqr() as an inline
function and so it would be a fair comparison to Helle's SSE code that
is also inlined.

But of course you're right if you want to compare to Sqrt() :P


btw: thanks :wink:
Athlon64 3700+, 1024MB Ram, Radeon X1600
Fred
Administrator
Administrator
Posts: 18265
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

remi_meier wrote:No it's you who isn't fair :twisted: , you implemented Sqr() as an inline
function and so it would be a fair comparison to Helle's SSE code that
is also inlined.
Ha, good point ;)
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

But it still isn't fair because it uses I directly instead of assigning to z first!
And, the other loops does the computation one more time.

And more importantly, it always gives -1.#IND00 as the result here.

Fair test:

Code: Select all

#Tries = 100000000

z.f
time = GetTickCount_()
For I = 0 To #Tries
  z = I
  !cvtsi2ss xmm1,[v_I]
  !sqrtss xmm0,xmm1
Next
!movd [v_z],xmm0
MessageRequester("SSE", Str(GetTickCount_()-time)+#CRLF$+StrF(z))

z.f
time = GetTickCount_()
For I = 0 To #Tries
  z = I
  !mov eax, [v_z]
  !sub eax, $3F800000
  !shr eax, 1
  !add eax, $3F800000
  ; Don't store the result
Next
!mov eax, [v_z]
!sub eax, $3F800000
!shr eax, 1
!add eax, $3F800000
!mov [v_z], eax
MessageRequester("Inline sqrt", Str(GetTickCount_()-time)+#CRLF$+StrF(z))
Helle
Enthusiast
Enthusiast
Posts: 178
Joined: Wed Apr 12, 2006 7:59 pm
Location: Germany
Contact:

Post by Helle »

Hi,
cvtsi2ss xmm1,[v_I] is z=I ! Convert integer to single precision float (scalar).
Results (on my PC):
SSE: 875 / 10000.000000
Inline sqrt: 655 / 10199.515625 (!)

Time or/and precision - this is now the question :mrgreen: !

Gruss
Helle
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

On my PC:
---------------------------
SSE
---------------------------
1201

-1.#IND00
---------------------------
OK
---------------------------
---------------------------
Inline sqrt
---------------------------
731

10199.515625
---------------------------
OK
---------------------------
Derek
Addict
Addict
Posts: 2354
Joined: Wed Apr 07, 2004 12:51 am
Location: England

Post by Derek »

@Trond, what CPU do you have, does it even have sse instructions?
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

I have an AMD Athlon XP-M 2400+. How do I know if it has SSE?

Edit: Yes, it has SSE, but not SSE2.
Derek
Addict
Addict
Posts: 2354
Joined: Wed Apr 07, 2004 12:51 am
Location: England

Post by Derek »

Thats strange then, I don't know whats causing that.

Edit. I was only looking at the sqr instruction and didn't bother looking at the rest of the code. Don't I feel like an idiot. :oops:
Last edited by Derek on Wed Dec 20, 2006 1:05 pm, edited 1 time in total.
Post Reply