Page 2 of 3

Posted: Thu Dec 14, 2006 3:23 pm
by Helle
For Fun with SSE:

Code: Select all

;- (near) single precision

Global T.f = 19.0
Global A.f

!movq xmm0,qword[v_T]
!sqrtss xmm0,xmm0 
!movq qword[v_A],xmm0
Debug A
or

Code: Select all

;- (near) double precision

Global T.d = 19.0
Global A.d

!movq xmm0,qword[v_T]
!sqrtsd xmm0,xmm0 
!movq qword[v_A],xmm0
Debug A
Test:

Code: Select all

Global T.f = 19.0

Procedure.f Sqrt(N.f) 
  !mov eax, [p.v_N] 
  !sub eax, $3F800000 
  !shr eax, 1 
  !add eax, $3F800000 
  !mov [esp-4], eax 
  !fld dword [esp-4] 
  CompilerIf #PB_Compiler_Debugger 
    ProcedureReturn 
  CompilerElse 
    !ret 4 
  CompilerEndIf 
EndProcedure 

#Tries = 50000000 

time = GetTickCount_() 
For I = 0 To #Tries 
 !movq xmm0,qword[v_T]
 !sqrtss xmm0,xmm0 
Next 
MessageRequester("", Str(GetTickCount_()-time)) 

z.f 
time = GetTickCount_() 
For I = 0 To #Tries 
  z = I 
  Sqrt(z) 
Next 
MessageRequester("", Str(GetTickCount_()-time)) 

z.f 
time = GetTickCount_() 
For I = 0 To #Tries 
  z = I 
  Sqr(z) 
Next 
MessageRequester("", Str(GetTickCount_()-time))
Gruss
Helle

Posted: Thu Dec 14, 2006 3:53 pm
by Derek
Well, thats certainly a lot faster. Isn't it safe to assume that most processors come with sse now and use this code.

Posted: Thu Dec 14, 2006 4:33 pm
by wilbert
I don't want to spoil the fun Helle but the speed comparisson isn't fair.

From my point of view to do a fair comparisson all methods should assign the return value of the sqr function to a variable. The sqr function is useless if no return value is used.

Besides that all methods should do the computation immediately or all should call a procedure since embedding the computation in a procedure slows things down. In this case the SSE routine is not embedded in a procedure and is only half a routine since no return value is stored. Therefore the speed can't be compared with the Sqrt function in the beginning.

Posted: Thu Dec 14, 2006 4:39 pm
by Shardik
Derek wrote: Isn't it safe to assume that most processors come with sse now and use this code
With Helle's fine code example from the German forum you can check the available extensions of the CPU on which your program is running:
http://www.purebasic.fr/german/viewtopic.php?t=10459

Posted: Thu Dec 14, 2006 11:31 pm
by Derek
@Shardik, thanks, will take a look.

I tried a few tests with the compiler options, changing the cpu, but as it says in the help file, nothing is implemented yet.

Using the code you pointed out will allow coding of processor specific optimizations.

Posted: Sat Dec 16, 2006 1:00 pm
by Helle
@wilbert: The result is stored in a xmm-register.

Code: Select all

Procedure.f Sqrt(N.f) 
  !mov eax, [p.v_N] 
  !sub eax, $3F800000 
  !shr eax, 1 
  !add eax, $3F800000 
  !mov [esp-4], eax 
  !fld dword [esp-4] 
  CompilerIf #PB_Compiler_Debugger 
    ProcedureReturn 
  CompilerElse 
    !ret 4 
  CompilerEndIf 
EndProcedure 

#Tries = 50000000 

z.f
time = GetTickCount_() 
For I = 0 To #Tries 
 !cvtsi2ss xmm1,[v_I]
 !sqrtss xmm0,xmm1
Next 
!movd [v_z],xmm0
MessageRequester("SSE", Str(GetTickCount_()-time)+#CRLF$+StrF(z)) 

z.f 
time = GetTickCount_() 
For I = 0 To #Tries 
  z = I 
  Sqrt(z) 
Next 
z=Sqrt(z)
MessageRequester("Procedure", Str(GetTickCount_()-time)+#CRLF$+StrF(z)) 

z.f 
time = GetTickCount_() 
For I = 0 To #Tries 
  z = I 
  Sqr(z) 
Next 
z=Sqr(z)
MessageRequester("PB", Str(GetTickCount_()-time)+#CRLF$+StrF(z))
Have a look at the results :wink: !

Gruss
Helle

Posted: Sat Dec 16, 2006 1:10 pm
by Fred
This is not a fair comparison because you don't use a procedure for your test, so you have a lot less overhead. Put your SSE code in a procedure and get the results..

Posted: Sun Dec 17, 2006 11:51 am
by remi_meier
No it's you who isn't fair :twisted: , you implemented Sqr() as an inline
function and so it would be a fair comparison to Helle's SSE code that
is also inlined.

But of course you're right if you want to compare to Sqrt() :P


btw: thanks :wink:

Posted: Sun Dec 17, 2006 12:44 pm
by Fred
remi_meier wrote:No it's you who isn't fair :twisted: , you implemented Sqr() as an inline
function and so it would be a fair comparison to Helle's SSE code that
is also inlined.
Ha, good point ;)

Posted: Tue Dec 19, 2006 3:40 pm
by Trond
But it still isn't fair because it uses I directly instead of assigning to z first!
And, the other loops does the computation one more time.

And more importantly, it always gives -1.#IND00 as the result here.

Fair test:

Code: Select all

#Tries = 100000000

z.f
time = GetTickCount_()
For I = 0 To #Tries
  z = I
  !cvtsi2ss xmm1,[v_I]
  !sqrtss xmm0,xmm1
Next
!movd [v_z],xmm0
MessageRequester("SSE", Str(GetTickCount_()-time)+#CRLF$+StrF(z))

z.f
time = GetTickCount_()
For I = 0 To #Tries
  z = I
  !mov eax, [v_z]
  !sub eax, $3F800000
  !shr eax, 1
  !add eax, $3F800000
  ; Don't store the result
Next
!mov eax, [v_z]
!sub eax, $3F800000
!shr eax, 1
!add eax, $3F800000
!mov [v_z], eax
MessageRequester("Inline sqrt", Str(GetTickCount_()-time)+#CRLF$+StrF(z))

Posted: Tue Dec 19, 2006 6:16 pm
by Helle
Hi,
cvtsi2ss xmm1,[v_I] is z=I ! Convert integer to single precision float (scalar).
Results (on my PC):
SSE: 875 / 10000.000000
Inline sqrt: 655 / 10199.515625 (!)

Time or/and precision - this is now the question :mrgreen: !

Gruss
Helle

Posted: Tue Dec 19, 2006 6:25 pm
by Trond
On my PC:
---------------------------
SSE
---------------------------
1201

-1.#IND00
---------------------------
OK
---------------------------
---------------------------
Inline sqrt
---------------------------
731

10199.515625
---------------------------
OK
---------------------------

Posted: Tue Dec 19, 2006 10:26 pm
by Derek
@Trond, what CPU do you have, does it even have sse instructions?

Posted: Tue Dec 19, 2006 10:47 pm
by Trond
I have an AMD Athlon XP-M 2400+. How do I know if it has SSE?

Edit: Yes, it has SSE, but not SSE2.

Posted: Wed Dec 20, 2006 11:21 am
by Derek
Thats strange then, I don't know whats causing that.

Edit. I was only looking at the sqr instruction and didn't bother looking at the rest of the code. Don't I feel like an idiot. :oops: