SqrtFloat ASM replacement !?

Everything else that doesn't fall into one of the other PB categories.
Ralf
Enthusiast
Enthusiast
Posts: 203
Joined: Fri May 30, 2003 1:29 pm
Location: Germany

SqrtFloat ASM replacement !?

Post by Ralf »

i am trying to have a fast and small SQR() replacement. What do you think about following code? Are the results exact enough for you? Have i call Push and Pop too? If this routine is good enough, i will do some performence tests!

Code: Select all

; ---- SqrtFloat Replacement !? ----

t.f = 45     ; your integer or float value for sqr

MOV eax,t

SUB eax,$3F800000
SAR eax,1
ADD eax,$3F800000

MOV t,eax

MessageRequester("ASM Testing", "Result: "+StrF(t), 0)

End 
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

Try this:

Code: Select all

; ---- SqrtFloat Replacement !? ---- 

t.f = 45     ; your integer or float value for sqr 
!fld dword[v_t]
!fsqrt
!fstp dword[v_t]

MessageRequester("ASM Testing", "Result: "+StrF(t), 0) 

End 
But Sqr() PB function is fast enough

Ther is a wellknown algorithm to perform sqr using ALU only, but it is loop based. If you want to know it you can find it using google.
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
traumatic
PureBasic Expert
PureBasic Expert
Posts: 1661
Joined: Sun Apr 27, 2003 4:41 pm
Location: Germany
Contact:

Post by traumatic »

Psychophanta wrote:But Sqr() PB function is fast enough
In time critical situations, nothing is fast enough ;)

Ralfs suggestion was just a rough estimation of sqrt, which I bet will be faster
than a "real sqrt" while still having a sufficient result in cases where speed is
more important than accuracy.
Good programmers don't comment their code. It was hard to write, should be hard to read.
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

Traumatic, you're wrong.
Current Intel FPU is as faster or more than ALU. Which means that you will get more speed at more accuracy.

Simply compare speeds with Sqr() PB function against another ASM function made using ALU only. But before of it: if you want to bet, i am disposed (i say Sqr() will win in: accuracy and speed) :)
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
traumatic
PureBasic Expert
PureBasic Expert
Posts: 1661
Joined: Sun Apr 27, 2003 4:41 pm
Location: Germany
Contact:

Post by traumatic »

Psychophanta wrote:Current Intel FPU is as faster or more than ALU. Which means that you will get more speed at more accuracy.
I must admit I don't know anything about current processor's capabilities so
you may be right. What about compatibility?
[...]if you want to bet, i am disposed (i say Sqr() will win in: accuracy and speed) :)
:lol:
Good programmers don't comment their code. It was hard to write, should be hard to read.
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

traumatic wrote:What about compatibility?
No problem. Since Intel based PCs with FPU, this is since intel 8087, 80287, 80387 FPUs.
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
traumatic
PureBasic Expert
PureBasic Expert
Posts: 1661
Joined: Sun Apr 27, 2003 4:41 pm
Location: Germany
Contact:

Post by traumatic »

No I rather meant: "Will it be faster on CPUs other than the latest Intel one?"
Good programmers don't comment their code. It was hard to write, should be hard to read.
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Post by Psychophanta »

traumatic wrote:No I rather meant: "Will it be faster on CPUs other than the latest Intel one?"
Obviously it depends on processor and FPU speed, but always will be faster and more accurate using FPU than using ALU.
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;
Ralf
Enthusiast
Enthusiast
Posts: 203
Joined: Fri May 30, 2003 1:29 pm
Location: Germany

Post by Ralf »

thanks to Psychophanta and traumatic for the discussion! also as i understand it right, my routine is nowadays more or less useless because psychophanta´s methode is fast enough on today cpus?
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

The asm code is faster but less accurate.

Code: Select all

Procedure.f Sqrt(N.f)
  !mov eax, [p.v_N]
  !sub eax, $3F800000
  !shr eax, 1
  !add eax, $3F800000
  !mov [esp-4], eax
  !fld dword [esp-4]
  CompilerIf #PB_Compiler_Debugger
    ProcedureReturn
  CompilerElse
    !ret 4
  CompilerEndIf
EndProcedure


#Tries = 50000000

z.f
time = GetTickCount_()
For I = 0 To #Tries
  z = I
  Sqrt(z)
Next
MessageRequester("", Str(GetTickCount_()-time))

z.f
time = GetTickCount_()
For I = 0 To #Tries
  z = I
  Sqr(z)
Next
MessageRequester("", Str(GetTickCount_()-time))
Derek
Addict
Addict
Posts: 2354
Joined: Wed Apr 07, 2004 12:51 am
Location: England

Post by Derek »

I believe SQR() is faster on AMD cpu's than it is on Intels.

Using the FPU commands that is.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Well, I have an AMD and the asm version (not fpu) is faster.
Derek
Addict
Addict
Posts: 2354
Joined: Wed Apr 07, 2004 12:51 am
Location: England

Post by Derek »

I started a thread about speed of core2's, tested using sqr() and most people with amd's were faster.

http://www.purebasic.fr/english/viewtopic.php?t=24870

Of course, you have to take speed of cpu into account, but I think amd's are generally faster at some functions. (And of course, slower at others.)

This is just my opinion and would need testing to be sure.

@Trond, but the fpu is more accurate though, like you said it would be.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Yes, the accuracy of the asm function is dreadful.
Derek
Addict
Addict
Posts: 2354
Joined: Wed Apr 07, 2004 12:51 am
Location: England

Post by Derek »

if you take an average of the quick result and the number you are trying to find the root of divided by the quick result then you improve the accuracy by quite a large margin.

Code: Select all

OpenConsole()

i=100000000; any less than 1000000 and it becomes slower than SQR()

r=ElapsedMilliseconds()
For l=1 To i
g.f=Sqr(l)
Next
r=ElapsedMilliseconds()-r
PrintN("Actual SQR() = "+StrF(g)+" in "+Str(r))

r=ElapsedMilliseconds()
For l=1 To i
g.f = l
MOV eax,g 
SUB eax,$3F800000 
SAR eax,1 
ADD eax,$3F800000 
MOV g,eax
Next
r=ElapsedMilliseconds()-r
PrintN("Approx SQR() = "+StrF(g)+" in "+Str(r))

r=ElapsedMilliseconds()
For l=1 To i
g.f = l  
MOV eax,g 
SUB eax,$3F800000 
SAR eax,1 
ADD eax,$3F800000 
MOV g,eax
n.f=(g+l/g)/2
Next
r=ElapsedMilliseconds()-r
PrintN("Averaged SQR() = "+StrF(n)+" in "+Str(r))
Input()
Can someone convert this line to ASM.

Code: Select all

n.f=(g+l/g)/2
Post Reply