FPU how to cleanup register?

Bare metal programming in PureBasic, for experienced users
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

FPU how to cleanup register?

Post by Mijikai »

I want to calculate timing differences but when i use my code without finit it works only a few times in succesion?

I suspect that my code has an issue.
What can i do/ clean up to make it work without finit?

Code:

Code: Select all

proc Timing
     finit
     fld qword[rcx+0x20]
     fild qword[rcx+0x30]
     fild qword[rcx+0x38]
     fsub st0,st1
     fdiv st0,st2
     fstp qword[rcx+0x40];<- this pops st0
     ret
endp
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: FPU how to cleanup register?

Post by wilbert »

You load three values onto the fpu stack and only remove one. That won't work.
Use fsubp and fdivp instead of fsub and fdiv.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: FPU how to cleanup register?

Post by Mijikai »

Thanks wilbert :)
I changed it and it works but im still not sure if it is correct?

Code:

Code: Select all

proc Timing
     fild qword[rcx+0x38]
     fild qword[rcx+0x30]
     fsubp
     fld qword[rcx+0x20]
     fdivp
     fstp qword[rcx+0x40]
     ret
endp 
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: FPU how to cleanup register?

Post by wilbert »

Mijikai wrote:I changed it and it works but im still not sure if it is correct?
The code looks fine to me.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: FPU how to cleanup register?

Post by Mijikai »

wilbert wrote:
Mijikai wrote:I changed it and it works but im still not sure if it is correct?
The code looks fine to me.
Thank you :)
I also found no more issues while testing.
I hardly touch the fpu.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: FPU how to cleanup register?

Post by wilbert »

Mijikai wrote:I hardly touch the fpu.
In most cases you can use SSE2 instructions if you don't like to use the x87 ones.

By the way, if the value you are using for division is a constant, it is a lot faster to calculate the reciprocal once and multiply by that.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: FPU how to cleanup register?

Post by Mijikai »

wilbert wrote:
Mijikai wrote:I hardly touch the fpu.
In most cases you can use SSE2 instructions if you don't like to use the x87 ones.

By the way, if the value you are using for division is a constant, it is a lot faster to calculate the reciprocal once and multiply by that.
Sounds great but i have no idea what a reciprocal is :oops:
[rcx+0x20] in my case is static.

Is it just:

Code: Select all

Constant =  (1 / Constant)
Result = Input * Constant
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: FPU how to cleanup register?

Post by wilbert »

Mijikai wrote:Is it just:

Code: Select all

Constant =  (1 / Constant)
Result = Input * Constant
Yes, that is it :)
Multiply is a much faster cpu operation compared to division.
Windows (x64)
Raspberry Pi OS (Arm64)
Helle
Enthusiast
Enthusiast
Posts: 178
Joined: Wed Apr 12, 2006 7:59 pm
Location: Germany
Contact:

Re: FPU how to cleanup register?

Post by Helle »

With this you can check the FPU-Registers:

Code: Select all

Procedure FPU()
  Global.q SaveRam, SaveRamAli, Ali       
  Global.d FST0, FST1, FST2, FST3, FST4, FST5, FST6, FST7 
  Global Tag.b
  Protected Tag0$="   Free", Tag1$="   Free",  Tag2$="   Free",  Tag3$="   Free", Tag4$="   Free",  Tag5$="   Free",  Tag6$="   Free",  Tag7$="   Free"

  SaveRam = AllocateMemory(512+64)     ;512 Byte FXSAVE
  Ali = SaveRam % 64
  SaveRamAli = SaveRam + Ali

  !mov rsi,[v_SaveRamAli]
  !fxsave64 [rsi]

  !mov al,[rsi+4]            ;Tag
  !mov [v_Tag],al
  !test [v_Tag],1
 !jz @f
  Tag7$="   Busy"
 !@@:
  !test [v_Tag],2
 !jz @f
  Tag6$="   Busy"
 !@@:
  !test [v_Tag],4
 !jz @f
  Tag5$="   Busy"
 !@@:
  !test [v_Tag],8
 !jz @f
  Tag4$="   Busy"
 !@@:
  !test [v_Tag],10h
 !jz @f
  Tag3$="   Busy"
 !@@:
  !test [v_Tag],20h
 !jz @f
  Tag2$="   Busy"
 !@@:
  !test [v_Tag],40h
 !jz @f
  Tag1$="   Busy"
 !@@:
  !test [v_Tag],80h
 !jz @f
  Tag0$="   Busy"
 !@@:

  !mov rsi,[v_SaveRamAli]
  !fld tword[rsi+32]
  !fstp qword[v_FST0] 
  !fld tword[rsi+48]
  !fstp qword[v_FST1]
  !fld tword[rsi+64]
  !fstp qword[v_FST2] 
  !fld tword[rsi+80]
  !fstp qword[v_FST3]
  !fld tword[rsi+96]
  !fstp qword[v_FST4] 
  !fld tword[rsi+112]
  !fstp qword[v_FST5]
  !fld tword[rsi+128]
  !fstp qword[v_FST6] 
  !fld tword[rsi+144]
  !fstp qword[v_FST7]

  Reg$ = "ST0 = " + StrD(FST0, 20) + Tag0$ + #LFCR$
  Reg$ + "ST1 = " + StrD(FST1, 20) + Tag1$ + #LFCR$
  Reg$ + "ST2 = " + StrD(FST2, 20) + Tag2$ + #LFCR$
  Reg$ + "ST3 = " + StrD(FST3, 20) + Tag3$ + #LFCR$
  Reg$ + "ST4 = " + StrD(FST4, 20) + Tag4$ + #LFCR$
  Reg$ + "ST5 = " + StrD(FST5, 20) + Tag5$ + #LFCR$
  Reg$ + "ST6 = " + StrD(FST6, 20) + Tag6$ + #LFCR$
  Reg$ + "ST7 = " + StrD(FST7, 20) + Tag7$

  FreeMemory(SaveRam)

  MessageRequester("FPU-Register", Reg$)
EndProcedure

;Test
Data1.d = 20.0 
Data2.q = 30
Data3.q = 40
Result.d

;FPU()
;!finit
FPU()
!fild qword[v_Data3]
FPU()
!fild qword[v_Data2]
FPU()
!fsubp
FPU()
!fdiv qword[v_Data1]
FPU()
!fstp qword[v_Result]
FPU()
Debug Result
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: FPU how to cleanup register?

Post by Mijikai »

Thanks wilbert & Helle :)
That is really helpful Helle 8)
now i can easily see my mistakes XD

I created a small timer library in fasm to calculate FPS, frametime and deltatime.
I would post the source for everyone if i get some help in optimizing/ making it better.
Its not that big (~400 lines) and it might become a nice tutorial for anyone that is interested in asm :)
Helle
Enthusiast
Enthusiast
Posts: 178
Joined: Wed Apr 12, 2006 7:59 pm
Location: Germany
Contact:

Re: FPU how to cleanup register?

Post by Helle »

Corona: No pubs, no bars... Much time for program -> corrected version (physical register):

Code: Select all

Procedure FPU(FPU_ID)
  Protected.q SaveRam, SaveRamAli, C1, PhysReg, RFlags, CW, SW, Tags
  Protected.d FST0, FST1, FST2, FST3, FST4, FST5, FST6, FST7 
  Protected.s sN = "Clear", sY = "Set", sSW = "S E T !", sTag0 = "Free", sTag1 = "Free", sTag2 = "Free", sTag3 = "Free", sTag4 = "Free", sTag5 = "Free", sTag6 = "Free", sTag7 = "Free", sOcc = "Occupied"
  Protected.s sTag00, sTag01, sTag02, sTag03, sTag04, sTag05, sTag06, sTag07
  Protected.s sSF = "  Stack Fault: ", sSet = "Settings:", sStat = "Status:", sEF = "  Exception Flags:", sCCF = "  Condition Code Flags:", sC0 = "    C0: ", sC1 = "    C1: ", sC2 = "    C2: ", sC3 = "    C3: " 
  Protected.s sST0 = "ST0 = ", sST1 = "ST1 = ", sST2 = "ST2 = ", sST3 = "ST3 = ", sST4 = "ST4 = ", sST5 = "ST5 = ", sST6 = "ST6 = ", sST7 = "ST7 = ", sTS = "  Top of Stack: Physical Register R"
  Protected.s sEM = "  Exception Masks:", sIO = "    Invalid Operation: ", sDO = "    Denormal Operand: ", sZD = "    Zero Divide: ", sOF = "    Overflow: ", sUF = "    Underflow: ", sPR = "    Precision: " 
  Protected.s sPrec = "  Precision: ", sRound = "  Rounding: Round ", sR1 = "to nearest (even)", sR2 = "down (toward -Infinity)", sR3 = "up (toward +Infinity)", sR4 = "toward zero (Truncate)"                    
  Protected.s sPR1 = "  (Low Precision! Inexact Result!)", sWarn = "    W A R N I N G ! ! !", sPrecE = "Double Extended Precision 64 Bits", sPrecD = "Double Precision 53 Bits", sPrecS = "Single Precision 24 Bits"
  Protected.s sReg, sReport, sIOM, sIOE, sDOM, sDOE, sZDM, sZDE, sOFM, sOFE, sUFM, sUFE, sPRM, sPRE

  !pushfq                              ;save RFlag-Registers; FCOMI(P) etc! 
  !pop qword[p.v_RFlags]

  SaveRam = AllocateMemory(512 + 64)   ;512 Byte for FXSAVE
  SaveRamAli = SaveRam + (SaveRam % 64)

  !mov r10,[p.v_SaveRamAli]
  !fxsave64 [r10]

  !mov ax,[r10]
  !mov [p.v_CW],rax          ;Control Word

  !mov ax,[r10+2]
  !mov [p.v_SW],rax          ;Status Word

  !mov al,[r10+4]
  !mov [p.v_Tags],rax        ;Tags

  !fld tword[r10+32]         ;Values
  !fstp qword[p.v_FST0]
  !fld tword[r10+48]
  !fstp qword[p.v_FST1]
  !fld tword[r10+64]
  !fstp qword[p.v_FST2]
  !fld tword[r10+80]
  !fstp qword[p.v_FST3]
  !fld tword[r10+96]
  !fstp qword[p.v_FST4]
  !fld tword[r10+112]
  !fstp qword[p.v_FST5]
  !fld tword[r10+128]
  !fstp qword[p.v_FST6]
  !fld tword[r10+144]
  !fstp qword[p.v_FST7]

  ;Control Word
  !test word[p.v_CW],100h    ;FPUCW Precision, Bits 8 and 9
 !jnz .ext
  !test word[p.v_CW],200h
 !jnz .double
  sPrec + sPrecS             ;Single Precision 24 Bits
 !jmp .precend
 !.double:
  sPrec + sPrecD             ;Double Precision 53 Bits
 !jmp .precend
 !.ext:
  sPrec + sPrecE             ;Double Extended Precision 64 Bits
 !.precend:
  sPrec + #LFCR$

  !test word[p.v_CW],0c00h   ;FPUCW Rounding, Bits 10 and 11
 !jz .round1                 ;00
  !test word[p.v_CW],0800h
 !jz .round2                 ;01
  !test word[p.v_CW],0400h    
 !jz .round3                 ;10
  sRound + sR4               ;11 toward zero (Truncate)
 !jmp .roundend
 !.round3:
  sRound + sR3               ;up (toward +Infinity)
 !jmp .roundend
 !.round2:
  sRound + sR2               ;down (toward -Infinity)
 !jmp .roundend
 !.round1:
  sRound + sR1               ;to nearest (even)
 !.roundend:
  sRound + #LFCR$

  !test word[p.v_CW],1       ;FPUCW Invalid Operation
 !jz @f
  sIOM = sIO + sY
 !jmp .IOM
 !@@:
  sIOM = sIO + sN
 !.IOM:

  !test word[p.v_CW],2       ;FPUCW Denormal Operand
 !jz @f
  sDOM = sDO + sY
 !jmp .DO
 !@@:
  sDOM = sDO + sN
 !.DO:

  !test word[p.v_CW],4       ;FPUCW Zero Divide
 !jz @f
  sZDM = sZD + sY
 !jmp .ZD
 !@@:
  sZDM = sZD + sN
 !.ZD:

  !test word[p.v_CW],8       ;FPUCW Overflow
 !jz @f
  sOFM = sOF + sY
 !jmp .OF
 !@@:
  sOFM = sOF + sN
 !.OF:

  !test word[p.v_CW],10h     ;FPUCW Underflow
 !jz @f
  sUFM = sUF + sY
 !jmp .UF
 !@@:
  sUFM = sUF + sN
 !.UF:

  !test word[p.v_CW],20h     ;FPUCW Precision
 !jz @f
  sPRM = sPR + sY
 !jmp .PM
 !@@:
  sPRM = sPR + sN
 !.PM:

  ;Status Word
  !mov ax,[p.v_SW]           ;FPUSW PhysReg TOS
  !shr rax,0bh               ;shift in Bit 0-2
  !and rax,111b
  !mov [p.v_PhysReg],rax
  sTS + Str(PhysReg) + #LFCR$

  !test word[p.v_SW],1       ;FPUSW Invalid Operation
 !jz @f
  sIOE = sIO + sSW + sWarn
 !jmp .IOE
 !@@:
  sIOE = sIO + sN
 !.IOE:

  !test word[p.v_SW],2       ;FPUSW Denormal Operand
 !jz @f
  sDOE = sDO + sSW + sWarn
 !jmp .DOE
 !@@:
  sDOE = sDO + sN
 !.DOE:

  !test word[p.v_SW],4       ;FPUSW Zero Divide
 !jz @f
  sZDE = sZD + sSW + sWarn
 !jmp .ZDE
 !@@:
  sZDE = sZD + sN
 !.ZDE:

  !test word[p.v_SW],8       ;FPUSW Overflow
 !jz @f
  sOFE = sOF + sSW + sWarn
 !jmp .OFE
 !@@:
  sOFE = sOF + sN
 !.OFE:

  !test word[p.v_SW],10h     ;FPUSW Underflow
 !jz @f
  sUFE = sUF + sSW + sWarn
 !jmp .UFE
 !@@:
  sUFE = sUF + sN
 !.UFE:

  !test word[p.v_SW],100h    ;FPUSW C0
 !jz @f 
  sC0 + sSW
  !jmp .C0E
 !@@:
  sC0 + sN
 !.C0E:

  !test word[p.v_SW],200h    ;FPUSW C1
 !jz @f 
  C1 = 1                     ;for Stack Fault
  sC1 + sSW
  !jmp .C1E
 !@@:
  sC1 + sN
 !.C1E:

  !test word[p.v_SW],400h    ;FPUSW C2
 !jz @f 
  sC2 + sSW
  !jmp .C2E
 !@@:
  sC2 + sN
 !.C2E:

  !test word[p.v_SW],4000h   ;FPUSW C3
 !jz @f 
  sC3 + sSW
  !jmp .C3E
 !@@:
  sC3 + sN
 !.C3E:

  !test word[p.v_SW],20h     ;FPUSW Precision
 !jz @f
  sPRE = sPR + sSW + sPR1
 !jmp .PRE
 !@@:
  sPRE = sPR + sN
 !.PRE:

  !test word[p.v_SW],40h     ;FPUSW Stack Fault
 !jz @f
  sSF + sSW
  ;If Invalid Operation
  If C1
   sSF + sOF + sWarn
   Else 
   sSF + sUF + sWarn
  EndIf
  ;EndIf 
 !jmp .SF
 !@@:
  sSF + sN
 !.SF:

  ;Tags
  !test byte[p.v_Tags],1 
 !jz @f
  sTag0 = sOcc
 !@@:
  !test byte[p.v_Tags],2 
 !jz @f
  sTag1 = sOcc
 !@@:
  !test byte[p.v_Tags],4 
 !jz @f
  sTag2 = sOcc
 !@@:
  !test byte[p.v_Tags],8 
 !jz @f
  sTag3 = sOcc
 !@@:
  !test byte[p.v_Tags],10h 
 !jz @f
  sTag4 = sOcc
 !@@:
  !test byte[p.v_Tags],20h 
 !jz @f
  sTag5 = sOcc
 !@@:
  !test byte[p.v_Tags],40h 
 !jz @f
  sTag6 = sOcc
 !@@:
  !test byte[p.v_Tags],80h 
 !jz @f
  sTag7 = sOcc
 !@@:

  Select PhysReg        ;TOS, better and more correctly infos
    Case 0
      sST0 + "R0 = " : sTag00 = sTag0
      sST1 + "R1 = " : sTag01 = sTag1
      sST2 + "R2 = " : sTag02 = sTag2
      sST3 + "R3 = " : sTag03 = sTag3
      sST4 + "R4 = " : sTag04 = sTag4
      sST5 + "R5 = " : sTag05 = sTag5
      sST6 + "R6 = " : sTag06 = sTag6
      sST7 + "R7 = " : sTag07 = sTag7
    Case 1
      sST0 + "R1 = " : sTag00 = sTag1
      sST1 + "R2 = " : sTag01 = sTag2
      sST2 + "R3 = " : sTag02 = sTag3
      sST3 + "R4 = " : sTag03 = sTag4
      sST4 + "R5 = " : sTag04 = sTag5
      sST5 + "R6 = " : sTag05 = sTag6
      sST6 + "R7 = " : sTag06 = sTag7
      sST7 + "R0 = " : sTag07 = sTag0
    Case 2
      sST0 + "R2 = " : sTag00 = sTag2
      sST1 + "R3 = " : sTag01 = sTag3
      sST2 + "R4 = " : sTag02 = sTag4
      sST3 + "R5 = " : sTag03 = sTag5
      sST4 + "R6 = " : sTag04 = sTag6
      sST5 + "R7 = " : sTag05 = sTag7
      sST6 + "R0 = " : sTag06 = sTag0
      sST7 + "R1 = " : sTag07 = sTag1
    Case 3
      sST0 + "R3 = " : sTag00 = sTag3
      sST1 + "R4 = " : sTag01 = sTag4
      sST2 + "R5 = " : sTag02 = sTag5
      sST3 + "R6 = " : sTag03 = sTag6
      sST4 + "R7 = " : sTag04 = sTag7
      sST5 + "R0 = " : sTag05 = sTag0
      sST6 + "R1 = " : sTag06 = sTag1
      sST7 + "R2 = " : sTag07 = sTag2
    Case 4
      sST0 + "R4 = " : sTag00 = sTag4
      sST1 + "R5 = " : sTag01 = sTag5
      sST2 + "R6 = " : sTag02 = sTag6
      sST3 + "R7 = " : sTag03 = sTag7
      sST4 + "R0 = " : sTag04 = sTag0
      sST5 + "R1 = " : sTag05 = sTag1
      sST6 + "R2 = " : sTag06 = sTag2
      sST7 + "R3 = " : sTag07 = sTag3
    Case 5
      sST0 + "R5 = " : sTag00 = sTag5
      sST1 + "R6 = " : sTag01 = sTag6
      sST2 + "R7 = " : sTag02 = sTag7
      sST3 + "R0 = " : sTag03 = sTag0
      sST4 + "R1 = " : sTag04 = sTag1
      sST5 + "R2 = " : sTag05 = sTag2
      sST6 + "R3 = " : sTag06 = sTag3
      sST7 + "R4 = " : sTag07 = sTag4
    Case 6
      sST0 + "R6 = " : sTag00 = sTag6
      sST1 + "R7 = " : sTag01 = sTag7
      sST2 + "R0 = " : sTag02 = sTag0
      sST3 + "R1 = " : sTag03 = sTag1
      sST4 + "R2 = " : sTag04 = sTag2
      sST5 + "R3 = " : sTag05 = sTag3
      sST6 + "R4 = " : sTag06 = sTag4
      sST7 + "R5 = " : sTag07 = sTag5
    Case 7
      sST0 + "R7 = " : sTag00 = sTag7
      sST1 + "R0 = " : sTag01 = sTag0
      sST2 + "R1 = " : sTag02 = sTag1
      sST3 + "R2 = " : sTag03 = sTag2
      sST4 + "R3 = " : sTag04 = sTag3
      sST5 + "R4 = " : sTag05 = sTag4
      sST6 + "R5 = " : sTag06 = sTag5
      sST7 + "R6 = " : sTag07 = sTag6
  EndSelect

  sReg = sST0 + StrD(FST0, 18) + Space(4) + sTag00 + #LFCR$
  sReg + sST1 + StrD(FST1, 18) + Space(4) + sTag01 + #LFCR$
  sReg + sST2 + StrD(FST2, 18) + Space(4) + sTag02 + #LFCR$
  sReg + sST3 + StrD(FST3, 18) + Space(4) + sTag03 + #LFCR$
  sReg + sST4 + StrD(FST4, 18) + Space(4) + sTag04 + #LFCR$
  sReg + sST5 + StrD(FST5, 18) + Space(4) + sTag05 + #LFCR$
  sReg + sST6 + StrD(FST6, 18) + Space(4) + sTag06 + #LFCR$
  sReg + sST7 + StrD(FST7, 18) + Space(4) + sTag07

  sEM + #LFCR$ + sIOM + #LFCR$ + sDOM + #LFCR$ + sZDM + #LFCR$ + sOFM + #LFCR$ + sUFM + #LFCR$ + sPRM + #LFCR$    ;Exception Masks
  sCCF + #LFCR$ + sC0 + #LFCR$ + sC1 + #LFCR$ + sC2 + #LFCR$ + sC3 + #LFCR$                                       ;Condition Code Flags
  sEF + #LFCR$ + sIOE + #LFCR$ + sDOE + #LFCR$ + sZDE + #LFCR$ + sOFE + #LFCR$ + sUFE + #LFCR$ + sPRE             ;Exception Flags

  sReport = sSet + #LFCR$ + sPrec + sRound + sEM + #LFCR$ + sStat + #LFCR$ + sCCF + sEF + #LFCR$ + sSF + #LFCR$ + sTS + #LFCR$ + sReg
  MessageRequester("FPU (" + Str(FPU_ID) + ")", sReport)

  FreeMemory(SaveRam)

  !push qword[p.v_RFlags]
  !popfq

EndProcedure


;Test ArcusCosinus(Data1)
Result.d
Data1.d = 0.1
;FPU(0)
;  !fninit                       
FPU(1)
  !fld1
FPU(2)
  !lea rsi,[v_Data1]  
  !fld qword[rsi] 
FPU(3)
  !fst st2
FPU(4)
  !fmul st0,st0
FPU(5)
  !fsubp
FPU(6)
  !fsqrt
FPU(7)
  !fxch
FPU(8)
  !fpatan
FPU(9)
  !lea rsi,[v_Result] 
  !fstp qword[rsi]      ;Result
FPU(10)

Debug Result
End

Post Reply