Divide already is a slow instruction but the 64 bit version is a lot slower compared to the 32 bit version on at least some Intel cpu.
If you compare the results with a custom asm procedure, on x86, the div instruction PB generates wins but on x64 (at least on my computer), even with the additional time it takes to call a custom procedure, the custom procedure still is the faster one.
Code: Select all
Procedure.i Divide32(numerator.l, denominator.l, *remainder.Long = 0)
!mov eax, [p.v_numerator]
!mov ecx, [p.v_denominator]
!cdq
!idiv ecx
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!movsxd rax, eax
!mov rcx, [p.p_remainder]
!and rcx, rcx
!jz .l0
!mov [rcx], edx
CompilerElse
!mov ecx, [p.p_remainder]
!and ecx, ecx
!jz .l0
!mov [ecx], edx
CompilerEndIf
!.l0:
ProcedureReturn
EndProcedure
Procedure.i UnsignedDivide32(numerator.l, denominator.l, *remainder.Long = 0)
!mov eax, [p.v_numerator]
!mov ecx, [p.v_denominator]
!xor edx, edx
!div ecx
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!mov rcx, [p.p_remainder]
!and rcx, rcx
!jz .l0
!mov [rcx], edx
CompilerElse
!mov ecx, [p.p_remainder]
!and ecx, ecx
!jz .l0
!mov [ecx], edx
CompilerEndIf
!.l0:
ProcedureReturn
EndProcedure
a.l = 5
b.l = 7
t1 = ElapsedMilliseconds()
For i = 0 To 100000000
c.l = a/b
Next
t2 = ElapsedMilliseconds()
For i = 0 To 100000000
c.l = Divide32(a,b)
Next
t3 = ElapsedMilliseconds()
MessageRequester("Results", Str(t2-t1)+" vs "+Str(t3-t2))
Edit: seems to be not the case on every cpu.
The procedures might still be useful if you want both the quotient and remainder at once.