Page 1 of 1

32 bit division on x64

Posted: Sun Aug 20, 2017 5:44 pm
by wilbert
It looks like on x64, even if you divide two variables typed .l , it still uses the 64 bit opcode to do the divide.
Divide already is a slow instruction but the 64 bit version is a lot slower compared to the 32 bit version on at least some Intel cpu.
If you compare the results with a custom asm procedure, on x86, the div instruction PB generates wins but on x64 (at least on my computer), even with the additional time it takes to call a custom procedure, the custom procedure still is the faster one.

Code: Select all

Procedure.i Divide32(numerator.l, denominator.l, *remainder.Long = 0)
  !mov eax, [p.v_numerator]
  !mov ecx, [p.v_denominator]
  !cdq
  !idiv ecx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movsxd rax, eax
    !mov rcx, [p.p_remainder]
    !and rcx, rcx
    !jz .l0
    !mov [rcx], edx
  CompilerElse
    !mov ecx, [p.p_remainder]
    !and ecx, ecx
    !jz .l0
    !mov [ecx], edx
  CompilerEndIf
  !.l0:
  ProcedureReturn
EndProcedure

Procedure.i UnsignedDivide32(numerator.l, denominator.l, *remainder.Long = 0)
  !mov eax, [p.v_numerator]
  !mov ecx, [p.v_denominator]
  !xor edx, edx
  !div ecx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rcx, [p.p_remainder]
    !and rcx, rcx
    !jz .l0
    !mov [rcx], edx
  CompilerElse
    !mov ecx, [p.p_remainder]
    !and ecx, ecx
    !jz .l0
    !mov [ecx], edx
  CompilerEndIf
  !.l0:
  ProcedureReturn
EndProcedure



a.l = 5
b.l = 7

t1 = ElapsedMilliseconds()
For i = 0 To 100000000
  c.l = a/b
Next  
t2 = ElapsedMilliseconds()
For i = 0 To 100000000
  c.l = Divide32(a,b)
Next  
t3 = ElapsedMilliseconds()

MessageRequester("Results", Str(t2-t1)+" vs "+Str(t3-t2))

Edit: seems to be not the case on every cpu.
The procedures might still be useful if you want both the quotient and remainder at once.

Re: 32 bit division on x64

Posted: Sun Aug 20, 2017 6:42 pm
by Lunasole
Interesting, but on my tests you procedure wins with x86 compiler, while is much slower with x64 (AMD proletary CPU).

Here first is PB div, second is Divide32() proc:

Code: Select all

x86:
Cycle 1: 4237ms
Cycle 2: 3967ms
------------
Second is faster: +6.81%


x64:
Cycle 1: 4477ms
Cycle 2: 7107ms
------------
First is faster: +58.74%

Re: 32 bit division on x64

Posted: Sun Aug 20, 2017 6:46 pm
by wilbert
That's really interesting.
I'm curious about other cpu.
The one on my computer is a Core i5.
Maybe Amd has a faster 64 bit divide :?

Re: 32 bit division on x64

Posted: Sun Aug 20, 2017 7:11 pm
by User_Russian
x86:
Cycle 1: 848ms
Cycle 2: 1250ms

x64:
Cycle 1: 3568ms
Cycle 2: 1688ms

Re: 32 bit division on x64

Posted: Sun Aug 20, 2017 7:29 pm
by mk-soft
PB x64
Result 1116 vs 660

PB x86
Result 427 vs 559

Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz

P.S. The calling of procedure by x64 is more complex as by x86
X64

Code: Select all

; c.l = Divide32(a,b)
  MOV    rax,0
  PUSH   rax
  MOVSXD rax,dword [v_b]
  PUSH   rax
  MOVSXD rax,dword [v_a]
  PUSH   rax
  POP    rdi
  POP    rsi
  POP    rdx
  CALL  _Procedure0
  MOV    rax,rax
  PUSH   rax
  POP    rax
  MOV    dword [v_c],eax
; 
X86

Code: Select all

; c.l = Divide32(a,b)
  SUB    esp,4
  PUSH   dword 0
  PUSH   dword [v_b]
  PUSH   dword [v_a]
  CALL  _Procedure0
  ADD    esp,4
  MOV    dword [v_c],eax
;

Re: 32 bit division on x64

Posted: Sun Aug 20, 2017 7:47 pm
by ts-soft
PB x64
Result 364 vs 477

PB x86
Result 354 vs 365

HexaCore AMD FX-6300, 4100 MHz (20.5 x 200)

Re: 32 bit division on x64

Posted: Sun Aug 20, 2017 8:02 pm
by said
PB x64
Results 1171 vs 584

PB x86
Results 354 vs 565

Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz

Re: 32 bit division on x64

Posted: Mon Aug 21, 2017 5:44 am
by netmaestro
Intel i7-6500U @ 2500 mhz:

x64
1012 native
533 asm

x86
354 native
561 asm

Re: 32 bit division on x64

Posted: Mon Aug 21, 2017 6:15 am
by wilbert
Thanks for testing everyone.
It clearly is a difference between Amd and Intel.
For a Amd cpu it doesn't seem to matter very much if a 32 or 64 bit division instruction is used but on Intel the 64 bit division instruction is much slower.
Usually Intel cpu are considered superior but in this case it looks to me Amd has done a better job with integer division.

Re: 32 bit division on x64

Posted: Mon Aug 21, 2017 7:16 am
by STARGĂ…TE
x32:
Results 1009 vs 1267

x64:
Results 1011 vs 1318

AMD Phenom II X4 955

Re: 32 bit division on x64

Posted: Mon Aug 21, 2017 2:17 pm
by blueb
x32:
Results 539 vs 494

x64:
Results 1209 vs 519

Intel XEON E-2670 (dual)

Re: 32 bit division on x64

Posted: Mon Aug 21, 2017 2:51 pm
by wilbert
Amazing how much difference there is between cpu for the div / idiv instructions. :shock: