32 bit division on x64

Share your advanced PureBasic knowledge/code with the community.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

32 bit division on x64

Post by wilbert »

It looks like on x64, even if you divide two variables typed .l , it still uses the 64 bit opcode to do the divide.
Divide already is a slow instruction but the 64 bit version is a lot slower compared to the 32 bit version on at least some Intel cpu.
If you compare the results with a custom asm procedure, on x86, the div instruction PB generates wins but on x64 (at least on my computer), even with the additional time it takes to call a custom procedure, the custom procedure still is the faster one.

Code: Select all

Procedure.i Divide32(numerator.l, denominator.l, *remainder.Long = 0)
  !mov eax, [p.v_numerator]
  !mov ecx, [p.v_denominator]
  !cdq
  !idiv ecx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movsxd rax, eax
    !mov rcx, [p.p_remainder]
    !and rcx, rcx
    !jz .l0
    !mov [rcx], edx
  CompilerElse
    !mov ecx, [p.p_remainder]
    !and ecx, ecx
    !jz .l0
    !mov [ecx], edx
  CompilerEndIf
  !.l0:
  ProcedureReturn
EndProcedure

Procedure.i UnsignedDivide32(numerator.l, denominator.l, *remainder.Long = 0)
  !mov eax, [p.v_numerator]
  !mov ecx, [p.v_denominator]
  !xor edx, edx
  !div ecx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rcx, [p.p_remainder]
    !and rcx, rcx
    !jz .l0
    !mov [rcx], edx
  CompilerElse
    !mov ecx, [p.p_remainder]
    !and ecx, ecx
    !jz .l0
    !mov [ecx], edx
  CompilerEndIf
  !.l0:
  ProcedureReturn
EndProcedure



a.l = 5
b.l = 7

t1 = ElapsedMilliseconds()
For i = 0 To 100000000
  c.l = a/b
Next  
t2 = ElapsedMilliseconds()
For i = 0 To 100000000
  c.l = Divide32(a,b)
Next  
t3 = ElapsedMilliseconds()

MessageRequester("Results", Str(t2-t1)+" vs "+Str(t3-t2))

Edit: seems to be not the case on every cpu.
The procedures might still be useful if you want both the quotient and remainder at once.
Last edited by wilbert on Sun Aug 20, 2017 6:56 pm, edited 1 time in total.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: 32 bit division on x64

Post by Lunasole »

Interesting, but on my tests you procedure wins with x86 compiler, while is much slower with x64 (AMD proletary CPU).

Here first is PB div, second is Divide32() proc:

Code: Select all

x86:
Cycle 1: 4237ms
Cycle 2: 3967ms
------------
Second is faster: +6.81%


x64:
Cycle 1: 4477ms
Cycle 2: 7107ms
------------
First is faster: +58.74%
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: 32 bit division on x64

Post by wilbert »

That's really interesting.
I'm curious about other cpu.
The one on my computer is a Core i5.
Maybe Amd has a faster 64 bit divide :?
Windows (x64)
Raspberry Pi OS (Arm64)
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: 32 bit division on x64

Post by User_Russian »

x86:
Cycle 1: 848ms
Cycle 2: 1250ms

x64:
Cycle 1: 3568ms
Cycle 2: 1688ms
User avatar
mk-soft
Always Here
Always Here
Posts: 5409
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: 32 bit division on x64

Post by mk-soft »

PB x64
Result 1116 vs 660

PB x86
Result 427 vs 559

Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz

P.S. The calling of procedure by x64 is more complex as by x86
X64

Code: Select all

; c.l = Divide32(a,b)
  MOV    rax,0
  PUSH   rax
  MOVSXD rax,dword [v_b]
  PUSH   rax
  MOVSXD rax,dword [v_a]
  PUSH   rax
  POP    rdi
  POP    rsi
  POP    rdx
  CALL  _Procedure0
  MOV    rax,rax
  PUSH   rax
  POP    rax
  MOV    dword [v_c],eax
; 
X86

Code: Select all

; c.l = Divide32(a,b)
  SUB    esp,4
  PUSH   dword 0
  PUSH   dword [v_b]
  PUSH   dword [v_a]
  CALL  _Procedure0
  ADD    esp,4
  MOV    dword [v_c],eax
;
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: 32 bit division on x64

Post by ts-soft »

PB x64
Result 364 vs 477

PB x86
Result 354 vs 365

HexaCore AMD FX-6300, 4100 MHz (20.5 x 200)
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
said
Enthusiast
Enthusiast
Posts: 342
Joined: Thu Apr 14, 2011 6:07 pm

Re: 32 bit division on x64

Post by said »

PB x64
Results 1171 vs 584

PB x86
Results 354 vs 565

Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8433
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: 32 bit division on x64

Post by netmaestro »

Intel i7-6500U @ 2500 mhz:

x64
1012 native
533 asm

x86
354 native
561 asm
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: 32 bit division on x64

Post by wilbert »

Thanks for testing everyone.
It clearly is a difference between Amd and Intel.
For a Amd cpu it doesn't seem to matter very much if a 32 or 64 bit division instruction is used but on Intel the 64 bit division instruction is much slower.
Usually Intel cpu are considered superior but in this case it looks to me Amd has done a better job with integer division.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
STARGÅTE
Addict
Addict
Posts: 2090
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: 32 bit division on x64

Post by STARGÅTE »

x32:
Results 1009 vs 1267

x64:
Results 1011 vs 1318

AMD Phenom II X4 955
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
blueb
Addict
Addict
Posts: 1044
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Re: 32 bit division on x64

Post by blueb »

x32:
Results 539 vs 494

x64:
Results 1209 vs 519

Intel XEON E-2670 (dual)
- It was too lonely at the top.

System : PB 6.10 LTS (x64) and Win Pro 11 (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: 32 bit division on x64

Post by wilbert »

Amazing how much difference there is between cpu for the div / idiv instructions. :shock:
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply