It is currently Wed Sep 18, 2019 2:26 am

All times are UTC + 1 hour




Post new topic Reply to topic  [ 35 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject: Re: Population count
PostPosted: Fri May 15, 2015 10:34 am 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3434
Location: Netherlands
davido wrote:
I've chosen the PB version of the method described by idle; it runs at 210%.
I present it, just in case it can be improved upon

Code:
Procedure popcount9(i.l)
  i - (i >> 1) & $55555555
  i = (i & $33333333) + ((i >> 2) & $33333333)
  ProcedureReturn (((i + i >> 4) & $0f0f0f0f) * $01010101) >> 24 & $ff
EndProcedure

_________________
macOS 10.14 Mojave, PB 5.62 x64


Last edited by wilbert on Fri May 15, 2015 3:00 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Fri May 15, 2015 12:21 pm 
Offline
Addict
Addict

Joined: Fri Nov 09, 2012 11:04 pm
Posts: 1671
Location: Uttoxeter, UK
@wilbert,
Well squeezed! Thank you very much. :D

_________________
DE AA EB


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Fri May 15, 2015 12:36 pm 
Offline
Addict
Addict
User avatar

Joined: Wed Aug 31, 2005 11:09 pm
Posts: 3694
Location: Italy
@davido

Did you miss the link I posted above ?

Code:
int swar(uint32_t i) {
  i = i - ((i >> 1) & 0x55555555);
  i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
  return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}


It was already there. Don't tell me "but it's in C !" :)

_________________
[ My little PureBasic review ]


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Fri May 15, 2015 12:49 pm 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3434
Location: Netherlands
luis wrote:
It was already there. Don't tell me "but it's in C !" :)

I know, but it didn't work when I tried it exactly like that in PureBasic x64.
It turned out that the result of the multiplication instruction on x64 is a 64 bit value and that 64 bit value is shifted 24 bits to the right resulting in a wrong output.

A lot of the brackets aren't required due to operator priority
Code:
Procedure popcount9(i.l)
  i - i >> 1 & $55555555
  i = i & $33333333 + i >> 2 & $33333333
  ProcedureReturn ((i + i >> 4) & $0f0f0f0f * $01010101) >> 24 & $ff
EndProcedure
but it might make less sense this way.

_________________
macOS 10.14 Mojave, PB 5.62 x64


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Fri May 15, 2015 6:14 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Thu Mar 24, 2011 12:40 am
Posts: 534
Location: Iowa, USA
Here is one I came up with.
It yields about 285% against wilbert's assembly code.
Code:
Procedure PopCount_10(k.l) ; BasicallyPure
   Macro AaS : result + k & 1 : k >> 1 : EndMacro
   Protected result
   AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
   AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
   AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS : AaS
   AaS : AaS
   ProcedureReturn result
EndProcedure


This is all very interesting but I don't have any idea what it can be used for.
It's obviously very useful or there wouldn't be an assembly instruction for it.

edit:
Never mind, a little research on 'PopCnt' or 'PopCount' told me more than I ever wanted to know.
It's probably a safe bet that I'll never use it.

_________________
BasicallyPure
Until you know everything you know nothing, all you have is what you believe.


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sat May 16, 2015 12:46 am 
Offline
Addict
Addict

Joined: Fri Nov 09, 2012 11:04 pm
Posts: 1671
Location: Uttoxeter, UK
@luis,
My apologies, I did get the code from the link you posted.
Thank you.

_________________
DE AA EB


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sat May 16, 2015 5:01 am 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3434
Location: Netherlands
Seems indeed to scale up to 64 bits as well :)
Code:
Procedure PopCount64(x.q)
  !mov rax, [p.v_x]
  !mov rdx, rax
  !shr rdx, 1
  !and rdx, [popcount64_v55]
  !sub rax, rdx
  ;x = (x & $3333333333333333) + ((x >> 2) & $3333333333333333)
  !mov rdx, rax       ;x
  !and rax, [popcount64_v33]
  !shr rdx, 2
  !and rdx, [popcount64_v33]
  !add rax, rdx
  ;x = (x + (x >> 4)) & $0f0f0f0f0f0f0f0f0f0f
  !mov rdx, rax
  !shr rdx, 4
  !add rax, rdx
  !and rax, [popcount64_v0f]
  ;x * $0101010101010101 >> 56
  !imul rax, [popcount64_v01]
  !shr rax, 56
  ProcedureReturn
  !popcount64_v01: dq 0x0101010101010101
  !popcount64_v0f: dq 0x0f0f0f0f0f0f0f0f
  !popcount64_v33: dq 0x3333333333333333
  !popcount64_v55: dq 0x5555555555555555
EndProcedure

x = Random($FFFFFFFF)
Debug Bin(x)
Debug Popcount64(x)


Code:
Procedure PopCount64(x.q); SSE2 (both x86 and x64)
  !movd xmm0, [p.v_x]
  !movd xmm1, [p.v_x + 4]
  !movq xmm2, [popcount64_v55]
  !movq xmm3, [popcount64_v33]
  !movq xmm4, [popcount64_v0f]
  !punpckldq xmm0, xmm1
  !movdqa xmm1, xmm0
  !psrlq xmm1, 1
  !pand xmm1, xmm2
  !psubb xmm0, xmm1
  !movdqa xmm1, xmm0
  !pand xmm0, xmm3
  !psrlq xmm1, 2
  !pand xmm1, xmm3
  !paddb xmm0, xmm1
  !movdqa xmm1, xmm0
  !psrlq xmm1, 4
  !paddb xmm0, xmm1
  !pand xmm0, xmm4
  !pxor xmm1, xmm1
  !psadbw xmm0, xmm1
  !movd eax, xmm0
  ProcedureReturn
  !popcount64_v0f: dq 0x0f0f0f0f0f0f0f0f
  !popcount64_v33: dq 0x3333333333333333
  !popcount64_v55: dq 0x5555555555555555
EndProcedure


Using twice 32 bits
Code:
Procedure Popcount64(x.q)
  !mov eax, [p.v_x]
  !mov ecx, [p.v_x + 4]
  !mov edx, ecx
  !shr edx, 1
  !jz popcount64_l0
  !and edx, 0x55555555
  !sub ecx, edx
  !mov edx, ecx
  !and ecx, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add ecx, edx
  !mov edx, ecx
  !shr edx, 4
  !add ecx, edx
  !and ecx, 0x0f0f0f0f
  !popcount64_l0:
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax, edx
  !mov edx, eax
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax, edx
  !mov edx, eax
  !shr edx, 4
  !add eax, edx
  !and eax, 0x0f0f0f0f
  !add eax, ecx
  !imul eax, 0x01010101
  !shr eax, 24
  ProcedureReturn
EndProcedure


A basic version should look like this
Code:
Procedure PopCount64b1(i.q)
  i - i >> 1 & $5555555555555555
  i = i & $3333333333333333 + i >> 2 & $3333333333333333
  ProcedureReturn ((i + i >> 4) & $0f0f0f0f0f0f0f0f * $0101010101010101) >> 56
EndProcedure

But this fails on x64. This code however
Code:
Procedure PopCount64b2(i.q)
  Protected m.q = $0101010101010101
  i - i >> 1 & $5555555555555555
  i = i & $3333333333333333 + i >> 2 & $3333333333333333
  i = (i + i >> 4) & $0f0f0f0f0f0f0f0f
  ProcedureReturn (i * m) >> 56
EndProcedure
works fine on x64.
PB bug?

_________________
macOS 10.14 Mojave, PB 5.62 x64


Last edited by wilbert on Mon May 18, 2015 7:27 am, edited 6 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sat May 16, 2015 7:24 am 
Offline
Addict
Addict
User avatar

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3392
Location: New Zealand
Wilbert do you know why x64 can't use immediates for the constants?
In my x64 version above I had to load them into a register
Code:
...
!mov r15, 0x5555555555555555
!and rdx, r15
...


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sat May 16, 2015 7:38 am 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3434
Location: Netherlands
:oops: :oops: :oops: I completely overlooked your 64 bit version post Idle :oops: :oops: :oops:

I ran into the same problem with immediate values. After some searching I found the answer ...
64bit mode allows only signed 32bit immediates. The only instruction that can take a 64bit immediate is mov .

_________________
macOS 10.14 Mojave, PB 5.62 x64


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sat May 16, 2015 8:54 am 
Offline
Addict
Addict
User avatar

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3392
Location: New Zealand
wilbert wrote:
:oops: :oops: :oops: I completely overlooked your 64 bit version post Idle :oops: :oops: :oops:

I ran into the same problem with immediate values. After some searching I found the answer ...
64bit mode allows only signed 32bit immediates. The only instruction that can take a 64bit immediate is mov .


That's good to know, I didn't expect it'd be an issue and couldn't understand why it only worked via mov.


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sat May 16, 2015 9:03 am 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3434
Location: Netherlands
Here's a speed test for some PopCount64 versions (results are copied to the clipboard).

Speed test code
Code:
; PopCount 64 bit test

EnableExplicit

CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
 
  Procedure PopCount0(k.q)
    !mov rax, [p.v_k]
    !popcnt rax, rax
    ProcedureReturn
  EndProcedure
 
  Procedure Popcount1(x.q); Idle
                          ;x - (x >> 1) &  $5555555555555555           
    !mov rax, [p.v_x]
    !mov rdx, rax
    !shr rdx, 1
    !mov r15, 0x5555555555555555
    !and rdx, r15
    !sub rax,rdx
    ;x = (x & $3333333333333333) + ((x >> 2) & $3333333333333333)
    !mov rdx, rax       ;x
    !mov r15, 0x3333333333333333
    !and rax, r15 
    !shr rdx, 2
    !and rdx, r15
    !add rax,rdx
    ;x = (x + (x >> 4)) & $0f0f0f0f0f0f0f0f
    !mov rdx, rax
    !shr rdx, 4
    !add rax,rdx
    !mov r15, 0x0f0f0f0f0f0f0f0f
    !and rax, r15
    ;x * 0101010101010101 >> 56
    !mov r15, 0x0101010101010101
    !imul rax, r15
    !shr rax, 56
    ProcedureReturn 
  EndProcedure
 
  Procedure PopCount2(x.q)
    !mov rax, [p.v_x]
    !mov rdx, rax
    !shr rdx, 1
    !and rdx, [popcount2_v55]
    !sub rax, rdx
    ;x = (x & $3333333333333333) + ((x >> 2) & $3333333333333333)
    !mov rdx, rax       ;x
    !and rax, [popcount2_v33]
    !shr rdx, 2
    !and rdx, [popcount2_v33]
    !add rax, rdx
    ;x = (x + (x >> 4)) & $0f0f0f0f0f0f0f0f0f0f
    !mov rdx, rax
    !shr rdx, 4
    !add rax, rdx
    !and rax, [popcount2_v0f]
    ;x * $0101010101010101 >> 56
    !imul rax, [popcount2_v01]
    !shr rax, 56
    ProcedureReturn
    !popcount2_v01: dq 0x0101010101010101
    !popcount2_v0f: dq 0x0f0f0f0f0f0f0f0f
    !popcount2_v33: dq 0x3333333333333333
    !popcount2_v55: dq 0x5555555555555555
  EndProcedure
 
CompilerEndIf

Procedure PopCount3(x.q); SSE2 (both x86 and x64)
  !movd xmm0, [p.v_x]
  !movd xmm1, [p.v_x + 4]
  !movq xmm2, [popcount3_v55]
  !movq xmm3, [popcount3_v33]
  !movq xmm4, [popcount3_v0f]
  !punpckldq xmm0, xmm1
  !movdqa xmm1, xmm0
  !psrlq xmm1, 1
  !pand xmm1, xmm2
  !psubb xmm0, xmm1
  !movdqa xmm1, xmm0
  !pand xmm0, xmm3
  !psrlq xmm1, 2
  !pand xmm1, xmm3
  !paddb xmm0, xmm1
  !movdqa xmm1, xmm0
  !psrlq xmm1, 4
  !paddb xmm0, xmm1
  !pand xmm0, xmm4
  !pxor xmm1, xmm1
  !psadbw xmm0, xmm1
  !movd eax, xmm0
  ProcedureReturn
  !popcount3_v0f: dq 0x0f0f0f0f0f0f0f0f
  !popcount3_v33: dq 0x3333333333333333
  !popcount3_v55: dq 0x5555555555555555
EndProcedure

Procedure Popcount4(x.q)
  !mov eax, [p.v_x]
  !mov ecx, [p.v_x + 4]
  !mov edx, ecx
  !shr edx, 1
  !jz popcount64_l0
  !and edx, 0x55555555
  !sub ecx, edx
  !mov edx, ecx
  !and ecx, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add ecx, edx
  !mov edx, ecx
  !shr edx, 4
  !add ecx, edx
  !and ecx, 0x0f0f0f0f
  !popcount64_l0:
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax, edx
  !mov edx, eax
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax, edx
  !mov edx, eax
  !shr edx, 4
  !add eax, edx
  !and eax, 0x0f0f0f0f
  !add eax, ecx
  !imul eax, 0x01010101
  !shr eax, 24
  ProcedureReturn
EndProcedure

Procedure PopCount5(i.q)
  Protected m.q = $0101010101010101
  i - i >> 1 & $5555555555555555
  i = i & $3333333333333333 + i >> 2 & $3333333333333333
  i = (i + i >> 4) & $0f0f0f0f0f0f0f0f
  ProcedureReturn (i * m) >> 56
EndProcedure


; ---- Initialisation ----
Define.i i, x, x0, x1, x2, x3, x4, x5, y, rep=10000000
Define.i t0, t1, t2, t3, t4, t5
Define.s s

Dim Rnd.q(rep)
For i = 1 To rep
  Rnd(i) = Random(2147483647) << 32 | Random(2147483647)
Next


; ---- Small check whether all procedures Return the same results ----
For i = 1 To 100
  x5 = PopCount5(Rnd(i))
  x4 = PopCount4(Rnd(i))
  x3 = PopCount3(Rnd(i))
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    x2 = PopCount2(Rnd(i))
    x1 = PopCount1(Rnd(i))
    x0 = PopCount0(Rnd(i))
  CompilerElse
    x2 = x3
    x1 = x2
    x0 = x1
  CompilerEndIf
  If x0 <> x1 Or
     x0 <> x2 Or
     x0 <> x3 Or
     x0 <> x4 Or
     x0 <> x5
    MessageRequester("Error",
                     "Different results for PopCount(" + Rnd(i) + ")")
    End
  EndIf
Next   


; ---- Speed test ----
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
  t0 = ElapsedMilliseconds()
  For i = 1 To rep
    x = PopCount0(Rnd(i))
  Next   
  t0 = ElapsedMilliseconds() - t0
 
  t1 = ElapsedMilliseconds()
  For i = 1 To rep
    x = PopCount1(Rnd(i))
  Next   
  t1 = ElapsedMilliseconds() - t1
 
  t2 = ElapsedMilliseconds()
  For i = 1 To rep
    x = PopCount2(Rnd(i))
  Next   
  t2 = ElapsedMilliseconds() - t2
CompilerEndIf

t3 = ElapsedMilliseconds()
For i = 1 To rep
  x = PopCount3(Rnd(i))
Next   
t3 = ElapsedMilliseconds() - t3

t4 = ElapsedMilliseconds()
For i = 1 To rep
  x = PopCount4(Rnd(i))
Next   
t4 = ElapsedMilliseconds() - t4

t5 = ElapsedMilliseconds()
For i = 1 To rep
  x = PopCount5(Rnd(i))
Next   
t5 = ElapsedMilliseconds() - t5


s = "t0 = " + t0 + " ms   (" + Int(100*t0/t4) + "%)" + #LF$ +
    "t1 = " + t1 + " ms   (" + Int(100*t1/t4) + "%)" + #LF$ +
    "t2 = " + t2 + " ms   (" + Int(100*t2/t4) + "%)" + #LF$ +
    "t3 = " + t3 + " ms   (" + Int(100*t3/t4) + "%)" + #LF$ +
    "t4 = " + t4 + " ms   (" + Int(100*t4/t4) + "%)" + #LF$ +
    "t5 = " + t5 + " ms   (" + Int(100*t5/t4) + "%)"
SetClipboardText(s)
MessageRequester("PopCount speed test", s)



Results OSX , PB x86 wrote:
t0 = 0 ms (0%)
t1 = 0 ms (0%)
t2 = 0 ms (0%)
t3 = 56 ms (80%)
t4 = 70 ms (100%)
t5 = 415 ms (592%)


Results OSX , PBx64 wrote:
t0 = 33 ms (46%)
t1 = 54 ms (76%)
t2 = 50 ms (70%)
t3 = 52 ms (73%)
t4 = 71 ms (100%)
t5 = 71 ms (100%)

_________________
macOS 10.14 Mojave, PB 5.62 x64


Last edited by wilbert on Mon May 18, 2015 7:30 am, edited 4 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sat May 16, 2015 12:49 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 3654
Location: Berlin, Germany
@BasicallyPure:
I'm using the Population Count for generating the Thue-Morse sequence.

_________________
Please excuse my flawed English. My native language is PureBasic.
Search
RSBasic's backups


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sat May 16, 2015 7:15 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Thu Mar 24, 2011 12:40 am
Posts: 534
Location: Iowa, USA
Ahh I see, it's the old Thue-Morse sequence, I should have guessed. :lol:

Nicely documented by the way.
Thanks.

_________________
BasicallyPure
Until you know everything you know nothing, all you have is what you believe.


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Sun Jun 14, 2015 6:53 am 
Offline
Addict
Addict
User avatar

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3392
Location: New Zealand
seems I forgot to post this but it the x86 specific popcountmem functions need checking

Code:

Procedure _Popcount32(x.l)
  ;x - (x >> 1) &  $55555555           
  !mov eax, [p.v_x]
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  ;x = (x & $33333333) + ((x >> 2) & $33333333)
  !mov edx, eax       ;x
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  ;x = (x + (x >> 4)) & $0f0f0f0f0f
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  ;x * 0x01010101 >> 24
  !imul eax, 0x01010101
  !shr eax, 24
  ProcedureReturn 
EndProcedure

CompilerIf SizeOf(Integer) = 4

Procedure _popcountmem(*mem.long,len.l)
  Protected result,a,b
       
  !mov eax, [p.v_len]
  !and eax, 3
  !mov edx, [p.v_len]
  !sub edx,eax
  !mov [p.v_a],edx
  !mov [p.v_b],eax
  !xor ecx,ecx
  !lwhile:
  !cmp ecx, [p.v_a]
  !jge lend
  !mov eax, [p.p_mem]
  !mov eax, [eax + ecx]
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  !mov edx, eax       
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  !imul eax, 0x01010101
  !shr eax, 24
  !add [p.v_result],eax   
  !add ecx,4
  !jmp lwhile
  !lend:
 
  !mov eax, [p.p_mem]
  !mov eax, [eax + ecx]
   
  !mov edx, [p.v_b]
  !jmp dword [JT_remain + edx * 4]
  !JT_remain dd le,l1,l2,l3,l4
   
  !l1:
  !and eax, 0xff
  !jmp le
  !l2:
  !and eax, 0xffff
  !jmp le
  !l3:
  !and eax, 0xffffff
  !jmp le
  !l4:
  !le:
   
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  !mov edx, eax       
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  !imul eax, 0x01010101
  !shr eax, 24
  !add [p.v_result],eax   
   
  ProcedureReturn result   
EndProcedure   

Procedure _popcountmemSSE(*mem,len)
  Protected result,a,b
       
  !mov eax, [p.v_len]
  !and eax, 3
  !mov edx, [p.v_len]
  !sub edx,eax
  !mov [p.v_a],edx
  !mov [p.v_b],eax
  !xor ecx,ecx
  !lwhile1:
  !cmp ecx, [p.v_a]
  !jge lend1
 
  !mov eax, [p.p_mem]
  !mov eax, [eax + ecx]
  !popcnt eax,eax   
  !add [p.v_result],eax   
  !add ecx,4
   
  !jmp lwhile1
  !lend1:
 
  !mov eax, [p.p_mem]
  !mov eax, [eax + ecx]
   
  !mov edx, [p.v_b]
  !jmp dword [JT_remain1 + edx * 4]
  !JT_remain1 dd lle,ll1,ll2,ll3,ll4
   
  !ll1:
  !and eax, 0xff
  !jmp lle
  !ll2:
  !and eax, 0xffff
  !jmp lle
  !ll3:
  !and eax, 0xffffff
  !jmp lle
  !ll4:
  !lle:
 
  !mov edx,eax
  !popcnt eax,edx   
  !add [p.v_result],eax   
 
  ProcedureReturn result 
EndProcedure   


CompilerElse

Procedure _Popcount64(x.i)
  ;x - (x >> 1) &  $5555555555555555           
  !mov rax, [p.v_x]
  !mov rdx, rax
  !shr rdx, 1
  !mov r15, 0x5555555555555555
  !and rdx, r15
  !sub rax,rdx
  ;x = (x & $3333333333333333) + ((x >> 2) & $3333333333333333)
  !mov rdx, rax       ;x
  !mov r15, 0x3333333333333333
  !and rax, r15 
  !shr rdx, 2
  !and rdx, r15
  !add rax,rdx
  ;x = (x + (x >> 4)) & $0f0f0f0f0f0f0f0f
  !mov rdx, rax
  !shr rdx, 4
  !add rax,rdx
  !mov r15, 0x0f0f0f0f0f0f0f0f
  !and rax, r15
  ;x * 0101010101010101 >> 56
  !mov r15, 0x0101010101010101
  !imul rax, r15
  !shr rax, 56
  ProcedureReturn 
EndProcedure

Procedure _popcountmem(*mem,len)
  Protected result,a,b
       
  !mov rax, [p.v_len]
  !And rax, 3
  !mov rdx, [p.v_len]
  !sub rdx,rax
  !mov [p.v_a],rdx
  !mov [p.v_b],rax
  !XOr rcx,rcx
  !lwhile:
  !cmp rcx, [p.v_a]
  !jge lend
  !mov rax, [p.p_mem]
  !mov eax, [rax + rcx]
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  !mov edx, eax       
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  !imul eax, 0x01010101
  !shr eax, 24
  !add [p.v_result],eax   
  !add rcx,4
  !jmp lwhile
  !lend:
 
  !mov rax, [p.p_mem]
  !mov eax, [rax + rcx]
 
  !lea rdx, [JT_remain]
  !mov rcx, [p.v_b]
  !jmp qword [rdx + rcx * 8]
  !JT_remain dq le,l1,l2,l3,l4
   
  !l1:
  !and eax, 0xff
  !jmp le
  !l2:
  !and eax, 0xffff
  !jmp le
  !l3:
  !and eax, 0xffffff
  !jmp le
  !l4:
  !le:
   
  !mov edx, eax
  !shr edx, 1
  !and edx, 0x55555555
  !sub eax,edx
  !mov edx, eax       
  !and eax, 0x33333333
  !shr edx, 2
  !and edx, 0x33333333
  !add eax,edx
  !mov edx, eax
  !shr edx, 4
  !add eax,edx
  !and eax, 0x0f0f0f0f
  !imul eax, 0x01010101
  !shr eax, 24
  !add [p.v_result],eax   
   
  ProcedureReturn result   
EndProcedure   

Procedure _popcountmemSSE(*mem,len)
  Protected result,a,b
       
  !mov rax, [p.v_len]
  !and rax, 7
  !mov rdx, [p.v_len]
  !sub rdx,rax
  !mov [p.v_a],rdx
  !mov [p.v_b],rax
  !xor rcx,rcx
  !lwhile1:
  !cmp rcx, [p.v_a]
  !jge lend1
 
  !mov rax, [p.p_mem]
  !mov rax, [rax + rcx]
  !popcnt rax,rax   
  !add [p.v_result],rax   
  !add rcx,8
   
  !jmp lwhile1
  !lend1:
 
  !mov rax, [p.p_mem]
  !mov rax, [rax + rcx]
   
  !lea rdx, [JT_remain1]
  !mov rcx, [p.v_b]
  !jmp qword [rdx + rcx * 8]
  !JT_remain1 dq lle,ll1,ll2,ll3,ll4,ll5,ll6,ll7,ll8
   
  !ll1:
  !and rax, 0xff
  !jmp lle
  !ll2:
  !and rax, 0xffff
  !jmp lle
  !ll3:
  !and rax, 0xffffff
  !jmp lle
  !ll4:
  !mov rdx,0xffffffff
  !and rax, rdx
  !jmp lle
  !ll5:
  !mov rdx, 0xffffffffff
  !and rax, rdx
  !jmp lle
  !ll6:
  !mov rdx,0xffffffffffff
  !and rax, rdx
  !jmp lle
  !ll7:
  !mov rdx,0xffffffffffffff
  !and rax, rdx
  !jmp lle
  !ll8:
  !lle:
 
  !mov rdx,rax
  !popcnt rax,rdx   
  !add [p.v_result],rax   
 
  ProcedureReturn result 
EndProcedure   
 
CompilerEndIf

Procedure _PopCountSSE4(x.i)
  CompilerIf SizeOf(Integer) = 8
    !popcnt rax,[p.v_x]   
  CompilerElse
    !popcnt eax,[p.v_x]   
  CompilerEndIf   
  ProcedureReturn
EndProcedure   
 
Prototype PopCount(x.i)
Prototype PopCountMem(*mem,len)
Global PopCount.PopCount
Global PopCountMem.PopCountMem

Procedure InitpopCount(FallBackMode.i=64)
  Protected result
  !mov eax, 1
  !cpuid
  !shr ecx, 23
  !and ecx, 1
  !mov [p.v_result],ecx
  If result
    popcount = @_PopCountSSE4()
    popcountmem = @_popcountmemSSE()
  Else
    CompilerIf SizeOf(Integer) = 8
      If FallBackMode = 64
        popcount = @_popcount64()
        popcountmem = @_popcountmem()
      Else
        popcount = @_popcount32()
        popcountmem = @_popcountmem()
      EndIf   
    CompilerElse
      popcount = @_popcount32()
      popcountmem = @_popcountmem()
    CompilerEndIf
 EndIf   
EndProcedure


CompilerIf #PB_Compiler_IsMainFile
 
  InitpopCount() ; sets popcount to the best supported function
  ;on x64 if you want to set the mode to the 32 function range, call it with InitpopCount(#PB_Long)
 
  x.i = %10101010101010101010101010101010
  Debug PopCount(x)  ;16
 
  mx = Random(1024,128)
  *mem = AllocateMemory(mx)
  *pa.Ascii = *mem 
  For a = 0 To mx-1
   *pa\a = 3
   *pa+1
   r+2
  Next   
  Debug r     
  Debug PopCountMem(*mem,mx)   
  Debug _popcountmem(*mem,mx) 
 
CompilerEndIf


Top
 Profile  
Reply with quote  
 Post subject: Re: Population count
PostPosted: Thu Feb 04, 2016 4:10 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Jun 04, 2015 7:10 am
Posts: 1673
great work everyone!:) Ive simply added a CPUID check for SSE4 POPCNT, and then just combined the 32&64bit asm routines posted here along with a Prototype function pointer so it's always pointing at the best function
Code:
Prototype.i protoGetBitCnt(number.i)

Procedure.i HasSSE4_POPCNT()
  EnableASM
  Protected.i reg_bx
  CompilerIf SizeOf(Integer) = 8
    !mov [p.v_reg_bx], rbx
  CompilerElse
    !mov [p.v_reg_bx], ebx
  CompilerEndIf 
  ! mov eax, 1
  ! cpuid   
  ! xor eax, eax 
  ! bt ecx, 23   ;popcnt
  ! setc al   
  CompilerIf SizeOf(Integer) = 8
    !mov rbx, [p.v_reg_bx]
  CompilerElse
    !mov ebx, [p.v_reg_bx]
  CompilerEndIf
  ProcedureReturn
  DisableASM
EndProcedure


Procedure GetBitCountSSE4(x.i)
  CompilerIf SizeOf(Integer) = 8
    !popcnt rax,[p.v_x]   
  CompilerElse
    !popcnt eax,[p.v_x]   
  CompilerEndIf   
  ProcedureReturn
EndProcedure 


Procedure GetBitCountAsm(x.i)
  CompilerIf SizeOf(Integer) = 8
    !mov rax, [p.v_x]
    !mov rdx, rax
    !shr rdx, 1
    !mov r15, 0x5555555555555555
    !and rdx, r15
    !sub rax,rdx
    !mov rdx, rax       
    !mov r15, 0x3333333333333333
    !and rax, r15
    !shr rdx, 2
    !and rdx, r15
    !add rax,rdx
    !mov rdx, rax
    !shr rdx, 4
    !add rax,rdx
    !mov r15, 0x0f0f0f0f0f0f0f0f
    !and rax, r15
    !mov r15, 0x0101010101010101
    !imul rax, r15
    !shr rax, 56
  CompilerElse
    !mov eax, [p.v_x]
    !mov edx, eax
    !shr edx, 1
    !and edx, 0x55555555
    !sub eax,edx
    !mov edx, eax       
    !and eax, 0x33333333
    !shr edx, 2
    !and edx, 0x33333333
    !add eax,edx
    !mov edx, eax
    !shr edx, 4
    !add eax,edx
    !and eax, 0x0f0f0f0f
    !imul eax, 0x01010101
    !shr eax, 24
  CompilerEndIf
  ProcedureReturn
EndProcedure

 
;Self-initialize
If HasSSE4_POPCNT() = 1
  Global GetBitCnt.protoGetBitCnt = @GetBitCountSSE4()
Else
  Global GetBitCnt.protoGetBitCnt = @GetBitCountAsm()
EndIf

;---
;from now you can simply call GetBitCnt(value), and it will automatically
;be pointing to either the SSE4 or asm procedure depending on your CPU

ps. i really wish the Thread Subject included the words COUNT and BITS as I nearly missed this thread! (and besides, 32 is more of a party than a population, so yes Intel shouldve named the instruction BITCNT if not PARTYCNT) :D

_________________
Thankyou to all the coders who generously helped & encouraged me in the nearly 2yrs when i was welcome here,
it was a tremendous privilege. I learned a lot. I wish you and your families all the best and success for the future.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 35 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye