Page 3 of 5
Re: Comparisons in mathematical simulations
Posted: Tue May 16, 2023 8:10 am
by wilbert
fvillanova wrote: Tue May 16, 2023 4:58 am
I appreciate any contribution to speeding up the CreateS() procedure.
Code: Select all
Procedure.s CreateS(*input)
Protected Dim Result.u(100)
FillMemory(@Result(), 200, '0', #PB_Unicode)
!mov r9, [p.a_Result]
!xor rax, rax
!xor edx, edx
!mov r8, [p.p_input]
!.loop:
!mov ecx, [r8]
!add r8, 6
!and ecx, 0xf000f
!imul ecx, 0xa0001
!shr ecx, 16
!neg rcx
!mov byte [r9 + rcx*2 + 198], '1'
!cmp [r8 - 2], dx
!jne .loop
ProcedureReturn PeekS(@result())
EndProcedure
s1.s="92 00 99 80 65 68 04 01"
s2.s="37 65 04 92 15 98 00 43 02"
Debug CreateS(@s1)
Debug CreateS(@s2)
Grep using a 128 bit structure ...
Code: Select all
Structure I128
low.q
high.q
EndStructure
Procedure CreateB_I128(*input, *result.I128)
!xor eax, eax
!mov r8, [p.p_input] ; pointer to input string
!mov r9, [p.p_result] ; pointer to i128 result
!xor r10, r10 ; result bits low [00-63]
!xor r11, r11 ; result bits high [64-99]
!.l0:
!mov ecx, [r8] ; get number
!add r8, 6
!and ecx, 0xf000f
!imul ecx, 0xa0001
!shr ecx, 16
!test ecx, 64 ; >= 64 ?
!jnz .l1
!bts r10, rcx ; set bit in low
!jmp .l2
!.l1:
!bts r11, rcx ; set bit in high
!.l2:
!cmp [r8 - 2], ax ; check for end of input string
!jne .l0 ; loop if not end of string
!mov [r9], r10 ; set result bits low [00-63]
!mov [r9 + 8], r11 ; set result bits high [64-99]
EndProcedure
Procedure.i PopCount_I128(*input.I128)
!mov r8, [p.p_input]
!popcnt rax, [r8] ; popcnt bits [00-63]
!popcnt rdx, [r8 + 8] ; popcnt bits [64-99]
!add rax, rdx ; add
ProcedureReturn
EndProcedure
Procedure.i Grep_I128(*input1.I128, *input2.I128)
!mov r8, [p.p_input1]
!mov r9, [p.p_input2]
!mov rax, [r8] ; get low bits of input1
!and rax, [r9] ; and with low bits of input2
!popcnt rax, rax ; popcnt
!mov rdx, [r8 + 8] ; get high bits of input1
!and rdx, [r9 + 8] ; and with high bits of input2
!popcnt rdx, rdx ; popcnt
!add rax, rdx ; add
ProcedureReturn
EndProcedure
s1.s="07 18 15 61 02 10 08 03 00 99 97"
s2.s="01 17 61 07 04 05 18 06 57 99"
s3.s="01 07"
CreateB_I128(@s1, @x.I128)
CreateB_I128(@s2, @y.I128)
CreateB_I128(@s3, @z.I128)
Debug "PopCount"
Debug PopCount_I128(@x)
Debug PopCount_I128(@y)
Debug PopCount_I128(@z)
Debug "Grep"
Debug Grep_I128(@x, @y)
Debug Grep_I128(@x, @z)
Debug Grep_I128(@y, @z)
Using popcnt is very useful if you convert the "01 99 50" strings once and use bits internally.
If you are always using those "01 99 50" strings as input for grep it probably is faster to do it differently.
Re: Comparisons in mathematical simulations
Posted: Tue May 16, 2023 7:39 pm
by fvillanova
wilbert wrote: Tue May 16, 2023 8:10 am
fvillanova wrote: Tue May 16, 2023 4:58 am
I appreciate any contribution to speeding up the CreateS() procedure.
Code: Select all
Procedure.s CreateS(*input)
Protected Dim Result.u(100)
FillMemory(@Result(), 200, '0', #PB_Unicode)
!mov r9, [p.a_Result]
!xor rax, rax
!xor edx, edx
!mov r8, [p.p_input]
!.loop:
!mov ecx, [r8]
!add r8, 6
!and ecx, 0xf000f
!imul ecx, 0xa0001
!shr ecx, 16
!neg rcx
!mov byte [r9 + rcx*2 + 198], '1'
!cmp [r8 - 2], dx
!jne .loop
ProcedureReturn PeekS(@result())
EndProcedure
s1.s="92 00 99 80 65 68 04 01"
s2.s="37 65 04 92 15 98 00 43 02"
Debug CreateS(@s1)
Debug CreateS(@s2)
Amazing, it was very fast to create String 99 to 0
Thank you again!
My program already stores these Strings in database records that are later processed with the PopCount procedure with unbeatable speed in comparisons.
I have routines that need to process trillions of iterations (loops).
CreateS test results:
My initial CreateS = 457 ms
Demivec's CreateS = 58 ms
Wilbert's CreateS = 7 ms
Then I'll see your idea of CreateB_I128 and PopCount_I128 but at the moment creating keys previously with CreateS and processing
with PopCount the speed is unbeatable:
Code: Select all
DisableDebugger
Procedure PopCount(inp.i)
!POPCNT rax,[p.v_inp]
ProcedureReturn
EndProcedure
s11.s="1000000100000000000100000000000100100000000000000000000000000000000000000000000000000000000000010011" ; "92 00 99 80 65 68 04 01"
s22.s="0100000100000000000000000000000000100000000000000000000010000010000000000000000000001000000000010101" ; "37 65 04 92 15 98 00 43 02"
part1.s=Left(s11,50)
part2.s=Right(s11,50)
part3.s=Left(s22,50)
part4.s=Right(s22,50)
x1.i=Val("%"+part1)
x2.i=Val("%"+part2)
y1.i=Val("%"+part3)
y2.i=Val("%"+part4)
t1 = ElapsedMilliseconds()
For i=1 To 40000000
n=PopCount(x1 & y1)+PopCount(x2 & y2)
Next
Result$+"Count "+Str(n)+" in "+StrF((ElapsedMilliseconds()-t1),0)+" ms ( For 400000000 four hundred million times ) "+Chr(13)
EnableDebugger
Debug Result$
I believe my idea of splitting the string into two parts is 8 times faster than your solution with procedure Grep_I128
n=PopCount(x1 & y1)+PopCount(x2 & y2) = 109 milliseconds
n=Grep_I128(@x,@y) = 890 milliseconds
Many thanks again for the new procedure CountS().
Re: Comparisons in mathematical simulations
Posted: Tue May 16, 2023 9:17 pm
by wilbert
fvillanova wrote: Tue May 16, 2023 7:39 pmCreateS test results:
My initial CreateS = 457 ms
Demivec's CreateS = 58 ms
Wilbert's CreateS = 7 ms
Good to hear that it works for you.
PopCount itself is fast.
It's the splitting the string in two and converting to a number with Left / Right and Val that takes up most of the time.
Shouldn't that also be in the loop when you time things or is this conversion a one time only operation ?
Re: Comparisons in mathematical simulations
Posted: Tue May 16, 2023 10:29 pm
by fvillanova
wilbert wrote: Tue May 16, 2023 9:17 pm
fvillanova wrote: Tue May 16, 2023 7:39 pmCreateS test results:
PopCount itself is fast.
It's the splitting the string in two and converting to a number with Left / Right and Val that takes up most of the time.
Shouldn't that also be in the loop when you time things or is this conversion a one time only operation ?
Yes, dividing by two Strings and converting left and right is already recorded in the database, it is not processed every time, just once.
The heaviest processing during simulations is done with the two conversion numbers that were previously calculated.
Thanks a lot for the help.
Re: Comparisons in mathematical simulations
Posted: Fri May 19, 2023 6:48 am
by wilbert
Just for fun (and speed) ...
2 alternative ways you could call popcnt with your two quad variables.
Code: Select all
DisableDebugger
; PopCount128
Procedure PopCount128_Addr()
ProcedureReturn ?Addr
!align 8
Addr:
CompilerIf #PB_Compiler_OS = #PB_OS_Windows
!popcnt rax, rcx ; x64 Windows
!popcnt r8 , rdx
CompilerElse
!popcnt rax, rdi ; x64 macOS and Linux
!popcnt r8 , rsi
CompilerEndIf
!add rax, r8
!ret
EndProcedure
PrototypeC PopCount128_Proto(q1.q, q2.q)
Global PopCount128.PopCount128_Proto = PopCount128_Addr()
; PopCount
Procedure PopCount(inp.i)
!popcnt rax, [p.v_inp]
ProcedureReturn
EndProcedure
; Compare
x1 = 5
x2 = 15
y1 = 255
y2 = 127
t1 = ElapsedMilliseconds()
For i=1 To 100000000
n=PopCount(x1 & y1)+PopCount(x2 & y2)
Next
t2 = ElapsedMilliseconds()
For i=1 To 100000000
n=PopCount128(x1 & y1, x2 & y2)
Next
t3 = ElapsedMilliseconds()
For i=1 To 100000000
; inline asm
!mov rax, [v_x1]
!mov rdx, [v_x2]
!and rax, [v_y1]
!and rdx, [v_y2]
!popcnt rax, rax
!popcnt rdx, rdx
!add rax, rdx
!mov [v_n], rax
Next
t4 = ElapsedMilliseconds()
EnableDebugger
Debug "PopCount (2x) : " + Str(t2-t1)
Debug "PopCount128 : " + Str(t3-t2)
Debug "inline popcnt : " + Str(t4-t3)
Re: Comparisons in mathematical simulations
Posted: Fri May 19, 2023 10:50 am
by Piero
I was always more interested in History of Math than in Math itself; so forgive me if I post again about this "problem": Kepler's conjecture (packing spheres!)
When I "studied" it, if I remember well, it was """nearly solved""" by "computations"...
I'm posting this just because I hope you will find it interesting...
In case forgive me
Re: Comparisons in mathematical simulations
Posted: Fri May 19, 2023 7:09 pm
by fvillanova
wilbert wrote: Fri May 19, 2023 6:48 am
Just for fun (and speed) ...
2 alternative ways you could call popcnt with your two quad variables.
Hi Wilbert,
Now this option "inline" will be the best choice.
I'll let you know later how it worked on my system.
Re: Comparisons in mathematical simulations
Posted: Fri May 19, 2023 9:08 pm
by Olli
Piero wrote: Fri May 19, 2023 10:50 am
I was always more interested in History of Math than in Math itself; so forgive me if I post again about this "problem": Kepler's conjecture (packing spheres!)
When I "studied" it, if I remember well, it was """nearly solved""" by "computations"...
I'm posting this just because I hope you will find it interesting...
In case forgive me
Hello Piero,
do not hesitate to create a whole subject, first in the
coding section, about the Kepler's conjecture (you put this expression in the title)
There will be any chances to get surprising answers. And after a time, you will be able to write a small project, or more in the
announcement section.
Re: Comparisons in mathematical simulations
Posted: Sat May 20, 2023 1:01 am
by fvillanova
wilbert wrote: Fri May 19, 2023 6:48 am
Just for fun (and speed) ...
2 alternative ways you could call popcnt with your two quad variables.
Amazing, the speed got bigger, I thought it would be impossible to increase!
In some places in the main program the input variables are simple, but in other places they are arrays.
Would it be faster if the "inline" routine took the values directly from the array without having to first pass it to a simple variable?
Code: Select all
i4q=matf2(i5) ; i5 = integer index
i8q=matf(i6) ; i6 = integer index
!mov rax, [v_i4q]
!and rax, [v_i8q]
!popcnt rax, rax
!mov [v_n], rax
Re: Comparisons in mathematical simulations
Posted: Sat May 20, 2023 1:21 am
by fvillanova
Piero wrote: Fri May 19, 2023 10:50 am
I was always more interested in History of Math than in Math itself; so forgive me if I post again about this "problem": Kepler's conjecture (packing spheres!)
When I "studied" it, if I remember well, it was """nearly solved""" by "computations"...
I'm posting this just because I hope you will find it interesting...
In case forgive me
Yes, mathematical problems like this can definitely be solved with quantum computing that is coming in a few years (I hope) that it will be within our reach!
Re: Comparisons in mathematical simulations
Posted: Sat May 20, 2023 6:21 am
by wilbert
fvillanova wrote: Sat May 20, 2023 1:01 am
Amazing, the speed got bigger, I thought it would be impossible to increase!
In some places in the main program the input variables are simple, but in other places they are arrays.
Would it be faster if the "inline" routine took the values directly from the array without having to first pass it to a simple variable?
Yes, that should be faster.
You can see for yourself
Test 1
Code: Select all
DisableDebugger
Dim matf.q(255)
Dim matf2.q(255)
i5 = 31
i6 = 151
matf(i6) = 65535
matf2(i5) = 247
n = 0
; Compare
t1 = ElapsedMilliseconds()
For i=1 To 100000000
i4q=matf2(i5) ; i5 = integer index
i8q=matf(i6) ; i6 = integer index
!mov rax, [v_i4q]
!and rax, [v_i8q]
!popcnt rax, rax
!mov [v_n], rax
Next
t2 = ElapsedMilliseconds()
For i=1 To 100000000
; inline asm
!mov r8, [v_i5]
!mov r9, [v_i6]
!shl r8, 3
!shl r9, 3
!add r8, [a_matf2]
!add r9, [a_matf]
!mov rax, [r8]
!and rax, [r9]
!popcnt rax, rax
!mov [v_n], rax
Next
t3 = ElapsedMilliseconds()
EnableDebugger
Debug "With i4q/i8q variables : " + Str(t2-t1)
Debug "Without i4q/i8q variables : " + Str(t3-t2)
Test 2
Code: Select all
DisableDebugger
Dim matf.q(8191)
Dim matf2.q(8191)
RandomData(@matf(0), 8192*8)
RandomData(@matf2(0), 8192*8)
n = 0
; Compare
t1 = ElapsedMilliseconds()
For i5 = 0 To 8191
For i6 = 0 To 8191
i4q=matf2(i5) ; i5 = integer index
i8q=matf(i6) ; i6 = integer index
!mov rax, [v_i4q]
!and rax, [v_i8q]
!popcnt rax, rax
!mov [v_n], rax
Next
Next
t2 = ElapsedMilliseconds()
For i5 = 0 To 8191
For i6 = 0 To 8191
; inline asm
!mov r8, [v_i5]
!mov r9, [v_i6]
!shl r8, 3
!shl r9, 3
!add r8, [a_matf2]
!add r9, [a_matf]
!mov rax, [r8]
!and rax, [r9]
!popcnt rax, rax
!mov [v_n], rax
Next
Next
t3 = ElapsedMilliseconds()
EnableDebugger
Debug "With i4q/i8q variables : " + Str(t2-t1)
Debug "Without i4q/i8q variables : " + Str(t3-t2)
Re: Comparisons in mathematical simulations
Posted: Sat May 20, 2023 8:24 am
by idle
Code: Select all
DisableDebugger
Procedure popcountmem(*mem,*mem1,len)
Protected result,a,b
!mov rax, [p.v_len]
!and rax, 7
!mov rdx, [p.v_len]
!sub rdx,rax
!mov [p.v_a],rdx
!mov [p.v_b],rax
!xor rcx,rcx
!lwhile1:
!cmp rcx, [p.v_a]
!jge lend1
!mov rax, [p.p_mem]
!mov rax, [rax + rcx]
!mov r9, [p.p_mem1]
!mov r9, [r9 + rcx]
!and rax,r9
!popcnt rax,rax
!add [p.v_result],rax
!add rcx,8
!jmp lwhile1
!lend1:
!mov rax, [p.p_mem]
!mov rax, [rax + rcx]
!mov r9, [p.p_mem1]
!mov r9, [r9 + rcx]
!and rax,r9
!lea rdx, [JT_remain1]
!mov rcx, [p.v_b]
!jmp qword [rdx + rcx * 8]
!JT_remain1 dq lle,ll1,ll2,ll3,ll4,ll5,ll6,ll7,ll8
!ll1:
!and rax, 0xff
!jmp lle
!ll2:
!and rax, 0xffff
!jmp lle
!ll3:
!and rax, 0xffffff
!jmp lle
!ll4:
!mov rdx,0xffffffff
!and rax, rdx
!jmp lle
!ll5:
!mov rdx, 0xffffffffff
!and rax, rdx
!jmp lle
!ll6:
!mov rdx,0xffffffffffff
!and rax, rdx
!jmp lle
!ll7:
!mov rdx,0xffffffffffffff
!and rax, rdx
!jmp lle
!ll8:
!lle:
!mov rdx,rax
!popcnt rax,rdx
!add [p.v_result],rax
ProcedureReturn result
EndProcedure
Dim matf.q(8191)
Dim matf2.q(8191)
RandomData(@matf(0), 8192*8)
RandomData(@matf2(0), 8192*8)
n = 0
n1 =0
n2=0
lp=8000
; Compare
t1 = ElapsedMilliseconds()
For a = 0 To lp
n=0
For i5 = 0 To 8191
i4q=matf2(i5) ; i5 = integer index
i8q=matf(i5) ; i6 = integer index
!mov rax, [v_i4q]
!and rax, [v_i8q]
!popcnt rax, rax
!add [v_n], rax
Next
Next
t2 = ElapsedMilliseconds()
For a = 0 To lp
n1=0
For i5 = 0 To 8191
; inline asm
!mov r8, [v_i5]
!mov r9, [v_i5]
!shl r8, 3
!shl r9, 3
!add r8, [a_matf2]
!add r9, [a_matf]
!mov rax, [r8]
!and rax, [r9]
!popcnt rax, rax
!add [v_n1], rax
Next
Next
t3 = ElapsedMilliseconds()
For a = 0 To lp
n2=0
n2 = popcountmem(@matf(0),@matf2(0),8192*8)
Next
t4 = ElapsedMilliseconds()
EnableDebugger
Debug "With i4q/i8q variables : " + Str(n) + " " + Str(t2-t1)
Debug "Without i4q/i8q variables : " + Str(n1) + " " + Str(t3-t2)
Debug "Raw : " + Str(n2) + " " + Str(t4-t3)
Re: Comparisons in mathematical simulations
Posted: Sun May 21, 2023 1:13 am
by fvillanova
wilbert wrote: Sat May 20, 2023 6:21 am
Yes, that should be faster.
You can see for yourself
Yes, it got even faster! I still can't believe it!
Several processings were done in +- 8 to 9 minutes and now they are being processed in 3.3 minutes!!!
I am not experienced in ASM but I can make small changes and simple routines.
See how the main part is in the processing:
Code: Select all
i4q=CreateB(@aux2)
FillMemory(@conte(), (mate(i0,14)+1)*8, 0, #PB_Integer)
i6=mate(i0,10): kk2=mate(i0,11)-i6+1
!mov r10,[a_conte]
!mov rcx,1
!MOV r13, qword [v_kk2]
!l_jump12:
!mov r9, [v_i6]
!shl r9, 3
!add r9, [a_matf]
!mov rax, [v_i4q]
!and rax, [r9]
!popcnt rax, rax
!add [r10 + rax * 8],rcx
!add [v_i6], 1
!dec r13
!jnz l_jump12
It's working great! and very... very... very fast!
note: The above code will not work because there are external variables and arrays that are not present here.
Re: Comparisons in mathematical simulations
Posted: Sun May 21, 2023 6:30 am
by wilbert
fvillanova wrote: Sun May 21, 2023 1:13 am
It's working great! and very... very... very fast!
note: The above code will not work because there are external variables and arrays that are not present here.
I think there's still some room for improvement on your code.
You don't have to update the i6 variable al the time. You can do it once after the loop has finished.
The code below should accomplish the same as your code.
Code: Select all
i4q=CreateB(@aux2)
FillMemory(@conte(), (mate(i0,14)+1)*8, 0, #PB_Integer)
i6=mate(i0,10): kk2=mate(i0,11)-i6+1
!mov r9, [v_i6] ; r9 = @matf() + i6 * 8
!shl r9, 3
!add r9, [a_matf]
!mov r10, [a_conte] ; r10 = @conte()
!mov rcx, [v_kk2] ; rcx = kk2
!l_jump12:
!mov rax, [v_i4q]
!and rax, [r9]
!popcnt rax, rax
!add qword [r10 + rax * 8], 1
!add r9, 8
!sub rcx, 1
!jnz l_jump12
i6+kk2
Re: Comparisons in mathematical simulations
Posted: Sun May 21, 2023 2:03 pm
by fvillanova
wilbert wrote: Sun May 21, 2023 6:30 am
I think there's still some room for improvement on your code.
You don't have to update the i6 variable al the time. You can do it once after the loop has finished.
The code below should accomplish the same as your code.
Unbelievable! you are the man!
Really, i6 was no longer necessary to be incremented, it was needed in the previous routines, and I didn't see it.
The other speedup you did, doubled the speed, wow!
This is the "Holy Grail" for my routine.
There are some details that I didn't show you, because it is used in 8 places in the system, including where it uses the four variables to be able to make PopCount with more than 64 (0 or 1)
Now, I put it in this segment of the program that consumes more processing because the mate(x,y) array have a large variation of values, later I will replace those in the other seven places.
Thanks again for your help.