Page 1 of 1

AVX-512 Instruction

Posted: Sat Feb 13, 2021 10:47 am
by oryaaaaa
Hello

PureBasic 5.73 LTS
flat assembler version 1.71.39

Is it on the planned roadmap?
flat assembler version 1.71.40
Added support for Intel AVX-512

512bits register is 32 pcs.
I hope to ASIO 2048 samples memory copy process in Bug head.

Code: Select all

VMOVDQA64 [R8], zmm0
Add R8, 64
VMOVDQA64 [R8], zmm1
Add R8, 64
VMOVDQA64 [R8], zmm2
Add R8, 64
...
Thanks.

Re: AVX-512 Instruction

Posted: Sat Feb 13, 2021 12:18 pm
by sq4
I hope to ASIO 2048 samples memory copy process in Bug head.
ASIO is all about latency reduction. The smaller the buffers, the better.

But if you talk about the render/mixing stage with anticipative pre-rendered buffering, then YES, AVX-512 is the way to go nowadays.

Re: AVX-512 Instruction

Posted: Sat Feb 13, 2021 6:15 pm
by Olli
I do not understand how do you do in direct Assembly write :

Code: Select all

add rax,1
I was ever angry to have confusion between native basic instructions and Assembly instructions. Plus, I did not understand why any Assembly instructions were unabled directly, but executed even by prefixing the '!' character...

Code: Select all

! add rax, 1
Anyway, if it misses any Assembly statements, it is (weightly) possible to write directly their bytecode. (excepted if there is a security, that I do not absolutely know, on the recent OSs).

Re: AVX-512 Instruction

Posted: Mon Feb 15, 2021 3:23 am
by oryaaaaa
I hope to ASIO 2048 samples memory copy process in Bug head.
My sound player is made so that the longer the ASIO latency, the better the sound quality. It is currently being used in $200,000 high-end audio, and professional studios that are strict about sound accuracy, and has a solid track record.

ASIO transfer codes

Code: Select all

...
!WasapiPorc_Process_SnowFall59:
!MOVNTQ [R8], mm5 ; Set ; 5
!MOVNTQ [R8], mm2 ; 0
!MOVNTQ [R8], mm1 ; 1
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm2 ; 0
!MOVNTQ [R8], mm1 ; 1
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm3 ; 1
!WasapiPorc_Process_SnowFall68:
!MOVNTQ [R8], mm5 ; Set ; 6
!NOP [Rip]
!NOP [R8]
!MOVQ mm5, [R8] ; [12.12 - 2.99]-Start
;dump ;   !MOVQ mm1, [R8] ; [12.27 - 3.14]
;dump ;   !MOVQ mm3, [R8] ; [12.27 - 3.14]
!MOVQ mm5, mm5 ; [12.34 - 3.21]
!MOVQ mm5, mm5 ; [12.34 - 3.21]
!MOVQ mm5, mm5 ; [12.34 - 3.21]
!MOVNTQ [R8], mm2 ; 0
!MOVNTQ [R8], mm1 ; 1
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm2 ; 0
!MOVNTQ [R8], mm1 ; 1
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set ; 6 [12.12 - 2.99]-End
!NOP [Rip]
!NOP [R8]
!AddFNOP_S_WasapiProc_2:
!INC Rdx
!INC Rdx
!INC Rdx
!INC Rdx ;4
!INC Rdx
!INC Rdx
!INC Rdx
!INC Rdx ;8
!INC R8
!INC R8
!INC R8
!INC R8 ;4
!INC R8
!INC R8
!INC R8
!INC R8 ;8
!DEC Rcx
!DEC Rcx
!DEC Rcx
!DEC Rcx ;4
!DEC Rcx
!DEC Rcx
!DEC Rcx
!DEC Rcx ;8
!FNOP ; for wide
!FNOP
!FNOP
!FNOP ;4
!NOP [Rip] ; [12.10 - 2.97]
!NOP [Rip] ; [12.10 - 2.97]
!JNZ WASAPI_Proc_LOOP_222
;     CopyMemory(*bufferDecode+WasapiPos, *buffer, length) ; or very light memory copy process
;     result = length : WasapiPos + result    
!WAIT ;1 [11.24 - 144]
!WAIT ; no fwait
!WAIT ; wait with FNOP
!WAIT ;4
!XCHG ch, cl
!XCHG cl, ch
The JNZ instruction is the one that causes the worst sound quality in this process. I was thinking of the AVX-512F instruction to avoid the loop process. But AVX-512 support can easily become a big neurological burden for compiler developers.

I am using memory access with MMX instructions. The problem with the Rax R8 registers instructions is that the left/right volume balance collapses with a full digital amplifier at the lowest 8 bits; the memory access for the SSE XMM registers and AVX YMM registers instructions seems to change the CPU clock during the transfer and the sound quality gets worse; the AVX-512F instruction might improve the sound quality. Only there, only about the transfer process, it would be enough to write it as a DLL in FASM.

How write FASM for AVX-512 x64 DLL? Do you know any about it?

Re: AVX-512 Instruction

Posted: Tue Mar 23, 2021 3:53 pm
by Thorium
It's fairly simple to update FASM to get the new instructions.
You can just replace FASM.exe in the compilers directory.

Re: AVX-512 Instruction

Posted: Thu Apr 15, 2021 4:59 am
by Keya
another awesome thing about PB moving to a C compiler will be that optimised instructions like AVX512 will be readily available via switches :)

Re: AVX-512 Instruction

Posted: Sat Apr 17, 2021 6:23 am
by Teddy Rogers
Keya wrote: Thu Apr 15, 2021 4:59 am PB moving to a C compiler
Is this is a fact or a wish list comment? If fact where is this information to be found on PB's roadmap!

Ted.

Re: AVX-512 Instruction

Posted: Sat Apr 17, 2021 7:47 am
by #NULL
Teddy Rogers wrote: Sat Apr 17, 2021 6:23 am
Keya wrote: Thu Apr 15, 2021 4:59 am PB moving to a C compiler
Is this is a fact or a wish list comment? If fact where is this information to be found on PB's roadmap!
discussion started here:
https://www.purebasic.fr/english/viewto ... 92#p567092
blog post:
https://www.purebasic.fr/blog/?p=480
fred's comment:
https://www.purebasic.fr/english/viewto ... 30#p567230

Re: AVX-512 Instruction

Posted: Sun May 02, 2021 2:24 am
by Teddy Rogers
Thank you for the info and links. Good to read this change is coming! I see it being an advantage in the long term as it may help to speed up development of PureBasic and have a quicker turn around introducing and supporting new hardware and OS features...

Ted.