AVX-512 Instruction

Bare metal programming in PureBasic, for experienced users
User avatar
oryaaaaa
Enthusiast
Enthusiast
Posts: 791
Joined: Mon Jan 12, 2004 11:40 pm
Location: Okazaki, JAPAN

AVX-512 Instruction

Post by oryaaaaa »

Hello

PureBasic 5.73 LTS
flat assembler version 1.71.39

Is it on the planned roadmap?
flat assembler version 1.71.40
Added support for Intel AVX-512

512bits register is 32 pcs.
I hope to ASIO 2048 samples memory copy process in Bug head.

Code: Select all

VMOVDQA64 [R8], zmm0
Add R8, 64
VMOVDQA64 [R8], zmm1
Add R8, 64
VMOVDQA64 [R8], zmm2
Add R8, 64
...
Thanks.
sq4
User
User
Posts: 98
Joined: Wed Feb 26, 2014 3:16 pm
Contact:

Re: AVX-512 Instruction

Post by sq4 »

I hope to ASIO 2048 samples memory copy process in Bug head.
ASIO is all about latency reduction. The smaller the buffers, the better.

But if you talk about the render/mixing stage with anticipative pre-rendered buffering, then YES, AVX-512 is the way to go nowadays.
Olli
Addict
Addict
Posts: 1071
Joined: Wed May 27, 2020 12:26 pm

Re: AVX-512 Instruction

Post by Olli »

I do not understand how do you do in direct Assembly write :

Code: Select all

add rax,1
I was ever angry to have confusion between native basic instructions and Assembly instructions. Plus, I did not understand why any Assembly instructions were unabled directly, but executed even by prefixing the '!' character...

Code: Select all

! add rax, 1
Anyway, if it misses any Assembly statements, it is (weightly) possible to write directly their bytecode. (excepted if there is a security, that I do not absolutely know, on the recent OSs).
User avatar
oryaaaaa
Enthusiast
Enthusiast
Posts: 791
Joined: Mon Jan 12, 2004 11:40 pm
Location: Okazaki, JAPAN

Re: AVX-512 Instruction

Post by oryaaaaa »

I hope to ASIO 2048 samples memory copy process in Bug head.
My sound player is made so that the longer the ASIO latency, the better the sound quality. It is currently being used in $200,000 high-end audio, and professional studios that are strict about sound accuracy, and has a solid track record.

ASIO transfer codes

Code: Select all

...
!WasapiPorc_Process_SnowFall59:
!MOVNTQ [R8], mm5 ; Set ; 5
!MOVNTQ [R8], mm2 ; 0
!MOVNTQ [R8], mm1 ; 1
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm2 ; 0
!MOVNTQ [R8], mm1 ; 1
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm3 ; 1
!WasapiPorc_Process_SnowFall68:
!MOVNTQ [R8], mm5 ; Set ; 6
!NOP [Rip]
!NOP [R8]
!MOVQ mm5, [R8] ; [12.12 - 2.99]-Start
;dump ;   !MOVQ mm1, [R8] ; [12.27 - 3.14]
;dump ;   !MOVQ mm3, [R8] ; [12.27 - 3.14]
!MOVQ mm5, mm5 ; [12.34 - 3.21]
!MOVQ mm5, mm5 ; [12.34 - 3.21]
!MOVQ mm5, mm5 ; [12.34 - 3.21]
!MOVNTQ [R8], mm2 ; 0
!MOVNTQ [R8], mm1 ; 1
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm2 ; 0
!MOVNTQ [R8], mm1 ; 1
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set
!MOVNTQ [R8], mm3 ; 1
!MOVNTQ [R8], mm5 ; Set ; 6 [12.12 - 2.99]-End
!NOP [Rip]
!NOP [R8]
!AddFNOP_S_WasapiProc_2:
!INC Rdx
!INC Rdx
!INC Rdx
!INC Rdx ;4
!INC Rdx
!INC Rdx
!INC Rdx
!INC Rdx ;8
!INC R8
!INC R8
!INC R8
!INC R8 ;4
!INC R8
!INC R8
!INC R8
!INC R8 ;8
!DEC Rcx
!DEC Rcx
!DEC Rcx
!DEC Rcx ;4
!DEC Rcx
!DEC Rcx
!DEC Rcx
!DEC Rcx ;8
!FNOP ; for wide
!FNOP
!FNOP
!FNOP ;4
!NOP [Rip] ; [12.10 - 2.97]
!NOP [Rip] ; [12.10 - 2.97]
!JNZ WASAPI_Proc_LOOP_222
;     CopyMemory(*bufferDecode+WasapiPos, *buffer, length) ; or very light memory copy process
;     result = length : WasapiPos + result    
!WAIT ;1 [11.24 - 144]
!WAIT ; no fwait
!WAIT ; wait with FNOP
!WAIT ;4
!XCHG ch, cl
!XCHG cl, ch
The JNZ instruction is the one that causes the worst sound quality in this process. I was thinking of the AVX-512F instruction to avoid the loop process. But AVX-512 support can easily become a big neurological burden for compiler developers.

I am using memory access with MMX instructions. The problem with the Rax R8 registers instructions is that the left/right volume balance collapses with a full digital amplifier at the lowest 8 bits; the memory access for the SSE XMM registers and AVX YMM registers instructions seems to change the CPU clock during the transfer and the sound quality gets worse; the AVX-512F instruction might improve the sound quality. Only there, only about the transfer process, it would be enough to write it as a DLL in FASM.

How write FASM for AVX-512 x64 DLL? Do you know any about it?
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: AVX-512 Instruction

Post by Thorium »

It's fairly simple to update FASM to get the new instructions.
You can just replace FASM.exe in the compilers directory.
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: AVX-512 Instruction

Post by Keya »

another awesome thing about PB moving to a C compiler will be that optimised instructions like AVX512 will be readily available via switches :)
User avatar
Teddy Rogers
User
User
Posts: 92
Joined: Sun Feb 23, 2014 2:05 am
Location: Australia
Contact:

Re: AVX-512 Instruction

Post by Teddy Rogers »

Keya wrote: Thu Apr 15, 2021 4:59 am PB moving to a C compiler
Is this is a fact or a wish list comment? If fact where is this information to be found on PB's roadmap!

Ted.
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: AVX-512 Instruction

Post by #NULL »

Teddy Rogers wrote: Sat Apr 17, 2021 6:23 am
Keya wrote: Thu Apr 15, 2021 4:59 am PB moving to a C compiler
Is this is a fact or a wish list comment? If fact where is this information to be found on PB's roadmap!
discussion started here:
https://www.purebasic.fr/english/viewto ... 92#p567092
blog post:
https://www.purebasic.fr/blog/?p=480
fred's comment:
https://www.purebasic.fr/english/viewto ... 30#p567230
User avatar
Teddy Rogers
User
User
Posts: 92
Joined: Sun Feb 23, 2014 2:05 am
Location: Australia
Contact:

Re: AVX-512 Instruction

Post by Teddy Rogers »

Thank you for the info and links. Good to read this change is coming! I see it being an advantage in the long term as it may help to speed up development of PureBasic and have a quicker turn around introducing and supporting new hardware and OS features...

Ted.
Post Reply