Page 1 of 1

Speed of ASM Instructions ...(comparison of different implementations)

Posted: Wed Dec 14, 2022 12:48 pm
by Axolotl
Hi folks,
I am doing some memory analysing stuff and was wondering about the best implementation.

Code: Select all

; === Test Program -- Compare implementations === 
Structure TByteArray  ; Memory Access by *Memory\Byte[Index] 
  Byte.a[0] 
EndStructure 

Procedure Test_1(*M, Index) 
  Protected r 
  r = PeekA(*M + Index) 
  Debug "1:" + r + " | " + Index 
EndProcedure  

Procedure Test_2(*Mem.TByteArray, Index) 
  Protected r 
  r = *Mem\Byte[Index] 
  Debug "2:" + r + " | " + Index 
EndProcedure  

  *MemoryID = AllocateMemory(500)
  If *MemoryID
    Debug "Starting address of the 500 Byte memory area:"
    PokeS(*MemoryID, "ABC Store this string in the memory area", -1, #PB_Ascii) 
    Test_1(*MemoryID, 1) 
    Test_2(*MemoryID, 1) 
    FreeMemory(*MemoryID)  ; will also be done automatically at the end of program
  Else
    Debug "Couldn't allocate the requested memory!"
  EndIf
This is what the ASM output shows: (I have prepared the part which is in my opinion the most important)

Code: Select all

  ; === Test 2 =========================================== Test 1 ===============================
  ; 
  ; Structure TByteArray  
  ;   Byte.a[0] 
  ; EndStructure 
  ; 

  ; Procedure Test_2(*Mem.TByteArray, Index)              ; Procedure Test_1(*M, Index) 
  _Procedure2:                                            _Procedure0:  
    MOV    qword [rsp+8],rcx                                MOV    qword [rsp+8],rcx  
    MOV    qword [rsp+16],rdx                               MOV    qword [rsp+16],rdx  
    PUSH   rbp                                              PUSH   r15  
    PS2=64                                                  PS0=64  
    XOR    rax,rax                                          XOR    rax,rax  
    PUSH   rax                                              PUSH   rax  
    SUB    rsp,40                                           SUB    rsp,40  
  ; Protected r                                           ; Protected r   
  ; r = *Mem\Byte[Index]                                  ; r = PeekA(*M + Index)   
    MOV    rbp,qword [rsp+PS2+0]                            MOV    r15,qword [rsp+PS0+0]  
    PUSH   rbp                                              ADD    r15,qword [rsp+PS0+8]  
    MOV    rax,qword [rsp+PS2+16]                           MOV    rax,r15  
    POP    rbp                                              PUSH   rax  
    ADD    rbp,rax                                          POP    rcx  
    MOVZX  rax,byte [rbp]                                   CALL   PB_PeekA  
    PUSH   rax                             
    POP    rax                             
    MOV    qword [rsp+40],rax                               MOV    qword [rsp+40],rax  
  ;                                                       ;   
  ; Debug "2:" + r + " | " + Index                        ; Debug "1:" + r + " | " + Index   
  ; EndProcedure                                          ; EndProcedure    

I would be interested in your opinion as ASM experts on this?
TIA.
Happy coding and stay healthy.

Re: Speed of ASM Instructions ...(comparison of different implementations)

Posted: Wed Dec 14, 2022 1:16 pm
by STARGĂ…TE
What is the question?
What is faster/better, Test_1 vs. Test_2?
Definitely Test_2 with the byte array, because you have no additional call instruction for PB_PeekA with jumps and you have to include the ASM output for PB_PeekA as well.
However, as you can see, the PB ASM output is not optimized, because of senseless lines like:

Code: Select all

    PUSH   rax                             
    POP    rax 

Re: Speed of ASM Instructions ...(comparison of different implementations)

Posted: Wed Dec 14, 2022 1:50 pm
by Axolotl
yes, you have extracted my question correctly...
sorry for writing so imprecisely.

I also suspected that the array could be faster.
Now I know.
thanks for your quick answer.

BTW: normally i don't dive that deep into assember code, but if you can.... :oops:

Re: Speed of ASM Instructions ...(comparison of different implementations)

Posted: Wed Dec 14, 2022 11:59 pm
by idle
In General if you want to improve performance you just need to minimize branching and memory fetches
Also while loops are faster than for loops in PB and if you're processing strings, pass a string in by reference

This will compile to near optimal with C backend

Code: Select all

Structure TByteArray  ; Memory Access by *Memory\Byte[Index] 
  Byte.a[0] 
EndStructure 

*MemoryID.TByteArray = Ascii("ABC Store this string in the memory area") 
ct = 0 
char.a 

While *MemoryID\Byte[ct] 
  char = *MemoryID\Byte[ct] 
  debug chr(char) 
  ct+1 
Wend 


Re: Speed of ASM Instructions ...(comparison of different implementations)

Posted: Thu Dec 15, 2022 12:39 am
by mk-soft

Code: Select all


Structure ArrayOfAscII
  Char.a[0]
EndStructure

Structure ArrayOfUnicode
  Char.u[0]
EndStructure

Define Text.s
Define *pText.ArrayOfUnicode, *pAscII.ArrayOfAscII
Define index, char

Text = "ABC Store this string in the memory area"

*pText = @Text
*pAscII = Ascii(Text) 

Debug "*** Unicode ***"
index = 0
While *pText\Char[index] 
  char = *pText\Char[index] 
  Debug RSet(Hex(char, #PB_Unicode), 4, "0") + ": " + Chr(char) 
  index + 1
Wend 

Debug "*** AscII ***"
index = 0
While *pAscII\Char[index] 
  char = *pText\Char[index] 
  Debug RSet(Hex(char, #PB_Ascii), 2, "0") + ": " + Chr(char) 
  index + 1
Wend

FreeMemory(*pAscII)

Re: Speed of ASM Instructions ...(comparison of different implementations)

Posted: Thu Dec 15, 2022 12:56 am
by mk-soft

Code: Select all


Structure ArrayOfAscII
  Char.a[0]
EndStructure

Structure ArrayOfUnicode
  Char.u[0]
EndStructure

Define Text.s
Define *pText.ArrayOfUnicode, *pAscII.ArrayOfAscII
Define index, char

Text = "ABC Store this string in the memory area"

*pText = @Text
*pAscII = Ascii(Text) 

Debug "*** Unicode ***"
index = 0
While *pText\Char[index] 
  char = *pText\Char[index] 
  Debug RSet(Hex(char, #PB_Unicode), 4, "0") + ": " + Chr(char) 
  index + 1
Wend 

Debug "*** AscII ***"
index = 0
While *pAscII\Char[index] 
  char = *pText\Char[index] 
  Debug RSet(Hex(char, #PB_Ascii), 2, "0") + ": " + Chr(char) 
  index + 1
Wend

FreeMemory(*pAscII)

Debug "*** String Parameter ByRef / Need always a variable 'strVal' of type string where store pointer to string ***"

Procedure Upper(*string.string)
  *string\s = "+++ " + UCase(*string\s) + " +++" 
  ProcedureReturn Len(*string\s)
EndProcedure

Define len, strVal.String

strVal\s = "hello world"
Debug "Addr to string: " + @strVal\s
len = Upper(strVal)
Debug "Len = " + len + " / " + strVal\s
Debug "Addr to string: " + @strVal\s

Re: Speed of ASM Instructions ...(comparison of different implementations)

Posted: Thu Dec 15, 2022 11:24 am
by juergenkulow

Code: Select all

; movzx eax, byte [rdx+rax] 
Structure TByteArray  
  Byte.a[0] 
EndStructure 

*MemoryID.TByteArray=Ascii("ABC Store this string in the memory area")
! asm("nop"); 
Index=1
r1=PeekA(*MemoryID + Index)
r2=*MemoryID\Byte[Index]
! asm("nop"); 
Debug r1
Debug r2
CompilerIf #PB_Compiler_Backend<>#PB_Backend_C Or #PB_Compiler_Optimizer=0
  CompilerError "Please switch Copmpiler to opimized C Backend."
CompilerEndIf 
; 66
; 66

; 0000000140001086 | 48:C705 2F330000 0100000 | mov qword ptr ds:[1400043C0],1                                |
; 0000000140001091 | 48:8B0D 28330000         | mov rcx,qword ptr ds:[1400043C0]                              |
; 0000000140001098 | 48:030D 29330000         | add rcx,qword ptr ds:[1400043C8]                              |
; 000000014000109F | E8 9C040000              | call memtest2.140001540                                       |
; 00000001400010A4 | 48:8905 2D330000         | mov qword ptr ds:[1400043D8],rax                              |
; 00000001400010AB | 48:8B05 0E330000         | mov rax,qword ptr ds:[1400043C0]                              |
; 00000001400010B2 | 48:8B15 0F330000         | mov rdx,qword ptr ds:[1400043C8]                              | rdx:EntryPoint
; 00000001400010B9 | 0FB60402                 | movzx eax,byte ptr ds:[rdx+rax]                               |
; 00000001400010BD | 48:8905 0C330000         | mov qword ptr ds:[1400043D0],rax                              |

Code: Select all

; Test(*MemoryID, Index)
*MemoryID=Ascii("ABC Store this string in the memory area")
Index=1
Define r.q
EnableASM 
mov rdx,[v_Index]
mov rax,[p_MemoryID]
movzx eax, byte [rdx+rax]
mov [v_r],rax
DisableASM
Debug r

CompilerIf #PB_Compiler_Backend<>#PB_Backend_Asm 
  CompilerError "Please switch Copmpiler to ASM Backend."
CompilerEndIf