Speed of ASM Instructions ...(comparison of different implementations)

Just starting out? Need help? Post your questions and find answers here.
Axolotl
Addict
Addict
Posts: 881
Joined: Wed Dec 31, 2008 3:36 pm

Speed of ASM Instructions ...(comparison of different implementations)

Post by Axolotl »

Hi folks,
I am doing some memory analysing stuff and was wondering about the best implementation.

Code: Select all

; === Test Program -- Compare implementations === 
Structure TByteArray  ; Memory Access by *Memory\Byte[Index] 
  Byte.a[0] 
EndStructure 

Procedure Test_1(*M, Index) 
  Protected r 
  r = PeekA(*M + Index) 
  Debug "1:" + r + " | " + Index 
EndProcedure  

Procedure Test_2(*Mem.TByteArray, Index) 
  Protected r 
  r = *Mem\Byte[Index] 
  Debug "2:" + r + " | " + Index 
EndProcedure  

  *MemoryID = AllocateMemory(500)
  If *MemoryID
    Debug "Starting address of the 500 Byte memory area:"
    PokeS(*MemoryID, "ABC Store this string in the memory area", -1, #PB_Ascii) 
    Test_1(*MemoryID, 1) 
    Test_2(*MemoryID, 1) 
    FreeMemory(*MemoryID)  ; will also be done automatically at the end of program
  Else
    Debug "Couldn't allocate the requested memory!"
  EndIf
This is what the ASM output shows: (I have prepared the part which is in my opinion the most important)

Code: Select all

  ; === Test 2 =========================================== Test 1 ===============================
  ; 
  ; Structure TByteArray  
  ;   Byte.a[0] 
  ; EndStructure 
  ; 

  ; Procedure Test_2(*Mem.TByteArray, Index)              ; Procedure Test_1(*M, Index) 
  _Procedure2:                                            _Procedure0:  
    MOV    qword [rsp+8],rcx                                MOV    qword [rsp+8],rcx  
    MOV    qword [rsp+16],rdx                               MOV    qword [rsp+16],rdx  
    PUSH   rbp                                              PUSH   r15  
    PS2=64                                                  PS0=64  
    XOR    rax,rax                                          XOR    rax,rax  
    PUSH   rax                                              PUSH   rax  
    SUB    rsp,40                                           SUB    rsp,40  
  ; Protected r                                           ; Protected r   
  ; r = *Mem\Byte[Index]                                  ; r = PeekA(*M + Index)   
    MOV    rbp,qword [rsp+PS2+0]                            MOV    r15,qword [rsp+PS0+0]  
    PUSH   rbp                                              ADD    r15,qword [rsp+PS0+8]  
    MOV    rax,qword [rsp+PS2+16]                           MOV    rax,r15  
    POP    rbp                                              PUSH   rax  
    ADD    rbp,rax                                          POP    rcx  
    MOVZX  rax,byte [rbp]                                   CALL   PB_PeekA  
    PUSH   rax                             
    POP    rax                             
    MOV    qword [rsp+40],rax                               MOV    qword [rsp+40],rax  
  ;                                                       ;   
  ; Debug "2:" + r + " | " + Index                        ; Debug "1:" + r + " | " + Index   
  ; EndProcedure                                          ; EndProcedure    

I would be interested in your opinion as ASM experts on this?
TIA.
Happy coding and stay healthy.
Just because it worked doesn't mean it works.
PureBasic 6.04 (x86) and <latest stable version and current alpha/beta> (x64) on Windows 11 Home. Now started with Linux (VM: Ubuntu 22.04).
User avatar
STARGÅTE
Addict
Addict
Posts: 2261
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Speed of ASM Instructions ...(comparison of different implementations)

Post by STARGÅTE »

What is the question?
What is faster/better, Test_1 vs. Test_2?
Definitely Test_2 with the byte array, because you have no additional call instruction for PB_PeekA with jumps and you have to include the ASM output for PB_PeekA as well.
However, as you can see, the PB ASM output is not optimized, because of senseless lines like:

Code: Select all

    PUSH   rax                             
    POP    rax 
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
Axolotl
Addict
Addict
Posts: 881
Joined: Wed Dec 31, 2008 3:36 pm

Re: Speed of ASM Instructions ...(comparison of different implementations)

Post by Axolotl »

yes, you have extracted my question correctly...
sorry for writing so imprecisely.

I also suspected that the array could be faster.
Now I know.
thanks for your quick answer.

BTW: normally i don't dive that deep into assember code, but if you can.... :oops:
Just because it worked doesn't mean it works.
PureBasic 6.04 (x86) and <latest stable version and current alpha/beta> (x64) on Windows 11 Home. Now started with Linux (VM: Ubuntu 22.04).
User avatar
idle
Always Here
Always Here
Posts: 6055
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Speed of ASM Instructions ...(comparison of different implementations)

Post by idle »

In General if you want to improve performance you just need to minimize branching and memory fetches
Also while loops are faster than for loops in PB and if you're processing strings, pass a string in by reference

This will compile to near optimal with C backend

Code: Select all

Structure TByteArray  ; Memory Access by *Memory\Byte[Index] 
  Byte.a[0] 
EndStructure 

*MemoryID.TByteArray = Ascii("ABC Store this string in the memory area") 
ct = 0 
char.a 

While *MemoryID\Byte[ct] 
  char = *MemoryID\Byte[ct] 
  debug chr(char) 
  ct+1 
Wend 

User avatar
mk-soft
Always Here
Always Here
Posts: 6346
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Speed of ASM Instructions ...(comparison of different implementations)

Post by mk-soft »

Code: Select all


Structure ArrayOfAscII
  Char.a[0]
EndStructure

Structure ArrayOfUnicode
  Char.u[0]
EndStructure

Define Text.s
Define *pText.ArrayOfUnicode, *pAscII.ArrayOfAscII
Define index, char

Text = "ABC Store this string in the memory area"

*pText = @Text
*pAscII = Ascii(Text) 

Debug "*** Unicode ***"
index = 0
While *pText\Char[index] 
  char = *pText\Char[index] 
  Debug RSet(Hex(char, #PB_Unicode), 4, "0") + ": " + Chr(char) 
  index + 1
Wend 

Debug "*** AscII ***"
index = 0
While *pAscII\Char[index] 
  char = *pText\Char[index] 
  Debug RSet(Hex(char, #PB_Ascii), 2, "0") + ": " + Chr(char) 
  index + 1
Wend

FreeMemory(*pAscII)
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
mk-soft
Always Here
Always Here
Posts: 6346
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Speed of ASM Instructions ...(comparison of different implementations)

Post by mk-soft »

Code: Select all


Structure ArrayOfAscII
  Char.a[0]
EndStructure

Structure ArrayOfUnicode
  Char.u[0]
EndStructure

Define Text.s
Define *pText.ArrayOfUnicode, *pAscII.ArrayOfAscII
Define index, char

Text = "ABC Store this string in the memory area"

*pText = @Text
*pAscII = Ascii(Text) 

Debug "*** Unicode ***"
index = 0
While *pText\Char[index] 
  char = *pText\Char[index] 
  Debug RSet(Hex(char, #PB_Unicode), 4, "0") + ": " + Chr(char) 
  index + 1
Wend 

Debug "*** AscII ***"
index = 0
While *pAscII\Char[index] 
  char = *pText\Char[index] 
  Debug RSet(Hex(char, #PB_Ascii), 2, "0") + ": " + Chr(char) 
  index + 1
Wend

FreeMemory(*pAscII)

Debug "*** String Parameter ByRef / Need always a variable 'strVal' of type string where store pointer to string ***"

Procedure Upper(*string.string)
  *string\s = "+++ " + UCase(*string\s) + " +++" 
  ProcedureReturn Len(*string\s)
EndProcedure

Define len, strVal.String

strVal\s = "hello world"
Debug "Addr to string: " + @strVal\s
len = Upper(strVal)
Debug "Len = " + len + " / " + strVal\s
Debug "Addr to string: " + @strVal\s
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
juergenkulow
Enthusiast
Enthusiast
Posts: 581
Joined: Wed Sep 25, 2019 10:18 am

Re: Speed of ASM Instructions ...(comparison of different implementations)

Post by juergenkulow »

Code: Select all

; movzx eax, byte [rdx+rax] 
Structure TByteArray  
  Byte.a[0] 
EndStructure 

*MemoryID.TByteArray=Ascii("ABC Store this string in the memory area")
! asm("nop"); 
Index=1
r1=PeekA(*MemoryID + Index)
r2=*MemoryID\Byte[Index]
! asm("nop"); 
Debug r1
Debug r2
CompilerIf #PB_Compiler_Backend<>#PB_Backend_C Or #PB_Compiler_Optimizer=0
  CompilerError "Please switch Copmpiler to opimized C Backend."
CompilerEndIf 
; 66
; 66

; 0000000140001086 | 48:C705 2F330000 0100000 | mov qword ptr ds:[1400043C0],1                                |
; 0000000140001091 | 48:8B0D 28330000         | mov rcx,qword ptr ds:[1400043C0]                              |
; 0000000140001098 | 48:030D 29330000         | add rcx,qword ptr ds:[1400043C8]                              |
; 000000014000109F | E8 9C040000              | call memtest2.140001540                                       |
; 00000001400010A4 | 48:8905 2D330000         | mov qword ptr ds:[1400043D8],rax                              |
; 00000001400010AB | 48:8B05 0E330000         | mov rax,qword ptr ds:[1400043C0]                              |
; 00000001400010B2 | 48:8B15 0F330000         | mov rdx,qword ptr ds:[1400043C8]                              | rdx:EntryPoint
; 00000001400010B9 | 0FB60402                 | movzx eax,byte ptr ds:[rdx+rax]                               |
; 00000001400010BD | 48:8905 0C330000         | mov qword ptr ds:[1400043D0],rax                              |

Code: Select all

; Test(*MemoryID, Index)
*MemoryID=Ascii("ABC Store this string in the memory area")
Index=1
Define r.q
EnableASM 
mov rdx,[v_Index]
mov rax,[p_MemoryID]
movzx eax, byte [rdx+rax]
mov [v_r],rax
DisableASM
Debug r

CompilerIf #PB_Compiler_Backend<>#PB_Backend_Asm 
  CompilerError "Please switch Copmpiler to ASM Backend."
CompilerEndIf 
Please ask your questions, because switch on the cognition apparatus decides on the only known life in the universe.Wersten :DDüsseldorf NRW Germany Europe Earth Solar System Flake Bubble Orionarm
Milky Way Local_Group Virgo Supercluster Laniakea Universe
Post Reply