Faster Mid() with asm

Share your advanced PureBasic knowledge/code with the community.
technicorn
Enthusiast
Enthusiast
Posts: 105
Joined: Wed Jan 18, 2006 7:40 pm
Location: Hamburg

Faster Mid() with asm

Post by technicorn »

Here's a replacement routine for Mid() that is faster than the build in function:

Code: Select all

! macro EndMacro {}

; Assembler macro to align code on a given boundary,
; aligning loop starts on 8 byte boundaries can speed up the code
; by about 1.5 to 2 times!

! macro calignjmp value
! {
!   local dest
!
!   if ((value - 1) - ((($ - $$) + value - 1) mod value)) > 3
!     makeDest equ 1
!     jmp dest
!   else
!     makeDest equ 0
!   end if
!
!   rept value
!   \{
!     if ($ - $$) mod value
!       nop
!     end if
!   \}
!   dest:
! } EndMacro

; Uncommend the next 3 lines, if you want to replace
; Mid with FMid in all places:
; Macro Mid(string, start, length)
;   FMid(@string,start,length)
; EndMacro

CompilerIf #PB_Compiler_Unicode
Procedure.s FMid(*srcPtr.Word, start.l, length.l)
  !MOV   edx,[p.p_srcPtr]
  !MOV   ecx,[p.v_length]
  !TEST  edx,edx
  !JZ    l_fmid_empty
  !TEST  ecx,ecx
  !JLE   l_fmid_empty
  !MOV   ecx,[p.v_start]
  !CMP   ecx,1
  !JLE   l_fmid_return
  !XOR   eax,eax
  !DEC   ecx
  !calignjmp 4
  fmid_scan:
  !CMP   [edx],ax
  !JE    l_fmid_scanend
  !ADD   edx,2
  !DEC   ecx
  !JNZ    l_fmid_scan
  fmid_scanend:
  !MOV   [p.p_srcPtr],edx
  fmid_return:
  ProcedureReturn PeekS(*srcPtr, length)
  fmid_empty:
  ProcedureReturn ""
EndProcedure
CompilerElse
Procedure.s FMid(*srcPtr.Byte, start.l, length.l)
  !MOV   edx,[p.p_srcPtr]
  !MOV   ecx,[p.v_length]
  !TEST  edx,edx
  !JZ    l_fmid_empty
  !TEST  ecx,ecx
  !JLE   l_fmid_empty
  !MOV   ecx,[p.v_start]
  !CMP   ecx,1
  !JLE   l_fmid_return
  !XOR   eax,eax
  !DEC   ecx
  !calignjmp 4
  fmid_scan:
  !CMP   [edx],al
  !JE    l_fmid_scanend
  !INC   edx
  !DEC   ecx
  !JNZ    l_fmid_scan
  fmid_scanend:
  !MOV   [p.p_srcPtr],edx
  fmid_return:
  ProcedureReturn PeekS(*srcPtr, length)
  fmid_empty:
  ProcedureReturn ""
EndProcedure
CompilerEndIf
And a program for speed testing:

Code: Select all

EnableExplicit
;IncludePath #PBIncludePath
XIncludeFile "FMid.pbi"

OpenConsole()

Global tStart.l, tStart2.l, tDiff.d
Global str1.s, str1Pos.l, str1Len.l = 16384, str1Mid.s
Global i.l, iEnd.l, strCh.l

strCh = 1
For i = 1 To str1Len
  str1 + Chr(strCh)
  strCh + 1
  If strCh > 255: strCh = 1: EndIf
Next i

iEnd = 10000000
str1Pos = 1

While str1Pos <= str1Len
  tStart2 = ElapsedMilliseconds(): Repeat: tStart = ElapsedMilliseconds(): Until tStart <> tStart2
  For i = 1 To iEnd
    str1Mid = Mid(str1, str1Pos, 1)
  Next i
  tDiff = (ElapsedMilliseconds() - tStart) / 1000.0
  If tDiff >= 1: iEnd / tDiff: EndIf
  PrintN(Right("      " + Str(str1Pos), 7) + ": " + StrD(tDiff, 3) + " sec., Ops per sec.: " + StrD(iEnd / tDiff, 3) + "    " + Str(Asc(str1Mid)))
  str1Pos + str1Pos
Wend
PrintN("")

iEnd = 1000000
str1Pos = 1
While str1Pos <= str1Len
  tStart2 = ElapsedMilliseconds(): Repeat: tStart = ElapsedMilliseconds(): Until tStart <> tStart2
  For i = 1 To iEnd
    str1Mid = FMid(@str1, str1Pos, 1)
  Next i
  tDiff = (ElapsedMilliseconds() - tStart) / 1000.0
  If str1Mid <> Mid(str1, str1Pos, 1): PrintN("Error"): EndIf
  If tDiff >= 1: iEnd / tDiff: EndIf
  PrintN(Right("      " + Str(str1Pos), 7) + ": " + StrD(tDiff, 3) + " sec., Ops per sec.: " + StrD(iEnd / tDiff, 3) + "    " + Str(Asc(str1Mid)))
  str1Pos + str1Pos
Wend
PrintN("")

PrintN(#CRLF$ + "Done")
While Inkey() = "": Delay(100): Wend
The PB Mid() is especial slow on extracting parts at the end of long strings,
but FMid() is even faster on extracting from the start of a string,
for longer string the speed gain is between 2-3 times for positions above
64 for ascii strings, and about 1.5 - 2 for unicode strings.
thamarok
Enthusiast
Enthusiast
Posts: 282
Joined: Wed Sep 06, 2006 1:37 pm

Post by thamarok »

Respect!

This is very useful for me. May I use parts of your sourcecode for AromaBasic? AromaBasic should work like PureBasic, first preprocess the source to ASM and then compile to EXE through FASM. And as this shows a faster Mid() in action, I thought it would be very useful for me since I have not had the time to implement my own.

Thank you!
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

I'm really excited to try this one out but I'm getting an error...

Code: Select all

PureBasic.asm [108]:
   \{
error:  extra characters on line.
With or without InlineASM enabled. Help! ^_^
thamarok
Enthusiast
Enthusiast
Posts: 282
Joined: Wed Sep 06, 2006 1:37 pm

Post by thamarok »

Xombie wrote:I'm really excited to try this one out but I'm getting an error...

Code: Select all

PureBasic.asm [108]:
   \{
error:  extra characters on line.
With or without InlineASM enabled. Help! ^_^
I was so aborbed in the subject of this topic that I forgot to test the code :lol:
But yes, I seem to have the same error...
mskuma
Enthusiast
Enthusiast
Posts: 573
Joined: Sat Dec 03, 2005 1:31 am
Location: Australia

Post by mskuma »

Thanks alot, but I get the same error as Xombie (except mine's on 82 instead of 108).
technicorn
Enthusiast
Enthusiast
Posts: 105
Joined: Wed Jan 18, 2006 7:40 pm
Location: Hamburg

Post by technicorn »

That seems to be a problem with the FAsm.exe version,
I've installed V1.64.

Please donload the newer version from:
Flat assembler home

Before replacing the Fasm please make a backup of the original on,
although I use it all the time and never had any trouble with the newer
version.

And the source and any part of it is freeware, so have fun with it :D

Especial the calignjmp macro can be handy, unfortunately, it's of no use
before a basic statement like For, thats because of the way the assem code is generated by PB.
But you could use in with /COMMENTED, although reassembling didn't seem to work right now,
tried it and got some unresolved externals :?

I've tested the code on a AMD Athlon XP under Windows XP home.
technicorn
Enthusiast
Enthusiast
Posts: 105
Joined: Wed Jan 18, 2006 7:40 pm
Location: Hamburg

Post by technicorn »

Did some loop unrolling, no its about 4-5 times faster than PB Mid(),
and the unicode version is nearly as fast as the ascii one:

Code: Select all

! macro EndMacro {} 

; Assembler macro to align code on a given boundary, 
; aligning loop starts on 8 byte boundaries can speed up the code 
; by about 1.5 to 2 times! 

! macro calignjmp value 
! { 
!   local dest 
! 
!   if ((value - 1) - ((($ - $$) + value - 1) mod value)) > 3 
!     makeDest equ 1 
!     jmp dest 
!   else 
!     makeDest equ 0 
!   end if 
! 
!   rept value 
!   \{ 
!     if ($ - $$) mod value 
!       nop 
!     end if 
!   \} 
!   dest: 
! } EndMacro 

; Uncommend the next 3 lines, if you want to replace 
; Mid with FMid in all places: 
; Macro Mid(string, start, length) 
;   FMid2(@string,start,length) 
; EndMacro 

CompilerIf #PB_Compiler_Unicode
Procedure.s FMid2(*srcPtr.Byte, start.l, length.l) 
  !MOV    edx,[p.p_srcPtr] 
  !MOV    ecx,[p.v_length] 
  !TEST   edx,edx 
  !JZ     l_fmid2_empty 
  !TEST   ecx,ecx 
  !JLE    l_fmid2_empty 
  !MOV    ecx,[p.v_start] 
  !CMP    ecx,1 
  !JLE    l_fmid2_return 
  !XOR    eax,eax 
  !DEC    ecx 
  !calignjmp 4 
  fmid2_scan1:
  !SUB    ecx,8
  !JL     l_fmid2_scanend1
  !CMP    [edx],ax
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !CMP    [edx],ax 
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !CMP    [edx],ax 
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !CMP    [edx],ax 
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !CMP    [edx],ax 
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !CMP    [edx],ax 
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !CMP    [edx],ax 
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !CMP    [edx],ax 
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !JMP    l_fmid2_scan1

  fmid2_scanend1:
  !ADD    ecx,8
  !JZ     l_fmid2_scanend2
  fmid2_scan2:
  !CMP    [edx],ax
  !JE     l_fmid2_scanend2
  !ADD    edx,2
  !DEC    ecx
  !JNZ    l_fmid2_scan2
  fmid2_scanend2: 
  !MOV   [p.p_srcPtr],edx 
  fmid2_return: 
  ProcedureReturn PeekS(*srcPtr, length) 
  fmid2_empty: 
  ProcedureReturn "" 
EndProcedure 
CompilerElse
Procedure.s FMid2(*srcPtr.Byte, start.l, length.l) 
  !MOV    edx,[p.p_srcPtr] 
  !MOV    ecx,[p.v_length] 
  !TEST   edx,edx 
  !JZ     l_fmid2_empty 
  !TEST   ecx,ecx 
  !JLE    l_fmid2_empty 
  !MOV    ecx,[p.v_start] 
  !CMP    ecx,1 
  !JLE    l_fmid2_return 
  !XOR    eax,eax 
  !DEC    ecx 
  !calignjmp 4 
  fmid2_scan1:
  !SUB    ecx,8
  !JL     l_fmid2_scanend1
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !JMP    l_fmid2_scan1

  fmid2_scanend1:
  !ADD    ecx,8
  !JZ     l_fmid2_scanend2
  fmid2_scan2:
  !CMP    [edx],al 
  !JE     l_fmid2_scanend2
  !INC    edx 
  !DEC    ecx 
  !JNZ    l_fmid2_scan2
  fmid2_scanend2: 
  !MOV   [p.p_srcPtr],edx 
  fmid2_return: 
  ProcedureReturn PeekS(*srcPtr, length) 
  fmid2_empty: 
  ProcedureReturn "" 
EndProcedure 
CompilerEndIf
And a test program to extract random start and length strings, also
with negatice values, to see that it works and don't crashes:

Code: Select all

EnableExplicit
;IncludePath #PBIncludePath
XIncludeFile "FMid_DL.pbi"

OpenConsole()

Global tStart.l, tStart2.l, tDiff.d
Global str1.s, str1Pos.l, str1Len.l = 4096, str1Mid.s, str1FMid.s
Global i.l, iEnd.l, strCh.l
Global start.l, length.l

strCh = 1
For i = 1 To str1Len
  str1 + Chr(strCh)
  strCh + 1
  If strCh > 255: strCh = 1: EndIf
Next i

iEnd = 10000
For i = 1 To iEnd
  start = Random(10000)-5000
  length = Random(10000)-5000
  str1Mid = Mid(str1, start, length)
  str1FMid = FMid(@str1, start, length)
  If str1Mid <> str1FMid
    PrintN("Error FMid: " + Str(start) + ", " + Str(length))
    CallDebugger
  EndIf
Next i

iEnd = 10000
For i = 1 To iEnd
  start = Random(10000)-5000
  length = Random(10000)-5000
  str1Mid = Mid(str1, start, length)
  str1FMid = FMid2(@str1, start, length)
  If str1Mid <> str1FMid
    PrintN("Error FMid2: " + Str(start) + ", " + Str(length))
    CallDebugger
  EndIf
Next i

PrintN(#CRLF$ + "Done")
While Inkey() = "": Delay(100): Wend
Because the procedure is only included once, it's worth the memory I think.
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

Great! I can't wait to test this out and use it :)

What do you use for reference/study to learn ASM? I'm very impressed and have always wanted to learn more ASM.
technicorn
Enthusiast
Enthusiast
Posts: 105
Joined: Wed Jan 18, 2006 7:40 pm
Location: Hamburg

Post by technicorn »

Hi Xombie,

learning assembler is quite easy today I think.
For example using PB with inline asm enabled, you can step through all instructions
and watch the register values and program flow.
Beside that have a look at http://www.agner.org/optimize/

And you will need the Intel and/or AMD documentation for the CPU,
and don't get intimidated by the number of pages of thies manuals,
togehter it's about 3000!
But you will mainly need the section about application programming.
Also the optimization docs by Intel and AMD are worth reading.


To all using FMid, please change the "calignjmp 4" -> "calignjmp 8",
uploaded a test version by mistake.
SoulReaper
Enthusiast
Enthusiast
Posts: 372
Joined: Sun Apr 03, 2005 2:14 am
Location: England

Post by SoulReaper »

Nice to see Assembler :)

Reminds me of the Z80/6809E/68000 Assembler days Respect :!: :roll: :wink:

Regards
Kevin :) :wink:
Amundo
Enthusiast
Enthusiast
Posts: 200
Joined: Thu Feb 16, 2006 1:41 am
Location: New Zealand

Post by Amundo »

Reminds me of the Z80/6809E/68000 Assembler days
Ahh, the 6809....there never was a better hybrid of 8 and 16-bit microprocessor technology....there'll never be a better instruction set...
Win10, PB6.x, okayish CPU, onboard video card, fuzzy monitor (or is that my eyesight?)
"When the facts change, I change my mind" - John Maynard Keynes
User avatar
Joakim Christiansen
Addict
Addict
Posts: 2452
Joined: Wed Dec 22, 2004 4:12 pm
Location: Norway
Contact:

Post by Joakim Christiansen »

Hey Fred, we wont allow this! (people making faster functions than the inbuilt) :P
I like logic, hence I dislike humans but love computers.
dracflamloc
Addict
Addict
Posts: 1648
Joined: Mon Sep 20, 2004 3:52 pm
Contact:

Post by dracflamloc »

On the contrary, its great to speed up processing of DracScript!

Now if someone can convert standard pb functions and parameters which internally use this fast method it'd make life and integration much better!
JCV
Enthusiast
Enthusiast
Posts: 580
Joined: Fri Jun 30, 2006 4:30 pm
Location: Philippines

Post by JCV »

nice! :D thx
technicorn
Enthusiast
Enthusiast
Posts: 105
Joined: Wed Jan 18, 2006 7:40 pm
Location: Hamburg

Post by technicorn »

@dracflamloc:

There are three lines in the sourcecode you can use to make PB use FMid as if it is a normal Mid in all places:

Code: Select all

; Uncommend the next 3 lines, if you want to replace 
; Mid with FMid in all places: 
; Macro Mid(string, start, length) 
;   FMid2(@string,start,length) 
; EndMacro 
so you don't have to change your code to pass string pointers to FMid.
Post Reply