Fast string building

Just starting out? Need help? Post your questions and find answers here.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Fast string building

Post by MachineCode »

There used to be a tip in these forums on how to build a large string really fast, much faster than PureBasic's native way. But I can't find it now. Anyone know where it is?

The reason is because my app builds an array of filenames (around 60000) of them, and then I want to convert that array into a single string, with a #CRLF$ at the end of each filename. So, I'm using the following simple loop right now, but with 60000 files it's taking over 3 minutes! :shock: I'd love to know a faster way, or to find that old tip that built strings really fast.

Code: Select all

For n=1 To numfiles
  s$+file$(n)+#CRLF$
Next
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
Shield
Addict
Addict
Posts: 1021
Joined: Fri Jan 21, 2011 8:25 am
Location: 'stralia!
Contact:

Re: Fast string building

Post by Shield »

Have a search for "StringBuilder". :)
Image
Blog: Why Does It Suck? (http://whydoesitsuck.com/)
"You can disagree with me as much as you want, but during this talk, by definition, anybody who disagrees is stupid and ugly."
- Linus Torvalds
eesau
Enthusiast
Enthusiast
Posts: 589
Joined: Fri Apr 27, 2007 12:38 pm
Location: Finland

Re: Fast string building

Post by eesau »

Was the 3 minutes with debugger on?
User avatar
Shield
Addict
Addict
Posts: 1021
Joined: Fri Jan 21, 2011 8:25 am
Location: 'stralia!
Contact:

Re: Fast string building

Post by Shield »

Doesn't really matter since PB's built-in string functions will always be waaaay slower for tasks
that require concatenating a very large number of strings. The reason for this is that PB allocates memory
over and over again after every iteration to hold the new string. String Builder functions on the other hand
store each line of the string as an array item and copy them together in a big memory block capable of holding the entire string
after the building process is complete (of course there are also other ways and techniques). :)
Image
Blog: Why Does It Suck? (http://whydoesitsuck.com/)
"You can disagree with me as much as you want, but during this talk, by definition, anybody who disagrees is stupid and ugly."
- Linus Torvalds
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fast string building

Post by MachineCode »

I was searching for "fast string building" and so on. No wonder I never found it. And yes, the 3+ minutes was the final executable with NO debugger enabled. Anyway, I saw Fred's tip about using CopyMemoryString(), and it worked! What used to take over 3 minutes is now done in just 62 ms! Yes, MILLISECONDS! :shock: Below how I adapted Fred's tip.

[Edited to fix an illegal memory access, and to loop from "2 to n", and to use MessageRequesters for a compiled exe].

Code: Select all

n=10000
Dim a$(n)

DisableDebugger ; This is ignored when building a compiled exe.

For i=1 To n
  a$(i)="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
  s+Len(a$(i)) ; "s" holds the total size of the array.
Next

t=GetTickCount_()
For i=1 To n : s$+a$(i) : Next
MessageRequester("Slow",Str(GetTickCount_()-t)) ; 12085 ms on my PC. 12 seconds!

t=GetTickCount_()
m=AllocateMemory(s+1) : p=m : CopyMemoryString(a$(1),@p) : For i=2 To n : CopyMemoryString(a$(i)) : Next
MessageRequester("Fast",Str(GetTickCount_()-t)) ; 0 ms on my PC. 0 seconds!
FreeMemory(m)
Last edited by MachineCode on Thu Jun 16, 2011 7:30 am, edited 1 time in total.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
kinglestat
Enthusiast
Enthusiast
Posts: 746
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Re: Fast string building

Post by kinglestat »

I created a free library for fast strings...called cieve; concat of teh cieve lib is about 10 times faster than pb and on average all cieve strng routines perform 5 times faster in ascii and 8 times faster with unicode

cheers
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
Little John
Addict
Addict
Posts: 4778
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Fast string building

Post by Little John »

MachineCode wrote:I saw Fred's tip about using CopyMemoryString(), and it worked! What used to take over 3 minutes is now done in just 62 ms! Yes, MILLISECONDS! :shock:
Could you please post the link to that tip? That would be interesting for me (and probably for some other people).

TIA, Little John
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fast string building

Post by MachineCode »

Fred's tip wasn't code, it was a verbal tip to use CopyMemoryString(). So I did, and came up with the above code.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
Little John
Addict
Addict
Posts: 4778
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Fast string building

Post by Little John »

I see. Thanks.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fast string building

Post by MachineCode »

Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Fast string building

Post by luis »

MachineCode wrote: Below how I adapted Fred's tip. One question though: if n=1000 at the start, instead of n=10000, the FreeMemory(m) command crashed with an invalid memory access. Why?
Probably you need to add a +1 in when allocating the memory area.


Uhmm... if I understand the docs, this would be better performance-wise:

Code: Select all

CopyMemoryString(a$(1),@p)
For i=2 To n
  CopyMemoryString(a$(i))
Next
... one less param pushed on the stack after the first call :P

Obviously it has almost zero impact, but anyway...
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Fast string building

Post by skywalk »

Also machinecode, you cannot report times with Debug.
I put this in a Template to insert whenever I need it. :wink:

Code: Select all

CompilerIf #PB_Compiler_Debugger
  DisableDebugger
CompilerEndIf
#Tries = 10
Define.i i,t1,t2
t1 = ElapsedMilliseconds()
For i = 0 To #Tries
  ;Do Stuff Here...
Next i
t2 = ElapsedMilliseconds()-t1
MessageRequester("SpeedTest", "MyFunction = " + str(t2))
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Fast string building

Post by Trond »

DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Fast string building

Post by skywalk »

Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you. :)
Physically turning off the Debugger on the GUI bought me another 15 ms.

I was thinking to concatenate strings in memory, but there is not much overhead doing it with PB arrays and then using CopyMemoryString() inside a Join() procedure. That leaves most of the memory management to PB. :wink:

Side note:
VB6 native string concatenation(&) is actually faster than PB. But, CopyMemoryString() blows it away and is easy to use.

Code: Select all

EnableExplicit

Procedure.s Join(Array sA.s(1), Delm$=#NULL$)
  ; REV:  110301, skywalk
  ; String Concatenate Speed Test w/DisableDebugger while Debugger ON
  ; Str + Str => 19234ms / 32000 Calls, StrSize = 160000
  ; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
  ; Join() => 47ms / 32000 Calls, StrSize = 160000  
  ; String Concatenate Speed Test w/DisableDebugger while Debugger OFF
  ; Str + Str => 18938ms / 32000 Calls, StrSize = 160000
  ; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
  ; Join() => 32ms / 32000 Calls, StrSize = 160000
  Protected.i i, k, *p, *buf, memlen
  Protected.s r$
  k = ArraySize(sA())
  For i = 0 To k
    memlen + MemoryStringLength(@sA(i)) + SizeOf(Character) ; account for #Null$
  Next i
  If Delm$
    memlen + k * MemoryStringLength(@Delm$) +  SizeOf(Character) ; Add room for Delimiters
  EndIf
  ;If memlen < 1024: memlen = 1024: EndIf
  *buf = AllocateMemory(memlen + 1)
  If *buf
    *p = *buf   ; Create tracking pointer for concatenating memory
    If memlen <= MemorySize(*p)  ; Verify enough memory created
      If Delm$
        CopyMemoryString(@sA(0), @*p)
        CopyMemoryString(@Delm$)
        For i = 1 To k
          CopyMemoryString(@sA(i))
          CopyMemoryString(@Delm$)
        Next i
      Else
        CopyMemoryString(@sA(0), @*p)
        For i = 1 To k
          CopyMemoryString(@sA(i))
        Next i
      EndIf
      r$ = PeekS(*buf)
      FreeMemory(*buf)
    EndIf
  EndIf
  ProcedureReturn r$
EndProcedure

CompilerIf #PB_Compiler_Debugger
  DisableDebugger ; Slightly faster if also disabled on GUI.
CompilerEndIf
#Tries = 32000
Define.i i,t1
Define.s S
Define.s STRPlusSTR$, Join$, CopyMemStr$
#StrToUse$ = "XYZ"

;- STRING+STRING
t1 = ElapsedMilliseconds()
For i = 0 To #Tries-1
  S + #StrToUse$ + #CRLF$
Next
t1 = ElapsedMilliseconds()-t1
STRPlusSTR$ = "Str + Str => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$

;- COPYMEMORYSTRING
t1 = ElapsedMilliseconds()
Define.i *p, *buf, memlen
S = #StrToUse$ + #CRLF$
*buf = AllocateMemory(#Tries*MemoryStringLength(@S)+SizeOf(Character))
*p = *Buf
memlen = MemorySize(*p)
If memlen 
  S = #StrToUse$ + #CRLF$
  CopyMemoryString(@S, @*p)
  For i = 1 To #Tries-1 
    S = #StrToUse$ + #CRLF$
    CopyMemoryString(@S)
  Next
  S = PeekS(*buf)
  FreeMemory(*buf)
EndIf
t1 = ElapsedMilliseconds()-t1
CopyMemStr$ = "CopyMemoryString() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$

;- JOIN
t1 = ElapsedMilliseconds()
; Create array of Strings
Dim sA.s(#Tries-1)
For i = 0 To #Tries-1
  sA(i) = #StrToUse$ + #CRLF$
Next i
S = Join(sA())
t1 = ElapsedMilliseconds()-t1
Join$ = "Join() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S))
S = #NULL$
MessageRequester("String Concatenate Speed Test",STRPlusSTR$+CopyMemStr$+Join$)
Edited: Fixed Join() IMA if appending with delimiters. Allocated Memory was short by a #Null$.
Last edited by skywalk on Thu Jun 16, 2011 9:03 pm, edited 1 time in total.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Fast string building

Post by luis »

Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Optimizer ? The PB compiler has an optimizer ?

What I thought is that when the debugger is enabled the exe is created with the debug version of the libraries (with sanity checks and so on) and with the debugger overseeing the execution in some way for the breakpoints, purifier checks etc.

What is this "optimizer" thing ? Are you telling me the generated assembly code is different ? I don't think so...

At least that's the general idea of an optimizer as I know it ... (rearranging code output to obtain more compact code, or faster code, etc.).
"Have you tried turning it off and on again ?"
A little PureBasic review
Post Reply