Page 1 of 2
Fast string building
Posted: Wed Jun 15, 2011 8:22 am
by MachineCode
There used to be a tip in these forums on how to build a large string really fast, much faster than PureBasic's native way. But I can't find it now. Anyone know where it is?
The reason is because my app builds an array of filenames (around 60000) of them, and then I want to convert that array into a single string, with a #CRLF$ at the end of each filename. So, I'm using the following simple loop right now, but with 60000 files it's taking over 3 minutes!

I'd love to know a faster way, or to find that old tip that built strings really fast.
Code: Select all
For n=1 To numfiles
s$+file$(n)+#CRLF$
Next
Re: Fast string building
Posted: Wed Jun 15, 2011 8:42 am
by Shield
Have a search for "StringBuilder".

Re: Fast string building
Posted: Wed Jun 15, 2011 9:39 am
by eesau
Was the 3 minutes with debugger on?
Re: Fast string building
Posted: Wed Jun 15, 2011 10:18 am
by Shield
Doesn't really matter since PB's built-in string functions will always be waaaay slower for tasks
that require concatenating a very large number of strings. The reason for this is that PB allocates memory
over and over again after every iteration to hold the new string. String Builder functions on the other hand
store each line of the string as an array item and copy them together in a big memory block capable of holding the entire string
after the building process is complete (of course there are also other ways and techniques).

Re: Fast string building
Posted: Wed Jun 15, 2011 10:39 am
by MachineCode
I was searching for "fast string building" and so on. No wonder I never found it. And yes, the 3+ minutes was the final executable with NO debugger enabled. Anyway, I saw Fred's tip about using CopyMemoryString(), and it worked! What used to take
over 3 minutes is now done in just
62 ms! Yes,
MILLISECONDS!

Below how I adapted Fred's tip.
[Edited to fix an illegal memory access, and to loop from "2 to n", and to use MessageRequesters for a compiled exe].
Code: Select all
n=10000
Dim a$(n)
DisableDebugger ; This is ignored when building a compiled exe.
For i=1 To n
a$(i)="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
s+Len(a$(i)) ; "s" holds the total size of the array.
Next
t=GetTickCount_()
For i=1 To n : s$+a$(i) : Next
MessageRequester("Slow",Str(GetTickCount_()-t)) ; 12085 ms on my PC. 12 seconds!
t=GetTickCount_()
m=AllocateMemory(s+1) : p=m : CopyMemoryString(a$(1),@p) : For i=2 To n : CopyMemoryString(a$(i)) : Next
MessageRequester("Fast",Str(GetTickCount_()-t)) ; 0 ms on my PC. 0 seconds!
FreeMemory(m)
Re: Fast string building
Posted: Wed Jun 15, 2011 10:59 am
by kinglestat
I created a free library for fast strings...called cieve; concat of teh cieve lib is about 10 times faster than pb and on average all cieve strng routines perform 5 times faster in ascii and 8 times faster with unicode
cheers
Re: Fast string building
Posted: Wed Jun 15, 2011 11:28 am
by Little John
MachineCode wrote:I saw Fred's tip about using CopyMemoryString(), and it worked! What used to take
over 3 minutes is now done in just
62 ms! Yes,
MILLISECONDS! 
Could you please post the link to that tip? That would be interesting for me (and probably for some other people).
TIA, Little John
Re: Fast string building
Posted: Wed Jun 15, 2011 12:00 pm
by MachineCode
Fred's tip wasn't code, it was a verbal tip to use CopyMemoryString(). So I did, and came up with the above code.
Re: Fast string building
Posted: Wed Jun 15, 2011 2:58 pm
by Little John
I see. Thanks.
Re: Fast string building
Posted: Wed Jun 15, 2011 3:07 pm
by MachineCode
Re: Fast string building
Posted: Wed Jun 15, 2011 7:24 pm
by luis
MachineCode wrote:
Below how I adapted Fred's tip. One question though: if n=1000 at the start, instead of n=10000, the FreeMemory(m) command crashed with an invalid memory access. Why?
Probably you need to add a +1 in when allocating the memory area.
Uhmm... if I understand the docs, this would be better performance-wise:
Code: Select all
CopyMemoryString(a$(1),@p)
For i=2 To n
CopyMemoryString(a$(i))
Next
... one less param pushed on the stack after the first call
Obviously it has almost zero impact, but anyway...
Re: Fast string building
Posted: Wed Jun 15, 2011 7:41 pm
by skywalk
Also machinecode, you cannot report times with Debug.
I put this in a Template to insert whenever I need it.
Code: Select all
CompilerIf #PB_Compiler_Debugger
DisableDebugger
CompilerEndIf
#Tries = 10
Define.i i,t1,t2
t1 = ElapsedMilliseconds()
For i = 0 To #Tries
;Do Stuff Here...
Next i
t2 = ElapsedMilliseconds()-t1
MessageRequester("SpeedTest", "MyFunction = " + str(t2))
Re: Fast string building
Posted: Wed Jun 15, 2011 8:26 pm
by Trond
DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Re: Fast string building
Posted: Wed Jun 15, 2011 9:05 pm
by skywalk
Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you.
Physically turning off the Debugger on the GUI bought me another 15 ms.
I was thinking to concatenate strings in memory, but there is not much overhead doing it with PB arrays and then using CopyMemoryString() inside a Join() procedure. That leaves most of the memory management to PB.
Side note:
VB6 native string concatenation(&) is actually faster than PB. But, CopyMemoryString() blows it away and is easy to use.
Code: Select all
EnableExplicit
Procedure.s Join(Array sA.s(1), Delm$=#NULL$)
; REV: 110301, skywalk
; String Concatenate Speed Test w/DisableDebugger while Debugger ON
; Str + Str => 19234ms / 32000 Calls, StrSize = 160000
; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
; Join() => 47ms / 32000 Calls, StrSize = 160000
; String Concatenate Speed Test w/DisableDebugger while Debugger OFF
; Str + Str => 18938ms / 32000 Calls, StrSize = 160000
; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
; Join() => 32ms / 32000 Calls, StrSize = 160000
Protected.i i, k, *p, *buf, memlen
Protected.s r$
k = ArraySize(sA())
For i = 0 To k
memlen + MemoryStringLength(@sA(i)) + SizeOf(Character) ; account for #Null$
Next i
If Delm$
memlen + k * MemoryStringLength(@Delm$) + SizeOf(Character) ; Add room for Delimiters
EndIf
;If memlen < 1024: memlen = 1024: EndIf
*buf = AllocateMemory(memlen + 1)
If *buf
*p = *buf ; Create tracking pointer for concatenating memory
If memlen <= MemorySize(*p) ; Verify enough memory created
If Delm$
CopyMemoryString(@sA(0), @*p)
CopyMemoryString(@Delm$)
For i = 1 To k
CopyMemoryString(@sA(i))
CopyMemoryString(@Delm$)
Next i
Else
CopyMemoryString(@sA(0), @*p)
For i = 1 To k
CopyMemoryString(@sA(i))
Next i
EndIf
r$ = PeekS(*buf)
FreeMemory(*buf)
EndIf
EndIf
ProcedureReturn r$
EndProcedure
CompilerIf #PB_Compiler_Debugger
DisableDebugger ; Slightly faster if also disabled on GUI.
CompilerEndIf
#Tries = 32000
Define.i i,t1
Define.s S
Define.s STRPlusSTR$, Join$, CopyMemStr$
#StrToUse$ = "XYZ"
;- STRING+STRING
t1 = ElapsedMilliseconds()
For i = 0 To #Tries-1
S + #StrToUse$ + #CRLF$
Next
t1 = ElapsedMilliseconds()-t1
STRPlusSTR$ = "Str + Str => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$
;- COPYMEMORYSTRING
t1 = ElapsedMilliseconds()
Define.i *p, *buf, memlen
S = #StrToUse$ + #CRLF$
*buf = AllocateMemory(#Tries*MemoryStringLength(@S)+SizeOf(Character))
*p = *Buf
memlen = MemorySize(*p)
If memlen
S = #StrToUse$ + #CRLF$
CopyMemoryString(@S, @*p)
For i = 1 To #Tries-1
S = #StrToUse$ + #CRLF$
CopyMemoryString(@S)
Next
S = PeekS(*buf)
FreeMemory(*buf)
EndIf
t1 = ElapsedMilliseconds()-t1
CopyMemStr$ = "CopyMemoryString() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$
;- JOIN
t1 = ElapsedMilliseconds()
; Create array of Strings
Dim sA.s(#Tries-1)
For i = 0 To #Tries-1
sA(i) = #StrToUse$ + #CRLF$
Next i
S = Join(sA())
t1 = ElapsedMilliseconds()-t1
Join$ = "Join() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S))
S = #NULL$
MessageRequester("String Concatenate Speed Test",STRPlusSTR$+CopyMemStr$+Join$)
Edited: Fixed Join() IMA if appending with delimiters. Allocated Memory was short by a #Null$.
Re: Fast string building
Posted: Wed Jun 15, 2011 9:36 pm
by luis
Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Optimizer ? The PB compiler has an optimizer ?
What I thought is that when the debugger is enabled the exe is created with the debug version of the libraries (with sanity checks and so on) and with the debugger overseeing the execution in some way for the breakpoints, purifier checks etc.
What is this "optimizer" thing ? Are you telling me the generated assembly code is different ? I don't think so...
At least that's the general idea of an optimizer as I know it ... (rearranging code output to obtain more compact code, or faster code, etc.).