Page 2 of 2
Re: Fast string building
Posted: Wed Jun 15, 2011 9:52 pm
by Trond
skywalk wrote:Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you.
Are you sure? Where did I use DisableDebugger for this purpose?
What is this "optimizer" thing ? Are you telling me the generated assembly code is different ? I don't think so...
It's a peephole optimizer. Which means, yes the generated assembly code is different.
http://www.purebasic.fr/english/viewtop ... 09#p242209
Re: Fast string building
Posted: Wed Jun 15, 2011 10:18 pm
by luis
Trond wrote:It's a peephole optimizer. Which means, yes the generated assembly code is different.
OH ! Thanks
Uhm, I'm a little confused... purebasic.asm always contains the "release" version of this code right ? We are not able to see the debug-only not-peephole-ized asm code ... I think.
Re: Fast string building
Posted: Wed Jun 15, 2011 11:21 pm
by skywalk
Trond wrote:skywalk wrote:Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you.
Are you sure? Where did I use DisableDebugger for this purpose?
Sorry, can't remember, but thanks for clarifying the optimizer.

Re: Fast string building
Posted: Thu Jun 16, 2011 6:03 am
by MachineCode
@luis: I know what you mean about the loop starting at index 2, but the loop in my actual app builds the string from 1 of 2 different formats, so it would mean have the same block of building code used twice: once for the first item, then again for the rest. Too much repetition for no speed gain at all. As for the crash when only using 1000 items in the array: yes, adding one extra byte to the allocated memory solves it, but I wonder why it doesn't crash when using 10000?
@skywalk: Why can't I report times with Debug? Works perfectly fine. The timing is done BETWEEN Debug calls, so it's not like Debug is skewing the results.
Re: Fast string building
Posted: Thu Jun 16, 2011 6:52 am
by citystate
even using DisableDebugger, the processing time will be slowed minutely with the debugger enabled - with enough loops, this can add up to a lot.
compare the time using MessageRequesters with the debugger on and off, you should see a difference
Re: Fast string building
Posted: Thu Jun 16, 2011 7:23 am
by MachineCode
@citystate: Okay, I tried it with MessageRequesters and as a compiled executable (not run in the IDE), and the results were still the same: 12 seconds with normal building, and 0 seconds with the CopyMemoryString() method. I've just edited my code in my original post.
Re: Fast string building
Posted: Thu Jun 16, 2011 9:29 am
by Michael Vogel
The speed differences are amazing, but preparing the dish (moving strings to an array etc.) and removing the rest (free memory) will cost also some time, I fear...
How much time could be saved for replacing code lines like...
Code: Select all
a.s="this is a very short string"
b.s=#CRLF$
c.s="and one more short string"
... and now one million times:
s.s=a+b+c
by something like
I fear, when working with standard strings, no (big) improvement will be possible, like with other string handling functions (StringFind, StringFindRight, StringMid, StringLeft, StringRight, StringCut, StringReplace etc.)
Re: Fast string building
Posted: Thu Jun 16, 2011 10:24 am
by MachineCode
@Michael: The tip isn't meant to be used for short strings, but for joining extremely large strings, like in my code example. Also, what's a "standard" string? There's no such thing. A string is standard no matter size it is.
Re: Fast string building
Posted: Thu Jun 16, 2011 11:38 am
by luis
MachineCode wrote: As for the crash when only using 1000 items in the array: yes, adding one extra byte to the allocated memory solves it, but I wonder why it doesn't crash when using 10000?
http://www.purebasic.fr/blog/?p=55
Anyway, if you enable the purifier, it will catch the problem of writing after the buffer even for the n=10000.
Re: Fast string building
Posted: Thu Jun 16, 2011 12:40 pm
by MachineCode
An AWESOME post for finding the source of Invalid Memory Access errors! Thank you, Luis!
To the PureBasic team: the info in the above blog post should REALLY go into the manual somewhere.
Re: Fast string building
Posted: Thu Jun 16, 2011 2:25 pm
by skywalk
Michael Vogel wrote:The speed differences are amazing, but preparing the dish (moving strings to an array etc.) and removing the rest (free memory) will cost also some time, I fear...
How much time could be saved for replacing code lines like...
Code :
a.s="this is a very short string"
b.s=#CRLF$
c.s="and one more short string"
... and now one million times:
s.s=a+b+c
by something like
Code :
s=StringConcat(a,b,c)

19 sec vs 0.032 sec =~ 600 times slower for native string concatenation.

You can play with the example I posted. Obviously, it is more critical for large string buffers.
S.S = a + b + c is the only buffer that is growing, so it's the only one that needs handling.
Michael Vogel wrote:I fear, when working with standard strings, no (big) improvement will be possible, like with other string handling functions (StringFind, StringFindRight, StringMid, StringLeft, StringRight, StringCut, StringReplace etc.)
Each of the StringFns() can and have been adapted to a memory approach for working on large buffers.
Re: Fast string building
Posted: Fri Jun 17, 2011 9:15 am
by Michael Vogel
skywalk wrote:You can play with the example I posted. Obviously, it is more critical for large string buffers.
I've played around with your fine join function, anyhow it takes some time for initializing an array and freeing the memory millions of time. I have to convert a text file (around 40.000.000 lines at the moment) and my PB program uses the internal strings of PB for handling this data.
Searching, cropping and concatenating string parts takes around 30% of the program execution time - so speeding up string functions would be fine. I tried to do some (quick) tests to include the join routine, but the time which has been saved by the routine got lost due other stuff. I will have a look if Stringvariable.s=Space(Size) is as fast as StringPointer*=AllocateMemory(Size), that could help...
Thanks,
Michael
Re: Fast string building
Posted: Fri Jun 17, 2011 3:38 pm
by skywalk
40 million lines of text
Notepad or Excel would choke and die on that.
I mostly use the Join() procedure to build csv files after combining arrays of data.
If/when they exceed manageable limits(>10MB) I go binary or SQLite.
How does such a file come to be?
Maybe you can attack the problem at the source and redesign the source of your data?
Re: Fast string building
Posted: Fri Jun 17, 2011 5:42 pm
by Michael Vogel
skywalk wrote:40 million lines of text
Notepad or Excel would choke and die on that.

Absolutely!

I use HiEditor (
http://www.winasm.net) for trouble shooting now, bad luck it can't handle regular expressions...
skywalk wrote:I mostly use the Join() procedure to build csv files after combining arrays of data.
If/when they exceed manageable limits(>10MB) I go binary or SQLite.
How does such a file come to be?
Maybe you can attack the problem at the source and redesign the source of your data?
The file is an export from an external program (Garmin Training Center) which creates extreme large XML type files. The "smallest" is around 250 MB for now and will be converted/compressed (see
this thread) to around 12 MB which is more usable for me.
Re: Fast string building
Posted: Fri Jun 17, 2011 7:48 pm
by rudz
skywalk wrote:Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you.
Physically turning off the Debugger on the GUI bought me another 15 ms.
I was thinking to concatenate strings in memory, but there is not much overhead doing it with PB arrays and then using CopyMemoryString() inside a Join() procedure. That leaves most of the memory management to PB.
Side note:
VB6 native string concatenation(&) is actually faster than PB. But, CopyMemoryString() blows it away and is easy to use.
Code: Select all
EnableExplicit
Procedure.s Join(Array sA.s(1), Delm$=#NULL$)
; REV: 110301, skywalk
; String Concatenate Speed Test w/DisableDebugger while Debugger ON
; Str + Str => 19234ms / 32000 Calls, StrSize = 160000
; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
; Join() => 47ms / 32000 Calls, StrSize = 160000
; String Concatenate Speed Test w/DisableDebugger while Debugger OFF
; Str + Str => 18938ms / 32000 Calls, StrSize = 160000
; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
; Join() => 32ms / 32000 Calls, StrSize = 160000
Protected.i i, k, *p, *buf, memlen
Protected.s r$
k = ArraySize(sA())
For i = 0 To k
memlen + MemoryStringLength(@sA(i)) + SizeOf(Character) ; account for #Null$
Next i
If Delm$
memlen + k * MemoryStringLength(@Delm$) + SizeOf(Character) ; Add room for Delimiters
EndIf
;If memlen < 1024: memlen = 1024: EndIf
*buf = AllocateMemory(memlen + 1)
If *buf
*p = *buf ; Create tracking pointer for concatenating memory
If memlen <= MemorySize(*p) ; Verify enough memory created
If Delm$
CopyMemoryString(@sA(0), @*p)
CopyMemoryString(@Delm$)
For i = 1 To k
CopyMemoryString(@sA(i))
CopyMemoryString(@Delm$)
Next i
Else
CopyMemoryString(@sA(0), @*p)
For i = 1 To k
CopyMemoryString(@sA(i))
Next i
EndIf
r$ = PeekS(*buf)
FreeMemory(*buf)
EndIf
EndIf
ProcedureReturn r$
EndProcedure
CompilerIf #PB_Compiler_Debugger
DisableDebugger ; Slightly faster if also disabled on GUI.
CompilerEndIf
#Tries = 32000
Define.i i,t1
Define.s S
Define.s STRPlusSTR$, Join$, CopyMemStr$
#StrToUse$ = "XYZ"
;- STRING+STRING
t1 = ElapsedMilliseconds()
For i = 0 To #Tries-1
S + #StrToUse$ + #CRLF$
Next
t1 = ElapsedMilliseconds()-t1
STRPlusSTR$ = "Str + Str => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$
;- COPYMEMORYSTRING
t1 = ElapsedMilliseconds()
Define.i *p, *buf, memlen
S = #StrToUse$ + #CRLF$
*buf = AllocateMemory(#Tries*MemoryStringLength(@S)+SizeOf(Character))
*p = *Buf
memlen = MemorySize(*p)
If memlen
S = #StrToUse$ + #CRLF$
CopyMemoryString(@S, @*p)
For i = 1 To #Tries-1
S = #StrToUse$ + #CRLF$
CopyMemoryString(@S)
Next
S = PeekS(*buf)
FreeMemory(*buf)
EndIf
t1 = ElapsedMilliseconds()-t1
CopyMemStr$ = "CopyMemoryString() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$
;- JOIN
t1 = ElapsedMilliseconds()
; Create array of Strings
Dim sA.s(#Tries-1)
For i = 0 To #Tries-1
sA(i) = #StrToUse$ + #CRLF$
Next i
S = Join(sA())
t1 = ElapsedMilliseconds()-t1
Join$ = "Join() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S))
S = #NULL$
MessageRequester("String Concatenate Speed Test",STRPlusSTR$+CopyMemStr$+Join$)
Edited: Fixed Join() IMA if appending with delimiters. Allocated Memory was short by a #Null$.
I'm pleased to see the difference, at 100000 calls and StrSize at 500000 i get these numbers (ms):
pb460b3 x86:
Str+Str : ~51823
CopyMemString() :
0
Join() :
15
pb460b3 x64:
Str+Str : ~48579
CopyMemString() :
15
Join() :
0
Perhaps there is some logical explanation to it, which apparently escapes me completely
