Page 2 of 2

Re: Fast string building

Posted: Wed Jun 15, 2011 9:52 pm
by Trond
skywalk wrote:
Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you. :)
Are you sure? Where did I use DisableDebugger for this purpose?
What is this "optimizer" thing ? Are you telling me the generated assembly code is different ? I don't think so...
It's a peephole optimizer. Which means, yes the generated assembly code is different.

http://www.purebasic.fr/english/viewtop ... 09#p242209

Re: Fast string building

Posted: Wed Jun 15, 2011 10:18 pm
by luis
Trond wrote:It's a peephole optimizer. Which means, yes the generated assembly code is different.
OH ! Thanks :o

Uhm, I'm a little confused... purebasic.asm always contains the "release" version of this code right ? We are not able to see the debug-only not-peephole-ized asm code ... I think.

Re: Fast string building

Posted: Wed Jun 15, 2011 11:21 pm
by skywalk
Trond wrote:
skywalk wrote:
Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you. :)
Are you sure? Where did I use DisableDebugger for this purpose?
Sorry, can't remember, but thanks for clarifying the optimizer. :wink:

Re: Fast string building

Posted: Thu Jun 16, 2011 6:03 am
by MachineCode
@luis: I know what you mean about the loop starting at index 2, but the loop in my actual app builds the string from 1 of 2 different formats, so it would mean have the same block of building code used twice: once for the first item, then again for the rest. Too much repetition for no speed gain at all. As for the crash when only using 1000 items in the array: yes, adding one extra byte to the allocated memory solves it, but I wonder why it doesn't crash when using 10000?

@skywalk: Why can't I report times with Debug? Works perfectly fine. The timing is done BETWEEN Debug calls, so it's not like Debug is skewing the results.

Re: Fast string building

Posted: Thu Jun 16, 2011 6:52 am
by citystate
even using DisableDebugger, the processing time will be slowed minutely with the debugger enabled - with enough loops, this can add up to a lot.

compare the time using MessageRequesters with the debugger on and off, you should see a difference

Re: Fast string building

Posted: Thu Jun 16, 2011 7:23 am
by MachineCode
@citystate: Okay, I tried it with MessageRequesters and as a compiled executable (not run in the IDE), and the results were still the same: 12 seconds with normal building, and 0 seconds with the CopyMemoryString() method. I've just edited my code in my original post.

Re: Fast string building

Posted: Thu Jun 16, 2011 9:29 am
by Michael Vogel
The speed differences are amazing, but preparing the dish (moving strings to an array etc.) and removing the rest (free memory) will cost also some time, I fear...

How much time could be saved for replacing code lines like...

Code: Select all

a.s="this is a very short string"
b.s=#CRLF$
c.s="and one more short string"
... and now one million times:
s.s=a+b+c
by something like

Code: Select all

s=StringConcat(a,b,c)
I fear, when working with standard strings, no (big) improvement will be possible, like with other string handling functions (StringFind, StringFindRight, StringMid, StringLeft, StringRight, StringCut, StringReplace etc.)

Re: Fast string building

Posted: Thu Jun 16, 2011 10:24 am
by MachineCode
@Michael: The tip isn't meant to be used for short strings, but for joining extremely large strings, like in my code example. Also, what's a "standard" string? There's no such thing. A string is standard no matter size it is.

Re: Fast string building

Posted: Thu Jun 16, 2011 11:38 am
by luis
MachineCode wrote: As for the crash when only using 1000 items in the array: yes, adding one extra byte to the allocated memory solves it, but I wonder why it doesn't crash when using 10000?
http://www.purebasic.fr/blog/?p=55

Anyway, if you enable the purifier, it will catch the problem of writing after the buffer even for the n=10000.

Re: Fast string building

Posted: Thu Jun 16, 2011 12:40 pm
by MachineCode
An AWESOME post for finding the source of Invalid Memory Access errors! Thank you, Luis! 8)

To the PureBasic team: the info in the above blog post should REALLY go into the manual somewhere.

Re: Fast string building

Posted: Thu Jun 16, 2011 2:25 pm
by skywalk
Michael Vogel wrote:The speed differences are amazing, but preparing the dish (moving strings to an array etc.) and removing the rest (free memory) will cost also some time, I fear...

How much time could be saved for replacing code lines like...
Code : 
a.s="this is a very short string"
b.s=#CRLF$
c.s="and one more short string"
... and now one million times:
s.s=a+b+c

by something like
Code : 
s=StringConcat(a,b,c)
:?: 19 sec vs 0.032 sec =~ 600 times slower for native string concatenation. :?:
You can play with the example I posted. Obviously, it is more critical for large string buffers.
S.S = a + b + c is the only buffer that is growing, so it's the only one that needs handling.
Michael Vogel wrote:I fear, when working with standard strings, no (big) improvement will be possible, like with other string handling functions (StringFind, StringFindRight, StringMid, StringLeft, StringRight, StringCut, StringReplace etc.)
Each of the StringFns() can and have been adapted to a memory approach for working on large buffers.

Re: Fast string building

Posted: Fri Jun 17, 2011 9:15 am
by Michael Vogel
skywalk wrote:You can play with the example I posted. Obviously, it is more critical for large string buffers.
I've played around with your fine join function, anyhow it takes some time for initializing an array and freeing the memory millions of time. I have to convert a text file (around 40.000.000 lines at the moment) and my PB program uses the internal strings of PB for handling this data.
Searching, cropping and concatenating string parts takes around 30% of the program execution time - so speeding up string functions would be fine. I tried to do some (quick) tests to include the join routine, but the time which has been saved by the routine got lost due other stuff. I will have a look if Stringvariable.s=Space(Size) is as fast as StringPointer*=AllocateMemory(Size), that could help...

Thanks,
Michael

Re: Fast string building

Posted: Fri Jun 17, 2011 3:38 pm
by skywalk
40 million lines of text :shock:
Notepad or Excel would choke and die on that. :)

I mostly use the Join() procedure to build csv files after combining arrays of data.
If/when they exceed manageable limits(>10MB) I go binary or SQLite.

How does such a file come to be?
Maybe you can attack the problem at the source and redesign the source of your data?

Re: Fast string building

Posted: Fri Jun 17, 2011 5:42 pm
by Michael Vogel
skywalk wrote:40 million lines of text :shock:
Notepad or Excel would choke and die on that. :)
Absolutely! :evil: I use HiEditor (http://www.winasm.net) for trouble shooting now, bad luck it can't handle regular expressions...
skywalk wrote:I mostly use the Join() procedure to build csv files after combining arrays of data.
If/when they exceed manageable limits(>10MB) I go binary or SQLite.

How does such a file come to be?
Maybe you can attack the problem at the source and redesign the source of your data?
The file is an export from an external program (Garmin Training Center) which creates extreme large XML type files. The "smallest" is around 250 MB for now and will be converted/compressed (see this thread) to around 12 MB which is more usable for me.

Re: Fast string building

Posted: Fri Jun 17, 2011 7:48 pm
by rudz
skywalk wrote:
Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you. :)
Physically turning off the Debugger on the GUI bought me another 15 ms.

I was thinking to concatenate strings in memory, but there is not much overhead doing it with PB arrays and then using CopyMemoryString() inside a Join() procedure. That leaves most of the memory management to PB. :wink:

Side note:
VB6 native string concatenation(&) is actually faster than PB. But, CopyMemoryString() blows it away and is easy to use.

Code: Select all

EnableExplicit

Procedure.s Join(Array sA.s(1), Delm$=#NULL$)
  ; REV:  110301, skywalk
  ; String Concatenate Speed Test w/DisableDebugger while Debugger ON
  ; Str + Str => 19234ms / 32000 Calls, StrSize = 160000
  ; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
  ; Join() => 47ms / 32000 Calls, StrSize = 160000  
  ; String Concatenate Speed Test w/DisableDebugger while Debugger OFF
  ; Str + Str => 18938ms / 32000 Calls, StrSize = 160000
  ; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
  ; Join() => 32ms / 32000 Calls, StrSize = 160000
  Protected.i i, k, *p, *buf, memlen
  Protected.s r$
  k = ArraySize(sA())
  For i = 0 To k
    memlen + MemoryStringLength(@sA(i)) + SizeOf(Character) ; account for #Null$
  Next i
  If Delm$
    memlen + k * MemoryStringLength(@Delm$) +  SizeOf(Character) ; Add room for Delimiters
  EndIf
  ;If memlen < 1024: memlen = 1024: EndIf
  *buf = AllocateMemory(memlen + 1)
  If *buf
    *p = *buf   ; Create tracking pointer for concatenating memory
    If memlen <= MemorySize(*p)  ; Verify enough memory created
      If Delm$
        CopyMemoryString(@sA(0), @*p)
        CopyMemoryString(@Delm$)
        For i = 1 To k
          CopyMemoryString(@sA(i))
          CopyMemoryString(@Delm$)
        Next i
      Else
        CopyMemoryString(@sA(0), @*p)
        For i = 1 To k
          CopyMemoryString(@sA(i))
        Next i
      EndIf
      r$ = PeekS(*buf)
      FreeMemory(*buf)
    EndIf
  EndIf
  ProcedureReturn r$
EndProcedure

CompilerIf #PB_Compiler_Debugger
  DisableDebugger ; Slightly faster if also disabled on GUI.
CompilerEndIf
#Tries = 32000
Define.i i,t1
Define.s S
Define.s STRPlusSTR$, Join$, CopyMemStr$
#StrToUse$ = "XYZ"

;- STRING+STRING
t1 = ElapsedMilliseconds()
For i = 0 To #Tries-1
  S + #StrToUse$ + #CRLF$
Next
t1 = ElapsedMilliseconds()-t1
STRPlusSTR$ = "Str + Str => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$

;- COPYMEMORYSTRING
t1 = ElapsedMilliseconds()
Define.i *p, *buf, memlen
S = #StrToUse$ + #CRLF$
*buf = AllocateMemory(#Tries*MemoryStringLength(@S)+SizeOf(Character))
*p = *Buf
memlen = MemorySize(*p)
If memlen 
  S = #StrToUse$ + #CRLF$
  CopyMemoryString(@S, @*p)
  For i = 1 To #Tries-1 
    S = #StrToUse$ + #CRLF$
    CopyMemoryString(@S)
  Next
  S = PeekS(*buf)
  FreeMemory(*buf)
EndIf
t1 = ElapsedMilliseconds()-t1
CopyMemStr$ = "CopyMemoryString() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$

;- JOIN
t1 = ElapsedMilliseconds()
; Create array of Strings
Dim sA.s(#Tries-1)
For i = 0 To #Tries-1
  sA(i) = #StrToUse$ + #CRLF$
Next i
S = Join(sA())
t1 = ElapsedMilliseconds()-t1
Join$ = "Join() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S))
S = #NULL$
MessageRequester("String Concatenate Speed Test",STRPlusSTR$+CopyMemStr$+Join$)
Edited: Fixed Join() IMA if appending with delimiters. Allocated Memory was short by a #Null$.
I'm pleased to see the difference, at 100000 calls and StrSize at 500000 i get these numbers (ms):

pb460b3 x86:
Str+Str : ~51823
CopyMemString() : 0
Join() : 15

pb460b3 x64:
Str+Str : ~48579
CopyMemString() : 15
Join() : 0

Perhaps there is some logical explanation to it, which apparently escapes me completely :D