Fast string building

Just starting out? Need help? Post your questions and find answers here.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Fast string building

Post by Trond »

skywalk wrote:
Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you. :)
Are you sure? Where did I use DisableDebugger for this purpose?
What is this "optimizer" thing ? Are you telling me the generated assembly code is different ? I don't think so...
It's a peephole optimizer. Which means, yes the generated assembly code is different.

http://www.purebasic.fr/english/viewtop ... 09#p242209
User avatar
luis
Addict
Addict
Posts: 3895
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Fast string building

Post by luis »

Trond wrote:It's a peephole optimizer. Which means, yes the generated assembly code is different.
OH ! Thanks :o

Uhm, I'm a little confused... purebasic.asm always contains the "release" version of this code right ? We are not able to see the debug-only not-peephole-ized asm code ... I think.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Fast string building

Post by skywalk »

Trond wrote:
skywalk wrote:
Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you. :)
Are you sure? Where did I use DisableDebugger for this purpose?
Sorry, can't remember, but thanks for clarifying the optimizer. :wink:
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fast string building

Post by MachineCode »

@luis: I know what you mean about the loop starting at index 2, but the loop in my actual app builds the string from 1 of 2 different formats, so it would mean have the same block of building code used twice: once for the first item, then again for the rest. Too much repetition for no speed gain at all. As for the crash when only using 1000 items in the array: yes, adding one extra byte to the allocated memory solves it, but I wonder why it doesn't crash when using 10000?

@skywalk: Why can't I report times with Debug? Works perfectly fine. The timing is done BETWEEN Debug calls, so it's not like Debug is skewing the results.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
citystate
Enthusiast
Enthusiast
Posts: 638
Joined: Sun Feb 12, 2006 10:06 pm

Re: Fast string building

Post by citystate »

even using DisableDebugger, the processing time will be slowed minutely with the debugger enabled - with enough loops, this can add up to a lot.

compare the time using MessageRequesters with the debugger on and off, you should see a difference
there is no sig, only zuul (and the following disclaimer)

WARNING: may be talking out of his hat
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fast string building

Post by MachineCode »

@citystate: Okay, I tried it with MessageRequesters and as a compiled executable (not run in the IDE), and the results were still the same: 12 seconds with normal building, and 0 seconds with the CopyMemoryString() method. I've just edited my code in my original post.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
Michael Vogel
Addict
Addict
Posts: 2797
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Fast string building

Post by Michael Vogel »

The speed differences are amazing, but preparing the dish (moving strings to an array etc.) and removing the rest (free memory) will cost also some time, I fear...

How much time could be saved for replacing code lines like...

Code: Select all

a.s="this is a very short string"
b.s=#CRLF$
c.s="and one more short string"
... and now one million times:
s.s=a+b+c
by something like

Code: Select all

s=StringConcat(a,b,c)
I fear, when working with standard strings, no (big) improvement will be possible, like with other string handling functions (StringFind, StringFindRight, StringMid, StringLeft, StringRight, StringCut, StringReplace etc.)
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fast string building

Post by MachineCode »

@Michael: The tip isn't meant to be used for short strings, but for joining extremely large strings, like in my code example. Also, what's a "standard" string? There's no such thing. A string is standard no matter size it is.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
luis
Addict
Addict
Posts: 3895
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Fast string building

Post by luis »

MachineCode wrote: As for the crash when only using 1000 items in the array: yes, adding one extra byte to the allocated memory solves it, but I wonder why it doesn't crash when using 10000?
http://www.purebasic.fr/blog/?p=55

Anyway, if you enable the purifier, it will catch the problem of writing after the buffer even for the n=10000.
Last edited by luis on Thu Jun 16, 2011 1:16 pm, edited 1 time in total.
"Have you tried turning it off and on again ?"
A little PureBasic review
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fast string building

Post by MachineCode »

An AWESOME post for finding the source of Invalid Memory Access errors! Thank you, Luis! 8)

To the PureBasic team: the info in the above blog post should REALLY go into the manual somewhere.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Fast string building

Post by skywalk »

Michael Vogel wrote:The speed differences are amazing, but preparing the dish (moving strings to an array etc.) and removing the rest (free memory) will cost also some time, I fear...

How much time could be saved for replacing code lines like...
Code : 
a.s="this is a very short string"
b.s=#CRLF$
c.s="and one more short string"
... and now one million times:
s.s=a+b+c

by something like
Code : 
s=StringConcat(a,b,c)
:?: 19 sec vs 0.032 sec =~ 600 times slower for native string concatenation. :?:
You can play with the example I posted. Obviously, it is more critical for large string buffers.
S.S = a + b + c is the only buffer that is growing, so it's the only one that needs handling.
Michael Vogel wrote:I fear, when working with standard strings, no (big) improvement will be possible, like with other string handling functions (StringFind, StringFindRight, StringMid, StringLeft, StringRight, StringCut, StringReplace etc.)
Each of the StringFns() can and have been adapted to a memory approach for working on large buffers.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
Michael Vogel
Addict
Addict
Posts: 2797
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Fast string building

Post by Michael Vogel »

skywalk wrote:You can play with the example I posted. Obviously, it is more critical for large string buffers.
I've played around with your fine join function, anyhow it takes some time for initializing an array and freeing the memory millions of time. I have to convert a text file (around 40.000.000 lines at the moment) and my PB program uses the internal strings of PB for handling this data.
Searching, cropping and concatenating string parts takes around 30% of the program execution time - so speeding up string functions would be fine. I tried to do some (quick) tests to include the join routine, but the time which has been saved by the routine got lost due other stuff. I will have a look if Stringvariable.s=Space(Size) is as fast as StringPointer*=AllocateMemory(Size), that could help...

Thanks,
Michael
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Fast string building

Post by skywalk »

40 million lines of text :shock:
Notepad or Excel would choke and die on that. :)

I mostly use the Join() procedure to build csv files after combining arrays of data.
If/when they exceed manageable limits(>10MB) I go binary or SQLite.

How does such a file come to be?
Maybe you can attack the problem at the source and redesign the source of your data?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
Michael Vogel
Addict
Addict
Posts: 2797
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Fast string building

Post by Michael Vogel »

skywalk wrote:40 million lines of text :shock:
Notepad or Excel would choke and die on that. :)
Absolutely! :evil: I use HiEditor (http://www.winasm.net) for trouble shooting now, bad luck it can't handle regular expressions...
skywalk wrote:I mostly use the Join() procedure to build csv files after combining arrays of data.
If/when they exceed manageable limits(>10MB) I go binary or SQLite.

How does such a file come to be?
Maybe you can attack the problem at the source and redesign the source of your data?
The file is an export from an external program (Garmin Training Center) which creates extreme large XML type files. The "smallest" is around 250 MB for now and will be converted/compressed (see this thread) to around 12 MB which is more usable for me.
rudz
User
User
Posts: 35
Joined: Sun Mar 21, 2010 6:59 am
Location: Denmark
Contact:

Re: Fast string building

Post by rudz »

skywalk wrote:
Trond wrote:DisableDebugger isn't enough, because the optimizer is only enabled when the debugger is disabled totally (from the menu).
Ha, I actually got the DisableDebugger idea from you. :)
Physically turning off the Debugger on the GUI bought me another 15 ms.

I was thinking to concatenate strings in memory, but there is not much overhead doing it with PB arrays and then using CopyMemoryString() inside a Join() procedure. That leaves most of the memory management to PB. :wink:

Side note:
VB6 native string concatenation(&) is actually faster than PB. But, CopyMemoryString() blows it away and is easy to use.

Code: Select all

EnableExplicit

Procedure.s Join(Array sA.s(1), Delm$=#NULL$)
  ; REV:  110301, skywalk
  ; String Concatenate Speed Test w/DisableDebugger while Debugger ON
  ; Str + Str => 19234ms / 32000 Calls, StrSize = 160000
  ; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
  ; Join() => 47ms / 32000 Calls, StrSize = 160000  
  ; String Concatenate Speed Test w/DisableDebugger while Debugger OFF
  ; Str + Str => 18938ms / 32000 Calls, StrSize = 160000
  ; CopyMemoryString() => 15ms / 32000 Calls, StrSize = 160000
  ; Join() => 32ms / 32000 Calls, StrSize = 160000
  Protected.i i, k, *p, *buf, memlen
  Protected.s r$
  k = ArraySize(sA())
  For i = 0 To k
    memlen + MemoryStringLength(@sA(i)) + SizeOf(Character) ; account for #Null$
  Next i
  If Delm$
    memlen + k * MemoryStringLength(@Delm$) +  SizeOf(Character) ; Add room for Delimiters
  EndIf
  ;If memlen < 1024: memlen = 1024: EndIf
  *buf = AllocateMemory(memlen + 1)
  If *buf
    *p = *buf   ; Create tracking pointer for concatenating memory
    If memlen <= MemorySize(*p)  ; Verify enough memory created
      If Delm$
        CopyMemoryString(@sA(0), @*p)
        CopyMemoryString(@Delm$)
        For i = 1 To k
          CopyMemoryString(@sA(i))
          CopyMemoryString(@Delm$)
        Next i
      Else
        CopyMemoryString(@sA(0), @*p)
        For i = 1 To k
          CopyMemoryString(@sA(i))
        Next i
      EndIf
      r$ = PeekS(*buf)
      FreeMemory(*buf)
    EndIf
  EndIf
  ProcedureReturn r$
EndProcedure

CompilerIf #PB_Compiler_Debugger
  DisableDebugger ; Slightly faster if also disabled on GUI.
CompilerEndIf
#Tries = 32000
Define.i i,t1
Define.s S
Define.s STRPlusSTR$, Join$, CopyMemStr$
#StrToUse$ = "XYZ"

;- STRING+STRING
t1 = ElapsedMilliseconds()
For i = 0 To #Tries-1
  S + #StrToUse$ + #CRLF$
Next
t1 = ElapsedMilliseconds()-t1
STRPlusSTR$ = "Str + Str => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$

;- COPYMEMORYSTRING
t1 = ElapsedMilliseconds()
Define.i *p, *buf, memlen
S = #StrToUse$ + #CRLF$
*buf = AllocateMemory(#Tries*MemoryStringLength(@S)+SizeOf(Character))
*p = *Buf
memlen = MemorySize(*p)
If memlen 
  S = #StrToUse$ + #CRLF$
  CopyMemoryString(@S, @*p)
  For i = 1 To #Tries-1 
    S = #StrToUse$ + #CRLF$
    CopyMemoryString(@S)
  Next
  S = PeekS(*buf)
  FreeMemory(*buf)
EndIf
t1 = ElapsedMilliseconds()-t1
CopyMemStr$ = "CopyMemoryString() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S)) + #CRLF$ + #CRLF$
S = #NULL$

;- JOIN
t1 = ElapsedMilliseconds()
; Create array of Strings
Dim sA.s(#Tries-1)
For i = 0 To #Tries-1
  sA(i) = #StrToUse$ + #CRLF$
Next i
S = Join(sA())
t1 = ElapsedMilliseconds()-t1
Join$ = "Join() => " + Str(t1) + "ms / " + Str(#Tries) + " Calls, StrSize = " + Str(MemoryStringLength(@S))
S = #NULL$
MessageRequester("String Concatenate Speed Test",STRPlusSTR$+CopyMemStr$+Join$)
Edited: Fixed Join() IMA if appending with delimiters. Allocated Memory was short by a #Null$.
I'm pleased to see the difference, at 100000 calls and StrSize at 500000 i get these numbers (ms):

pb460b3 x86:
Str+Str : ~51823
CopyMemString() : 0
Join() : 15

pb460b3 x64:
Str+Str : ~48579
CopyMemString() : 15
Join() : 0

Perhaps there is some logical explanation to it, which apparently escapes me completely :D
AMD FX-8350 @ ~4.8GHz | 8GB Corsair DDR3-SDRAM @ 1800Mhz | 7even Ult & Manjaro 0.8.7.1 | PB 5.3
Web: rudz.dk
Post Reply