Fastest way to modify a large string, byte-by-byte?
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Faster, but still unacceptably slow (about 20 seconds). Thanks for trying, but I really want to see a "Character" version to complete the other two "Character" versions that I've learned.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
Good try! It shows an empty string because you increment *o in your loop. Save *o and use the saved value (original) for PeekS().MachineCode wrote:One final plea for help (and once I get this sorted, I should be set -- because I've already learned how to change same-size strings and longer-to-shorter strings).
This last time, I need to learn how to quickly enlarge an existing string with new data. In the particular example below, I want to put line numbers (and a space) in front of each line in the existing string. I think my code is almost there, but I don't know how to get the new larger result.
Please help me, because once this is done, you'll have taught me to fish, instead of feeding me a fish.
Code: Select all
; Output in MessageRequester is to be: ; 8 eight ; 9 nine ; 10 ten ; With the above specific alignment. text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$ width=2 : lines=3 soc=SizeOf(Character) *i.Character=@text$ *o.Character=AllocateMemory(999) n=8 : s$=RSet(Str(n),width," ")+" " For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next While *i\c *o\c=*i\c : *o+soc If *i\c=#LF ; End of current line. n+1 : s$=RSet(Str(n),width," ")+" " For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next EndIf *i+soc Wend *o\c=0 MessageRequester("Result",text$) ; Shows original string. MessageRequester("Result",PeekS(*o)) ; Shows null.
sizeOf(Character) is a constant value, so you can change soc to #soc everywhere. Same for width in this case.
Could be replaced with one PokeS():
Code: Select all
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+#soc : Next
Code: Select all
; Output in MessageRequester is to be:
; 8 eight
; 9 nine
; 10 ten
; With the above specific alignment.
text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
#width=2 : lines=3
#soc=SizeOf(Character)
*i.Character=@text$
CompilerIf #PB_Compiler_Version > 500
*o.Character=AllocateMemory(999,#PB_Memory_NoClear) ; faster memory alloc
CompilerElse
*o.Character=AllocateMemory(999) ; slower memory alloc
CompilerEndIf
*o_orig = *o ; save original memory pointer
n=8 : s$=RSet(Str(n),#width," ")+" "
PokeS(*o,s$) : *o+Len(s$)*#soc
While *i\c
*o\c=*i\c : *o+#soc
If *i\c=#LF ; End of current line.
n+1 : s$=RSet(Str(n),#width," ")+" "
PokeS(*o,s$) : *o+Len(s$)*#soc ; use PokeS
EndIf
*i+#soc
Wend
*o\c=0
MessageRequester("Result",text$) ; Shows original string.
MessageRequester("Result",PeekS(*o_orig)) ; Shows new string
FreeMemory(*o_orig) ; free saved memory pointer
Last edited by Danilo on Sun Mar 03, 2013 6:48 am, edited 1 time in total.
Re: Fastest way to modify a large string, byte-by-byte?
This is the quickest way I could think of using pointers, I am assuming you are ok with using pbs string concatenation.
You could I guess insert the characters into a memory block on the fly (stopping where you find the #crlf pair and inserting your line text) and then just peeks the memory afterwards, however there is no real way to know how many #crlf pairs there are, so you'd have to guess the memory size (or just allocate based on a formula assuming the file contains ALL #crlf pairs). Other than that this is the best I can do.
You could I guess insert the characters into a memory block on the fly (stopping where you find the #crlf pair and inserting your line text) and then just peeks the memory afterwards, however there is no real way to know how many #crlf pairs there are, so you'd have to guess the memory size (or just allocate based on a formula assuming the file contains ALL #crlf pairs). Other than that this is the best I can do.
Code: Select all
Global line,*p.Character=@text$,*f=*p
While *p\c
If *p\c=#CR
*p+SizeOf(Character)
If *p\c=#LF
line+1
*p+SizeOf(Character)
r$+RSet(Str(line),2,"0")+" "+PeekS(*f,*p-*f)
*f=*p
EndIf
Else
*p+SizeOf(Character)
EndIf
Wend
MessageRequester("Test",r$)
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Re: Fastest way to modify a large string, byte-by-byte?
Are you saying you wish the parse the numbers in between the #crlf pairs and output the parsed value as a line number?
So "one"+#crlf$+"twenty"+#crlf$+"thirty"+#crlf$+"forty"+#crlf$+"fifty"
Would output
01 one
20 twenty
30 thirty
40 forty
50 fifty
In your message requester?
So "one"+#crlf$+"twenty"+#crlf$+"thirty"+#crlf$+"forty"+#crlf$+"fifty"
Would output
01 one
20 twenty
30 thirty
40 forty
50 fifty
In your message requester?
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Danilo, that's the stuff! Does what I want in about 30 ms on my 725 KB text. THANK YOU! 
Also, I knew that about the "width" var; I only set it to 2 for my example. For my 725 KB text string it's actually set to 5 due to the number of lines being over 60000.
[Edit] Psych, I just wanted to insert line numbers in front of every line of text$. All examples (except Danilo's) took over 20 seconds to do it, which is unacceptable. His version does it in less than a quarter of a second.
Amazing. I am very grateful for all the lessons I learned in this thread. It's been a great topic to cover: the fast manipulation of a massive string.

Also, I knew that about the "width" var; I only set it to 2 for my example. For my 725 KB text string it's actually set to 5 due to the number of lines being over 60000.

[Edit] Psych, I just wanted to insert line numbers in front of every line of text$. All examples (except Danilo's) took over 20 seconds to do it, which is unacceptable. His version does it in less than a quarter of a second.

Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
It's worth pointing out that danillos example produced an extra line number with empty text when I tested it (unless this behaviour doesnt matter).
I guess PBs string concatenation was the slowing factor after all.
I guess PBs string concatenation was the slowing factor after all.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Re: Fastest way to modify a large string, byte-by-byte?
It is your code, you wrote it. The only (common) mistake was you didn't save the original pointer *o. I made the same mistake before, so I always remember it now.MachineCode wrote:Danilo, that's the stuff! Does what I want in about 30 ms on my 725 KB text. THANK YOU!
Also, I knew that about the "width" var; I only set it to 2 for my example. For my 725 KB text string it's actually set to 5 due to the number of lines being over 60000.

Just remembered you still use PB 5.00, so I changed the code little bit:
Code: Select all
CompilerIf #PB_Compiler_Version > 500
*o.Character=AllocateMemory(999,#PB_Memory_NoClear) ; faster memory alloc
CompilerElse
*o.Character=AllocateMemory(999) ; slower memory alloc
CompilerEndIf
It is correct and happens because text$ has an +#CRLF at end. Remove the last #CRLF and there is no extra line number added.psych wrote:It's worth pointing out that danillos example produced an extra line number with empty text when I tested it (unless this behaviour doesnt matter).
I guess PBs string concatenation was the slowing factor after all.
Re: Fastest way to modify a large string, byte-by-byte?
Different approach but I don't know if it is faster or slower
Code: Select all
Procedure.s InsertLineNumbers(text.s)
Protected *in.Character = @text
Protected out.s, *cont, *out.Character
Protected in_len, ln_cnt, lnum_w, l, ln = 1
While *in\c
If *in\c = #LF
ln_cnt + 1
EndIf
*in + SizeOf(Character)
Wend
in_len = *in - @text
If in_len
*in - SizeOf(character)
If *in\c > 31
ln_cnt + 1
EndIf
EndIf
If ln_cnt
lnum_w = 2 + Log10(ln_cnt)
out = Space(in_len + lnum_w * ln_cnt)
*in = @text
*out = @out
While *in\c
*cont = *out + lnum_w
*out = *cont - 2
l = ln
While l
*out\c = $30 | (l % 10)
*out - 1
l / 10
Wend
*out = *cont
ln + 1
Repeat
*out\c = *in\c
*in + SizeOf(character)
*out + SizeOf(character)
Until *in\c = #LF Or *in\c = 0
If *in\c = #LF
*out\c = #LF
*in + SizeOf(character)
*out + SizeOf(character)
EndIf
Wend
EndIf
ProcedureReturn out
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Thanks Wilbert, but I'm pretty happy with what I've got now, so I won't compare yours for speed (sorry).wilbert wrote:Different approach but I don't know if it is faster or slower
I do have another question that came up today in my app: I need to alphabetize (sort) all the lines in text$ now. Currently I'm putting each line in an array with StringField(), calling SortArray(), then rebuilding a new sorted text$ with CopyMemoryString() in a loop. For my 725 KB text, it takes around 20 seconds. You guessed it: too long.


[Edit] I was wondering maybe there's a way to "PeekS" an array, instead of rebuilding it with CopyMemoryString?
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
Hi MachineCode,
Sorry if my suggestion is daft, but here goes:
Could you make a structure like this where the lengths of the lines are a fixed string array:
You could then treat the text as a single string or line by line using the array.
If sort still works on the array you won't have to move the text.
Sorry if my suggestion is daft, but here goes:
Could you make a structure like this where the lengths of the lines are a fixed string array:
Code: Select all
Structure XX
S$
StructureUnion
Array ss${20}(1000)
EndStructureUnion
EndStructure
If sort still works on the array you won't have to move the text.
DE AA EB
Re: Fastest way to modify a large string, byte-by-byte?
Here is an untested procedure:MachineCode wrote:Thanks Wilbert, but I'm pretty happy with what I've got now, so I won't compare yours for speed (sorry).wilbert wrote:Different approach but I don't know if it is faster or slower
I do have another question that came up today in my app: I need to alphabetize (sort) all the lines in text$ now. Currently I'm putting each line in an array with StringField(), calling SortArray(), then rebuilding a new sorted text$ with CopyMemoryString() in a loop. For my 725 KB text, it takes around 20 seconds. You guessed it: too long.I've looked hard but can't see how to apply the "Character" type to this particular situation. Any tips on how I should approach this one? Thanks.
[Edit] I was wondering maybe there's a way to "PeekS" an array, instead of rebuilding it with CopyMemoryString?
Code: Select all
#soc = SizeOf(character)
Procedure.s sortText(text$)
NewList textLines.s()
*start = @text$
*end.character = *start
While *end\c
If *end\c = #LF ;assumes that #LF is the last character of the string
*end + #soc
AddElement(textLines())
textLines() = Space(*end - *start)
CopyMemory(*start, @textLines(), *end - *start))
*start = *end
Else
*end + #soc
EndIf
Wend
SortList(textLines(), #PB_Sort_Ascending)
*start = @text$
CopyMemoryString("", @*start) ;prime the copy function
ForEach textLines()
CopyMemoryString(@textLines())
Next
ProcedureReturn text$
EndProcedure
Last edited by Demivec on Mon Mar 04, 2013 8:25 am, edited 1 time in total.
Re: Fastest way to modify a large string, byte-by-byte?
The following code is not the best one, but it took 47-63ms(Unicode) on my PC.MachineCode wrote:...... alphabetize (sort) all the lines in text$ ......
Windows only.
Code: Select all
ImportC "msvcrt.lib"
CompilerIf #PB_Compiler_Unicode
StringField2(*string, *seps) As "_wcstok"
CompilerElse
StringField2(*string, *seps) As "_strtok"
CompilerEndIf
EndImport
CompilerIf #PB_Compiler_Version < 510
CompilerError "Use PB 5.10+"
CompilerEndIf
Procedure.s SortTextLines(text$)
Protected NewList Lines.s()
Protected Seps$ = #CRLF$
Protected Token, Size, *Buffer, *Pointer
If text$
Token = StringField2(@text$, @Seps$)
While Token
If AddElement(Lines())
Lines() = PeekS(Token)
Size + StringByteLength(Lines()) + 2 * SizeOf(Character)
EndIf
Token = StringField2(0, @Seps$)
Wend
SortList(Lines(), #PB_Sort_Ascending)
;line numbering.
; Protected Num, NumLength
; NumLength = Int(Log10(ListSize(Lines()))) + 1
; ForEach Lines()
; Num + 1
; Lines() = RSet(Str(Num), NumLength) + " " + Lines()
; Size + StringByteLength(Lines()) + 2 * SizeOf(Character)
; Next
*Buffer = AllocateMemory(Size + SizeOf(Character), #PB_Memory_NoClear)
If *Buffer
*Pointer = *Buffer
CopyMemoryString("", @*Pointer)
ForEach Lines()
CopyMemoryString(Lines() + #CRLF$)
Next
text$ = PeekS(*Buffer)
FreeMemory(*Buffer)
EndIf
EndIf
ProcedureReturn text$
EndProcedure
*Buffer = AllocateMemory(6000000)
*Pointer = *Buffer
CopyMemoryString("", @*Pointer)
For i = 1 To 60000
CopyMemoryString(Chr(Random(126, 33)) + Chr(Random(126, 33)) + RSet(Str(Random(999999)), 6) + " - test string - " + Chr(Random(126, 33)) + #CRLF$)
Next
L1 = MemoryStringLength(*Buffer)
t$ = PeekS(*Buffer)
MessageRequester("Original", PeekS(@t$, 500))
t = ElapsedMilliseconds()
t$ = SortTextLines(t$)
t = ElapsedMilliseconds() - t
L2 = Len(t$)
MessageRequester("Sorted", PeekS(@t$, 500))
MessageRequester("Result", Str(t) + " ms" + #CR$ + #CR$ + "Length:" + #CR$ + "ori " + Str(L1) + #CR$ + "sorted " + Str(L2))
---------------------------
Result
---------------------------
47 ms
Length:
ori 1680000
sorted 1680000
---------------------------
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
I'm now using PureBasic 5.10 (because FindString is twice as fast, and you all know I need speed for my text app).
Demivec, your code crashes with IMA at: "SortList(textLines(), #PB_Sort_Ascending)"
Breeze4me, your code is perfect! Sorts my 725 KB text in around 30 ms.
Damn I love these forums!
Demivec, your code crashes with IMA at: "SortList(textLines(), #PB_Sort_Ascending)"
Breeze4me, your code is perfect! Sorts my 725 KB text in around 30 ms.

Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
At the moment I'm trying to determine if this reveals a bug or a misunderstanding.MachineCode wrote:Demivec, your code crashes with IMA at: "SortList(textLines(), #PB_Sort_Ascending)"

Because I like the puzzle though, here is a slightly altered version that will at least give breeze4me a run for his money. Please test it with your actual 725 KB text to see how it does on real-world data.
Code: Select all
#soc = SizeOf(character)
;---------
Procedure.s sortText(text$)
NewList *textLines.string()
otext$ = text$ ;This is simply to reserve an output buffer of the same size as the original string
*start.character = @text$
*end.character = *start
;setup pointers to each substring
While *end\c
If *end\c = #LF ;assumes that #LF is the last character of each substring
*end\c = #Null ;replace #LF temporarily to mark it as a substring
*end + #soc
AddElement(*textLines())
*textLines() = *start ;simply point to the string segment without copying it
*start = *end
Else
*end + #soc
EndIf
Wend
SortStructuredList(*textLines(), #PB_Sort_Ascending, 0, #PB_String)
;assemble the output string from the pointers to the sorted substrings
*start = @otext$
CopyMemoryString("", @*start) ;prime the copy function
ForEach *textLines()
CopyMemoryString(*textLines())
*start\c = #LF ;restore the #LF and 'unmark' the substring
*start + 1
Next
ProcedureReturn otext$
EndProcedure
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Misunderstanding? No, it was a direct copy-paste of your original code and when run, it crashed with that line. This was after I removed an extra closing bracket from your original code too.Demivec wrote:At the moment I'm trying to determine if this reveals a bug or a misunderstanding.![]()
Because I like the puzzle though, here is a slightly altered version that will at least give breeze4me a run for his money. Please test it with your actual 725 KB text to see how it does on real-world data.


Anyway, I tried your new code against breeze4me and it seems just as fast, but yours ends up having lots of blank lines at the top, because they're obviously being sorted too from all the carriage-returns in my text. Breeze's doesn't include them for some reason? If you can omit them, that'd be better because I hate "Import" in my code because I have no idea what it's doing and I hate having code that I can't understand (even if it works).
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!