Page 3 of 4
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 6:12 am
by MachineCode
Faster, but still unacceptably slow (about 20 seconds). Thanks for trying, but I really want to see a "Character" version to complete the other two "Character" versions that I've learned.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 6:19 am
by Danilo
MachineCode wrote:One final plea for help (and once I get this sorted, I should be set -- because I've already learned how to change same-size strings and longer-to-shorter strings).
This last time, I need to learn how to quickly enlarge an existing string with new data. In the particular example below, I want to put line numbers (and a space) in front of each line in the existing string. I think my code is almost there, but I don't know how to get the new larger result.
Please help me, because once this is done, you'll have taught me to fish, instead of feeding me a fish.
Code: Select all
; Output in MessageRequester is to be:
; 8 eight
; 9 nine
; 10 ten
; With the above specific alignment.
text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
width=2 : lines=3
soc=SizeOf(Character)
*i.Character=@text$
*o.Character=AllocateMemory(999)
n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next
While *i\c
*o\c=*i\c : *o+soc
If *i\c=#LF ; End of current line.
n+1 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next
EndIf
*i+soc
Wend
*o\c=0
MessageRequester("Result",text$) ; Shows original string.
MessageRequester("Result",PeekS(*o)) ; Shows null.
Good try! It shows an empty string because you increment *o in your loop. Save *o and use the saved value (original) for PeekS().
sizeOf(Character) is a constant value, so you can change soc to #soc everywhere. Same for width in this case.
Could be replaced with one PokeS():
Code: Select all
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+#soc : Next
Code: Select all
; Output in MessageRequester is to be:
; 8 eight
; 9 nine
; 10 ten
; With the above specific alignment.
text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
#width=2 : lines=3
#soc=SizeOf(Character)
*i.Character=@text$
CompilerIf #PB_Compiler_Version > 500
*o.Character=AllocateMemory(999,#PB_Memory_NoClear) ; faster memory alloc
CompilerElse
*o.Character=AllocateMemory(999) ; slower memory alloc
CompilerEndIf
*o_orig = *o ; save original memory pointer
n=8 : s$=RSet(Str(n),#width," ")+" "
PokeS(*o,s$) : *o+Len(s$)*#soc
While *i\c
*o\c=*i\c : *o+#soc
If *i\c=#LF ; End of current line.
n+1 : s$=RSet(Str(n),#width," ")+" "
PokeS(*o,s$) : *o+Len(s$)*#soc ; use PokeS
EndIf
*i+#soc
Wend
*o\c=0
MessageRequester("Result",text$) ; Shows original string.
MessageRequester("Result",PeekS(*o_orig)) ; Shows new string
FreeMemory(*o_orig) ; free saved memory pointer
Be aware that RSet with a #width of 2 makes the line numbers max. 2 character in width! '999' and '9999' will get truncated to '99'!
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 6:32 am
by Psych
This is the quickest way I could think of using pointers, I am assuming you are ok with using pbs string concatenation.
You could I guess insert the characters into a memory block on the fly (stopping where you find the #crlf pair and inserting your line text) and then just peeks the memory afterwards, however there is no real way to know how many #crlf pairs there are, so you'd have to guess the memory size (or just allocate based on a formula assuming the file contains ALL #crlf pairs). Other than that this is the best I can do.
Code: Select all
Global line,*p.Character=@text$,*f=*p
While *p\c
If *p\c=#CR
*p+SizeOf(Character)
If *p\c=#LF
line+1
*p+SizeOf(Character)
r$+RSet(Str(line),2,"0")+" "+PeekS(*f,*p-*f)
*f=*p
EndIf
Else
*p+SizeOf(Character)
EndIf
Wend
MessageRequester("Test",r$)
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 6:38 am
by Psych
Are you saying you wish the parse the numbers in between the #crlf pairs and output the parsed value as a line number?
So "one"+#crlf$+"twenty"+#crlf$+"thirty"+#crlf$+"forty"+#crlf$+"fifty"
Would output
01 one
20 twenty
30 thirty
40 forty
50 fifty
In your message requester?
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 6:43 am
by MachineCode
Danilo, that's the stuff! Does what I want in about 30 ms on my 725 KB text. THANK YOU!
Also, I knew that about the "width" var; I only set it to 2 for my example. For my 725 KB text string it's actually set to 5 due to the number of lines being over 60000.
[Edit] Psych, I just wanted to insert line numbers in front of every line of text$. All examples (except Danilo's) took over 20 seconds to do it, which is unacceptable. His version does it in less than a quarter of a second.

Amazing. I am very grateful for all the lessons I learned in this thread. It's been a great topic to cover: the fast manipulation of a massive string.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 6:52 am
by Psych
It's worth pointing out that danillos example produced an extra line number with empty text when I tested it (unless this behaviour doesnt matter).
I guess PBs string concatenation was the slowing factor after all.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 6:55 am
by Danilo
MachineCode wrote:Danilo, that's the stuff! Does what I want in about 30 ms on my 725 KB text. THANK YOU!
Also, I knew that about the "width" var; I only set it to 2 for my example. For my 725 KB text string it's actually set to 5 due to the number of lines being over 60000.

It is your code, you wrote it. The only (common) mistake was you didn't save the original pointer *o. I made the same mistake before, so I always remember it now.
Just remembered you still use PB 5.00, so I changed the code little bit:
Code: Select all
CompilerIf #PB_Compiler_Version > 500
*o.Character=AllocateMemory(999,#PB_Memory_NoClear) ; faster memory alloc
CompilerElse
*o.Character=AllocateMemory(999) ; slower memory alloc
CompilerEndIf
Memory allocation without setting it to zero is faster, but it was just introduced with PB 5.10.
psych wrote:It's worth pointing out that danillos example produced an extra line number with empty text when I tested it (unless this behaviour doesnt matter).
I guess PBs string concatenation was the slowing factor after all.
It is correct and happens because text$ has an +#CRLF at end. Remove the last #CRLF and there is no extra line number added.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 8:03 am
by wilbert
Different approach but I don't know if it is faster or slower
Code: Select all
Procedure.s InsertLineNumbers(text.s)
Protected *in.Character = @text
Protected out.s, *cont, *out.Character
Protected in_len, ln_cnt, lnum_w, l, ln = 1
While *in\c
If *in\c = #LF
ln_cnt + 1
EndIf
*in + SizeOf(Character)
Wend
in_len = *in - @text
If in_len
*in - SizeOf(character)
If *in\c > 31
ln_cnt + 1
EndIf
EndIf
If ln_cnt
lnum_w = 2 + Log10(ln_cnt)
out = Space(in_len + lnum_w * ln_cnt)
*in = @text
*out = @out
While *in\c
*cont = *out + lnum_w
*out = *cont - 2
l = ln
While l
*out\c = $30 | (l % 10)
*out - 1
l / 10
Wend
*out = *cont
ln + 1
Repeat
*out\c = *in\c
*in + SizeOf(character)
*out + SizeOf(character)
Until *in\c = #LF Or *in\c = 0
If *in\c = #LF
*out\c = #LF
*in + SizeOf(character)
*out + SizeOf(character)
EndIf
Wend
EndIf
ProcedureReturn out
EndProcedure
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 3:26 pm
by MachineCode
wilbert wrote:Different approach but I don't know if it is faster or slower
Thanks Wilbert, but I'm pretty happy with what I've got now, so I won't compare yours for speed (sorry).
I do have another question that came up today in my app: I need to alphabetize (sort) all the lines in text$ now. Currently I'm putting each line in an array with StringField(), calling SortArray(), then rebuilding a new sorted text$ with CopyMemoryString() in a loop. For my 725 KB text, it takes around 20 seconds. You guessed it: too long.

I've looked hard but can't see how to apply the "Character" type to this particular situation. Any tips on how I should approach this one? Thanks.
[Edit] I was wondering maybe there's a way to "PeekS" an array, instead of rebuilding it with CopyMemoryString?
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 5:25 pm
by davido
Hi MachineCode,
Sorry if my suggestion is daft, but here goes:
Could you make a structure like this where the lengths of the lines are a fixed string array:
Code: Select all
Structure XX
S$
StructureUnion
Array ss${20}(1000)
EndStructureUnion
EndStructure
You could then treat the text as a single string or line by line using the array.
If sort still works on the array you won't have to move the text.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Sun Mar 03, 2013 5:49 pm
by Demivec
MachineCode wrote:wilbert wrote:Different approach but I don't know if it is faster or slower
Thanks Wilbert, but I'm pretty happy with what I've got now, so I won't compare yours for speed (sorry).
I do have another question that came up today in my app: I need to alphabetize (sort) all the lines in text$ now. Currently I'm putting each line in an array with StringField(), calling SortArray(), then rebuilding a new sorted text$ with CopyMemoryString() in a loop. For my 725 KB text, it takes around 20 seconds. You guessed it: too long.

I've looked hard but can't see how to apply the "Character" type to this particular situation. Any tips on how I should approach this one? Thanks.
[Edit] I was wondering maybe there's a way to "PeekS" an array, instead of rebuilding it with CopyMemoryString?
Here is an untested procedure:
Code: Select all
#soc = SizeOf(character)
Procedure.s sortText(text$)
NewList textLines.s()
*start = @text$
*end.character = *start
While *end\c
If *end\c = #LF ;assumes that #LF is the last character of the string
*end + #soc
AddElement(textLines())
textLines() = Space(*end - *start)
CopyMemory(*start, @textLines(), *end - *start))
*start = *end
Else
*end + #soc
EndIf
Wend
SortList(textLines(), #PB_Sort_Ascending)
*start = @text$
CopyMemoryString("", @*start) ;prime the copy function
ForEach textLines()
CopyMemoryString(@textLines())
Next
ProcedureReturn text$
EndProcedure
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Mon Mar 04, 2013 12:25 am
by breeze4me
MachineCode wrote:...... alphabetize (sort) all the lines in text$ ......
The following code is not the best one, but it took 47-63ms(Unicode) on my PC.
Windows only.
Code: Select all
ImportC "msvcrt.lib"
CompilerIf #PB_Compiler_Unicode
StringField2(*string, *seps) As "_wcstok"
CompilerElse
StringField2(*string, *seps) As "_strtok"
CompilerEndIf
EndImport
CompilerIf #PB_Compiler_Version < 510
CompilerError "Use PB 5.10+"
CompilerEndIf
Procedure.s SortTextLines(text$)
Protected NewList Lines.s()
Protected Seps$ = #CRLF$
Protected Token, Size, *Buffer, *Pointer
If text$
Token = StringField2(@text$, @Seps$)
While Token
If AddElement(Lines())
Lines() = PeekS(Token)
Size + StringByteLength(Lines()) + 2 * SizeOf(Character)
EndIf
Token = StringField2(0, @Seps$)
Wend
SortList(Lines(), #PB_Sort_Ascending)
;line numbering.
; Protected Num, NumLength
; NumLength = Int(Log10(ListSize(Lines()))) + 1
; ForEach Lines()
; Num + 1
; Lines() = RSet(Str(Num), NumLength) + " " + Lines()
; Size + StringByteLength(Lines()) + 2 * SizeOf(Character)
; Next
*Buffer = AllocateMemory(Size + SizeOf(Character), #PB_Memory_NoClear)
If *Buffer
*Pointer = *Buffer
CopyMemoryString("", @*Pointer)
ForEach Lines()
CopyMemoryString(Lines() + #CRLF$)
Next
text$ = PeekS(*Buffer)
FreeMemory(*Buffer)
EndIf
EndIf
ProcedureReturn text$
EndProcedure
*Buffer = AllocateMemory(6000000)
*Pointer = *Buffer
CopyMemoryString("", @*Pointer)
For i = 1 To 60000
CopyMemoryString(Chr(Random(126, 33)) + Chr(Random(126, 33)) + RSet(Str(Random(999999)), 6) + " - test string - " + Chr(Random(126, 33)) + #CRLF$)
Next
L1 = MemoryStringLength(*Buffer)
t$ = PeekS(*Buffer)
MessageRequester("Original", PeekS(@t$, 500))
t = ElapsedMilliseconds()
t$ = SortTextLines(t$)
t = ElapsedMilliseconds() - t
L2 = Len(t$)
MessageRequester("Sorted", PeekS(@t$, 500))
MessageRequester("Result", Str(t) + " ms" + #CR$ + #CR$ + "Length:" + #CR$ + "ori " + Str(L1) + #CR$ + "sorted " + Str(L2))
---------------------------
Result
---------------------------
47 ms
Length:
ori 1680000
sorted 1680000
---------------------------
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Mon Mar 04, 2013 11:33 am
by MachineCode
I'm now using PureBasic 5.10 (because FindString is twice as fast, and you all know I need speed for my text app).
Demivec, your code crashes with IMA at: "SortList(textLines(), #PB_Sort_Ascending)"
Breeze4me, your code is perfect! Sorts my 725 KB text in around 30 ms.

Damn I love these forums!
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Mon Mar 04, 2013 6:38 pm
by Demivec
MachineCode wrote:Demivec, your code crashes with IMA at: "SortList(textLines(), #PB_Sort_Ascending)"
At the moment I'm trying to determine if this reveals a bug or a misunderstanding.
Because I like the puzzle though, here is a slightly altered version that will at least give breeze4me a run for his money. Please test it with your actual 725 KB text to see how it does on real-world data.
Code: Select all
#soc = SizeOf(character)
;---------
Procedure.s sortText(text$)
NewList *textLines.string()
otext$ = text$ ;This is simply to reserve an output buffer of the same size as the original string
*start.character = @text$
*end.character = *start
;setup pointers to each substring
While *end\c
If *end\c = #LF ;assumes that #LF is the last character of each substring
*end\c = #Null ;replace #LF temporarily to mark it as a substring
*end + #soc
AddElement(*textLines())
*textLines() = *start ;simply point to the string segment without copying it
*start = *end
Else
*end + #soc
EndIf
Wend
SortStructuredList(*textLines(), #PB_Sort_Ascending, 0, #PB_String)
;assemble the output string from the pointers to the sorted substrings
*start = @otext$
CopyMemoryString("", @*start) ;prime the copy function
ForEach *textLines()
CopyMemoryString(*textLines())
*start\c = #LF ;restore the #LF and 'unmark' the substring
*start + 1
Next
ProcedureReturn otext$
EndProcedure
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Tue Mar 05, 2013 1:45 pm
by MachineCode
Demivec wrote:At the moment I'm trying to determine if this reveals a bug or a misunderstanding.
Because I like the puzzle though, here is a slightly altered version that will at least give breeze4me a run for his money. Please test it with your actual 725 KB text to see how it does on real-world data.
Misunderstanding? No, it was a direct copy-paste of your original code and when run, it crashed with that line. This was after I removed an extra closing bracket from your original code too.

Yes, you noticed that after.
Anyway, I tried your new code against breeze4me and it seems just as fast, but yours ends up having lots of blank lines at the top, because they're obviously being sorted too from all the carriage-returns in my text. Breeze's doesn't include them for some reason? If you can omit them, that'd be better because I hate "Import" in my code because I have no idea what it's doing and I hate having code that I can't understand (even if it works).