
Fastest way to modify a large string, byte-by-byte?
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Demivec, you're fantastic!
Where can I donate to you for showing me the light?

Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
The procedure of Demivec a bit optimized for speed
Code: Select all
Procedure.s InitialCaps(text.s)
Protected *b.Character = @text, cur, wordStarted = #False
cur = *b\c
While cur
Select cur
Case 97 To 122
If Not wordStarted
*b\c = cur - 32
wordStarted = #True
EndIf
Case 65 To 90
If wordStarted
*b\c = cur + 32
Else
wordStarted = #True
EndIf
Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
Default
wordStarted = #False
EndSelect
*b + SizeOf(Character)
cur = *b\c
Wend
ProcedureReturn text
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: Fastest way to modify a large string, byte-by-byte?
@MachineCode: Your welcome. It isn't necessary to compensate me but I do accept donations.MachineCode wrote:Demivec, you're fantastic!Where can I donate to you for showing me the light?

@wilbert: Thanks for completing the last bit of optimization. You saved me a step. The optimization removes the multiplication involved behind the scenes with using the index to access the characters of the string.
My benchmark marks the improvement at about 33% faster.
@Edit: removed email address.
Last edited by Demivec on Thu Apr 25, 2013 2:55 pm, edited 1 time in total.
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Edited code in this post because this first comment is wrong: You can also optimize further by first converting text$ to LCase() (do it before "While") and then totally removing the "Case 65 To 90" section. Remember, we're not interested in checking for 65 to 90 because everything is lowercase (97 to 122) except for what we manually change to uppercase.
Here's my own personal version that I'm using now (based on both your codes). I've also changed "97 To 122" to "'a' to 'z'" for readability, and compacted some lines and variables (it's just what I do). Also removed "Protected" as I don't need it and I'm sure it adds overhead (no matter how small) to my final exe.
Demivec: I've opened a PayPal account but it needs two business days to verify my bank account... will make a donation next week.
I can do one for wilbert too if you like? It won't be much each (probably just $10) but still, it's saved me a lot of speed hassles for my app.
Here's my own personal version that I'm using now (based on both your codes). I've also changed "97 To 122" to "'a' to 'z'" for readability, and compacted some lines and variables (it's just what I do). Also removed "Protected" as I don't need it and I'm sure it adds overhead (no matter how small) to my final exe.
Code: Select all
Procedure.s InitialCaps(text$)
*b.Character=@text$
cur=*b\c
While cur
Select cur
Case 'a' To 'z' : If Not nextword : *b\c=cur-32 : nextword=1 : EndIf
Case 'A' To 'Z' : If nextword : *b\c=cur+32 : Else : nextword=1 : EndIf
Case 39 ; Ignore apostrophes as they're considered part of a word.
Default : nextword=0
EndSelect
*b+SizeOf(Character)
cur=*b\c
Wend
ProcedureReturn text$
EndProcedure
Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")

Last edited by MachineCode on Sat Mar 02, 2013 3:38 am, edited 1 time in total.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
The LCase() part actually increases the time. With LCase() you are going through the string twice. You end up looking at every character twice, once to convert it to lowercase and once to possibly convert it to uppercase. With the previous optimization you only looked at each character once and either left it the same or changed the case appropriately.MachineCode wrote:You can also optimize further by first converting text$ to LCase() (do it before "While") and then totally removing the "Case 65 To 90" section. Remember, we're not interested in checking for 65 to 90 because everything is lowercase (97 to 122) except for what we manually change to uppercase.
I thought you would be optimizing for time. If you are looking only for simplicity I guess your changes will be best.
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
I am, but I didn't realise LCase() would increase it.Demivec wrote:I thought you would be optimizing for time.


Another question regarding optimization: should we be calculating SizeOf() over and over inside the While/Wend loop? This seems to work just fine, where it's calculated just once outside the loop:
Code: Select all
Procedure.s InitialCaps(text$)
*b.Character=@text$
s=SizeOf(Character)
cur=*b\c
While cur
Select cur
Case 'a' To 'z' : If Not nextword : *b\c=cur-32 : nextword=1 : EndIf
Case 'A' To 'Z' : If nextword : *b\c=cur+32 : Else : nextword=1 : EndIf
Case 39 ; Ignore apostrophes as they're considered part of a word.
Default : nextword=0
EndSelect
*b+s : cur=*b\c
Wend
ProcedureReturn text$
EndProcedure
Last edited by MachineCode on Sat Mar 02, 2013 2:30 pm, edited 1 time in total.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
SizeOf() is a compiler function. It is not a function that is executed each time. There's no need to put the result inside a variable.MachineCode wrote:Another question regarding optimization: should we be calculating SizeOf() over and over inside the While/Wend loop?Code: Select all
Procedure.s DigitsOnly(text$) *b.Character=@text$ cur=*b\c While cur If cur>47 And cur<58 n$+Chr(cur) EndIf *b+SizeOf(Character) cur=*b\c Wend ProcedureReturn n$ EndProcedure
This should be faster for DigitsOnly() but this approach only works if you know the result is always shorter ...
Code: Select all
Procedure.s DigitsOnly(text$)
Protected *in.Character = @text$
Protected *out.Character = *in
While *in\c
If *in\c ! $30 < 10
*out\c = *in\c
*out + SizeOf(Character)
EndIf
*in + SizeOf(Character)
Wend
*out\c = 0
ProcedureReturn text$
EndProcedure
Code: Select all
Procedure.s InitialCaps(text.s)
Protected *in.Character = @text
Protected lc, msk = $ffdf
While *in\c
lc = *in\c | $20
If ((lc - 1) ! $60) < 26
*in\c = lc & msk
msk = $ffff
ElseIf *in\c <> 39
msk = $ffdf
EndIf
*in + SizeOf(Character)
Wend
ProcedureReturn text
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Wilbert, thanks for showing me how to create a new string using the "Character" type. 
One more question regarding this thread: would it be even faster still to perhaps just use regular expressions to modify text$? I'm no good with them, though, so it'd be harder for me to implement. But I'm thinking maybe a regex to change all to initial caps might be faster than a While/Wend loop?

One more question regarding this thread: would it be even faster still to perhaps just use regular expressions to modify text$? I'm no good with them, though, so it'd be harder for me to implement. But I'm thinking maybe a regex to change all to initial caps might be faster than a While/Wend loop?
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
Noooooo. Regex's are very slow. PB code will always beat them.MachineCode wrote:One more question regarding this thread: would it be even faster still to perhaps just use regular expressions to modify text$? I'm no good with them, though, so it'd be harder for me to implement. But I'm thinking maybe a regex to change all to initial caps might be faster than a While/Wend loop?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Thanks for replying. I had no idea they were slow.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
One final plea for help (and once I get this sorted, I should be set -- because I've already learned how to change same-size strings and longer-to-shorter strings). 
This last time, I need to learn how to quickly enlarge an existing string with new data. In the particular example below, I want to put line numbers (and a space) in front of each line in the existing string. I think my code is almost there, but I don't know how to get the new larger result.
Please help me, because once this is done, you'll have taught me to fish, instead of feeding me a fish.
[Edit]
This works, but I fear that using n$ will slow it down:

This last time, I need to learn how to quickly enlarge an existing string with new data. In the particular example below, I want to put line numbers (and a space) in front of each line in the existing string. I think my code is almost there, but I don't know how to get the new larger result.
Please help me, because once this is done, you'll have taught me to fish, instead of feeding me a fish.

Code: Select all
; Output in MessageRequester is to be:
; 8 eight
; 9 nine
; 10 ten
; With the above specific alignment.
text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
width=2 : lines=3
soc=SizeOf(Character)
*i.Character=@text$
*o.Character=AllocateMemory(999)
n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next
While *i\c
*o\c=*i\c : *o+soc
If *i\c=#LF ; End of current line.
n+1 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next
EndIf
*i+soc
Wend
*o\c=0
MessageRequester("Result",text$) ; Shows original string.
MessageRequester("Result",PeekS(*o)) ; Shows null.
This works, but I fear that using n$ will slow it down:
Code: Select all
text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
width=2 : lines=3
soc=SizeOf(Character)
*i.Character=@text$
n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : n$+Mid(s$,a,1) : Next
While *i\c
n$+Chr(*i\c)
If *i\c=#LF ; End of current line.
n+1 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : n$+Mid(s$,a,1) : Next
EndIf
*i+soc
Wend
MessageRequester("Result",n$)
Last edited by MachineCode on Sun Mar 03, 2013 5:42 am, edited 1 time in total.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
Code: Select all
EnableExplicit
Global text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
Global i,c=CountString(text$,#CRLF$)
Global r$
For i=1 To c
If StringField(text$,i,#CRLF$)
r$+RSet(Str(i),2,"0")+" "+StringField(text$,i,#CRLF$)+#CRLF$
EndIf
Next
MessageRequester("Test",r$)
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
-
- Addict
- Posts: 1482
- Joined: Tue Feb 22, 2011 1:16 pm
Re: Fastest way to modify a large string, byte-by-byte?
Psych, I tried that approach first but it's too slow on a 725 KB file (Win32api.txt). It takes over 30 seconds to complete. Hence the need to use the approach above with "Character" types, because all other such "Character" examples in this thread just rip through that 725 KB file in less than 50 ms. 

Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
PureBasic: Born in 1998 and still going strong to this very day!
Re: Fastest way to modify a large string, byte-by-byte?
I did think that it was a performance issue, I'll rethink.
Give me 3 mins, see if the next one is any quicker.
Give me 3 mins, see if the next one is any quicker.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Re: Fastest way to modify a large string, byte-by-byte?
Try this, the last routine was using stringfields, which counts all the text each time, this should be faster and shouldn't be that far off a custom string search routine.
Code: Select all
EnableExplicit
Global text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
Global pos=1,line,f,r$
Repeat
f=FindString(text$,#CRLF$,pos)
If f
line+1
r$+RSet(Str(line),2,"0")+" "+Mid(text$,pos,f-pos+2)
pos=f+2
EndIf
Until Not f
MessageRequester("Test",r$)
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------