Re: Fastest way to modify a large string, byte-by-byte?
Posted: Fri Mar 01, 2013 5:34 am
Demivec, you're fantastic!
Where can I donate to you for showing me the light?

http://www.purebasic.com
https://www.purebasic.fr/english/
Code: Select all
Procedure.s InitialCaps(text.s)
Protected *b.Character = @text, cur, wordStarted = #False
cur = *b\c
While cur
Select cur
Case 97 To 122
If Not wordStarted
*b\c = cur - 32
wordStarted = #True
EndIf
Case 65 To 90
If wordStarted
*b\c = cur + 32
Else
wordStarted = #True
EndIf
Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
Default
wordStarted = #False
EndSelect
*b + SizeOf(Character)
cur = *b\c
Wend
ProcedureReturn text
EndProcedure
@MachineCode: Your welcome. It isn't necessary to compensate me but I do accept donations.MachineCode wrote:Demivec, you're fantastic!Where can I donate to you for showing me the light?
Code: Select all
Procedure.s InitialCaps(text$)
*b.Character=@text$
cur=*b\c
While cur
Select cur
Case 'a' To 'z' : If Not nextword : *b\c=cur-32 : nextword=1 : EndIf
Case 'A' To 'Z' : If nextword : *b\c=cur+32 : Else : nextword=1 : EndIf
Case 39 ; Ignore apostrophes as they're considered part of a word.
Default : nextword=0
EndSelect
*b+SizeOf(Character)
cur=*b\c
Wend
ProcedureReturn text$
EndProcedure
Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
The LCase() part actually increases the time. With LCase() you are going through the string twice. You end up looking at every character twice, once to convert it to lowercase and once to possibly convert it to uppercase. With the previous optimization you only looked at each character once and either left it the same or changed the case appropriately.MachineCode wrote:You can also optimize further by first converting text$ to LCase() (do it before "While") and then totally removing the "Case 65 To 90" section. Remember, we're not interested in checking for 65 to 90 because everything is lowercase (97 to 122) except for what we manually change to uppercase.
I am, but I didn't realise LCase() would increase it.Demivec wrote:I thought you would be optimizing for time.
Code: Select all
Procedure.s InitialCaps(text$)
*b.Character=@text$
s=SizeOf(Character)
cur=*b\c
While cur
Select cur
Case 'a' To 'z' : If Not nextword : *b\c=cur-32 : nextword=1 : EndIf
Case 'A' To 'Z' : If nextword : *b\c=cur+32 : Else : nextword=1 : EndIf
Case 39 ; Ignore apostrophes as they're considered part of a word.
Default : nextword=0
EndSelect
*b+s : cur=*b\c
Wend
ProcedureReturn text$
EndProcedure
SizeOf() is a compiler function. It is not a function that is executed each time. There's no need to put the result inside a variable.MachineCode wrote:Another question regarding optimization: should we be calculating SizeOf() over and over inside the While/Wend loop?Code: Select all
Procedure.s DigitsOnly(text$) *b.Character=@text$ cur=*b\c While cur If cur>47 And cur<58 n$+Chr(cur) EndIf *b+SizeOf(Character) cur=*b\c Wend ProcedureReturn n$ EndProcedure
Code: Select all
Procedure.s DigitsOnly(text$)
Protected *in.Character = @text$
Protected *out.Character = *in
While *in\c
If *in\c ! $30 < 10
*out\c = *in\c
*out + SizeOf(Character)
EndIf
*in + SizeOf(Character)
Wend
*out\c = 0
ProcedureReturn text$
EndProcedure
Code: Select all
Procedure.s InitialCaps(text.s)
Protected *in.Character = @text
Protected lc, msk = $ffdf
While *in\c
lc = *in\c | $20
If ((lc - 1) ! $60) < 26
*in\c = lc & msk
msk = $ffff
ElseIf *in\c <> 39
msk = $ffdf
EndIf
*in + SizeOf(Character)
Wend
ProcedureReturn text
EndProcedure
Noooooo. Regex's are very slow. PB code will always beat them.MachineCode wrote:One more question regarding this thread: would it be even faster still to perhaps just use regular expressions to modify text$? I'm no good with them, though, so it'd be harder for me to implement. But I'm thinking maybe a regex to change all to initial caps might be faster than a While/Wend loop?
Code: Select all
; Output in MessageRequester is to be:
; 8 eight
; 9 nine
; 10 ten
; With the above specific alignment.
text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
width=2 : lines=3
soc=SizeOf(Character)
*i.Character=@text$
*o.Character=AllocateMemory(999)
n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next
While *i\c
*o\c=*i\c : *o+soc
If *i\c=#LF ; End of current line.
n+1 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next
EndIf
*i+soc
Wend
*o\c=0
MessageRequester("Result",text$) ; Shows original string.
MessageRequester("Result",PeekS(*o)) ; Shows null.
Code: Select all
text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
width=2 : lines=3
soc=SizeOf(Character)
*i.Character=@text$
n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : n$+Mid(s$,a,1) : Next
While *i\c
n$+Chr(*i\c)
If *i\c=#LF ; End of current line.
n+1 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : n$+Mid(s$,a,1) : Next
EndIf
*i+soc
Wend
MessageRequester("Result",n$)
Code: Select all
EnableExplicit
Global text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
Global i,c=CountString(text$,#CRLF$)
Global r$
For i=1 To c
If StringField(text$,i,#CRLF$)
r$+RSet(Str(i),2,"0")+" "+StringField(text$,i,#CRLF$)+#CRLF$
EndIf
Next
MessageRequester("Test",r$)
Code: Select all
EnableExplicit
Global text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
Global pos=1,line,f,r$
Repeat
f=FindString(text$,#CRLF$,pos)
If f
line+1
r$+RSet(Str(line),2,"0")+" "+Mid(text$,pos,f-pos+2)
pos=f+2
EndIf
Until Not f
MessageRequester("Test",r$)