Page 1 of 4
Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 2:07 pm
by MachineCode
One of my apps needs to change a string byte-by-byte. As a rough example:
Code: Select all
a$="hal"
For p=1 To Len(a$)
b$+Chr(Asc(Mid(a$,p,1))+1)
Next
Debug b$
The above code works fine... unless a$ is something large like 500 KB in size. Then it's waaaaaay too slow. What's the best way to make it faster on large strings when modifying byte-by-byte?
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 2:18 pm
by ts-soft
Code: Select all
Structure CharArray
char.c[0]
EndStructure
a$="hal"
*b.CharArray = @a$
c = Len(a$) -1
For i = 0 To c
*b\char[i] + 1
Next
Debug a$
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 2:55 pm
by MachineCode
Thanks. But how would I apply that now to more complex byte-by-byte, like this:
Code: Select all
Procedure.s InitialCaps(text$)
s=Len(text$)
For p=1 To s
pre=Asc(LCase(Mid(text$,p-1,1))) : cur=Asc(LCase(Mid(text$,p,1)))
If pre<>39 And cur>96 And cur<123 And (p=1 Or pre<97 Or pre>122) : cur-32 : EndIf
n$+Chr(cur)
Next
ProcedureReturn n$
EndProcedure
Sorry to ask for you to do the work... I tried but kept getting crashes or failed results.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 3:02 pm
by luis
About the original question ... another way:
if it's ok to modify the source string
Code: Select all
a$="hal"
*c.Character = @a$
While *c\c
*c\c + 1
*c + SizeOf(Character)
Wend
Debug a$
else
Code: Select all
a$="hal"
b$ = Space(Len(a$))
*s.Character = @a$
*d.Character = @b$
While *s\c
*d\c = *s\c + 1
*d + SizeOf(Character)
*s + SizeOf(Character)
Wend
Debug b$
In case
anyone is interested, this is generated code for the first loop:
Code: Select all
; While *c\c
_While1:
MOV ebp,dword [p_c]
CMP byte [ebp],0
JE _Wend1
; *c\c + 1
MOV ebp,dword [p_c]
MOVZX ebx,byte [ebp]
INC ebx
PUSH ebx
MOV ebp,dword [p_c]
POP eax
MOV byte [ebp],al
; *c + SizeOf(Character)
INC dword [p_c]
; Wend
JMP _While1
_Wend1:
and this is the the code generated using the structured pointer as shown in the other posts in this same thread:
Code: Select all
; For i = 0 To c
MOV dword [v_i],0
_For1:
MOV eax,dword [v_c]
CMP eax,dword [v_i]
JL _Next2
; *b\char[i] + 1
MOV ebp,dword [p_b]
PUSH ebp
MOV eax,dword [v_i]
POP ebp
ADD ebp,eax
MOVZX ebx,byte [ebp]
INC ebx
PUSH ebx
MOV ebp,dword [p_b]
PUSH ebp
MOV eax,dword [v_i]
POP ebp
ADD ebp,eax
POP eax
MOV byte [ebp],al
; Next
_NextContinue2:
INC dword [v_i]
JNO _For1
_Next2:
Take your pick.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 3:19 pm
by MachineCode
Okay, I've done this but it's still really slow for large strings:
Code: Select all
Procedure.s InitialCaps(text$)
Structure CharArray
char.c[0]
EndStructure
text$=LCase(text$)
*b.CharArray=@text$
s=Len(text$)
For p=0 To s
pre=*b\char[p-1] : cur=*b\char[p]
If pre<>39 And cur>96 And cur<123 And (p=0 Or pre<97 Or pre>122) : cur-32 : EndIf
n$+Chr(cur)
Next
ProcedureReturn n$
EndProcedure
Debug InitialCaps("ALL words SHOULD have initial capital letters ONLY!")
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 3:20 pm
by ts-soft
Code: Select all
EnableExplicit
Structure CharArray
char.c[0]
EndStructure
Procedure.s InitialCaps(text$)
Protected i, cur, pre, c = Len(text$) -1
Protected b$ = LCase(text$)
Protected *b.CharArray = @b$
Protected n$
For i = 0 To c
pre = *b\char[i -1]
cur = *b\char[i]
If pre <> 39 And cur > 96 And cur < 123 And (i = 0 Or pre < 97 Or pre > 122) : cur - 32 : EndIf
n$ + Chr(cur)
Next
ProcedureReturn n$
EndProcedure
Debug InitialCaps("hal")
//edit: to late

Re: Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 3:47 pm
by MachineCode
ts-soft wrote://edit: to late

Yeah, but both mine and yours are still too slow on 500 KB text. I'm looking for something as fast as the native string commands, which do it in like 10 ms or so. There must be a way. Maybe memory blocks instead. I'll keep trying.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 5:01 pm
by Demivec
This should be faster:
Code: Select all
Procedure.s InitialCaps(text$)
Structure CharArray
char.c[0]
EndStructure
Protected s, *b.CharArray, cur, pre, p, previousWasLetter = #False
text$ = LCase(text$)
*b.CharArray = @text$
s = Len(text$)
If s > 0
cur = *b\char[0]
If cur > 96 And cur < 123: cur - 32: previousWasLetter = #True: EndIf
*b\char[0] = cur
s - 1
For p = 1 To s
pre = cur: cur = *b\char[p]
If pre <> 39 And cur > 96 And cur < 123
If Not previousWasLetter
cur - 32
previousWasLetter = #True
EndIf
Else
previousWasLetter = #False
EndIf
*b\char[p] = cur
Next
EndIf
ProcedureReturn text$
EndProcedure
Debug InitialCaps("ALL words SHOULD have initial capital letters ONLY!")
Testing with a longer string and the debugger off I get a benchmark of 47ms for a string of 510000 characters.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Thu Feb 28, 2013 5:31 pm
by skywalk
If you need speed, never do this
String concatenation is very slow.
Demivec's way with structured pointer is best.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Fri Mar 01, 2013 1:03 am
by MachineCode
skywalk wrote:If you need speed, never do this
I know, but that's not the bottleneck, because I removed that line for a test and the procedure was just as slow without it.

Therefore, I knew it had to be the comparisons causing the slowdown.
@Demivec: I could kiss you!

Not really. But yours is fast, and does my 500 KB of text in about 20 ms! Thank you!
HOWEVER, it fails with apostrophes... give it this line:
Code: Select all
Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
I know the sentence is not grammatically correct, but it's just to show the problem.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Fri Mar 01, 2013 1:23 am
by skywalk
MachineCode wrote:I know, but that's not the bottleneck, because I removed that line for a test and the procedure was just as slow without it. Therefore, I knew it had to be the comparisons causing the slowdown.
Huh? Integer comparisons are fast!
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Fri Mar 01, 2013 1:26 am
by MachineCode
Okay, but removing n$+Chr(cur) didn't speed anything up, is what I'm saying.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Fri Mar 01, 2013 2:17 am
by Demivec
MachineCode wrote:@Demivec: I could kiss you!

Not really. But yours is fast, and does my 500 KB of text in about 20 ms! Thank you!
HOWEVER, it fails with apostrophes... give it this line:
Code: Select all
Debug InitialCaps("ALL you're words 'SHOULD' have initial cap's and ''quotes'' working.")
I know the sentence is not grammatically correct, but it's just to show the problem.
The changes you were needing and the inputs it would be required to work with weren't really specified. To deal with those you need to nail down the input requirements and how to handle them. What about inputs like this: "that's", hypen-ated, under_scored, or a.c.m.e.? So far, I think the code only handles the last one correctly by producing "A.C.M.E." . It will depend on what is desired in each of those conditions as well as nonsense conditions like '-silly'.
Re: Fastest way to modify a large string, byte-by-byte?
Posted: Fri Mar 01, 2013 2:21 am
by MachineCode
I've edited my example (a few times) because it keeps coming up with errors. I thank you for getting me started, though.
Basically, the rules are: every word must start with a capital letter, where "word" = any string from A to Z only except if an apostrophe is part of it. So, "A.C.M.E" is considered 4 words, and each would be a capital. 'hi' should become "Hi", and "you're" should become "You're".
This test string:
Code: Select all
'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.
Should therefore become this:
Code: Select all
'Hi There'! Ev'ry Word 'Should' Have Initial Cap's Only And ''Quotes'' Working.
I'm still playing with your code and hope to nut it out soon.

Re: Fastest way to modify a large string, byte-by-byte?
Posted: Fri Mar 01, 2013 3:47 am
by Demivec
Here are two versions that do what you need.
The first one uses a For/Next loop:
Code: Select all
Procedure.s InitialCaps(text$)
Structure CharArray
char.c[0]
EndStructure
Protected s, *b.CharArray, cur, p, wordStarted = #False
text$ = LCase(text$)
*b.CharArray = @text$
s = Len(text$)
If s > 0
s - 1
For p = 0 To s
cur = *b\char[p]
Select cur
Case 97 To 122
If Not wordStarted
cur - 32
wordStarted = #True
EndIf
Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
Default
wordStarted = #False
EndSelect
*b\char[p] = cur
Next
EndIf
ProcedureReturn text$
EndProcedure
The last one switches to a While/Wend loop and removes the string functions Len() and LCase(). Ithandles both upper and lower case characters in the loop:
Code: Select all
Procedure.s InitialCaps(text$)
Structure CharArray
char.c[0]
EndStructure
Protected *b.CharArray, cur, wordStarted = #False
*b.CharArray = @text$
p = 0
While *b\char[p] <> 0
cur = *b\char[p]
Select cur
Case 97 To 122
If Not wordStarted
cur - 32
wordStarted = #True
EndIf
Case 65 To 90
If wordStarted
cur + 32
Else
wordStarted = #True
EndIf
Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
Default
wordStarted = #False
EndSelect
*b\char[p] = cur
p + 1
Wend
ProcedureReturn text$
EndProcedure
I roughly clock the last one as being about two times faster than the first one.
This was my test code:
Code: Select all
CompilerIf #PB_Compiler_Debugger
Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
Debug InitialCaps("What about inputs like this: that's, hypen-ated, under_scored, Or a.c.m.e.?")
CompilerElse
Define t, a$, i
For i = 1 To 6500
a$ + "'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working."
Next
t= ElapsedMilliseconds()
b$ = InitialCaps(a$)
t = ElapsedMilliseconds() - t
MessageRequester("Result", Str(t) + " " + Str(Len(b$)))
CompilerEndIf