Page 1 of 4

Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 2:07 pm
by MachineCode
One of my apps needs to change a string byte-by-byte. As a rough example:

Code: Select all

a$="hal"
For p=1 To Len(a$)
  b$+Chr(Asc(Mid(a$,p,1))+1)
Next
Debug b$
The above code works fine... unless a$ is something large like 500 KB in size. Then it's waaaaaay too slow. What's the best way to make it faster on large strings when modifying byte-by-byte?

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 2:18 pm
by ts-soft

Code: Select all

Structure CharArray
  char.c[0]
EndStructure

a$="hal"
*b.CharArray = @a$
c = Len(a$) -1

For i = 0 To c
  *b\char[i] + 1  
Next

Debug a$

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 2:55 pm
by MachineCode
Thanks. But how would I apply that now to more complex byte-by-byte, like this:

Code: Select all

Procedure.s InitialCaps(text$)
  s=Len(text$)
  For p=1 To s
    pre=Asc(LCase(Mid(text$,p-1,1))) : cur=Asc(LCase(Mid(text$,p,1)))
    If pre<>39 And cur>96 And cur<123 And (p=1 Or pre<97 Or pre>122) : cur-32 : EndIf
    n$+Chr(cur)
  Next
  ProcedureReturn n$
EndProcedure
Sorry to ask for you to do the work... I tried but kept getting crashes or failed results.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 3:02 pm
by luis
About the original question ... another way:

if it's ok to modify the source string

Code: Select all

a$="hal"
*c.Character = @a$

While *c\c
 *c\c + 1
 *c + SizeOf(Character)
Wend

Debug a$
else

Code: Select all

a$="hal"
b$ = Space(Len(a$))
*s.Character = @a$
*d.Character = @b$

While *s\c
 *d\c = *s\c + 1
 *d + SizeOf(Character)
 *s + SizeOf(Character)
Wend

Debug b$

In case anyone is interested, this is generated code for the first loop:

Code: Select all

; While *c\c
_While1:
  MOV    ebp,dword [p_c]
  CMP    byte [ebp],0
  JE    _Wend1
; *c\c + 1
  MOV    ebp,dword [p_c]
  MOVZX  ebx,byte [ebp]
  INC    ebx
  PUSH   ebx
  MOV    ebp,dword [p_c]
  POP    eax
  MOV    byte [ebp],al
; *c + SizeOf(Character)
  INC    dword [p_c]
; Wend
  JMP   _While1
_Wend1:
and this is the the code generated using the structured pointer as shown in the other posts in this same thread:

Code: Select all

; For i = 0 To c
  MOV    dword [v_i],0
_For1:
  MOV    eax,dword [v_c]
  CMP    eax,dword [v_i]
  JL    _Next2
; *b\char[i] + 1 
  MOV    ebp,dword [p_b]
  PUSH   ebp
  MOV    eax,dword [v_i]
  POP    ebp
  ADD    ebp,eax
  MOVZX  ebx,byte [ebp]
  INC    ebx
  PUSH   ebx
  MOV    ebp,dword [p_b]
  PUSH   ebp
  MOV    eax,dword [v_i]
  POP    ebp
  ADD    ebp,eax
  POP    eax
  MOV    byte [ebp],al
; Next
_NextContinue2:
  INC    dword [v_i]
  JNO   _For1
_Next2:
Take your pick.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 3:19 pm
by MachineCode
Okay, I've done this but it's still really slow for large strings:

Code: Select all

Procedure.s InitialCaps(text$)

  Structure CharArray
    char.c[0]
  EndStructure

  text$=LCase(text$)
  
  *b.CharArray=@text$
  s=Len(text$)

  For p=0 To s
    pre=*b\char[p-1] : cur=*b\char[p]
    If pre<>39 And cur>96 And cur<123 And (p=0 Or pre<97 Or pre>122) : cur-32 : EndIf
    n$+Chr(cur)
  Next

  ProcedureReturn n$

EndProcedure

Debug InitialCaps("ALL words SHOULD have initial capital letters ONLY!")

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 3:20 pm
by ts-soft

Code: Select all

EnableExplicit

Structure CharArray
  char.c[0]
EndStructure

Procedure.s InitialCaps(text$)
  Protected i, cur, pre, c = Len(text$) -1
  Protected b$ = LCase(text$)
  Protected *b.CharArray = @b$
  Protected n$
  For i = 0 To c
    pre = *b\char[i -1]
    cur = *b\char[i]
    If pre <> 39 And cur > 96 And cur < 123 And (i = 0 Or pre < 97 Or pre > 122) : cur - 32 : EndIf
    n$ + Chr(cur)
  Next
  ProcedureReturn n$
EndProcedure

Debug InitialCaps("hal")
//edit: to late :wink:

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 3:47 pm
by MachineCode
ts-soft wrote://edit: to late :wink:
Yeah, but both mine and yours are still too slow on 500 KB text. I'm looking for something as fast as the native string commands, which do it in like 10 ms or so. There must be a way. Maybe memory blocks instead. I'll keep trying.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 5:01 pm
by Demivec
This should be faster:

Code: Select all

Procedure.s InitialCaps(text$)
  
  Structure CharArray
    char.c[0]
  EndStructure

  Protected s, *b.CharArray, cur, pre, p, previousWasLetter = #False

  text$ = LCase(text$)
  
  *b.CharArray = @text$
  s = Len(text$)
  If s > 0
 
    cur = *b\char[0]
    If cur > 96 And cur < 123: cur - 32: previousWasLetter = #True: EndIf 
    *b\char[0] = cur
    
    s - 1
    For p = 1 To s
      pre = cur: cur = *b\char[p]
      If pre <> 39 And  cur > 96 And cur < 123
        If Not previousWasLetter
          cur - 32
          previousWasLetter = #True
        EndIf
      Else
        previousWasLetter = #False
      EndIf
      
      *b\char[p] = cur
    Next
      
  EndIf 

  ProcedureReturn text$
EndProcedure

Debug InitialCaps("ALL words SHOULD have initial capital letters ONLY!")
Testing with a longer string and the debugger off I get a benchmark of 47ms for a string of 510000 characters.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Thu Feb 28, 2013 5:31 pm
by skywalk
If you need speed, never do this :!:

Code: Select all

n$ + Chr(cur)
String concatenation is very slow.
Demivec's way with structured pointer is best.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 1:03 am
by MachineCode
skywalk wrote:If you need speed, never do this :!:

Code: Select all

n$ + Chr(cur)
I know, but that's not the bottleneck, because I removed that line for a test and the procedure was just as slow without it. :) Therefore, I knew it had to be the comparisons causing the slowdown.

@Demivec: I could kiss you! :lol: Not really. But yours is fast, and does my 500 KB of text in about 20 ms! Thank you!

HOWEVER, it fails with apostrophes... give it this line:

Code: Select all

Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
I know the sentence is not grammatically correct, but it's just to show the problem.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 1:23 am
by skywalk
MachineCode wrote:I know, but that's not the bottleneck, because I removed that line for a test and the procedure was just as slow without it. Therefore, I knew it had to be the comparisons causing the slowdown.
Huh? Integer comparisons are fast!

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 1:26 am
by MachineCode
Okay, but removing n$+Chr(cur) didn't speed anything up, is what I'm saying.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 2:17 am
by Demivec
MachineCode wrote:@Demivec: I could kiss you! :lol: Not really. But yours is fast, and does my 500 KB of text in about 20 ms! Thank you!

HOWEVER, it fails with apostrophes... give it this line:

Code: Select all

Debug InitialCaps("ALL you're words 'SHOULD' have initial cap's and ''quotes'' working.")
I know the sentence is not grammatically correct, but it's just to show the problem.
The changes you were needing and the inputs it would be required to work with weren't really specified. To deal with those you need to nail down the input requirements and how to handle them. What about inputs like this: "that's", hypen-ated, under_scored, or a.c.m.e.? So far, I think the code only handles the last one correctly by producing "A.C.M.E." . It will depend on what is desired in each of those conditions as well as nonsense conditions like '-silly'.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 2:21 am
by MachineCode
I've edited my example (a few times) because it keeps coming up with errors. I thank you for getting me started, though. :)

Basically, the rules are: every word must start with a capital letter, where "word" = any string from A to Z only except if an apostrophe is part of it. So, "A.C.M.E" is considered 4 words, and each would be a capital. 'hi' should become "Hi", and "you're" should become "You're".

This test string:

Code: Select all

'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.
Should therefore become this:

Code: Select all

'Hi There'! Ev'ry Word 'Should' Have Initial Cap's Only And ''Quotes'' Working.
I'm still playing with your code and hope to nut it out soon. :)

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 3:47 am
by Demivec
Here are two versions that do what you need.


The first one uses a For/Next loop:

Code: Select all

Procedure.s InitialCaps(text$)
  
  Structure CharArray
    char.c[0]
  EndStructure

  Protected s, *b.CharArray, cur, p, wordStarted = #False

  text$ = LCase(text$)
  
  *b.CharArray = @text$
  s = Len(text$)
  If s > 0
 
    s - 1
    For p = 0 To s
      cur = *b\char[p]
      
      Select cur
        Case 97 To 122
          If Not wordStarted
            cur - 32
            wordStarted = #True
          EndIf
        Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
        Default
          wordStarted = #False
      EndSelect
            
      *b\char[p] = cur
    Next
      
  EndIf 

  ProcedureReturn text$
EndProcedure
The last one switches to a While/Wend loop and removes the string functions Len() and LCase(). Ithandles both upper and lower case characters in the loop:

Code: Select all

Procedure.s InitialCaps(text$)
  
  Structure CharArray
    char.c[0]
  EndStructure

  Protected *b.CharArray, cur, wordStarted = #False

  *b.CharArray = @text$
  
  p = 0
  While *b\char[p] <> 0
    cur = *b\char[p]
    
    Select cur
      Case 97 To 122
        If Not wordStarted
          cur - 32
          wordStarted = #True
        EndIf
      Case 65 To 90
        If wordStarted
          cur + 32
        Else
          wordStarted = #True
        EndIf 
      Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
      Default
        wordStarted = #False
    EndSelect
          
    *b\char[p] = cur
    p + 1
  Wend 

  ProcedureReturn text$
EndProcedure
I roughly clock the last one as being about two times faster than the first one.

This was my test code:

Code: Select all

CompilerIf #PB_Compiler_Debugger 
  Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
  Debug InitialCaps("What about inputs like this: that's, hypen-ated, under_scored, Or a.c.m.e.?")
CompilerElse
  Define t, a$, i
  For i = 1 To 6500
    a$ + "'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working."
  Next
  t= ElapsedMilliseconds()
  b$ = InitialCaps(a$)
  t = ElapsedMilliseconds() - t
  MessageRequester("Result", Str(t) + " " + Str(Len(b$)))
CompilerEndIf