Fastest way to modify a large string, byte-by-byte?

Just starting out? Need help? Post your questions and find answers here.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

One of my apps needs to change a string byte-by-byte. As a rough example:

Code: Select all

a$="hal"
For p=1 To Len(a$)
  b$+Chr(Asc(Mid(a$,p,1))+1)
Next
Debug b$
The above code works fine... unless a$ is something large like 500 KB in size. Then it's waaaaaay too slow. What's the best way to make it faster on large strings when modifying byte-by-byte?
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: Fastest way to modify a large string, byte-by-byte?

Post by ts-soft »

Code: Select all

Structure CharArray
  char.c[0]
EndStructure

a$="hal"
*b.CharArray = @a$
c = Len(a$) -1

For i = 0 To c
  *b\char[i] + 1  
Next

Debug a$
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Thanks. But how would I apply that now to more complex byte-by-byte, like this:

Code: Select all

Procedure.s InitialCaps(text$)
  s=Len(text$)
  For p=1 To s
    pre=Asc(LCase(Mid(text$,p-1,1))) : cur=Asc(LCase(Mid(text$,p,1)))
    If pre<>39 And cur>96 And cur<123 And (p=1 Or pre<97 Or pre>122) : cur-32 : EndIf
    n$+Chr(cur)
  Next
  ProcedureReturn n$
EndProcedure
Sorry to ask for you to do the work... I tried but kept getting crashes or failed results.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
luis
Addict
Addict
Posts: 3893
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Fastest way to modify a large string, byte-by-byte?

Post by luis »

About the original question ... another way:

if it's ok to modify the source string

Code: Select all

a$="hal"
*c.Character = @a$

While *c\c
 *c\c + 1
 *c + SizeOf(Character)
Wend

Debug a$
else

Code: Select all

a$="hal"
b$ = Space(Len(a$))
*s.Character = @a$
*d.Character = @b$

While *s\c
 *d\c = *s\c + 1
 *d + SizeOf(Character)
 *s + SizeOf(Character)
Wend

Debug b$

In case anyone is interested, this is generated code for the first loop:

Code: Select all

; While *c\c
_While1:
  MOV    ebp,dword [p_c]
  CMP    byte [ebp],0
  JE    _Wend1
; *c\c + 1
  MOV    ebp,dword [p_c]
  MOVZX  ebx,byte [ebp]
  INC    ebx
  PUSH   ebx
  MOV    ebp,dword [p_c]
  POP    eax
  MOV    byte [ebp],al
; *c + SizeOf(Character)
  INC    dword [p_c]
; Wend
  JMP   _While1
_Wend1:
and this is the the code generated using the structured pointer as shown in the other posts in this same thread:

Code: Select all

; For i = 0 To c
  MOV    dword [v_i],0
_For1:
  MOV    eax,dword [v_c]
  CMP    eax,dword [v_i]
  JL    _Next2
; *b\char[i] + 1 
  MOV    ebp,dword [p_b]
  PUSH   ebp
  MOV    eax,dword [v_i]
  POP    ebp
  ADD    ebp,eax
  MOVZX  ebx,byte [ebp]
  INC    ebx
  PUSH   ebx
  MOV    ebp,dword [p_b]
  PUSH   ebp
  MOV    eax,dword [v_i]
  POP    ebp
  ADD    ebp,eax
  POP    eax
  MOV    byte [ebp],al
; Next
_NextContinue2:
  INC    dword [v_i]
  JNO   _For1
_Next2:
Take your pick.
Last edited by luis on Thu Feb 28, 2013 5:48 pm, edited 2 times in total.
"Have you tried turning it off and on again ?"
A little PureBasic review
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Okay, I've done this but it's still really slow for large strings:

Code: Select all

Procedure.s InitialCaps(text$)

  Structure CharArray
    char.c[0]
  EndStructure

  text$=LCase(text$)
  
  *b.CharArray=@text$
  s=Len(text$)

  For p=0 To s
    pre=*b\char[p-1] : cur=*b\char[p]
    If pre<>39 And cur>96 And cur<123 And (p=0 Or pre<97 Or pre>122) : cur-32 : EndIf
    n$+Chr(cur)
  Next

  ProcedureReturn n$

EndProcedure

Debug InitialCaps("ALL words SHOULD have initial capital letters ONLY!")
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: Fastest way to modify a large string, byte-by-byte?

Post by ts-soft »

Code: Select all

EnableExplicit

Structure CharArray
  char.c[0]
EndStructure

Procedure.s InitialCaps(text$)
  Protected i, cur, pre, c = Len(text$) -1
  Protected b$ = LCase(text$)
  Protected *b.CharArray = @b$
  Protected n$
  For i = 0 To c
    pre = *b\char[i -1]
    cur = *b\char[i]
    If pre <> 39 And cur > 96 And cur < 123 And (i = 0 Or pre < 97 Or pre > 122) : cur - 32 : EndIf
    n$ + Chr(cur)
  Next
  ProcedureReturn n$
EndProcedure

Debug InitialCaps("hal")
//edit: to late :wink:
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

ts-soft wrote://edit: to late :wink:
Yeah, but both mine and yours are still too slow on 500 KB text. I'm looking for something as fast as the native string commands, which do it in like 10 ms or so. There must be a way. Maybe memory blocks instead. I'll keep trying.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Fastest way to modify a large string, byte-by-byte?

Post by Demivec »

This should be faster:

Code: Select all

Procedure.s InitialCaps(text$)
  
  Structure CharArray
    char.c[0]
  EndStructure

  Protected s, *b.CharArray, cur, pre, p, previousWasLetter = #False

  text$ = LCase(text$)
  
  *b.CharArray = @text$
  s = Len(text$)
  If s > 0
 
    cur = *b\char[0]
    If cur > 96 And cur < 123: cur - 32: previousWasLetter = #True: EndIf 
    *b\char[0] = cur
    
    s - 1
    For p = 1 To s
      pre = cur: cur = *b\char[p]
      If pre <> 39 And  cur > 96 And cur < 123
        If Not previousWasLetter
          cur - 32
          previousWasLetter = #True
        EndIf
      Else
        previousWasLetter = #False
      EndIf
      
      *b\char[p] = cur
    Next
      
  EndIf 

  ProcedureReturn text$
EndProcedure

Debug InitialCaps("ALL words SHOULD have initial capital letters ONLY!")
Testing with a longer string and the debugger off I get a benchmark of 47ms for a string of 510000 characters.
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Fastest way to modify a large string, byte-by-byte?

Post by skywalk »

If you need speed, never do this :!:

Code: Select all

n$ + Chr(cur)
String concatenation is very slow.
Demivec's way with structured pointer is best.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

skywalk wrote:If you need speed, never do this :!:

Code: Select all

n$ + Chr(cur)
I know, but that's not the bottleneck, because I removed that line for a test and the procedure was just as slow without it. :) Therefore, I knew it had to be the comparisons causing the slowdown.

@Demivec: I could kiss you! :lol: Not really. But yours is fast, and does my 500 KB of text in about 20 ms! Thank you!

HOWEVER, it fails with apostrophes... give it this line:

Code: Select all

Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
I know the sentence is not grammatically correct, but it's just to show the problem.
Last edited by MachineCode on Fri Mar 01, 2013 2:25 am, edited 6 times in total.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Fastest way to modify a large string, byte-by-byte?

Post by skywalk »

MachineCode wrote:I know, but that's not the bottleneck, because I removed that line for a test and the procedure was just as slow without it. Therefore, I knew it had to be the comparisons causing the slowdown.
Huh? Integer comparisons are fast!
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Okay, but removing n$+Chr(cur) didn't speed anything up, is what I'm saying.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Fastest way to modify a large string, byte-by-byte?

Post by Demivec »

MachineCode wrote:@Demivec: I could kiss you! :lol: Not really. But yours is fast, and does my 500 KB of text in about 20 ms! Thank you!

HOWEVER, it fails with apostrophes... give it this line:

Code: Select all

Debug InitialCaps("ALL you're words 'SHOULD' have initial cap's and ''quotes'' working.")
I know the sentence is not grammatically correct, but it's just to show the problem.
The changes you were needing and the inputs it would be required to work with weren't really specified. To deal with those you need to nail down the input requirements and how to handle them. What about inputs like this: "that's", hypen-ated, under_scored, or a.c.m.e.? So far, I think the code only handles the last one correctly by producing "A.C.M.E." . It will depend on what is desired in each of those conditions as well as nonsense conditions like '-silly'.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

I've edited my example (a few times) because it keeps coming up with errors. I thank you for getting me started, though. :)

Basically, the rules are: every word must start with a capital letter, where "word" = any string from A to Z only except if an apostrophe is part of it. So, "A.C.M.E" is considered 4 words, and each would be a capital. 'hi' should become "Hi", and "you're" should become "You're".

This test string:

Code: Select all

'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.
Should therefore become this:

Code: Select all

'Hi There'! Ev'ry Word 'Should' Have Initial Cap's Only And ''Quotes'' Working.
I'm still playing with your code and hope to nut it out soon. :)
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Fastest way to modify a large string, byte-by-byte?

Post by Demivec »

Here are two versions that do what you need.


The first one uses a For/Next loop:

Code: Select all

Procedure.s InitialCaps(text$)
  
  Structure CharArray
    char.c[0]
  EndStructure

  Protected s, *b.CharArray, cur, p, wordStarted = #False

  text$ = LCase(text$)
  
  *b.CharArray = @text$
  s = Len(text$)
  If s > 0
 
    s - 1
    For p = 0 To s
      cur = *b\char[p]
      
      Select cur
        Case 97 To 122
          If Not wordStarted
            cur - 32
            wordStarted = #True
          EndIf
        Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
        Default
          wordStarted = #False
      EndSelect
            
      *b\char[p] = cur
    Next
      
  EndIf 

  ProcedureReturn text$
EndProcedure
The last one switches to a While/Wend loop and removes the string functions Len() and LCase(). Ithandles both upper and lower case characters in the loop:

Code: Select all

Procedure.s InitialCaps(text$)
  
  Structure CharArray
    char.c[0]
  EndStructure

  Protected *b.CharArray, cur, wordStarted = #False

  *b.CharArray = @text$
  
  p = 0
  While *b\char[p] <> 0
    cur = *b\char[p]
    
    Select cur
      Case 97 To 122
        If Not wordStarted
          cur - 32
          wordStarted = #True
        EndIf
      Case 65 To 90
        If wordStarted
          cur + 32
        Else
          wordStarted = #True
        EndIf 
      Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
      Default
        wordStarted = #False
    EndSelect
          
    *b\char[p] = cur
    p + 1
  Wend 

  ProcedureReturn text$
EndProcedure
I roughly clock the last one as being about two times faster than the first one.

This was my test code:

Code: Select all

CompilerIf #PB_Compiler_Debugger 
  Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
  Debug InitialCaps("What about inputs like this: that's, hypen-ated, under_scored, Or a.c.m.e.?")
CompilerElse
  Define t, a$, i
  For i = 1 To 6500
    a$ + "'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working."
  Next
  t= ElapsedMilliseconds()
  b$ = InitialCaps(a$)
  t = ElapsedMilliseconds() - t
  MessageRequester("Result", Str(t) + " " + Str(Len(b$)))
CompilerEndIf 
Post Reply