Fastest way to modify a large string, byte-by-byte?

Just starting out? Need help? Post your questions and find answers here.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Demivec, you're fantastic! 8) Where can I donate to you for showing me the light?
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Fastest way to modify a large string, byte-by-byte?

Post by wilbert »

The procedure of Demivec a bit optimized for speed

Code: Select all

Procedure.s InitialCaps(text.s)
  
  Protected *b.Character = @text, cur, wordStarted = #False
  
  cur = *b\c
  While cur
  
    Select cur
      Case 97 To 122
        If Not wordStarted
          *b\c = cur - 32
          wordStarted = #True
        EndIf
      Case 65 To 90
        If wordStarted
          *b\c = cur + 32
        Else
          wordStarted = #True
        EndIf 
      Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
      Default
        wordStarted = #False
    EndSelect
          
    *b + SizeOf(Character)
    cur = *b\c
  Wend  

  ProcedureReturn text
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Fastest way to modify a large string, byte-by-byte?

Post by Demivec »

MachineCode wrote:Demivec, you're fantastic! 8) Where can I donate to you for showing me the light?
@MachineCode: Your welcome. It isn't necessary to compensate me but I do accept donations. :wink: You can PayPal me.


@wilbert: Thanks for completing the last bit of optimization. You saved me a step. The optimization removes the multiplication involved behind the scenes with using the index to access the characters of the string.

My benchmark marks the improvement at about 33% faster.


@Edit: removed email address.
Last edited by Demivec on Thu Apr 25, 2013 2:55 pm, edited 1 time in total.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Edited code in this post because this first comment is wrong: You can also optimize further by first converting text$ to LCase() (do it before "While") and then totally removing the "Case 65 To 90" section. Remember, we're not interested in checking for 65 to 90 because everything is lowercase (97 to 122) except for what we manually change to uppercase.

Here's my own personal version that I'm using now (based on both your codes). I've also changed "97 To 122" to "'a' to 'z'" for readability, and compacted some lines and variables (it's just what I do). Also removed "Protected" as I don't need it and I'm sure it adds overhead (no matter how small) to my final exe.

Code: Select all

Procedure.s InitialCaps(text$)

  *b.Character=@text$
  cur=*b\c

  While cur
    Select cur
      Case 'a' To 'z' : If Not nextword : *b\c=cur-32 : nextword=1 : EndIf
      Case 'A' To 'Z' : If nextword : *b\c=cur+32 : Else : nextword=1 : EndIf
      Case 39 ; Ignore apostrophes as they're considered part of a word.
      Default : nextword=0
    EndSelect
    *b+SizeOf(Character)
    cur=*b\c
  Wend

  ProcedureReturn text$

EndProcedure

Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
Demivec: I've opened a PayPal account but it needs two business days to verify my bank account... will make a donation next week. ;) I can do one for wilbert too if you like? It won't be much each (probably just $10) but still, it's saved me a lot of speed hassles for my app.
Last edited by MachineCode on Sat Mar 02, 2013 3:38 am, edited 1 time in total.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Fastest way to modify a large string, byte-by-byte?

Post by Demivec »

MachineCode wrote:You can also optimize further by first converting text$ to LCase() (do it before "While") and then totally removing the "Case 65 To 90" section. Remember, we're not interested in checking for 65 to 90 because everything is lowercase (97 to 122) except for what we manually change to uppercase. ;)
The LCase() part actually increases the time. With LCase() you are going through the string twice. You end up looking at every character twice, once to convert it to lowercase and once to possibly convert it to uppercase. With the previous optimization you only looked at each character once and either left it the same or changed the case appropriately.

I thought you would be optimizing for time. If you are looking only for simplicity I guess your changes will be best.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Demivec wrote:I thought you would be optimizing for time.
I am, but I didn't realise LCase() would increase it. :shock: I just assumed the procedure checking it would be slower than PureBasic's LCase() command, since that's ASM. Anyway, I've now removed it from my procedure, based on what you've told me above. :)

Another question regarding optimization: should we be calculating SizeOf() over and over inside the While/Wend loop? This seems to work just fine, where it's calculated just once outside the loop:

Code: Select all

Procedure.s InitialCaps(text$)

  *b.Character=@text$
  s=SizeOf(Character)
  cur=*b\c

  While cur
    Select cur
      Case 'a' To 'z' : If Not nextword : *b\c=cur-32 : nextword=1 : EndIf
      Case 'A' To 'Z' : If nextword : *b\c=cur+32 : Else : nextword=1 : EndIf
      Case 39 ; Ignore apostrophes as they're considered part of a word.
      Default : nextword=0
    EndSelect
    *b+s : cur=*b\c
  Wend

  ProcedureReturn text$

EndProcedure
Last edited by MachineCode on Sat Mar 02, 2013 2:30 pm, edited 1 time in total.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Fastest way to modify a large string, byte-by-byte?

Post by wilbert »

MachineCode wrote:

Code: Select all

Procedure.s DigitsOnly(text$)
  *b.Character=@text$
  cur=*b\c
  While cur
    If cur>47 And cur<58
      n$+Chr(cur)
    EndIf
    *b+SizeOf(Character)
    cur=*b\c
  Wend
  ProcedureReturn n$
EndProcedure
Another question regarding optimization: should we be calculating SizeOf() over and over inside the While/Wend loop?
SizeOf() is a compiler function. It is not a function that is executed each time. There's no need to put the result inside a variable.

This should be faster for DigitsOnly() but this approach only works if you know the result is always shorter ...

Code: Select all

Procedure.s DigitsOnly(text$)
  
  Protected *in.Character = @text$
  Protected *out.Character = *in
  
  While *in\c
    If *in\c ! $30 < 10
      *out\c = *in\c
      *out + SizeOf(Character)
    EndIf
    *in + SizeOf(Character)
  Wend
  *out\c = 0
  
  ProcedureReturn text$
  
EndProcedure
Here's also a different approach to InitialCaps

Code: Select all

Procedure.s InitialCaps(text.s)
  
  Protected *in.Character = @text
  Protected lc, msk = $ffdf
  
  While *in\c
    lc = *in\c | $20
    If ((lc - 1) ! $60) < 26
      *in\c = lc & msk
      msk = $ffff
    ElseIf *in\c <> 39 
      msk = $ffdf
    EndIf
    *in + SizeOf(Character)
  Wend
  
  ProcedureReturn text
  
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Wilbert, thanks for showing me how to create a new string using the "Character" type. :)

One more question regarding this thread: would it be even faster still to perhaps just use regular expressions to modify text$? I'm no good with them, though, so it'd be harder for me to implement. But I'm thinking maybe a regex to change all to initial caps might be faster than a While/Wend loop?
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Fastest way to modify a large string, byte-by-byte?

Post by skywalk »

MachineCode wrote:One more question regarding this thread: would it be even faster still to perhaps just use regular expressions to modify text$? I'm no good with them, though, so it'd be harder for me to implement. But I'm thinking maybe a regex to change all to initial caps might be faster than a While/Wend loop?
Noooooo. Regex's are very slow. PB code will always beat them.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Thanks for replying. I had no idea they were slow.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

One final plea for help (and once I get this sorted, I should be set -- because I've already learned how to change same-size strings and longer-to-shorter strings). :)

This last time, I need to learn how to quickly enlarge an existing string with new data. In the particular example below, I want to put line numbers (and a space) in front of each line in the existing string. I think my code is almost there, but I don't know how to get the new larger result.

Please help me, because once this is done, you'll have taught me to fish, instead of feeding me a fish. ;)

Code: Select all

; Output in MessageRequester is to be:

;  8 eight
;  9 nine
; 10 ten

; With the above specific alignment.

text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$

width=2 : lines=3
soc=SizeOf(Character)

*i.Character=@text$
*o.Character=AllocateMemory(999)

n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next

While *i\c
  *o\c=*i\c : *o+soc
  If *i\c=#LF ; End of current line.
    n+1 : s$=RSet(Str(n),width," ")+" "
    For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next
  EndIf
  *i+soc
Wend
*o\c=0

MessageRequester("Result",text$) ; Shows original string.
MessageRequester("Result",PeekS(*o)) ; Shows null.
[Edit]

This works, but I fear that using n$ will slow it down:

Code: Select all

text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$

width=2 : lines=3
soc=SizeOf(Character)

*i.Character=@text$

n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : n$+Mid(s$,a,1) : Next

While *i\c
  n$+Chr(*i\c)
  If *i\c=#LF ; End of current line.
    n+1 : s$=RSet(Str(n),width," ")+" "
    For a=1 To Len(s$) : n$+Mid(s$,a,1) : Next
  EndIf
  *i+soc
Wend

MessageRequester("Result",n$)
Last edited by MachineCode on Sun Mar 03, 2013 5:42 am, edited 1 time in total.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
Psych
Enthusiast
Enthusiast
Posts: 239
Joined: Thu Dec 18, 2008 3:35 pm
Location: Wales, UK

Re: Fastest way to modify a large string, byte-by-byte?

Post by Psych »

Code: Select all

EnableExplicit
Global text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
Global i,c=CountString(text$,#CRLF$)
Global r$
For i=1 To c
  If StringField(text$,i,#CRLF$)
    r$+RSet(Str(i),2,"0")+" "+StringField(text$,i,#CRLF$)+#CRLF$
  EndIf
Next
MessageRequester("Test",r$)
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Fastest way to modify a large string, byte-by-byte?

Post by MachineCode »

Psych, I tried that approach first but it's too slow on a 725 KB file (Win32api.txt). It takes over 30 seconds to complete. Hence the need to use the approach above with "Character" types, because all other such "Character" examples in this thread just rip through that 725 KB file in less than 50 ms. :)
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
Psych
Enthusiast
Enthusiast
Posts: 239
Joined: Thu Dec 18, 2008 3:35 pm
Location: Wales, UK

Re: Fastest way to modify a large string, byte-by-byte?

Post by Psych »

I did think that it was a performance issue, I'll rethink.

Give me 3 mins, see if the next one is any quicker.
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Psych
Enthusiast
Enthusiast
Posts: 239
Joined: Thu Dec 18, 2008 3:35 pm
Location: Wales, UK

Re: Fastest way to modify a large string, byte-by-byte?

Post by Psych »

Try this, the last routine was using stringfields, which counts all the text each time, this should be faster and shouldn't be that far off a custom string search routine.

Code: Select all

EnableExplicit
Global text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
Global pos=1,line,f,r$
Repeat
  f=FindString(text$,#CRLF$,pos)
  If f
    line+1
    r$+RSet(Str(line),2,"0")+" "+Mid(text$,pos,f-pos+2)
    pos=f+2
  EndIf
Until Not f
MessageRequester("Test",r$)
----------------------------------------------------------------------------
Commenting your own code is admitting you don't understand it.
----------------------------------------------------------------------------
Post Reply