Page 2 of 4

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 5:34 am
by MachineCode
Demivec, you're fantastic! 8) Where can I donate to you for showing me the light?

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 7:20 am
by wilbert
The procedure of Demivec a bit optimized for speed

Code: Select all

Procedure.s InitialCaps(text.s)
  
  Protected *b.Character = @text, cur, wordStarted = #False
  
  cur = *b\c
  While cur
  
    Select cur
      Case 97 To 122
        If Not wordStarted
          *b\c = cur - 32
          wordStarted = #True
        EndIf
      Case 65 To 90
        If wordStarted
          *b\c = cur + 32
        Else
          wordStarted = #True
        EndIf 
      Case 39, '-', '_' ;apostrophe, hyphen and underscore treated as part of word
      Default
        wordStarted = #False
    EndSelect
          
    *b + SizeOf(Character)
    cur = *b\c
  Wend  

  ProcedureReturn text
EndProcedure

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Fri Mar 01, 2013 4:27 pm
by Demivec
MachineCode wrote:Demivec, you're fantastic! 8) Where can I donate to you for showing me the light?
@MachineCode: Your welcome. It isn't necessary to compensate me but I do accept donations. :wink: You can PayPal me.


@wilbert: Thanks for completing the last bit of optimization. You saved me a step. The optimization removes the multiplication involved behind the scenes with using the index to access the characters of the string.

My benchmark marks the improvement at about 33% faster.


@Edit: removed email address.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sat Mar 02, 2013 3:00 am
by MachineCode
Edited code in this post because this first comment is wrong: You can also optimize further by first converting text$ to LCase() (do it before "While") and then totally removing the "Case 65 To 90" section. Remember, we're not interested in checking for 65 to 90 because everything is lowercase (97 to 122) except for what we manually change to uppercase.

Here's my own personal version that I'm using now (based on both your codes). I've also changed "97 To 122" to "'a' to 'z'" for readability, and compacted some lines and variables (it's just what I do). Also removed "Protected" as I don't need it and I'm sure it adds overhead (no matter how small) to my final exe.

Code: Select all

Procedure.s InitialCaps(text$)

  *b.Character=@text$
  cur=*b\c

  While cur
    Select cur
      Case 'a' To 'z' : If Not nextword : *b\c=cur-32 : nextword=1 : EndIf
      Case 'A' To 'Z' : If nextword : *b\c=cur+32 : Else : nextword=1 : EndIf
      Case 39 ; Ignore apostrophes as they're considered part of a word.
      Default : nextword=0
    EndSelect
    *b+SizeOf(Character)
    cur=*b\c
  Wend

  ProcedureReturn text$

EndProcedure

Debug InitialCaps("'hI there'! ev'RY word 'SHOULD' have initial cap's ONLY and ''quotes'' working.")
Demivec: I've opened a PayPal account but it needs two business days to verify my bank account... will make a donation next week. ;) I can do one for wilbert too if you like? It won't be much each (probably just $10) but still, it's saved me a lot of speed hassles for my app.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sat Mar 02, 2013 3:30 am
by Demivec
MachineCode wrote:You can also optimize further by first converting text$ to LCase() (do it before "While") and then totally removing the "Case 65 To 90" section. Remember, we're not interested in checking for 65 to 90 because everything is lowercase (97 to 122) except for what we manually change to uppercase. ;)
The LCase() part actually increases the time. With LCase() you are going through the string twice. You end up looking at every character twice, once to convert it to lowercase and once to possibly convert it to uppercase. With the previous optimization you only looked at each character once and either left it the same or changed the case appropriately.

I thought you would be optimizing for time. If you are looking only for simplicity I guess your changes will be best.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sat Mar 02, 2013 3:33 am
by MachineCode
Demivec wrote:I thought you would be optimizing for time.
I am, but I didn't realise LCase() would increase it. :shock: I just assumed the procedure checking it would be slower than PureBasic's LCase() command, since that's ASM. Anyway, I've now removed it from my procedure, based on what you've told me above. :)

Another question regarding optimization: should we be calculating SizeOf() over and over inside the While/Wend loop? This seems to work just fine, where it's calculated just once outside the loop:

Code: Select all

Procedure.s InitialCaps(text$)

  *b.Character=@text$
  s=SizeOf(Character)
  cur=*b\c

  While cur
    Select cur
      Case 'a' To 'z' : If Not nextword : *b\c=cur-32 : nextword=1 : EndIf
      Case 'A' To 'Z' : If nextword : *b\c=cur+32 : Else : nextword=1 : EndIf
      Case 39 ; Ignore apostrophes as they're considered part of a word.
      Default : nextword=0
    EndSelect
    *b+s : cur=*b\c
  Wend

  ProcedureReturn text$

EndProcedure

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sat Mar 02, 2013 7:48 am
by wilbert
MachineCode wrote:

Code: Select all

Procedure.s DigitsOnly(text$)
  *b.Character=@text$
  cur=*b\c
  While cur
    If cur>47 And cur<58
      n$+Chr(cur)
    EndIf
    *b+SizeOf(Character)
    cur=*b\c
  Wend
  ProcedureReturn n$
EndProcedure
Another question regarding optimization: should we be calculating SizeOf() over and over inside the While/Wend loop?
SizeOf() is a compiler function. It is not a function that is executed each time. There's no need to put the result inside a variable.

This should be faster for DigitsOnly() but this approach only works if you know the result is always shorter ...

Code: Select all

Procedure.s DigitsOnly(text$)
  
  Protected *in.Character = @text$
  Protected *out.Character = *in
  
  While *in\c
    If *in\c ! $30 < 10
      *out\c = *in\c
      *out + SizeOf(Character)
    EndIf
    *in + SizeOf(Character)
  Wend
  *out\c = 0
  
  ProcedureReturn text$
  
EndProcedure
Here's also a different approach to InitialCaps

Code: Select all

Procedure.s InitialCaps(text.s)
  
  Protected *in.Character = @text
  Protected lc, msk = $ffdf
  
  While *in\c
    lc = *in\c | $20
    If ((lc - 1) ! $60) < 26
      *in\c = lc & msk
      msk = $ffff
    ElseIf *in\c <> 39 
      msk = $ffdf
    EndIf
    *in + SizeOf(Character)
  Wend
  
  ProcedureReturn text
  
EndProcedure

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sun Mar 03, 2013 3:10 am
by MachineCode
Wilbert, thanks for showing me how to create a new string using the "Character" type. :)

One more question regarding this thread: would it be even faster still to perhaps just use regular expressions to modify text$? I'm no good with them, though, so it'd be harder for me to implement. But I'm thinking maybe a regex to change all to initial caps might be faster than a While/Wend loop?

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sun Mar 03, 2013 3:43 am
by skywalk
MachineCode wrote:One more question regarding this thread: would it be even faster still to perhaps just use regular expressions to modify text$? I'm no good with them, though, so it'd be harder for me to implement. But I'm thinking maybe a regex to change all to initial caps might be faster than a While/Wend loop?
Noooooo. Regex's are very slow. PB code will always beat them.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sun Mar 03, 2013 3:51 am
by MachineCode
Thanks for replying. I had no idea they were slow.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sun Mar 03, 2013 5:11 am
by MachineCode
One final plea for help (and once I get this sorted, I should be set -- because I've already learned how to change same-size strings and longer-to-shorter strings). :)

This last time, I need to learn how to quickly enlarge an existing string with new data. In the particular example below, I want to put line numbers (and a space) in front of each line in the existing string. I think my code is almost there, but I don't know how to get the new larger result.

Please help me, because once this is done, you'll have taught me to fish, instead of feeding me a fish. ;)

Code: Select all

; Output in MessageRequester is to be:

;  8 eight
;  9 nine
; 10 ten

; With the above specific alignment.

text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$

width=2 : lines=3
soc=SizeOf(Character)

*i.Character=@text$
*o.Character=AllocateMemory(999)

n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next

While *i\c
  *o\c=*i\c : *o+soc
  If *i\c=#LF ; End of current line.
    n+1 : s$=RSet(Str(n),width," ")+" "
    For a=1 To Len(s$) : *o\c=Asc(Mid(s$,a,1)) : *o+soc : Next
  EndIf
  *i+soc
Wend
*o\c=0

MessageRequester("Result",text$) ; Shows original string.
MessageRequester("Result",PeekS(*o)) ; Shows null.
[Edit]

This works, but I fear that using n$ will slow it down:

Code: Select all

text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$

width=2 : lines=3
soc=SizeOf(Character)

*i.Character=@text$

n=8 : s$=RSet(Str(n),width," ")+" "
For a=1 To Len(s$) : n$+Mid(s$,a,1) : Next

While *i\c
  n$+Chr(*i\c)
  If *i\c=#LF ; End of current line.
    n+1 : s$=RSet(Str(n),width," ")+" "
    For a=1 To Len(s$) : n$+Mid(s$,a,1) : Next
  EndIf
  *i+soc
Wend

MessageRequester("Result",n$)

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sun Mar 03, 2013 5:39 am
by Psych

Code: Select all

EnableExplicit
Global text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
Global i,c=CountString(text$,#CRLF$)
Global r$
For i=1 To c
  If StringField(text$,i,#CRLF$)
    r$+RSet(Str(i),2,"0")+" "+StringField(text$,i,#CRLF$)+#CRLF$
  EndIf
Next
MessageRequester("Test",r$)

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sun Mar 03, 2013 5:46 am
by MachineCode
Psych, I tried that approach first but it's too slow on a 725 KB file (Win32api.txt). It takes over 30 seconds to complete. Hence the need to use the approach above with "Character" types, because all other such "Character" examples in this thread just rip through that 725 KB file in less than 50 ms. :)

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sun Mar 03, 2013 5:49 am
by Psych
I did think that it was a performance issue, I'll rethink.

Give me 3 mins, see if the next one is any quicker.

Re: Fastest way to modify a large string, byte-by-byte?

Posted: Sun Mar 03, 2013 6:07 am
by Psych
Try this, the last routine was using stringfields, which counts all the text each time, this should be faster and shouldn't be that far off a custom string search routine.

Code: Select all

EnableExplicit
Global text$="eight"+#CRLF$+"nine"+#CRLF$+"ten"+#CRLF$
Global pos=1,line,f,r$
Repeat
  f=FindString(text$,#CRLF$,pos)
  If f
    line+1
    r$+RSet(Str(line),2,"0")+" "+Mid(text$,pos,f-pos+2)
    pos=f+2
  EndIf
Until Not f
MessageRequester("Test",r$)