How efficient is StringField?

Everything else that doesn't fall into one of the other PB categories.
User avatar
GedB
Addict
Addict
Posts: 1313
Joined: Fri May 16, 2003 3:47 pm
Location: England
Contact:

How efficient is StringField?

Post by GedB »

Does the StringField function parse the string from the beginning with each call, or does it 'remember' stuff.

For example, given the following code:

Code: Select all

Days.s = "Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday"

;Using StringField
i = 1
Repeat 
  Day.s = StringField(Days, i, "|") 
  Debug Day.s
  i + 1
Until Day = ""
Would every call to StringField involve reading every byte of the string from the beginning?

If I am parsing a large file, am I better off writing my own parser so that only one pass is necessary?
PolyVector
Enthusiast
Enthusiast
Posts: 499
Joined: Wed Sep 17, 2003 9:17 pm
Location: Southern California
Contact:

Post by PolyVector »

It makes a pass each time it is called, you would want to impliment your own routine if you require something optimized for your specific application...

Here's a one-pass solution:

Code: Select all

Days.s = "Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday" 

;Using FindString
DayStart = 1 
Repeat 
  DayEnd=FindString(Days,"|",DayStart)
  Day.s = Mid(Days,DayStart,DayEnd-DayStart)
  Debug Day.s 
  DayStart=DayEnd+1 ;/Next Field... Add the number of characters in the delimiter
Until DayEnd=0
User avatar
GedB
Addict
Addict
Posts: 1313
Joined: Fri May 16, 2003 3:47 pm
Location: England
Contact:

Post by GedB »

Thankyou Mr Vector.
GreenGiant
Enthusiast
Enthusiast
Posts: 252
Joined: Fri Feb 20, 2004 5:43 pm

Post by GreenGiant »

I made a little test to see how much speed was gained by using the one-pass solution. I found on my computer that for a 60000 character random string using the StringField command was (very slightly) faster. I think my code's right, so see which is faster for you

Code: Select all

For a=1 To 60000
  string$=string$+Chr(97+Random(25))
Next

Delay(5000)

oldtime=GetTickCount_()

max=CountString(string$,"b")
For i=1 To max
  temp$ = StringField(string$, i, "b") 
  ;Debug temp$ 
Next

newtime=GetTickCount_()

Start = 1 
Repeat 
  Stop=FindString(string$,"b",Start) 
  temp$ = Mid(string$,Start,Stop-Start) 
  ;Debug temp$
  Start=Stop+1 
Until Stop=0

newtime2=GetTickCount_()

answer$="Speed test using a string with 60000 characters"+Chr(13)+Chr(10)+"Using StringField: "+Str(newtime-oldtime)+Chr(13)+Chr(10)+"Using FindString: "+Str(newtime2-newtime)
MessageRequester("Results",answer$,0)
The delay's just to give PB the chance to initialise itself by the way.
User avatar
GedB
Addict
Addict
Posts: 1313
Joined: Fri May 16, 2003 3:47 pm
Location: England
Contact:

Post by GedB »

String field wins with both Debug on and off on my machine:

Debug on:
StringField: 203
FindString: 219

Debug off:
StringField: 188
FindString: 203

This raises the question: Is it that StringField is just so fast that even with the overhead of multiple passes it wins? Or is it keeping state between calls?

So I tried updating the the loops so that the StringField had to switch between 2 different strings and delimters. This did not change a thing. StringField was still faster.

So this leaves the conclusion that StringField is one damn fine piece of well optimised code.

The question remains, however. Is it just so much faster it can afford to keep going back to the beginning or is it using some very fancy cacheing?

Code: Select all


For a=1 To 60000
  b_string$=b_string$+Chr(97+Random(25))
Next

For a=1 To 60000
  c_string$=c_string$+Chr(97+Random(25))
Next

Delay(5000)

oldtime=GetTickCount_()

b_max = CountString(b_string$,"b")
c_max = CountString(c_string$,"c")
If b_max > c_max
  max = b_max
Else
  max = c_max
EndIf
  

For i=1 To max
  temp$ = StringField(b_string$, i, "b")
  temp$ + StringField(c_string$, i, "c")
  ;Debug temp$
Next

newtime=GetTickCount_()

b_Start = 1
c_start = 1
b_Finished = #False
c_Finished = #False
Repeat
  If b_Finished = #False
    Stop=FindString(b_string$,"b",b_Start)
    temp$ = Mid(b_string$,b_Start,Stop - b_Start)
    ;Debug temp$
    b_Start=Stop+1
  EndIf
  If Stop = 0 
    b_Finished = #True
  EndIf
  If c_Finished = #False
    Stop=FindString(c_string$,"c",c_start)
    temp$ + Mid(c_string$,c_start,Stop-c_start)
    ;Debug temp$
    c_start=Stop+1
  EndIf
  If Stop = 0 
    c_Finished = #True
  EndIf
Until b_Finished And c_Finished

newtime2=GetTickCount_()

answer$="Speed test using a string with 60000 characters"+Chr(13)+Chr(10)+"Using StringField: "+Str(newtime-oldtime)+Chr(13)+Chr(10)+"Using FindString: "+Str(newtime2-newtime)
MessageRequester("Results",answer$,0)

User avatar
tinman
PureBasic Expert
PureBasic Expert
Posts: 1102
Joined: Sat Apr 26, 2003 4:56 pm
Location: Level 5 of Robot Hell
Contact:

Post by tinman »

A couple of guesses as to why:

FindString and/or Mid also count through from the start, making sure you don't run off the end of the string?

StringField uses a single character, and it is always a single character that is searched for in the string. FindString searches for a string making it a more complex comparison.

You'd probably find that replacing FindString with a bunch of peekb's makes it quicker.
If you paint your butt blue and glue the hole shut you just themed your ass but lost the functionality.
(WinXPhSP3 PB5.20b14)
El_Choni
TailBite Expert
TailBite Expert
Posts: 1007
Joined: Fri Apr 25, 2003 6:09 pm
Location: Spain

Post by El_Choni »

Changing the two Mid() for PeekS() increases speed:

Code: Select all

; temp$ = Mid(b_string$,b_Start,Stop - b_Start) 
    If Stop:temp$ = PeekS(@b_string$+b_Start-1, Stop-b_Start):EndIf
;...
; temp$ + Mid(c_string$,c_start,Stop-c_start) 
    If Stop:temp$+PeekS(@c_string$+c_Start-1, Stop-c_Start):EndIf
El_Choni
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3943
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

A little experiment. I don't know if it'll be much faster and if it is allowed to do it this way but it seems to work. I wasn't be able to test the array failed option becasue it didn't fail :wink:

Code: Select all

Procedure.l SplitStringByChar(StringToSplit.s, SplitChar.s, ArrayPtr.l)

; free the current array

! mov edx,[esp+8]
! call PB_FreeArray

; duplicate the string to split

! mov edx,[esp]
! lea ecx,[esp+8]
! call SYS_FastAllocateString

; replace all splitchars by 0 and push all pointers to the substrings

! mov edx,[esp+4]
! mov ah,byte [edx]
! xor ecx,ecx
! mov edx,[esp+8]
! _splitstring_loop0:
! inc ecx
! push edx
! _splitstring_loop1:
! mov al,byte [edx]
! inc edx
! And al,al
! jz _splitstring_cont  
! cmp al,ah
! jne _splitstring_loop1
! mov byte [edx-1],0
! jmp _splitstring_loop0

; correct stack if array allocation failed

! _ss_arrayfailed:
! pop ecx
! sal ecx,2
! add esp,ecx
! jmp _ss_exit

; allocate new array

! _splitstring_cont:
! push ecx
! mov ebx,ecx
! mov eax,4
! call PB_AllocateArray
! jz _ss_arrayfailed
! mov dword [eax],8
! pop ecx
! mov edx,ecx
! sal edx,2
! add edx,eax
! add eax,4

; set all pointers

! _splitstring_loop2:
! popd [edx]
! sub edx,4
! loop _splitstring_loop2  
! _ss_exit:
  ProcedureReturn
EndProcedure

; test the procedure

Dim B.s(0)

; supplying two times B() is required to free the old array

B() = SplitStringByChar("Hello, this is a test to see if the splitting of this string will be correct"," ",B())

ArrayLength = PeekL(@B()-8)

For i=0 To ArrayLength-1
 Debug B(i)
Next
PolyVector
Enthusiast
Enthusiast
Posts: 499
Joined: Wed Sep 17, 2003 9:17 pm
Location: Southern California
Contact:

Post by PolyVector »

I consistantly got best results with FindString.... One thing you have to consider is that even if FindString is slower, the more data your program has to process, the faster FindString(or a similar method) will be in comparison...
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3943
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

PolyVector wrote:I consistantly got best results with FindString....
Are you serious ???
After I posted my code, I decided to check for myself how fast it is. Of course I had to use the first testing routine because in it's current form, my routine doesn't support multiple split chars.

Running that first test on my computer my routine seems to be 280x faster (with debug off). :?:
User avatar
tinman
PureBasic Expert
PureBasic Expert
Posts: 1102
Joined: Sat Apr 26, 2003 4:56 pm
Location: Level 5 of Robot Hell
Contact:

Post by tinman »

GedB wrote:The question remains, however. Is it just so much faster it can afford to keep going back to the beginning or is it using some very fancy cacheing?
You'd have to think it always goes back to the beginning, otherwise this would not work:

Code: Select all

a$ = "eat|my|shorts"
Debug a$ + " " + Str(2) + " " + StringField(a$, 2, "|")

PokeB(@a$ + 1, Asc("|"))
Debug a$ + " " + Str(3) + " " + StringField(a$, 3, "|")
If it cached the last used index position then it should find "shorts" for the third index, even though I changed the string (not through any "usual" method. Sure, you could maybe do some mad MMU/cache tricks to show that the string has been modified, but perhaps overkill (and would it really be controllable by a user program?).

If you replace the FindString() with your own basic code that steps through the string, remembering the last position and Mid() with PeekS() as El_Choni suggested, the StringField gets left in the dust.
If you paint your butt blue and glue the hole shut you just themed your ass but lost the functionality.
(WinXPhSP3 PB5.20b14)
PolyVector
Enthusiast
Enthusiast
Posts: 499
Joined: Wed Sep 17, 2003 9:17 pm
Location: Southern California
Contact:

Post by PolyVector »

@wilbert
I was refering to the test earlier between FindString() and StringField()
Your asm is faster of course :)
Post Reply