Page 1 of 1
How efficient is StringField?
Posted: Thu Sep 23, 2004 10:57 am
by GedB
Does the StringField function parse the string from the beginning with each call, or does it 'remember' stuff.
For example, given the following code:
Code: Select all
Days.s = "Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday"
;Using StringField
i = 1
Repeat
Day.s = StringField(Days, i, "|")
Debug Day.s
i + 1
Until Day = ""
Would every call to StringField involve reading every byte of the string from the beginning?
If I am parsing a large file, am I better off writing my own parser so that only one pass is necessary?
Posted: Thu Sep 23, 2004 11:48 am
by PolyVector
It makes a pass each time it is called, you would want to impliment your own routine if you require something optimized for your specific application...
Here's a one-pass solution:
Code: Select all
Days.s = "Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday"
;Using FindString
DayStart = 1
Repeat
DayEnd=FindString(Days,"|",DayStart)
Day.s = Mid(Days,DayStart,DayEnd-DayStart)
Debug Day.s
DayStart=DayEnd+1 ;/Next Field... Add the number of characters in the delimiter
Until DayEnd=0
Posted: Thu Sep 23, 2004 12:05 pm
by GedB
Thankyou Mr Vector.
Posted: Thu Sep 23, 2004 1:44 pm
by GreenGiant
I made a little test to see how much speed was gained by using the one-pass solution. I found on my computer that for a 60000 character random string using the StringField command was (very slightly) faster. I think my code's right, so see which is faster for you
Code: Select all
For a=1 To 60000
string$=string$+Chr(97+Random(25))
Next
Delay(5000)
oldtime=GetTickCount_()
max=CountString(string$,"b")
For i=1 To max
temp$ = StringField(string$, i, "b")
;Debug temp$
Next
newtime=GetTickCount_()
Start = 1
Repeat
Stop=FindString(string$,"b",Start)
temp$ = Mid(string$,Start,Stop-Start)
;Debug temp$
Start=Stop+1
Until Stop=0
newtime2=GetTickCount_()
answer$="Speed test using a string with 60000 characters"+Chr(13)+Chr(10)+"Using StringField: "+Str(newtime-oldtime)+Chr(13)+Chr(10)+"Using FindString: "+Str(newtime2-newtime)
MessageRequester("Results",answer$,0)
The delay's just to give PB the chance to initialise itself by the way.
Posted: Thu Sep 23, 2004 4:51 pm
by GedB
String field wins with both Debug on and off on my machine:
Debug on:
StringField: 203
FindString: 219
Debug off:
StringField: 188
FindString: 203
This raises the question: Is it that StringField is just so fast that even with the overhead of multiple passes it wins? Or is it keeping state between calls?
So I tried updating the the loops so that the StringField had to switch between 2 different strings and delimters. This did not change a thing. StringField was still faster.
So this leaves the conclusion that StringField is one damn fine piece of well optimised code.
The question remains, however. Is it just so much faster it can afford to keep going back to the beginning or is it using some very fancy cacheing?
Code: Select all
For a=1 To 60000
b_string$=b_string$+Chr(97+Random(25))
Next
For a=1 To 60000
c_string$=c_string$+Chr(97+Random(25))
Next
Delay(5000)
oldtime=GetTickCount_()
b_max = CountString(b_string$,"b")
c_max = CountString(c_string$,"c")
If b_max > c_max
max = b_max
Else
max = c_max
EndIf
For i=1 To max
temp$ = StringField(b_string$, i, "b")
temp$ + StringField(c_string$, i, "c")
;Debug temp$
Next
newtime=GetTickCount_()
b_Start = 1
c_start = 1
b_Finished = #False
c_Finished = #False
Repeat
If b_Finished = #False
Stop=FindString(b_string$,"b",b_Start)
temp$ = Mid(b_string$,b_Start,Stop - b_Start)
;Debug temp$
b_Start=Stop+1
EndIf
If Stop = 0
b_Finished = #True
EndIf
If c_Finished = #False
Stop=FindString(c_string$,"c",c_start)
temp$ + Mid(c_string$,c_start,Stop-c_start)
;Debug temp$
c_start=Stop+1
EndIf
If Stop = 0
c_Finished = #True
EndIf
Until b_Finished And c_Finished
newtime2=GetTickCount_()
answer$="Speed test using a string with 60000 characters"+Chr(13)+Chr(10)+"Using StringField: "+Str(newtime-oldtime)+Chr(13)+Chr(10)+"Using FindString: "+Str(newtime2-newtime)
MessageRequester("Results",answer$,0)
Posted: Thu Sep 23, 2004 5:00 pm
by tinman
A couple of guesses as to why:
FindString and/or Mid also count through from the start, making sure you don't run off the end of the string?
StringField uses a single character, and it is always a single character that is searched for in the string. FindString searches for a string making it a more complex comparison.
You'd probably find that replacing FindString with a bunch of peekb's makes it quicker.
Posted: Thu Sep 23, 2004 5:11 pm
by El_Choni
Changing the two Mid() for PeekS() increases speed:
Code: Select all
; temp$ = Mid(b_string$,b_Start,Stop - b_Start)
If Stop:temp$ = PeekS(@b_string$+b_Start-1, Stop-b_Start):EndIf
;...
; temp$ + Mid(c_string$,c_start,Stop-c_start)
If Stop:temp$+PeekS(@c_string$+c_Start-1, Stop-c_Start):EndIf
Posted: Thu Sep 23, 2004 5:20 pm
by wilbert
A little experiment. I don't know if it'll be much faster and if it is allowed to do it this way but it seems to work. I wasn't be able to test the array failed option becasue it didn't fail
Code: Select all
Procedure.l SplitStringByChar(StringToSplit.s, SplitChar.s, ArrayPtr.l)
; free the current array
! mov edx,[esp+8]
! call PB_FreeArray
; duplicate the string to split
! mov edx,[esp]
! lea ecx,[esp+8]
! call SYS_FastAllocateString
; replace all splitchars by 0 and push all pointers to the substrings
! mov edx,[esp+4]
! mov ah,byte [edx]
! xor ecx,ecx
! mov edx,[esp+8]
! _splitstring_loop0:
! inc ecx
! push edx
! _splitstring_loop1:
! mov al,byte [edx]
! inc edx
! And al,al
! jz _splitstring_cont
! cmp al,ah
! jne _splitstring_loop1
! mov byte [edx-1],0
! jmp _splitstring_loop0
; correct stack if array allocation failed
! _ss_arrayfailed:
! pop ecx
! sal ecx,2
! add esp,ecx
! jmp _ss_exit
; allocate new array
! _splitstring_cont:
! push ecx
! mov ebx,ecx
! mov eax,4
! call PB_AllocateArray
! jz _ss_arrayfailed
! mov dword [eax],8
! pop ecx
! mov edx,ecx
! sal edx,2
! add edx,eax
! add eax,4
; set all pointers
! _splitstring_loop2:
! popd [edx]
! sub edx,4
! loop _splitstring_loop2
! _ss_exit:
ProcedureReturn
EndProcedure
; test the procedure
Dim B.s(0)
; supplying two times B() is required to free the old array
B() = SplitStringByChar("Hello, this is a test to see if the splitting of this string will be correct"," ",B())
ArrayLength = PeekL(@B()-8)
For i=0 To ArrayLength-1
Debug B(i)
Next
Posted: Thu Sep 23, 2004 10:56 pm
by PolyVector
I consistantly got best results with FindString.... One thing you have to consider is that even if FindString is slower, the more data your program has to process, the faster FindString(or a similar method) will be in comparison...
Posted: Fri Sep 24, 2004 4:54 am
by wilbert
PolyVector wrote:I consistantly got best results with FindString....
Are you serious ???
After I posted my code, I decided to check for myself how fast it is. Of course I had to use the first testing routine because in it's current form, my routine doesn't support multiple split chars.
Running that first test on my computer my routine seems to be 280x faster (with debug off).

Posted: Fri Sep 24, 2004 1:34 pm
by tinman
GedB wrote:The question remains, however. Is it just so much faster it can afford to keep going back to the beginning or is it using some very fancy cacheing?
You'd have to think it always goes back to the beginning, otherwise this would not work:
Code: Select all
a$ = "eat|my|shorts"
Debug a$ + " " + Str(2) + " " + StringField(a$, 2, "|")
PokeB(@a$ + 1, Asc("|"))
Debug a$ + " " + Str(3) + " " + StringField(a$, 3, "|")
If it cached the last used index position then it should find "shorts" for the third index, even though I changed the string (not through any "usual" method. Sure, you could maybe do some mad MMU/cache tricks to show that the string has been modified, but perhaps overkill (and would it really be controllable by a user program?).
If you replace the FindString() with your own basic code that steps through the string, remembering the last position and Mid() with PeekS() as El_Choni suggested, the StringField gets left in the dust.
Posted: Mon Sep 27, 2004 9:53 am
by PolyVector
@wilbert
I was refering to the test earlier between FindString() and StringField()
Your asm is faster of course
