Page 2 of 2
Re: High speed split string
Posted: Wed Mar 14, 2018 11:38 pm
by linkerstorm
Hi.
Another less "ASMish" split and fast enough to general purpose, using the good old C library, hopefully shipped with PB.
The presented version here is Unicode. You can use "strstr" for Ascii if needed (plus some little changes to the code).
The function returns the array length.
Code: Select all
ImportC "crtdll.lib"
wcsstr.i (*str1, *str2)
EndImport
Procedure.i Split_wcsstr(Array StringArray.s(1), StringToSplit.s, Separator.s = " ")
Protected c = CountString(StringToSplit, Separator)
; We have to have a string to split
If Len(StringToSplit) = 0
ProcedureReturn -1
EndIf
; We return back the string as is if no separator found
If c = 0
ReDim StringArray(0)
StringArray(0) = StringToSplit
ProcedureReturn ArraySize(StringArray()) + 1
EndIf
ReDim StringArray(c)
Define *StringToSplit = @StringToSplit
Define *pfound = wcsstr(*StringToSplit, @Separator)
Define.i i
While *pfound
StringArray(i) = PeekS(*StringToSplit, (*pfound - *StringToSplit) / 2)
*StringToSplit = *pfound + 2
*pfound = wcsstr(*StringToSplit, @Separator)
i + 1
Wend
StringArray(i) = PeekS(*StringToSplit)
ProcedureReturn c + 1
EndProcedure
Enjoy !
Re: High speed split string
Posted: Fri Jun 10, 2022 10:14 pm
by AZJIO
The separator is any character from the specified set. The separator is not an entire string. If the separator is repeated in the line under study, then it is considered as a single separator, that is, empty elements are not added.
Code: Select all
EnableExplicit
Procedure SplitL2(String$, List StringList.s(), Separator$ = #CRLF$ + #TAB$ + #FF$ + #VT$ + " ")
Protected *S.Integer = @String$
Protected Len1, Len2, Blen, i, j
Protected *memChar, *c.Character, *jc.Character
Len1 = Len(Separator$)
Len2 = Len(String$)
ClearList(StringList())
*c.Character = @String$
*memChar = @Separator$
For i = 1 To Len2
*jc.Character = *memChar
For j = 1 To Len1
If *c\c = *jc\c
*c\c = 0
If *S <> *c
AddElement(StringList())
StringList() = PeekS(*S)
EndIf
*S = *c + SizeOf(Character)
Break
EndIf
*jc + SizeOf(Character)
Next
*c + SizeOf(Character)
Next
AddElement(StringList())
StringList() = PeekS(*S)
EndProcedure
Define S.s = "This is a test string to see if split and join are working."
Define NewList MyStrings.s()
SplitL2(S, MyStrings(), " ")
; Debug ListSize(MyStrings())
ForEach MyStrings()
Debug MyStrings()
Next
Re: High speed split string
Posted: Sun Jun 12, 2022 7:43 am
by idle
Not at computer but as a general rule pass in strings by address and as strings are null terminated theres no need to check the length. Just loop until null That way you eliminate a parse of the string to get the length for the copy and also a second parse to get the length. For the loop.
Re: High speed split string
Posted: Sun Jun 12, 2022 10:16 am
by mk-soft
See
SplitStringArray
Here the end of the string is checked via NULL and with option double-quotes
Re: High speed split string
Posted: Mon Jun 13, 2022 2:44 am
by AZJIO
idle
(+ While) Speed increase by 3%: 150 -> 145
(+ FindString) Speed increase: 145 -> 300
Code: Select all
EnableExplicit
DisableDebugger
Procedure SplitL2(String$, List StringList.s(), Separator$ = #CRLF$ + #TAB$ + #FF$ + #VT$ + " ")
Protected *S.Integer = @String$
Protected *jc.Character, *c.Character = @String$
ClearList(StringList())
While *c\c
*jc.Character = @Separator$
While *jc\c
If *c\c = *jc\c
*c\c = 0
If *S <> *c
AddElement(StringList())
StringList() = PeekS(*S)
EndIf
*S = *c + SizeOf(Character)
Break
EndIf
*jc + SizeOf(Character)
Wend
*c + SizeOf(Character)
Wend
AddElement(StringList())
StringList() = PeekS(*S)
EndProcedure
Define S.s = "This is a test string to see if split and join are working."
Define NewList MyStrings.s()
Define i
Define StartTime = ElapsedMilliseconds()
For i = 1 To 100000
SplitL2(S, MyStrings(), " ")
Next
MessageRequester("","Completed in " + Str(ElapsedMilliseconds() - StartTime) + " ms")
; Debug "Completed in " + Str(ElapsedMilliseconds() - StartTime) + " ms" + #CRLF$
; Debug ListSize(MyStrings())
ForEach MyStrings()
Debug "|" + MyStrings() + "|"
Next
(+ FindString)
Code: Select all
While *c\c
If FindString(Separator$, Chr(*c\c))
*c\c = 0
If *S <> *c
AddElement(StringList())
StringList() = PeekS(*S)
EndIf
*S = *c + SizeOf(Character)
EndIf
*c + SizeOf(Character)
Wend
Re: High speed split string
Posted: Mon Jun 13, 2022 4:24 am
by idle
Try this its not exactly the same as it skips anything below the separator character
Code: Select all
Procedure StringField_List(*source,List StringFields.s(),separator=' ')
Protected *inp.Character
ClearList(StringFields())
If *source
*inp = *source
While *inp\c <> 0
While (*inp\c > separator )
*inp+2
Wend
AddElement(StringFields())
StringFields()= PeekS(*source,(*inp-*source)>>1)
If *inp\c <> 0
While *inp\c <= separator
*inp+2
*source = *inp
Wend
Else
Break
EndIf
Wend
EndIf
EndProcedure
Define S.s = "This is a test string to see if split and join are working."
NewList strings.s()
StringField_List(@S,Strings())
ForEach strings()
Debug Strings()
Next
Re: High speed split string
Posted: Mon Jun 13, 2022 5:33 am
by Demivec
@AZJIO: your code doesn't seem to be working properly yet. It only is detecting some of the separators.
I tested your code with this test string and it seemed to miss the LF and CR characters:
Code: Select all
Define S.s = "This is a test " + #LF$ + " string to " + #FF$ + #VT$ + #CRLF$ + " see " + #CR$ + " If split And join are working."
Re: High speed split string
Posted: Mon Jun 13, 2022 7:33 am
by AZJIO
@Demivec
Code: Select all
SplitL2(S, MyStrings(), " " + #FF$ + #VT$ + #CRLF$)
Re: High speed split string
Posted: Mon Jun 13, 2022 8:41 am
by Demivec
AZJIO wrote: Mon Jun 13, 2022 7:33 am
@Demivec
Code: Select all
SplitL2(S, MyStrings(), " " + #FF$ + #VT$ + #CRLF$)
Thanks for the hint. I had overlooked the test code passing in the Separator$.

Everything is working as it should now.
Re: High speed split string
Posted: Mon Jun 13, 2022 9:55 am
by AZJIO
idle wrote: Mon Jun 13, 2022 4:24 am
Try this its not exactly the same as it skips anything below the separator character
I had such an idea, given that there are unreadable characters below the space, but I wanted universality, since in my program the user specifies which character he will use. It can be a comma or a custom unicode character in the form of some kind of shape.
Reading is faster if length is specified?
In my tests, this was within the margin of error.
Code: Select all
StringList() = PeekS(*S, (*c - *S) >> 1)
Re: High speed split string
Posted: Mon Jun 13, 2022 11:10 am
by idle
I was only trying to show that it's faster to pass the string by its address. Your code is otherwise fine.
Re: High speed split string
Posted: Mon Jun 13, 2022 11:32 am
by AZJIO
idle wrote: Mon Jun 13, 2022 11:10 am
by its address.
In my code, characters in the string are overwritten with zeros, so I cannot use the original string so as not to spoil it.