PB Strings, Can be serveral thousand times too slow
-
- Enthusiast
- Posts: 480
- Joined: Thu Jul 27, 2006 4:06 am
As I mentioned:

Otherwise, I can't release the memoryThe Split function currently looks like this: (But I want to clean it up and merge the midmem() into it and get rid of the function call)

Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
-
- Enthusiast
- Posts: 480
- Joined: Thu Jul 27, 2006 4:06 am
Sorry, didn't read all
How about this for a stand alone version though? (untested...)
I'm sure someone will come up with a better idea - Within the same context.
Just make sure you don't go off bounds... it might not crash at most of the times, but thats due to the fact that memory pages are allocated to the nearest power of 2... Making sure with a Len() call before the MidFast() is a good thing, perhaps with a macro.
Substituting the whole pb string set for a custom one could as well be a better solution though.
Null terminated, yet, with a small structure with string information would be ideal.

How about this for a stand alone version though? (untested...)
Code: Select all
Procedure.s MidFast(dwSrc.l, dwStart.l, dwEnd.l)
If dwSrc
If (dwEnd > 0) And (dwStart > 0)
Define.s szRet
szRet = Space(dwEnd)
CopyMemory(dwSrc + dwStart -1, @szRet, dwEnd)
ProcedureReturn szRet
EndIf
EndIf
EndProcedure
str.s = "Hello World"
Debug MidFast(@str, 1, 4)
Debug MidFast(@str, 8, 2)
Debug MidFast(@str, 7, 5)
Debug "-----------------"
Debug Mid(str, 1, 4)
Debug Mid(str, 8, 2)
Debug Mid(str, 7, 5)
I'm sure someone will come up with a better idea - Within the same context.
Just make sure you don't go off bounds... it might not crash at most of the times, but thats due to the fact that memory pages are allocated to the nearest power of 2... Making sure with a Len() call before the MidFast() is a good thing, perhaps with a macro.
Substituting the whole pb string set for a custom one could as well be a better solution though.
Null terminated, yet, with a small structure with string information would be ideal.


Does szRet not go out of scope?
This sort of thing confuses me a little I admit
This sort of thing confuses me a little I admit
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
superadnim, You've inspired me! 
Your code fixes my mem leak bug and is fast. I wanted to get rid of the space() function (not for any real reason I can put my finger on though
) so I came up with a cross between yours and mine. In actually it's not any faster than yours but both are thousands of times faster than PB's when the strings get longer!
Thanks!!

Your code fixes my mem leak bug and is fast. I wanted to get rid of the space() function (not for any real reason I can put my finger on though

Thanks!!
Code: Select all
Procedure.s MidMem(*MainMem, StartPos.l, GetLen.l)
If *MainMem = 0 Or GetLen = 0
ProcedureReturn ""
EndIf
*RetMem = AllocateMemory(GetLen)
CopyMemory(*MainMem + StartPos -1, *RetMem, GetLen)
ReturnString.s = PeekS(*RetMem,GetLen)
FreeMemory(*RetMem)
ProcedureReturn ReturnString
EndProcedure
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
I couldn't help myself, and this also shows how brilliant macros are, no function call overheads. just pure PureBasic Power 
And as a bonus my MidMemLightning() also handles unicode just fine unlike MidMem()
Obviously just like MidMem() there is no bounds check here either, if you want safety use PB's Mid() instead. (just a reminder to the noobies out there
PS! You may notice the absense of checking the string pointer or the length given to the macro, but PeekS handles 0 length just fine, and since you normally pass a string (so you normally always have valid memory it's pointing to), even if it's empty PeekS handles that ok as well.
So not checking that saves some cpu as well.

And as a bonus my MidMemLightning() also handles unicode just fine unlike MidMem()

Obviously just like MidMem() there is no bounds check here either, if you want safety use PB's Mid() instead. (just a reminder to the noobies out there

Code: Select all
EnableExplicit
Macro MidMemLightning(string,startpos,length)
PeekS(@string+(startpos*SizeOf(Character))-SizeOf(Character),length)
EndMacro
Procedure.s MidMem(*MainMem, StartPos.l, GetLen.l)
Protected *RetMem,ReturnString.s
If *MainMem = 0 Or GetLen = 0
ProcedureReturn ""
EndIf
*RetMem = AllocateMemory(GetLen)
CopyMemory(*MainMem + StartPos -1, *RetMem, GetLen)
ReturnString = PeekS(*RetMem,GetLen)
FreeMemory(*RetMem)
ProcedureReturn ReturnString
EndProcedure
Define str$,TestString$,Output$,Start1.l,Start2.l,Start3.l,Start4.l,i.l
str$ = "Hello World"
Debug MidMem(@str$, 1, 4)
Debug MidMem(@str$, 8, 2)
Debug MidMem(@str$, 7, 5)
Debug "-----------------"
Debug Mid(str$, 1, 4)
Debug Mid(str$, 8, 2)
Debug Mid(str$, 7, 5)
Debug "-----------------"
Debug MidMemLightning(str$, 1, 4)
Debug MidMemLightning(str$, 8, 2)
Debug MidMemLightning(str$, 7, 5)
DisableDebugger
TestString$ = Space(20000000)
Start1 = ElapsedMilliseconds()
For i = 1 To 100000
Output$ = "!" + Midmem(@TestString$,i*100,10) + "!"
Next
Start2 = ElapsedMilliseconds()
For i = 1 To 10000
Output$ = "!" + Mid(TestString$,i*100,10) + "!"
Next
Start3 = ElapsedMilliseconds()
For i = 1 To 100000
Output$ = "!" + MidMemLightning(TestString$,i*100,10) + "!"
Next
Start4 = ElapsedMilliseconds()
EnableDebugger
MessageRequester("Mid vs MidMem vs MidMemLightning", "Mid() = " + Str(Start3-Start2) + " (10000 rounds only as it's so slow)" + #CRLF$ + "MidMem() = " + Str(Start2-Start1) + " (100000 rounds)" + #CRLF$ + "MidMemLightning() ="+Str(Start4-Start3) + " (100000 rounds)" )
So not checking that saves some cpu as well.
Legend!
Even without the macro, pulling the middle out of it without the extra alloc-copymemory-free should be lots faster anway.
It's aways great when this forum gets competitive.
I don't have a compiler at work so I'll test it when I get home but I see what you've done and chopped out an unnecessary step!

Even without the macro, pulling the middle out of it without the extra alloc-copymemory-free should be lots faster anway.
It's aways great when this forum gets competitive.

I don't have a compiler at work so I'll test it when I get home but I see what you've done and chopped out an unnecessary step!
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
For some reason, instead of doing work on my own code I ended up fiddling with this instead :roll:
Your Split() plus the improved Mid macro, and two new Split implementations.
My implementations support unicode, your's don't.
A note about the two variants I made, the first scans to get the count to avoid the need to redim, the second variant does not do this but redims instead.
And both my implementations support a unlimited number of split strings and unlimited number of dividers, only available memory should be the real limitation here.
And my implementations also handle strings like "," and "this,is," and ",is" for example, yours seems to act a bit weird on that.
And here's the really interesting thing.
Run the source as it is and you'll get numbers like:
Split() = 14976
SplitLightning() = 23650
SplitLightning2() = 13213
But choose Compile without debugger in the IDE menu and you get numbers like these:
Split() = 5866
SplitLightning() = 4025
SplitLightning2() = 5023
Not the first time I've seen this behaviour, even though disabledebugger is used there is still a lot of debugger overhead so for speed/optimization tests it's wiser to compiler without debugger.
Also try with shorter test strings and you'll see yet again different results.
Using only "this,is,a,test,string" as string instead of that long monster string I got this:
Split() = 7082
SplitLightning() = 2543
SplitLightning2() = 1310
And when doing a compile without debugger I got:
Split() = 2543
SplitLightning() = 515
SplitLightning2() = 421
So for shorter strings my SplitLighting2() seems best, for medium strings SplitLightning() for really long strings things even out more as main memory overhead rears it's head rather than cpu speed/cache etc.
There are probably faster ways to do this, and I'm pretty sure that a ASM variant would be the winner but I'll leave that challenge to someone else
Your Split() plus the improved Mid macro, and two new Split implementations.
My implementations support unicode, your's don't.
A note about the two variants I made, the first scans to get the count to avoid the need to redim, the second variant does not do this but redims instead.
And both my implementations support a unlimited number of split strings and unlimited number of dividers, only available memory should be the real limitation here.
And my implementations also handle strings like "," and "this,is," and ",is" for example, yours seems to act a bit weird on that.
And here's the really interesting thing.
Run the source as it is and you'll get numbers like:
Split() = 14976
SplitLightning() = 23650
SplitLightning2() = 13213
But choose Compile without debugger in the IDE menu and you get numbers like these:
Split() = 5866
SplitLightning() = 4025
SplitLightning2() = 5023
Not the first time I've seen this behaviour, even though disabledebugger is used there is still a lot of debugger overhead so for speed/optimization tests it's wiser to compiler without debugger.
Also try with shorter test strings and you'll see yet again different results.
Using only "this,is,a,test,string" as string instead of that long monster string I got this:
Split() = 7082
SplitLightning() = 2543
SplitLightning2() = 1310
And when doing a compile without debugger I got:
Split() = 2543
SplitLightning() = 515
SplitLightning2() = 421
So for shorter strings my SplitLighting2() seems best, for medium strings SplitLightning() for really long strings things even out more as main memory overhead rears it's head rather than cpu speed/cache etc.
There are probably faster ways to do this, and I'm pretty sure that a ASM variant would be the winner but I'll leave that challenge to someone else

Code: Select all
EnableExplicit
Macro MidMemLightning(string,startpos,length)
PeekS((@string)+((startpos)*SizeOf(Character))-SizeOf(Character),(length))
EndMacro
Structure MemoryArray
Byte.c[0]
word.w[0]
EndStructure
Procedure.l SplitFaster(StringArray.s(1), Text2Split.s, Delim.s) ;return count
Protected FindLen.l,MainLen.l,StringCount.l,*MainByteArray.MemoryArray,FoundPos.l
Protected *FindByteArray.MemoryArray,PrevPos.l,i.l,MainArrayLoop.l,EndSearchPos.l
FindLen = Len(Delim)
MainLen = Len(Text2Split)
Dim StringArray.s(1000)
StringCount = 0
*MainByteArray = @Text2Split ;*MainMem
*FindByteArray = @Delim ;*FindMem
PrevPos = 1
; Build BadChr Array
Dim BadChar.l(255)
; set all alphabet to max shift pos (length of find string plus 1)
For i = 0 To 255
BadChar(i) = FindLen + 1
Next
;Update chars that are in the find string to their position from the end.
For i = 0 To FindLen -1
BadChar(*FindByteArray\byte[i]) = FindLen - i
Next
MainArrayLoop = 1
EndSearchPos = MainLen - (FindLen -1)
While MainArrayLoop <= EndSearchPos
If CompareMemory(@Text2Split + MainArrayLoop, @Delim, FindLen) = 1
FoundPos = MainArrayLoop + 1
If StringCount % 1000 = 0 ; not really needed, doesn't have much of a speed increase. This used to do a lot in the old VB days
ReDim StringArray.s(StringCount + 1000)
EndIf
StringArray(StringCount) = MidMemLightning(Text2Split, Prevpos, Foundpos - PrevPos) ;Mid(Text2Split, Prevpos, Foundpos - PrevPos) ;"HEllo, this is some text" + #TAB$ + " " + #TAB$ + "esdfsdf"
StringCount = StringCount + 1
PrevPos = foundpos + Findlen
EndIf
;Didn't find the string so shift as per the table.
MainArrayLoop + BadChar(*MainByteArray.MemoryArray\byte[MainArrayLoop + FindLen])
Wend
;catch end
ReDim StringArray.s(StringCount)
StringArray(StringCount) = MidMemLightning(Text2Split, Prevpos, MainLen - PrevPos +1)
StringCount = StringCount + 1
ReDim StringArray.s(StringCount)
ProcedureReturn StringCount
EndProcedure
Procedure.l SplitLightning(StringArray.s(1),text$,delimiter$) ;returns stringcount
Protected delimiterlen.l,textlen.l,stringcount.l,stringindex.l=0
Protected *text.Character,*textend,*delimiter.Character,*delimiterend
delimiterlen=Len(delimiter$)
textlen=Len(text$)
If textlen And delimiterlen
stringcount=0
*text=@text$
*textend=@text$+(textlen*SizeOf(Character))
*delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))
For *text=@text$ To *textend Step SizeOf(Character)
For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
If *text\c=*delimiter\c
stringcount+1
Break
EndIf
Next
Next
If stringcount
Dim StringArray.s(stringcount-1) ;starts at index 0
*text=@text$
*textend=@text$+(textlen*SizeOf(Character))-SizeOf(Character)
*delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))-SizeOf(Character)
textlen=0
For *text=@text$ To *textend Step SizeOf(Character)
For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
If *text\c=*delimiter\c
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
stringindex+1
textlen=0
*text+SizeOf(Character)
Break
EndIf
Next
textlen+1
Next
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
EndIf
EndIf
ProcedureReturn stringcount
EndProcedure
Procedure.l SplitLightning2(StringArray.s(1),text$,delimiter$) ;returns stringcount
Protected delimiterlen.l,textlen.l,stringcount.l,stringindex.l=0
Protected *text.Character,*textend,*delimiter.Character,*delimiterend
delimiterlen=Len(delimiter$)
textlen=Len(text$)
If textlen And delimiterlen
*text=@text$
*textend=@text$+(textlen*SizeOf(Character))-SizeOf(Character)
*delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))-SizeOf(Character)
textlen=0
For *text=@text$ To *textend Step SizeOf(Character)
For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
If *text\c=*delimiter\c
ReDim StringArray.s(stringindex+1)
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
stringindex+1
textlen=0
*text+SizeOf(Character)
Break
EndIf
Next
textlen+1
Next
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
EndIf
ProcedureReturn stringindex
EndProcedure
Define i.l,n.l,start1.l,start2.l,start3.l,start4.l
Dim StringArray.s(0)
Debug "Split()"
n=SplitFaster(StringArray(),"this,is,a,test,string",",")
If n
For i=0 To n
Debug Str(i)+" = "+StringArray(i)
Next
EndIf
Debug ""
Debug "SplitLightning()"
n=SplitLightning(StringArray(),"this,is-a.test,string",",.-")
If n
n-1
For i=0 To n
Debug Str(i)+" = "+StringArray(i)
Next
EndIf
DisableDebugger
start1=ElapsedMilliseconds()
For i=1 To 100000
n=SplitFaster(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string",",")
Next
start2=ElapsedMilliseconds()
For i=1 To 100000
n=SplitLightning(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string",",")
Next
start3=ElapsedMilliseconds()
For i=1 To 100000
n=SplitLightning2(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string",",")
Next
start4=ElapsedMilliseconds()
EnableDebugger
MessageRequester("Split() vs SplitLightning()", "Split() = " + Str(start2-start1)+#CRLF$+"SplitLightning() = "+Str(start3-start2)+#CRLF$+"SplitLightning2() = "+Str(start4-start3))
Thanks, I'll spend some time with this.
If you want to turn your times on their head, use this same code but just change the splitter from "," to "string". Since I'm using quicksearch, the longer the deliminator, the faster it searches. If you are building a string to be split a lot later, put a looooong delimiter in there and it will get faster and faster! It would never be quite faster to swap the delimeter for a longer one to make it faster unless you intend to split the same data more than once, but if you have to build the string itself at some point, the longer the better..
If you want to turn your times on their head, use this same code but just change the splitter from "," to "string". Since I'm using quicksearch, the longer the deliminator, the faster it searches. If you are building a string to be split a lot later, put a looooong delimiter in there and it will get faster and faster! It would never be quite faster to swap the delimeter for a longer one to make it faster unless you intend to split the same data more than once, but if you have to build the string itself at some point, the longer the better..
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Ah I see the issue. your split is using a string delimiter only, not a multiple delimiter.
I'll add a string mode option and the times should be the same as the current multi delimiter mode.
You see in the case of using "string" as a delimiter the routine is treating it as a multi delimiter like
"s" "t" "r" "i" "n" "g"
So obviously it's slower as that is 6 times more delimiters than just "string".
I'll mess with this code later tonight to add a string delimiter mode.
I'll add a string mode option and the times should be the same as the current multi delimiter mode.
You see in the case of using "string" as a delimiter the routine is treating it as a multi delimiter like
"s" "t" "r" "i" "n" "g"
So obviously it's slower as that is 6 times more delimiters than just "string".
I'll mess with this code later tonight to add a string delimiter mode.
On top of that, the scan moves though the main string in steps the size of match string.
It tries to match "g" in "string" against char 6 of the main string and if it's not there, it slides forward by an amount depending on what the result of the comparison was.
If it was a "z" (or anything not in "string") then it can slide 6 chars and compare g again, if it spots an "r"
then it slides 4 chars for a match. Because this table is in-built it becomes very fast for cases where strings are long.
For a single char comparison like "," its power is lost so it becomes debateable as to whether the algorithm is appropriate in this case at all.
It tries to match "g" in "string" against char 6 of the main string and if it's not there, it slides forward by an amount depending on what the result of the comparison was.
If it was a "z" (or anything not in "string") then it can slide 6 chars and compare g again, if it spots an "r"
then it slides 4 chars for a match. Because this table is in-built it becomes very fast for cases where strings are long.
For a single char comparison like "," its power is lost so it becomes debateable as to whether the algorithm is appropriate in this case at all.
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Improved the code some more.
mode 0 (default) supports multi delimiter, where each character is a delimiter
mode 1 supports a case sensitive string delimiter, where the delimiter is treated as a full string
mode 2 supports a case insensitive string delimiter, where the delimiter is treated as a full string
mode 2 (case insensistive delimiter string) is slightly slower than mode 1 (case sensitive delimiter string)
mode 0 (case sensitive delimiter chars should be the same as the previous posted code)
Unicode or Ansi compile shows very little difference.
I got these numbers
Compile with debugger (F5):
Split() = 8143
SplitLightning() = 9734
SplitLightning2() = 6116
Compile without debugger:
Split() = 3151
SplitLightning() = 2169
SplitLightning2() = 1778
Even though I'm critical to the ReDim for each match (memory fragmentation/overhead etc) I think I would recommend SplitLightning2() as it seems to be fastest in most cases compared to SplitLightning().
So SplitLightning2() is rather interesting, I think I might use that one myself, it would be very practical as a alternative to the StringField() function, and as StringField() do not support multi delimiters nor string delimiters it's just that more flexible and could be used to split ini lines, html/xml and much more. And the case insensitive mode for delimiter strings is really cool.
Oh yeah, and consider this code public domain, do with as you please folks!
mode 0 (default) supports multi delimiter, where each character is a delimiter
mode 1 supports a case sensitive string delimiter, where the delimiter is treated as a full string
mode 2 supports a case insensitive string delimiter, where the delimiter is treated as a full string
mode 2 (case insensistive delimiter string) is slightly slower than mode 1 (case sensitive delimiter string)
mode 0 (case sensitive delimiter chars should be the same as the previous posted code)
Unicode or Ansi compile shows very little difference.
I got these numbers
Compile with debugger (F5):
Split() = 8143
SplitLightning() = 9734
SplitLightning2() = 6116
Compile without debugger:
Split() = 3151
SplitLightning() = 2169
SplitLightning2() = 1778
Even though I'm critical to the ReDim for each match (memory fragmentation/overhead etc) I think I would recommend SplitLightning2() as it seems to be fastest in most cases compared to SplitLightning().
So SplitLightning2() is rather interesting, I think I might use that one myself, it would be very practical as a alternative to the StringField() function, and as StringField() do not support multi delimiters nor string delimiters it's just that more flexible and could be used to split ini lines, html/xml and much more. And the case insensitive mode for delimiter strings is really cool.
Oh yeah, and consider this code public domain, do with as you please folks!

Code: Select all
EnableExplicit
Macro MidMemLightning(string,startpos,length)
PeekS((@string)+((startpos)*SizeOf(Character))-SizeOf(Character),(length))
EndMacro
Structure MemoryArray
Byte.c[0]
word.w[0]
EndStructure
Procedure.l SplitFaster(StringArray.s(1), Text2Split.s, Delim.s) ;return count
Protected FindLen.l,MainLen.l,StringCount.l,*MainByteArray.MemoryArray,FoundPos.l
Protected *FindByteArray.MemoryArray,PrevPos.l,i.l,MainArrayLoop.l,EndSearchPos.l
FindLen = Len(Delim)
MainLen = Len(Text2Split)
Dim StringArray.s(1000)
StringCount = 0
*MainByteArray = @Text2Split ;*MainMem
*FindByteArray = @Delim ;*FindMem
PrevPos = 1
; Build BadChr Array
Dim BadChar.l(255)
; set all alphabet to max shift pos (length of find string plus 1)
For i = 0 To 255
BadChar(i) = FindLen + 1
Next
;Update chars that are in the find string to their position from the end.
For i = 0 To FindLen -1
BadChar(*FindByteArray\byte[i]) = FindLen - i
Next
MainArrayLoop = 1
EndSearchPos = MainLen - (FindLen -1)
While MainArrayLoop <= EndSearchPos
If CompareMemory(@Text2Split + MainArrayLoop, @Delim, FindLen) = 1
FoundPos = MainArrayLoop + 1
If StringCount % 1000 = 0 ; not really needed, doesn't have much of a speed increase. This used to do a lot in the old VB days
ReDim StringArray.s(StringCount + 1000)
EndIf
StringArray(StringCount) = MidMemLightning(Text2Split, Prevpos, Foundpos - PrevPos) ;Mid(Text2Split, Prevpos, Foundpos - PrevPos) ;"HEllo, this is some text" + #TAB$ + " " + #TAB$ + "esdfsdf"
StringCount = StringCount + 1
PrevPos = foundpos + Findlen
EndIf
;Didn't find the string so shift as per the table.
MainArrayLoop + BadChar(*MainByteArray.MemoryArray\byte[MainArrayLoop + FindLen])
Wend
;catch end
ReDim StringArray.s(StringCount)
StringArray(StringCount) = MidMemLightning(Text2Split, Prevpos, MainLen - PrevPos +1)
StringCount = StringCount + 1
ReDim StringArray.s(StringCount)
ProcedureReturn StringCount
EndProcedure
;mode 0 (default) supports multi delimiter, where each character is a delimiter
;mode 1 supports a case sensitive string delimiter, where the delimiter is treated as a full string
;mode 2 supports a case insensitive string delimiter, where the delimiter is treated as a full string
Procedure.l SplitLightning(StringArray.s(1),text$,delimiter$,mode.l=0) ;returns stringcount
Protected delimiterlen.l,textlen.l,stringcount.l,stringindex.l=0,casesensitive.l=0
Protected *text.Character,*textend,*delimiter.Character,*delimiterend
delimiterlen=Len(delimiter$)
textlen=Len(text$)
If textlen And delimiterlen
stringcount=0
*text=@text$
*textend=@text$+(textlen*SizeOf(Character))
*delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))
If mode=0
For *text=@text$ To *textend Step SizeOf(Character)
For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
If *text\c=*delimiter\c
stringcount+1
*text+SizeOf(Character)
Break
EndIf
Next
Next
ElseIf (mode=1) Or (mode=2)
If mode=2 : casesensitive=1 : EndIf
If delimiterlen>textlen : delimiterlen=textlen : EndIf
For *text=@text$ To *textend Step SizeOf(Character)
If Not CompareMemoryString(*text,@delimiter$,casesensitive,delimiterlen)
stringcount+1
*text+(delimiterlen*SizeOf(Character))
EndIf
Next
EndIf
If stringcount
If mode<>0
stringcount+1
EndIf
Dim StringArray.s(stringcount-1) ;starts at index 0
*text=@text$
*textend=@text$+(textlen*SizeOf(Character))-SizeOf(Character)
*delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))-SizeOf(Character)
textlen=0
If mode=0
For *text=@text$ To *textend Step SizeOf(Character)
For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
If *text\c=*delimiter\c
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
stringindex+1
textlen=0
*text+SizeOf(Character)
Break
EndIf
Next
textlen+1
Next
Else
For *text=@text$ To *textend Step SizeOf(Character)
If Not CompareMemoryString(*text,@delimiter$,casesensitive,delimiterlen)
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
stringindex+1
textlen=0
*text+(delimiterlen*SizeOf(Character))
EndIf
textlen+1
Next
EndIf
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
EndIf
EndIf
ProcedureReturn stringcount
EndProcedure
;mode 0 (default) supports multi delimiter, where each character is a delimiter
;mode 1 supports a case sensitive string delimiter, where the delimiter is treated as a full string
;mode 2 supports a case insensitive string delimiter, where the delimiter is treated as a full string
Procedure.l SplitLightning2(StringArray.s(1),text$,delimiter$,mode.l=0) ;returns stringcount
Protected delimiterlen.l,textlen.l,stringcount.l,stringindex.l=0,casesensitive.l=0
Protected *text.Character,*textend,*delimiter.Character,*delimiterend
delimiterlen=Len(delimiter$)
textlen=Len(text$)
If textlen And delimiterlen
*text=@text$
*textend=@text$+(textlen*SizeOf(Character))-SizeOf(Character)
*delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))-SizeOf(Character)
If mode=0
textlen=0
For *text=@text$ To *textend Step SizeOf(Character)
For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
If *text\c=*delimiter\c
ReDim StringArray.s(stringindex+1)
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
stringindex+1
textlen=0
*text+SizeOf(Character)
Break
EndIf
Next
textlen+1
Next
Else
If mode=2 : casesensitive=0 : EndIf
If delimiterlen>textlen : delimiterlen=textlen : EndIf
textlen=0
For *text=@text$ To *textend Step SizeOf(Character)
If Not CompareMemoryString(*text,@delimiter$,casesensitive,delimiterlen)
ReDim StringArray.s(stringindex+1)
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
stringindex+1
textlen=0
*text+(delimiterlen*SizeOf(Character))
EndIf
textlen+1
Next
EndIf
If stringindex
StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
stringindex+1
EndIf
EndIf
ProcedureReturn stringindex
EndProcedure
Define i.l,n.l,start1.l,start2.l,start3.l,start4.l
Dim StringArray.s(0)
Debug "Split()"
n=SplitFaster(StringArray(),"this,is,a,test,string,this,is,a,test,string","string")
If n
For i=0 To n
Debug Str(i)+" = "+StringArray(i)
Next
EndIf
Debug ""
Debug "SplitLightning()"
n=SplitLightning(StringArray(),"this,is,a,test,string,this,is,a,test,string","string",1)
If n
n-1
For i=0 To n
Debug Str(i)+" = "+StringArray(i)
Next
EndIf
DisableDebugger
start1=ElapsedMilliseconds()
For i=1 To 100000
n=SplitFaster(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string","string")
Next
start2=ElapsedMilliseconds()
For i=1 To 100000
n=SplitLightning(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string","string",1)
Next
start3=ElapsedMilliseconds()
For i=1 To 100000
n=SplitLightning2(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string","string",1)
Next
start4=ElapsedMilliseconds()
EnableDebugger
MessageRequester("Split() vs SplitLightning()", "Split() = " + Str(start2-start1)+#CRLF$+"SplitLightning() = "+Str(start3-start2)+#CRLF$+"SplitLightning2() = "+Str(start4-start3))
Interesting to see your comment on redim.
The first time I worked on this, I added about 100 elements to the array at a time then cleaned up at the end to reduce the number of redims. I turned out though that this had absolutely no performance benefit. (unlike in my VB days 7-8 years back that would have been a huge performance tweak.
I guess PB's memory management is a lot better.
It's going to take me a few days to digest the rest of this, thanks!
The first time I worked on this, I added about 100 elements to the array at a time then cleaned up at the end to reduce the number of redims. I turned out though that this had absolutely no performance benefit. (unlike in my VB days 7-8 years back that would have been a huge performance tweak.
I guess PB's memory management is a lot better.
It's going to take me a few days to digest the rest of this, thanks!
Paul Dwyer
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein