PB Strings, Can be serveral thousand times too slow

Everything else that doesn't fall into one of the other PB categories.
superadnim
Enthusiast
Enthusiast
Posts: 480
Joined: Thu Jul 27, 2006 4:06 am

Post by superadnim »

Isn't the OP's suggestion causing a memory leak due to the allocated memory he then copies with PeekS on return ? .. IMO that's not cool.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

As I mentioned:
The Split function currently looks like this: (But I want to clean it up and merge the midmem() into it and get rid of the function call)
Otherwise, I can't release the memory :?
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
superadnim
Enthusiast
Enthusiast
Posts: 480
Joined: Thu Jul 27, 2006 4:06 am

Post by superadnim »

Sorry, didn't read all :P

How about this for a stand alone version though? (untested...)

Code: Select all

Procedure.s MidFast(dwSrc.l, dwStart.l, dwEnd.l) 
	
	If dwSrc	
		If (dwEnd > 0) And (dwStart > 0)
			
			Define.s szRet
			szRet = Space(dwEnd)
			
			CopyMemory(dwSrc + dwStart -1, @szRet, dwEnd)
			ProcedureReturn szRet
			
		EndIf
	EndIf
	
EndProcedure 


str.s = "Hello World"

Debug MidFast(@str, 1, 4)
Debug MidFast(@str, 8, 2)
Debug MidFast(@str, 7, 5)
Debug "-----------------"
Debug Mid(str, 1, 4)
Debug Mid(str, 8, 2)
Debug Mid(str, 7, 5)

I'm sure someone will come up with a better idea - Within the same context.

Just make sure you don't go off bounds... it might not crash at most of the times, but thats due to the fact that memory pages are allocated to the nearest power of 2... Making sure with a Len() call before the MidFast() is a good thing, perhaps with a macro.

Substituting the whole pb string set for a custom one could as well be a better solution though.

Null terminated, yet, with a small structure with string information would be ideal.

:lol: should I bash the keyboard and give up?
:?
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Does szRet not go out of scope?

This sort of thing confuses me a little I admit
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

superadnim, You've inspired me! :D

Your code fixes my mem leak bug and is fast. I wanted to get rid of the space() function (not for any real reason I can put my finger on though :P ) so I came up with a cross between yours and mine. In actually it's not any faster than yours but both are thousands of times faster than PB's when the strings get longer!

Thanks!!

Code: Select all

Procedure.s MidMem(*MainMem, StartPos.l, GetLen.l)  

    If *MainMem = 0 Or GetLen = 0
        ProcedureReturn ""
    EndIf
    
    *RetMem = AllocateMemory(GetLen)
    CopyMemory(*MainMem + StartPos -1, *RetMem, GetLen)
    ReturnString.s = PeekS(*RetMem,GetLen)
    FreeMemory(*RetMem)
    
    ProcedureReturn ReturnString

EndProcedure 
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

I couldn't help myself, and this also shows how brilliant macros are, no function call overheads. just pure PureBasic Power ;)
And as a bonus my MidMemLightning() also handles unicode just fine unlike MidMem() :P
Obviously just like MidMem() there is no bounds check here either, if you want safety use PB's Mid() instead. (just a reminder to the noobies out there :)

Code: Select all

EnableExplicit

Macro MidMemLightning(string,startpos,length)
 PeekS(@string+(startpos*SizeOf(Character))-SizeOf(Character),length)
EndMacro

Procedure.s MidMem(*MainMem, StartPos.l, GetLen.l) 
 Protected *RetMem,ReturnString.s
    If *MainMem = 0 Or GetLen = 0
        ProcedureReturn ""
    EndIf
   
    *RetMem = AllocateMemory(GetLen)
    CopyMemory(*MainMem + StartPos -1, *RetMem, GetLen)
    ReturnString = PeekS(*RetMem,GetLen)
    FreeMemory(*RetMem)
   
    ProcedureReturn ReturnString

EndProcedure

Define str$,TestString$,Output$,Start1.l,Start2.l,Start3.l,Start4.l,i.l

str$ = "Hello World"

Debug MidMem(@str$, 1, 4)
Debug MidMem(@str$, 8, 2)
Debug MidMem(@str$, 7, 5)
Debug "-----------------"
Debug Mid(str$, 1, 4)
Debug Mid(str$, 8, 2)
Debug Mid(str$, 7, 5)
Debug "-----------------"
Debug MidMemLightning(str$, 1, 4)
Debug MidMemLightning(str$, 8, 2)
Debug MidMemLightning(str$, 7, 5)

DisableDebugger

TestString$ = Space(20000000)
Start1 = ElapsedMilliseconds()

For i = 1 To 100000
    Output$ = "!" + Midmem(@TestString$,i*100,10) + "!"
Next

Start2 = ElapsedMilliseconds()

For i = 1 To 10000
    Output$ = "!" + Mid(TestString$,i*100,10) + "!"   
Next

Start3 = ElapsedMilliseconds()

For i = 1 To 100000
    Output$ = "!" + MidMemLightning(TestString$,i*100,10) + "!"   
Next

Start4 = ElapsedMilliseconds()


EnableDebugger
MessageRequester("Mid vs MidMem vs MidMemLightning", "Mid() = " + Str(Start3-Start2) + " (10000 rounds only as it's so slow)" + #CRLF$ +  "MidMem() = " + Str(Start2-Start1) + " (100000 rounds)" + #CRLF$ +  "MidMemLightning() ="+Str(Start4-Start3) + " (100000 rounds)"  )
PS! You may notice the absense of checking the string pointer or the length given to the macro, but PeekS handles 0 length just fine, and since you normally pass a string (so you normally always have valid memory it's pointing to), even if it's empty PeekS handles that ok as well.
So not checking that saves some cpu as well.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Legend! 8)

Even without the macro, pulling the middle out of it without the extra alloc-copymemory-free should be lots faster anway.

It's aways great when this forum gets competitive. :)

I don't have a compiler at work so I'll test it when I get home but I see what you've done and chopped out an unnecessary step!
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

For some reason, instead of doing work on my own code I ended up fiddling with this instead :roll:

Your Split() plus the improved Mid macro, and two new Split implementations.
My implementations support unicode, your's don't.
A note about the two variants I made, the first scans to get the count to avoid the need to redim, the second variant does not do this but redims instead.
And both my implementations support a unlimited number of split strings and unlimited number of dividers, only available memory should be the real limitation here.
And my implementations also handle strings like "," and "this,is," and ",is" for example, yours seems to act a bit weird on that.

And here's the really interesting thing.
Run the source as it is and you'll get numbers like:
Split() = 14976
SplitLightning() = 23650
SplitLightning2() = 13213

But choose Compile without debugger in the IDE menu and you get numbers like these:
Split() = 5866
SplitLightning() = 4025
SplitLightning2() = 5023

Not the first time I've seen this behaviour, even though disabledebugger is used there is still a lot of debugger overhead so for speed/optimization tests it's wiser to compiler without debugger.

Also try with shorter test strings and you'll see yet again different results.
Using only "this,is,a,test,string" as string instead of that long monster string I got this:
Split() = 7082
SplitLightning() = 2543
SplitLightning2() = 1310

And when doing a compile without debugger I got:
Split() = 2543
SplitLightning() = 515
SplitLightning2() = 421

So for shorter strings my SplitLighting2() seems best, for medium strings SplitLightning() for really long strings things even out more as main memory overhead rears it's head rather than cpu speed/cache etc.

There are probably faster ways to do this, and I'm pretty sure that a ASM variant would be the winner but I'll leave that challenge to someone else ;)

Code: Select all

EnableExplicit

Macro MidMemLightning(string,startpos,length)
 PeekS((@string)+((startpos)*SizeOf(Character))-SizeOf(Character),(length))
EndMacro

Structure MemoryArray
    Byte.c[0]
    word.w[0]
EndStructure

Procedure.l SplitFaster(StringArray.s(1), Text2Split.s, Delim.s) ;return count
Protected FindLen.l,MainLen.l,StringCount.l,*MainByteArray.MemoryArray,FoundPos.l
Protected *FindByteArray.MemoryArray,PrevPos.l,i.l,MainArrayLoop.l,EndSearchPos.l

    FindLen = Len(Delim)
    MainLen = Len(Text2Split)
    Dim StringArray.s(1000)
    StringCount = 0
   
    *MainByteArray = @Text2Split  ;*MainMem
    *FindByteArray = @Delim       ;*FindMem

    PrevPos = 1

    ; Build BadChr Array
    Dim BadChar.l(255)
   
    ; set all alphabet to max shift pos (length of find string plus 1)
    For i = 0 To 255
        BadChar(i)  =  FindLen + 1
    Next
   
    ;Update chars that are in the find string to their position from the end.
    For i = 0 To FindLen -1
        BadChar(*FindByteArray\byte[i]) = FindLen - i   
    Next     

    MainArrayLoop = 1
    EndSearchPos = MainLen - (FindLen -1)
   
    While MainArrayLoop <= EndSearchPos
   
        If CompareMemory(@Text2Split + MainArrayLoop, @Delim, FindLen) = 1
            FoundPos = MainArrayLoop + 1
           
            If StringCount % 1000 = 0  ; not really needed, doesn't have much of a speed increase. This used to do a lot in the old VB days         
                ReDim StringArray.s(StringCount + 1000)
            EndIf
               
            StringArray(StringCount) = MidMemLightning(Text2Split, Prevpos, Foundpos - PrevPos) ;Mid(Text2Split, Prevpos, Foundpos - PrevPos) ;"HEllo, this is some text" + #TAB$ + "  " + #TAB$ + "esdfsdf"
            StringCount = StringCount + 1
            PrevPos = foundpos + Findlen

        EndIf
        ;Didn't find the string so shift as per the table.
        MainArrayLoop + BadChar(*MainByteArray.MemoryArray\byte[MainArrayLoop + FindLen])
    Wend
 
    ;catch end
    ReDim StringArray.s(StringCount)
    StringArray(StringCount) = MidMemLightning(Text2Split, Prevpos, MainLen - PrevPos +1)
    StringCount = StringCount + 1
    ReDim StringArray.s(StringCount)

    ProcedureReturn StringCount

EndProcedure

Procedure.l SplitLightning(StringArray.s(1),text$,delimiter$) ;returns stringcount
 Protected delimiterlen.l,textlen.l,stringcount.l,stringindex.l=0
 Protected *text.Character,*textend,*delimiter.Character,*delimiterend

 delimiterlen=Len(delimiter$)
 textlen=Len(text$)
 If textlen And delimiterlen

  stringcount=0
  *text=@text$
  *textend=@text$+(textlen*SizeOf(Character))
  *delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))
  For *text=@text$ To *textend Step SizeOf(Character)
   For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
    If *text\c=*delimiter\c
     stringcount+1
     Break
    EndIf
   Next
  Next

  If stringcount
   Dim StringArray.s(stringcount-1) ;starts at index 0
   *text=@text$
   *textend=@text$+(textlen*SizeOf(Character))-SizeOf(Character)
   *delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))-SizeOf(Character)
   textlen=0
   For *text=@text$ To *textend Step SizeOf(Character)
    For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
     If *text\c=*delimiter\c
      StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
      stringindex+1
      textlen=0
      *text+SizeOf(Character)
      Break
     EndIf
    Next
    textlen+1
   Next
   StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
  EndIf

 EndIf

 ProcedureReturn stringcount
EndProcedure

Procedure.l SplitLightning2(StringArray.s(1),text$,delimiter$) ;returns stringcount
 Protected delimiterlen.l,textlen.l,stringcount.l,stringindex.l=0
 Protected *text.Character,*textend,*delimiter.Character,*delimiterend

 delimiterlen=Len(delimiter$)
 textlen=Len(text$)
 If textlen And delimiterlen
  *text=@text$
  *textend=@text$+(textlen*SizeOf(Character))-SizeOf(Character)
  *delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))-SizeOf(Character)
  textlen=0
  For *text=@text$ To *textend Step SizeOf(Character)
   For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
    If *text\c=*delimiter\c
     ReDim StringArray.s(stringindex+1)
     StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
     stringindex+1
     textlen=0
     *text+SizeOf(Character)
     Break
    EndIf
   Next
   textlen+1
  Next
  StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
 EndIf

 ProcedureReturn stringindex
EndProcedure


Define i.l,n.l,start1.l,start2.l,start3.l,start4.l
Dim StringArray.s(0)

Debug "Split()"
n=SplitFaster(StringArray(),"this,is,a,test,string",",")
If n
 For i=0 To n
  Debug Str(i)+" = "+StringArray(i)
 Next
EndIf

Debug ""
Debug "SplitLightning()"
n=SplitLightning(StringArray(),"this,is-a.test,string",",.-")
If n
 n-1
 For i=0 To n
  Debug Str(i)+" = "+StringArray(i)
 Next
EndIf

DisableDebugger

start1=ElapsedMilliseconds()
For i=1 To 100000
 n=SplitFaster(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string",",")
Next
start2=ElapsedMilliseconds()
For i=1 To 100000
 n=SplitLightning(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string",",")
Next
start3=ElapsedMilliseconds()
For i=1 To 100000
 n=SplitLightning2(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string",",")
Next
start4=ElapsedMilliseconds()

EnableDebugger
MessageRequester("Split() vs SplitLightning()", "Split() = " + Str(start2-start1)+#CRLF$+"SplitLightning() = "+Str(start3-start2)+#CRLF$+"SplitLightning2() = "+Str(start4-start3))
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Thanks, I'll spend some time with this.

If you want to turn your times on their head, use this same code but just change the splitter from "," to "string". Since I'm using quicksearch, the longer the deliminator, the faster it searches. If you are building a string to be split a lot later, put a looooong delimiter in there and it will get faster and faster! It would never be quite faster to swap the delimeter for a longer one to make it faster unless you intend to split the same data more than once, but if you have to build the string itself at some point, the longer the better..
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

Ah I see the issue. your split is using a string delimiter only, not a multiple delimiter.
I'll add a string mode option and the times should be the same as the current multi delimiter mode.

You see in the case of using "string" as a delimiter the routine is treating it as a multi delimiter like
"s" "t" "r" "i" "n" "g"

So obviously it's slower as that is 6 times more delimiters than just "string".
I'll mess with this code later tonight to add a string delimiter mode.
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

On top of that, the scan moves though the main string in steps the size of match string.

It tries to match "g" in "string" against char 6 of the main string and if it's not there, it slides forward by an amount depending on what the result of the comparison was.
If it was a "z" (or anything not in "string") then it can slide 6 chars and compare g again, if it spots an "r"
then it slides 4 chars for a match. Because this table is in-built it becomes very fast for cases where strings are long.

For a single char comparison like "," its power is lost so it becomes debateable as to whether the algorithm is appropriate in this case at all.
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

Improved the code some more.
mode 0 (default) supports multi delimiter, where each character is a delimiter
mode 1 supports a case sensitive string delimiter, where the delimiter is treated as a full string
mode 2 supports a case insensitive string delimiter, where the delimiter is treated as a full string

mode 2 (case insensistive delimiter string) is slightly slower than mode 1 (case sensitive delimiter string)
mode 0 (case sensitive delimiter chars should be the same as the previous posted code)
Unicode or Ansi compile shows very little difference.

I got these numbers
Compile with debugger (F5):
Split() = 8143
SplitLightning() = 9734
SplitLightning2() = 6116

Compile without debugger:
Split() = 3151
SplitLightning() = 2169
SplitLightning2() = 1778


Even though I'm critical to the ReDim for each match (memory fragmentation/overhead etc) I think I would recommend SplitLightning2() as it seems to be fastest in most cases compared to SplitLightning().

So SplitLightning2() is rather interesting, I think I might use that one myself, it would be very practical as a alternative to the StringField() function, and as StringField() do not support multi delimiters nor string delimiters it's just that more flexible and could be used to split ini lines, html/xml and much more. And the case insensitive mode for delimiter strings is really cool.

Oh yeah, and consider this code public domain, do with as you please folks! :)

Code: Select all

EnableExplicit

Macro MidMemLightning(string,startpos,length)
 PeekS((@string)+((startpos)*SizeOf(Character))-SizeOf(Character),(length))
EndMacro

Structure MemoryArray
    Byte.c[0]
    word.w[0]
EndStructure

Procedure.l SplitFaster(StringArray.s(1), Text2Split.s, Delim.s) ;return count
Protected FindLen.l,MainLen.l,StringCount.l,*MainByteArray.MemoryArray,FoundPos.l
Protected *FindByteArray.MemoryArray,PrevPos.l,i.l,MainArrayLoop.l,EndSearchPos.l

    FindLen = Len(Delim)
    MainLen = Len(Text2Split)
    Dim StringArray.s(1000)
    StringCount = 0
   
    *MainByteArray = @Text2Split  ;*MainMem
    *FindByteArray = @Delim       ;*FindMem

    PrevPos = 1

    ; Build BadChr Array
    Dim BadChar.l(255)
   
    ; set all alphabet to max shift pos (length of find string plus 1)
    For i = 0 To 255
        BadChar(i)  =  FindLen + 1
    Next
   
    ;Update chars that are in the find string to their position from the end.
    For i = 0 To FindLen -1
        BadChar(*FindByteArray\byte[i]) = FindLen - i   
    Next     

    MainArrayLoop = 1
    EndSearchPos = MainLen - (FindLen -1)
   
    While MainArrayLoop <= EndSearchPos
   
        If CompareMemory(@Text2Split + MainArrayLoop, @Delim, FindLen) = 1
            FoundPos = MainArrayLoop + 1
           
            If StringCount % 1000 = 0  ; not really needed, doesn't have much of a speed increase. This used to do a lot in the old VB days         
                ReDim StringArray.s(StringCount + 1000)
            EndIf
               
            StringArray(StringCount) = MidMemLightning(Text2Split, Prevpos, Foundpos - PrevPos) ;Mid(Text2Split, Prevpos, Foundpos - PrevPos) ;"HEllo, this is some text" + #TAB$ + "  " + #TAB$ + "esdfsdf"
            StringCount = StringCount + 1
            PrevPos = foundpos + Findlen

        EndIf
        ;Didn't find the string so shift as per the table.
        MainArrayLoop + BadChar(*MainByteArray.MemoryArray\byte[MainArrayLoop + FindLen])
    Wend
 
    ;catch end
    ReDim StringArray.s(StringCount)
    StringArray(StringCount) = MidMemLightning(Text2Split, Prevpos, MainLen - PrevPos +1)
    StringCount = StringCount + 1
    ReDim StringArray.s(StringCount)

    ProcedureReturn StringCount

EndProcedure

;mode 0 (default) supports multi delimiter, where each character is a delimiter
;mode 1 supports a case sensitive string delimiter, where the delimiter is treated as a full string
;mode 2 supports a case insensitive string delimiter, where the delimiter is treated as a full string
Procedure.l SplitLightning(StringArray.s(1),text$,delimiter$,mode.l=0) ;returns stringcount
 Protected delimiterlen.l,textlen.l,stringcount.l,stringindex.l=0,casesensitive.l=0
 Protected *text.Character,*textend,*delimiter.Character,*delimiterend

 delimiterlen=Len(delimiter$)
 textlen=Len(text$)
 If textlen And delimiterlen

  stringcount=0
  *text=@text$
  *textend=@text$+(textlen*SizeOf(Character))
  *delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))
  If mode=0
   For *text=@text$ To *textend Step SizeOf(Character)
    For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
     If *text\c=*delimiter\c
      stringcount+1
      *text+SizeOf(Character)
      Break
     EndIf
    Next
   Next
  ElseIf (mode=1) Or (mode=2)
   If mode=2 : casesensitive=1 : EndIf
   If delimiterlen>textlen : delimiterlen=textlen : EndIf
   For *text=@text$ To *textend Step SizeOf(Character)
    If Not CompareMemoryString(*text,@delimiter$,casesensitive,delimiterlen)
     stringcount+1
     *text+(delimiterlen*SizeOf(Character))
    EndIf
   Next
  EndIf

  If stringcount
   If mode<>0
    stringcount+1
   EndIf
   Dim StringArray.s(stringcount-1) ;starts at index 0
   *text=@text$
   *textend=@text$+(textlen*SizeOf(Character))-SizeOf(Character)
   *delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))-SizeOf(Character)
   textlen=0
   If mode=0
    For *text=@text$ To *textend Step SizeOf(Character)
     For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
      If *text\c=*delimiter\c
       StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
       stringindex+1
       textlen=0
       *text+SizeOf(Character)
       Break
      EndIf
     Next
     textlen+1
    Next
   Else
    For *text=@text$ To *textend Step SizeOf(Character)
     If Not CompareMemoryString(*text,@delimiter$,casesensitive,delimiterlen)
      StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
      stringindex+1
      textlen=0
      *text+(delimiterlen*SizeOf(Character))
     EndIf
     textlen+1
    Next
   EndIf
   StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
  EndIf

 EndIf

 ProcedureReturn stringcount
EndProcedure

;mode 0 (default) supports multi delimiter, where each character is a delimiter
;mode 1 supports a case sensitive string delimiter, where the delimiter is treated as a full string
;mode 2 supports a case insensitive string delimiter, where the delimiter is treated as a full string
Procedure.l SplitLightning2(StringArray.s(1),text$,delimiter$,mode.l=0) ;returns stringcount
 Protected delimiterlen.l,textlen.l,stringcount.l,stringindex.l=0,casesensitive.l=0
 Protected *text.Character,*textend,*delimiter.Character,*delimiterend

 delimiterlen=Len(delimiter$)
 textlen=Len(text$)
 If textlen And delimiterlen
  *text=@text$
  *textend=@text$+(textlen*SizeOf(Character))-SizeOf(Character)
  *delimiterend=@delimiter$+(delimiterlen*SizeOf(Character))-SizeOf(Character)
  If mode=0
   textlen=0
   For *text=@text$ To *textend Step SizeOf(Character)
    For *delimiter=@delimiter$ To *delimiterend Step SizeOf(Character)
     If *text\c=*delimiter\c
      ReDim StringArray.s(stringindex+1)
      StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
      stringindex+1
      textlen=0
      *text+SizeOf(Character)
      Break
     EndIf
    Next
    textlen+1
   Next
  Else
   If mode=2 : casesensitive=0 : EndIf
   If delimiterlen>textlen : delimiterlen=textlen : EndIf
   textlen=0
   For *text=@text$ To *textend Step SizeOf(Character)
    If Not CompareMemoryString(*text,@delimiter$,casesensitive,delimiterlen)
     ReDim StringArray.s(stringindex+1)
     StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
     stringindex+1
     textlen=0
     *text+(delimiterlen*SizeOf(Character))
    EndIf
    textlen+1
   Next
  EndIf
  If stringindex
   StringArray(stringindex)=PeekS(*text-(textlen*SizeOf(Character)),textlen)
   stringindex+1
  EndIf
 EndIf

 ProcedureReturn stringindex
EndProcedure

Define i.l,n.l,start1.l,start2.l,start3.l,start4.l
Dim StringArray.s(0)

Debug "Split()"
n=SplitFaster(StringArray(),"this,is,a,test,string,this,is,a,test,string","string")
If n
 For i=0 To n
  Debug Str(i)+" = "+StringArray(i)
 Next
EndIf

Debug ""
Debug "SplitLightning()"
n=SplitLightning(StringArray(),"this,is,a,test,string,this,is,a,test,string","string",1)
If n
 n-1
 For i=0 To n
  Debug Str(i)+" = "+StringArray(i)
 Next
EndIf
DisableDebugger

start1=ElapsedMilliseconds()
For i=1 To 100000
 n=SplitFaster(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string","string")
Next
start2=ElapsedMilliseconds()
For i=1 To 100000
 n=SplitLightning(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string","string",1)
Next
start3=ElapsedMilliseconds()
For i=1 To 100000
 n=SplitLightning2(StringArray(),"this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string,this,is,a,test,string","string",1)
Next
start4=ElapsedMilliseconds()

EnableDebugger
MessageRequester("Split() vs SplitLightning()", "Split() = " + Str(start2-start1)+#CRLF$+"SplitLightning() = "+Str(start3-start2)+#CRLF$+"SplitLightning2() = "+Str(start4-start3))
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

Interesting to see your comment on redim.

The first time I worked on this, I added about 100 elements to the array at a time then cleaned up at the end to reduce the number of redims. I turned out though that this had absolutely no performance benefit. (unlike in my VB days 7-8 years back that would have been a huge performance tweak.

I guess PB's memory management is a lot better.

It's going to take me a few days to digest the rest of this, thanks!
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Post Reply