String Split() and Join() procs

Share your advanced PureBasic knowledge/code with the community.
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

String Split() and Join() procs

Post by Lunasole »

There are many topics in coding questions (and me personally several times needed such functions in PB), I think they are one of "must have" routines both on desktop and web, so it is strange that PB still has nothing built-in and working with a good speed.

The author of recent topic has requested for high performance -- and there is one cool and ultraoptimized here: http://www.purebasic.fr/english/viewtop ... 12&t=65159.

However, I found the ASM code too long and unhandy to use.
So there is another example, more "highlevel" ^^
It is also done similar to VB6 string split function.

Code: Select all

EnableExplicit

;	2016			(c) Lunasole

; equivalent of VB6 Split()
; sOut$:		array to receive output
; String$:		string to split
; Delimiter$:	a sequence of chars to split String$ by
; Limit:		a maximum number of results, if set, then last item of array contains all data over Limit
; RETURN:		number of Delimiter$ inside of String$ [also it is size of sOut$ array]
Procedure SplitS(Array Out$(1), String$, Delimiter$, Limit = -1, Mode = #PB_String_CaseSensitive)
	Protected nC, mPos, lPos = 1, nDelimLen = Len(Delimiter$)
	Repeat
		mPos = FindString(String$, Delimiter$, lPos, Mode)
		If ArraySize(Out$()) < nC
			ReDim Out$(nC + 10240) ; enlarge your array for just $2800 :3
		EndIf
		If mPos And (Limit = -1 Or nC < Limit)
			Out$(nC) = Mid(String$, lPos, (mPos - lPos))
			lPos = mPos + nDelimLen
			nC + 1			
		Else
			Out$(nC) = Mid(String$, lPos)
			Break
		EndIf
	ForEver
	ReDim Out$(nC) ; trim output array
	ProcedureReturn nC
EndProcedure

; equivalent of VB6 Join()
; sOut$:		array to receive output
; RETURN:		string containing all array items
Procedure$ JoinS(Array In$(1), Delimiter$ = "")
	Protected sOut$, nC, nCMax = ArraySize(In$())
	For nC = 0 To nCMax
		If Not nC = nCMax
			sOut$ + In$(nC) + Delimiter$
		Else
			sOut$ + In$(nC)
		EndIf
	Next
	ProcedureReturn sOut$
EndProcedure

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Dim T$(0) ; output array
Debug "RETURN: " + SplitS(T$(), "Macross: do you remember love?", " ")

Debug "======="
Define t
For t = 0 To ArraySize(T$())
	Debug t$(t)
Next t
PS. The JoinS() is still not optimized
Last edited by Lunasole on Mon Jun 20, 2016 3:53 pm, edited 2 times in total.
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 560
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: String Split() and Join() procs

Post by Sicro »

Code: Select all

For nC = 0 To ArraySize(Source$())
ArraySize() is evaluated on each loop. Result: slow.

Code: Select all

String$ = Mid(String$, nPos + nDelimLen)
You create a new string each time. Result: slow.

Code: Select all

While nPos
  ReDim sOut$(nC)
  [...]
  nC = nC + 1
Wend
You redim the Array on each loop. Result: slow.

The SplitS() is about 60% faster:

Code: Select all

; equivalent of VB6 Join()
; sOut$:       array to receive output
; RETURN:      string containing all array items
Procedure$ JoinS(Array Source$(1), Delimiter$ = "")
  Protected nC, sOut$, nArraySize

  nArraySize = ArraySize(Source$())
  For nC = 0 To nArraySize
    sOut$ + Source$(nC) + Delimiter$
  Next

  ProcedureReturn sOut$
EndProcedure

; equivalent of VB6 Split()
; sOut$:       array to receive output
; String$:     string to split
; Delimiter$:  a string to split by
; Limit:       a maximum number of results
; RETURN:      none
Procedure SplitS(Array sOut$(1), String$, Delimiter$, Limit = -1, Mode = #PB_String_CaseSensitive)
  Protected nC, nPos, nDelimPos, nDelimLen, nArraySize, nReDimStep
  
  nReDimStep = 30 ; Use a high value if many separations have to be performed.

  If Delimiter$
    nPos      = 1
    nDelimLen = Len(Delimiter$)
    ReDim sOut$(nReDimStep)
    nArraySize = nReDimStep
    Repeat
      nDelimPos = FindString(String$, Delimiter$, nPos, Mode)
      If nDelimPos <> 0
        sOut$(nC) = Mid(String$, nPos, nDelimPos - nPos - nDelimLen + 1)
        If Not Limit = -1 And nC = Limit : Break : EndIf
        nC + 1
        nPos = nDelimPos + nDelimLen
        If nC > nArraySize
          nArraySize + nReDimStep
          ReDim sOut$(nArraySize)
        EndIf
      Else
        Break
      EndIf
    ForEver
  EndIf

  ReDim sOut$(nC - 1)
EndProcedure
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: String Split() and Join() procs

Post by Lunasole »

I'll speed it up sometime when will need this for large strings (or someone else can do this if will need this for something before than me ^^)
Well the time has come ^^
I've needed speed up and just rewrite SplitS() procedure, new code is added to a first post. It should be fast enough and looks like even has no bugs (however I'm too lazy to write some unitmonkey-test) :)

PS. Also 10240 magic number can be adjusted to increase performance for much larger strings
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
Post Reply