[PB4] Split() and Join() commands

Flype · Post by **Flype** » Fri Apr 28, 2006 9:27 pm

; Split() functions, Purebasic 4.0+

Macro CountArray(array)
  ( PeekL( @array-8 ) )
EndMacro

Procedure.l SplitArray(array.s(1), text.s, separator.s = ",") ; String to Array
  
  Protected index.l, size.l = CountString(text, separator)
  
  ReDim array.s(size)
  
  For index = 0 To size
    array(index) = StringField(text, index + 1, separator)
  Next
  
  ProcedureReturn size
  
EndProcedure
Procedure.s JoinArray(array.s(1), separator.s = ",") ; Array to String
  
  Protected index.l, result.s, size.l = CountArray(array()) - 1
  
  For index = 0 To size
    result + array(index)
    If (index < size)
      result + separator
    EndIf
  Next
  
  ProcedureReturn result
  
EndProcedure
Procedure.l SplitList(list.s(), text.s, separator.s = ",") ; String to List
  
  Protected index.l, size.l = CountString(text, separator)
  
  For index = 0 To size
    If AddElement(list())
      list() = StringField(text, index + 1, separator)
    EndIf
  Next
  
  ProcedureReturn size
  
EndProcedure
Procedure.s JoinList(list.s(), separator.s = ",") ; List to String
  
  Protected result.s, size.l = CountList(list()) - 1
  
  ForEach list()
    result + list()
    If (ListIndex(list()) < size)
      result + separator
    EndIf
  Next
  
  ProcedureReturn result
  
EndProcedure

; Examples

string.s = "abc,defg,hi,jklmop,qrs,tuv,wxyz"

; string -> array -> string

Dim a.s(0)

size.l = SplitArray(a(), string, ",")

For i = 0 To size
  Debug a(i)
Next

Debug JoinArray(a())

; string -> list -> string

NewList b.s()

If SplitList(b(), string)
  ForEach b()
    Debug b()
  Next
EndIf

Debug JoinList(b())

;--

SCRJ · Post by **SCRJ** » Fri Apr 28, 2006 9:40 pm

Cool, thanks for sharing.

Flype · Post by **Flype** » Fri Apr 28, 2006 11:11 pm

another version, which is the php syntax.

http://php.net/manual/en/function.implode.php
http://php.net/manual/en/function.explode.php

Code: Select all

; 
; Join()/Split()
; 
; Php Syntax:
; implode.s( glue.s, pieces.s(1) )
; explode.l( separator.s, string.s , limit.l = 0 )
; 
; http://php.net/manual/en/function.implode.php
; http://php.net/manual/en/function.explode.php
; 

Macro CountArray(array)
  ( PeekL( @array-8 ) )
EndMacro

Procedure.l explode_array(array.s(1), separator.s, string.s, limit.l = 0) ; String to Array
  
  Protected index.l, size.l = CountString(string, separator)
  
  If (limit > 0)
    size = limit - 1
  ElseIf (limit < 0)
    size + limit
  EndIf
  
  ReDim array.s(size)
  
  For index = 0 To size
    array(index) = StringField(string, index + 1, separator)
  Next
  
  ProcedureReturn size
  
EndProcedure
Procedure.s implode_array(glue.s, pieces.s(1)) ; Array to String
  
  Protected index.l, string.s, size.l = CountArray(pieces()) - 1
  
  For index = 0 To size
    string + pieces(index)
    If (index < size)
      string + glue
    EndIf
  Next
  
  ProcedureReturn string
  
EndProcedure
Procedure.l explode_list(list.s(), separator.s, string.s, limit.l = 0) ; String to List
  
  Protected index.l, size.l = CountString(string, separator)
  
  If (limit > 0)
    size = limit - 1
  ElseIf (limit < 0)
    size + limit
  EndIf
  
  For index = 0 To size
    If AddElement(list())
      list() = StringField(string, index + 1, separator)
    EndIf
  Next
  
  ProcedureReturn size
  
EndProcedure
Procedure.s implode_list(glue.s, pieces.s()) ; List to String
  
  Protected string.s, size.l = CountList(pieces()) - 1
  
  ForEach pieces()
    string + pieces()
    If (ListIndex(pieces()) < size)
      string + glue
    EndIf
  Next
  
  ProcedureReturn string
  
EndProcedure

; Examples

input.s = "feel the power of purebasic :-)"

; string -> array -> string

Dim a.s(0)

size.l = explode_array(a(), " ", input, -3)

For i = 0 To size
  Debug a(i)
Next

Debug implode_array("  =|=  ", a())

; string -> list -> string

NewList b.s()

If explode_list(b(), " ", input, 5)
  ForEach b()
    Debug b()
  Next
EndIf

Debug "[" + implode_list("][", b()) + "]"

;--

Flype · Post by **Flype** » Fri Jun 30, 2006 1:47 pm

for those who need a true php syntax,
we can 'patch' the builtin command 'StringField()' with a new custom macro in order to accept separators longer than 1 character.

so, add this code at the top of the above source code and it should work.

Code: Select all

Macro StringField(string, index, separator)
  StringFieldEx(string, index, separator)
EndMacro

Procedure.s StringFieldEx(string.s, index.l, sep.s)
  
  Protected i.l, pos.l, field.s, lSep.l = Len(sep)
  
  For i = 1 To index
    pos = FindString(string, sep, 1)
    If pos
      field = Left(string, pos - 1)
    Else
      field = string
    EndIf
    string = Mid(string, pos + lSep, Len(string))
  Next
  
  ProcedureReturn field
  
EndProcedure

wilbert · Post by **wilbert** » Mon Mar 25, 2013 8:29 am

I know this is an old thread and there are multiple threads about splitting and joining but just wanted to share my attempt.
It supports both unicode and ascii and was tested with PB 5.11

Code: Select all

Procedure.i Split(Array StringArray.s(1), StringToSplit.s, Separator.s = " ")
  Protected c = CountString(StringToSplit, Separator)
  Protected i, l = StringByteLength(Separator)
  Protected *p1.Character = @StringToSplit
  Protected *p2.Character = @Separator
  Protected *p = *p1

  ReDim StringArray(c)
  While i < c
    While *p1\c <> *p2\c
      *p1 + SizeOf(Character)
    Wend
    If CompareMemory(*p1, *p2, l)
      CompilerIf #PB_Compiler_Unicode
        StringArray(i) = PeekS(*p, (*p1 - *p) >> 1)
      CompilerElse
        StringArray(i) = PeekS(*p, *p1 - *p)
      CompilerEndIf
      *p1 + l
      *p = *p1
      i + 1
    Else
      *p1 + SizeOf(Character)
    EndIf
  Wend
  StringArray(c) = PeekS(*p)
  ProcedureReturn c
EndProcedure

Procedure.s Join(Array StringArray.s(1), Separator.s = "")
  Protected r.s, i, l, c = ArraySize(StringArray())
  While i <= c
    l + Len(StringArray(i))
    i + 1  
  Wend
  r = Space(l + Len(Separator) * c)
  i = 1
  l = @r
  CopyMemoryString(@StringArray(0), @l)
  While i <= c
    CopyMemoryString(@Separator)
    CopyMemoryString(@StringArray(i))
    i + 1  
  Wend
  ProcedureReturn r
EndProcedure



; *** test code ***

Dim A.s(0)

S.s = "... ++ This is a test string ++ used to test split and join ++ ..."
For i = 1 To 18
  S + S
Next

t1 = ElapsedMilliseconds()
Split(A(), S, "++")
t2 = ElapsedMilliseconds()
S = Join(A(), "*")
t3 = ElapsedMilliseconds()

MessageRequester("Test result", "Split:" + Str(t2-t1) + " Join:" + Str(t3-t2))

Joris · Post by **Joris** » Mon Mar 25, 2013 9:36 am

Hi Wilbert,

I haven't been busy with the code above, but I used Split and Join quit a lot in GB32 and so I searched already for equivalents in PB.
This one I found but haven't test it yet : ExtractRegularExpression(#RegularExpression, String$, Array$()) must be useful for split.
It should do the same as that php command explode http://php.net/manual/en/function.explode.php.
I anounce this as it isn't used yet in one of the sources above.

wilbert · Post by **wilbert** » Mon Mar 25, 2013 10:11 am

Joris wrote:This one I found but haven't test it yet : ExtractRegularExpression(#RegularExpression, String$, Array$()) must be useful for split.

It looks interesting but I haven't got a clue what the proper expression would be to split a string.
Also I don't know how fast it is.

Joris · Post by **Joris** » Mon Mar 25, 2013 10:56 am

wilbert wrote:
Joris wrote:This one I found but haven't test it yet : ExtractRegularExpression(#RegularExpression, String$, Array$()) must be useful for split.
It looks interesting but I haven't got a clue what the proper expression would be to split a string.
Also I don't know how fast it is.

Hm, good question. I got stuck on that...
Now more on what kind of RegularExpression can be used, as there are a lot and I don't know if all have the same definition. I thougth Perl is a language with the most possible RegularExpression, the base for the RegularExpression in PB I don't know (we must ask or find out).

Here are some RegularExpression I can use in my editor (UltraEdit) :

% Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.
$ Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.
? Matches any single character except newline.
* Matches any number of occurrences of any character except newline.
+ Matches one or more of the preceding character/expression. At least one occurrence of the character must be found. Does not match repeated newlines.
++ Matches the preceding character/expression zero or more times. Does not match repeated newlines.
^b Matches a page break.
^p Matches a newline (CR/LF) (paragraph) (DOS Files)
^r Matches a newline (CR Only) (paragraph) (MAC Files)
^n Matches a newline (LF Only) (paragraph) (UNIX Files)
^t Matches a tab character
[xyz] A character set. Matches any characters between brackets.
[~xyz] A negative character set. Matches any characters NOT between brackets including newline characters.
^{A^}^{B^} Matches expression A OR B
^ Overrides the following regular expression character
^(…^) Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
The corresponding replacement expression is ^x, for x in the range 1-9. Example: If ^(h*o^) ^(f*s^) matches "hello folks", ^2 ^1 would replace it with "folks hello".

Added a few more so maybe easier to find the equivalents in PB (already tried some like \w and ... not compatible) :

Code: Select all

\  Indicates the next character has a special meaning.  "n" on it’s own matches the character "n".  "\n" matches a linefeed or newline character.  See examples below (\d, \f, \n etc).
^  Matches/anchors the beginning of line.
$  Matches/anchors the end of line.
*  Matches the preceding character zero or more times.
+  Matches the preceding character one or more times.  Does not match repeated newlines.
.  Matches any single character except a newline character.  Does not match repeated newlines. (expression)
 Brackets or tags an expression to use in the replace command.  A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
The corresponding replacement expression is \x, for x in the range 1-9.  Example: If (h.*o) (f.*s) matches "hello folks", \2 \1 would replace it with "folks hello".
[xyz]  A character set.  Matches any characters between brackets.
[^xyz]  A negative character set.  Matches any characters NOT between brackets including newline characters.
\d  Matches a digit character.  Equivalent to [0-9].
\D  Matches a nondigit character.  Equivalent to [^0-9].
\f  Matches a form-feed character.
\n  Matches a linefeed character.
\r  Matches a carriage return character.
\s  Matches any whitespace including space, tab, form-feed, etc but not newline.
\S  Matches any non-whitespace character but not newline.
\t  Matches a tab character.
\v  Matches a vertical tab character.
\w  Matches any word character including underscore.
\W  Matches any nonword character.
\p  Matches CR/LF (same as \r\n) to match a DOS line terminator.

Sometimes it's a bit searching to combine the right one but mostly they are very useful.

I just see the RegularExpression for PB are explained here : http://www.pcre.org/pcre.txt
Quit a bunch to explore...

Little John · Post by **Little John** » Mon Mar 25, 2013 2:42 pm

Just a small remark:
PureBasic has now a built-in function called "SplitList()".
So for more clarity, maybe these functions better should be called "SplitString()" and "JoinString()" or so.

Joris · Post by **Joris** » Mon Mar 25, 2013 2:53 pm

Little John wrote:Just a small remark:
PureBasic has now a built-in function called "SplitList()".
So for more clarity, maybe these functions better should be called "SplitString()" and "JoinString()" or so.

"Small remark" yeah, as SplitList has a complete different function : it splits on a certain amount instead of a 'regular condition'.

Little John · Post by **Little John** » Mon Mar 25, 2013 3:01 pm

Joris wrote:SplitList has a complete different function

Yes, of course.

Joris · Post by **Joris** » Mon Mar 25, 2013 3:53 pm

wilbert wrote:It looks interesting but I haven't got a clue what the proper expression would be to split a string.
Also I don't know how fast it is.

@Wilbert if you haven't noticed this link below, the solution (the speed... I don't know, yet.) :
http://www.purebasic.fr/english/viewtop ... 13&t=54089

So, to split a string in words :

Code: Select all

If CreateRegularExpression(0, "\w+")
    Dim Result$(0)
    NbFound = ExtractRegularExpression(0, "abC ABc zbA abc", Result$())
     Debug NbFound
    For k = 0 To NbFound-1
      Debug Result$(k)
    Next
  Else
    Debug RegularExpressionError()
EndIf

skywalk · Post by **skywalk** » Mon Mar 25, 2013 3:59 pm

RegularExpressions are 10 to 100 times slower than PB code.
Use them only where speed is not a concern.

GJ-68 · Post by **GJ-68** » Sun Jun 25, 2017 9:54 am

@wilbert

You should test your Split function with a separator with more than one character and when the first char of sep matches.
Example: StringToSplit = "ABCxzDEFxyGHI", Separator = "xy"

Fixed version:

Code: Select all

Procedure.i Split(Array StringArray.s(1), StringToSplit.s, Separator.s = " ")
  Protected c = CountString(StringToSplit, Separator)
  Protected i, l = StringByteLength(Separator)
  Protected *p1.Character = @StringToSplit
  Protected *p2.Character = @Separator
  Protected *p = *p1

  ReDim StringArray(c)
  While i < c
    While *p1\c <> *p2\c
      *p1 + SizeOf(Character)
    Wend
    If CompareMemory(*p1, *p2, l)
      CompilerIf #PB_Compiler_Unicode
        StringArray(i) = PeekS(*p, (*p1 - *p) >> 1)
      CompilerElse
        StringArray(i) = PeekS(*p, *p1 - *p)
      CompilerEndIf
      *p1 + l
      *p = *p1
      i + 1
    Else
      *p1 + SizeOf(Character)
    EndIf
  Wend
  StringArray(c) = PeekS(*p)
  ProcedureReturn c
EndProcedure

wilbert · Post by **wilbert** » Sun Jun 25, 2017 5:40 pm

GJ-68 wrote:@wilbert

You should test your Split function with a separator with more than one character and when the first char of sep matches.
Example: StringToSplit = "ABCxzDEFxyGHI", Separator = "xy"

Thanks for mentioning.
I updated my code.

PureBasic Forums - English

[PB4] Split() and Join() commands

[PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands

Re: [PB4] Split() and Join() commands