[PB4] Split() and Join() commands

Share your advanced PureBasic knowledge/code with the community.
User avatar
Flype
Addict
Addict
Posts: 1542
Joined: Tue Jul 22, 2003 5:02 pm
Location: In a long distant galaxy

[PB4] Split() and Join() commands

Post by Flype »

Code: Select all

; Split() functions, Purebasic 4.0+

Macro CountArray(array)
  ( PeekL( @array-8 ) )
EndMacro

Procedure.l SplitArray(array.s(1), text.s, separator.s = ",") ; String to Array
  
  Protected index.l, size.l = CountString(text, separator)
  
  ReDim array.s(size)
  
  For index = 0 To size
    array(index) = StringField(text, index + 1, separator)
  Next
  
  ProcedureReturn size
  
EndProcedure
Procedure.s JoinArray(array.s(1), separator.s = ",") ; Array to String
  
  Protected index.l, result.s, size.l = CountArray(array()) - 1
  
  For index = 0 To size
    result + array(index)
    If (index < size)
      result + separator
    EndIf
  Next
  
  ProcedureReturn result
  
EndProcedure
Procedure.l SplitList(list.s(), text.s, separator.s = ",") ; String to List
  
  Protected index.l, size.l = CountString(text, separator)
  
  For index = 0 To size
    If AddElement(list())
      list() = StringField(text, index + 1, separator)
    EndIf
  Next
  
  ProcedureReturn size
  
EndProcedure
Procedure.s JoinList(list.s(), separator.s = ",") ; List to String
  
  Protected result.s, size.l = CountList(list()) - 1
  
  ForEach list()
    result + list()
    If (ListIndex(list()) < size)
      result + separator
    EndIf
  Next
  
  ProcedureReturn result
  
EndProcedure

; Examples

string.s = "abc,defg,hi,jklmop,qrs,tuv,wxyz"

; string -> array -> string

Dim a.s(0)

size.l = SplitArray(a(), string, ",")

For i = 0 To size
  Debug a(i)
Next

Debug JoinArray(a())

; string -> list -> string

NewList b.s()

If SplitList(b(), string)
  ForEach b()
    Debug b()
  Next
EndIf

Debug JoinList(b())

;--
Last edited by Flype on Fri Apr 28, 2006 9:40 pm, edited 1 time in total.
No programming language is perfect. There is not even a single best language.
There are only languages well suited or perhaps poorly suited for particular purposes. Herbert Mayer
SCRJ
User
User
Posts: 93
Joined: Sun Jan 15, 2006 1:36 pm

Post by SCRJ »

Cool, thanks for sharing. :D
User avatar
Flype
Addict
Addict
Posts: 1542
Joined: Tue Jul 22, 2003 5:02 pm
Location: In a long distant galaxy

Post by Flype »

another version, which is the php syntax.

http://php.net/manual/en/function.implode.php
http://php.net/manual/en/function.explode.php

Code: Select all

; 
; Join()/Split()
; 
; Php Syntax:
; implode.s( glue.s, pieces.s(1) )
; explode.l( separator.s, string.s , limit.l = 0 )
; 
; http://php.net/manual/en/function.implode.php
; http://php.net/manual/en/function.explode.php
; 

Macro CountArray(array)
  ( PeekL( @array-8 ) )
EndMacro

Procedure.l explode_array(array.s(1), separator.s, string.s, limit.l = 0) ; String to Array
  
  Protected index.l, size.l = CountString(string, separator)
  
  If (limit > 0)
    size = limit - 1
  ElseIf (limit < 0)
    size + limit
  EndIf
  
  ReDim array.s(size)
  
  For index = 0 To size
    array(index) = StringField(string, index + 1, separator)
  Next
  
  ProcedureReturn size
  
EndProcedure
Procedure.s implode_array(glue.s, pieces.s(1)) ; Array to String
  
  Protected index.l, string.s, size.l = CountArray(pieces()) - 1
  
  For index = 0 To size
    string + pieces(index)
    If (index < size)
      string + glue
    EndIf
  Next
  
  ProcedureReturn string
  
EndProcedure
Procedure.l explode_list(list.s(), separator.s, string.s, limit.l = 0) ; String to List
  
  Protected index.l, size.l = CountString(string, separator)
  
  If (limit > 0)
    size = limit - 1
  ElseIf (limit < 0)
    size + limit
  EndIf
  
  For index = 0 To size
    If AddElement(list())
      list() = StringField(string, index + 1, separator)
    EndIf
  Next
  
  ProcedureReturn size
  
EndProcedure
Procedure.s implode_list(glue.s, pieces.s()) ; List to String
  
  Protected string.s, size.l = CountList(pieces()) - 1
  
  ForEach pieces()
    string + pieces()
    If (ListIndex(pieces()) < size)
      string + glue
    EndIf
  Next
  
  ProcedureReturn string
  
EndProcedure

; Examples

input.s = "feel the power of purebasic :-)"

; string -> array -> string

Dim a.s(0)

size.l = explode_array(a(), " ", input, -3)

For i = 0 To size
  Debug a(i)
Next

Debug implode_array("  =|=  ", a())

; string -> list -> string

NewList b.s()

If explode_list(b(), " ", input, 5)
  ForEach b()
    Debug b()
  Next
EndIf

Debug "[" + implode_list("][", b()) + "]"

;--
No programming language is perfect. There is not even a single best language.
There are only languages well suited or perhaps poorly suited for particular purposes. Herbert Mayer
User avatar
Flype
Addict
Addict
Posts: 1542
Joined: Tue Jul 22, 2003 5:02 pm
Location: In a long distant galaxy

Post by Flype »

for those who need a true php syntax,
we can 'patch' the builtin command 'StringField()' with a new custom macro in order to accept separators longer than 1 character.

so, add this code at the top of the above source code and it should work.

Code: Select all

Macro StringField(string, index, separator)
  StringFieldEx(string, index, separator)
EndMacro

Procedure.s StringFieldEx(string.s, index.l, sep.s)
  
  Protected i.l, pos.l, field.s, lSep.l = Len(sep)
  
  For i = 1 To index
    pos = FindString(string, sep, 1)
    If pos
      field = Left(string, pos - 1)
    Else
      field = string
    EndIf
    string = Mid(string, pos + lSep, Len(string))
  Next
  
  ProcedureReturn field
  
EndProcedure
No programming language is perfect. There is not even a single best language.
There are only languages well suited or perhaps poorly suited for particular purposes. Herbert Mayer
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: [PB4] Split() and Join() commands

Post by wilbert »

I know this is an old thread and there are multiple threads about splitting and joining but just wanted to share my attempt.
It supports both unicode and ascii and was tested with PB 5.11

Code: Select all

Procedure.i Split(Array StringArray.s(1), StringToSplit.s, Separator.s = " ")
  Protected c = CountString(StringToSplit, Separator)
  Protected i, l = StringByteLength(Separator)
  Protected *p1.Character = @StringToSplit
  Protected *p2.Character = @Separator
  Protected *p = *p1

  ReDim StringArray(c)
  While i < c
    While *p1\c <> *p2\c
      *p1 + SizeOf(Character)
    Wend
    If CompareMemory(*p1, *p2, l)
      CompilerIf #PB_Compiler_Unicode
        StringArray(i) = PeekS(*p, (*p1 - *p) >> 1)
      CompilerElse
        StringArray(i) = PeekS(*p, *p1 - *p)
      CompilerEndIf
      *p1 + l
      *p = *p1
      i + 1
    Else
      *p1 + SizeOf(Character)
    EndIf
  Wend
  StringArray(c) = PeekS(*p)
  ProcedureReturn c
EndProcedure

Procedure.s Join(Array StringArray.s(1), Separator.s = "")
  Protected r.s, i, l, c = ArraySize(StringArray())
  While i <= c
    l + Len(StringArray(i))
    i + 1  
  Wend
  r = Space(l + Len(Separator) * c)
  i = 1
  l = @r
  CopyMemoryString(@StringArray(0), @l)
  While i <= c
    CopyMemoryString(@Separator)
    CopyMemoryString(@StringArray(i))
    i + 1  
  Wend
  ProcedureReturn r
EndProcedure



; *** test code ***

Dim A.s(0)

S.s = "... ++ This is a test string ++ used to test split and join ++ ..."
For i = 1 To 18
  S + S
Next

t1 = ElapsedMilliseconds()
Split(A(), S, "++")
t2 = ElapsedMilliseconds()
S = Join(A(), "*")
t3 = ElapsedMilliseconds()

MessageRequester("Test result", "Split:" + Str(t2-t1) + " Join:" + Str(t3-t2))
Last edited by wilbert on Sun Jun 25, 2017 7:18 pm, edited 2 times in total.
Windows (x64)
Raspberry Pi OS (Arm64)
Joris
Addict
Addict
Posts: 890
Joined: Fri Oct 16, 2009 10:12 am
Location: BE

Re: [PB4] Split() and Join() commands

Post by Joris »

Hi Wilbert,

I haven't been busy with the code above, but I used Split and Join quit a lot in GB32 and so I searched already for equivalents in PB.
This one I found but haven't test it yet : ExtractRegularExpression(#RegularExpression, String$, Array$()) must be useful for split.
It should do the same as that php command explode http://php.net/manual/en/function.explode.php.
I anounce this as it isn't used yet in one of the sources above.
Yeah I know, but keep in mind ... Leonardo da Vinci was also an autodidact.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: [PB4] Split() and Join() commands

Post by wilbert »

Joris wrote:This one I found but haven't test it yet : ExtractRegularExpression(#RegularExpression, String$, Array$()) must be useful for split.
It looks interesting but I haven't got a clue what the proper expression would be to split a string.
Also I don't know how fast it is.
Windows (x64)
Raspberry Pi OS (Arm64)
Joris
Addict
Addict
Posts: 890
Joined: Fri Oct 16, 2009 10:12 am
Location: BE

Re: [PB4] Split() and Join() commands

Post by Joris »

wilbert wrote:
Joris wrote:This one I found but haven't test it yet : ExtractRegularExpression(#RegularExpression, String$, Array$()) must be useful for split.
It looks interesting but I haven't got a clue what the proper expression would be to split a string.
Also I don't know how fast it is.
Hm, good question. I got stuck on that...
Now more on what kind of RegularExpression can be used, as there are a lot and I don't know if all have the same definition. I thougth Perl is a language with the most possible RegularExpression, the base for the RegularExpression in PB I don't know (we must ask or find out).

Here are some RegularExpression I can use in my editor (UltraEdit) :
% Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.
$ Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.
? Matches any single character except newline.
* Matches any number of occurrences of any character except newline.
+ Matches one or more of the preceding character/expression. At least one occurrence of the character must be found. Does not match repeated newlines.
++ Matches the preceding character/expression zero or more times. Does not match repeated newlines.
^b Matches a page break.
^p Matches a newline (CR/LF) (paragraph) (DOS Files)
^r Matches a newline (CR Only) (paragraph) (MAC Files)
^n Matches a newline (LF Only) (paragraph) (UNIX Files)
^t Matches a tab character
[xyz] A character set. Matches any characters between brackets.
[~xyz] A negative character set. Matches any characters NOT between brackets including newline characters.
^{A^}^{B^} Matches expression A OR B
^ Overrides the following regular expression character
^(…^) Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
The corresponding replacement expression is ^x, for x in the range 1-9. Example: If ^(h*o^) ^(f*s^) matches "hello folks", ^2 ^1 would replace it with "folks hello".
Added a few more so maybe easier to find the equivalents in PB (already tried some like \w and ... not compatible) :

Code: Select all

\  Indicates the next character has a special meaning.  "n" on it’s own matches the character "n".  "\n" matches a linefeed or newline character.  See examples below (\d, \f, \n etc).
^  Matches/anchors the beginning of line.
$  Matches/anchors the end of line.
*  Matches the preceding character zero or more times.
+  Matches the preceding character one or more times.  Does not match repeated newlines.
.  Matches any single character except a newline character.  Does not match repeated newlines. (expression)
 Brackets or tags an expression to use in the replace command.  A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
The corresponding replacement expression is \x, for x in the range 1-9.  Example: If (h.*o) (f.*s) matches "hello folks", \2 \1 would replace it with "folks hello".
[xyz]  A character set.  Matches any characters between brackets.
[^xyz]  A negative character set.  Matches any characters NOT between brackets including newline characters.
\d  Matches a digit character.  Equivalent to [0-9].
\D  Matches a nondigit character.  Equivalent to [^0-9].
\f  Matches a form-feed character.
\n  Matches a linefeed character.
\r  Matches a carriage return character.
\s  Matches any whitespace including space, tab, form-feed, etc but not newline.
\S  Matches any non-whitespace character but not newline.
\t  Matches a tab character.
\v  Matches a vertical tab character.
\w  Matches any word character including underscore.
\W  Matches any nonword character.
\p  Matches CR/LF (same as \r\n) to match a DOS line terminator.
Sometimes it's a bit searching to combine the right one but mostly they are very useful.

I just see the RegularExpression for PB are explained here : http://www.pcre.org/pcre.txt
Quit a bunch to explore...
Yeah I know, but keep in mind ... Leonardo da Vinci was also an autodidact.
Little John
Addict
Addict
Posts: 4789
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: [PB4] Split() and Join() commands

Post by Little John »

Just a small remark:
PureBasic has now a built-in function called "SplitList()".
So for more clarity, maybe these functions better should be called "SplitString()" and "JoinString()" or so.
Joris
Addict
Addict
Posts: 890
Joined: Fri Oct 16, 2009 10:12 am
Location: BE

Re: [PB4] Split() and Join() commands

Post by Joris »

Little John wrote:Just a small remark:
PureBasic has now a built-in function called "SplitList()".
So for more clarity, maybe these functions better should be called "SplitString()" and "JoinString()" or so.
"Small remark" yeah, as SplitList has a complete different function : it splits on a certain amount instead of a 'regular condition'.
Yeah I know, but keep in mind ... Leonardo da Vinci was also an autodidact.
Little John
Addict
Addict
Posts: 4789
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: [PB4] Split() and Join() commands

Post by Little John »

Joris wrote:SplitList has a complete different function
Yes, of course.
Joris
Addict
Addict
Posts: 890
Joined: Fri Oct 16, 2009 10:12 am
Location: BE

Re: [PB4] Split() and Join() commands

Post by Joris »

wilbert wrote:It looks interesting but I haven't got a clue what the proper expression would be to split a string.
Also I don't know how fast it is.
@Wilbert if you haven't noticed this link below, the solution (the speed... I don't know, yet.) :
http://www.purebasic.fr/english/viewtop ... 13&t=54089

So, to split a string in words :

Code: Select all

If CreateRegularExpression(0, "\w+")
    Dim Result$(0)
    NbFound = ExtractRegularExpression(0, "abC ABc zbA abc", Result$())
     Debug NbFound
    For k = 0 To NbFound-1
      Debug Result$(k)
    Next
  Else
    Debug RegularExpressionError()
EndIf
Yeah I know, but keep in mind ... Leonardo da Vinci was also an autodidact.
User avatar
skywalk
Addict
Addict
Posts: 4218
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: [PB4] Split() and Join() commands

Post by skywalk »

RegularExpressions are 10 to 100 times slower than PB code.
Use them only where speed is not a concern. :wink:
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
GJ-68
User
User
Posts: 32
Joined: Sun Jun 23, 2013 1:00 pm
Location: France (68)

Re: [PB4] Split() and Join() commands

Post by GJ-68 »

@wilbert

You should test your Split function with a separator with more than one character and when the first char of sep matches.
Example: StringToSplit = "ABCxzDEFxyGHI", Separator = "xy"

Fixed version:

Code: Select all

Procedure.i Split(Array StringArray.s(1), StringToSplit.s, Separator.s = " ")
  Protected c = CountString(StringToSplit, Separator)
  Protected i, l = StringByteLength(Separator)
  Protected *p1.Character = @StringToSplit
  Protected *p2.Character = @Separator
  Protected *p = *p1

  ReDim StringArray(c)
  While i < c
    While *p1\c <> *p2\c
      *p1 + SizeOf(Character)
    Wend
    If CompareMemory(*p1, *p2, l)
      CompilerIf #PB_Compiler_Unicode
        StringArray(i) = PeekS(*p, (*p1 - *p) >> 1)
      CompilerElse
        StringArray(i) = PeekS(*p, *p1 - *p)
      CompilerEndIf
      *p1 + l
      *p = *p1
      i + 1
    Else
      *p1 + SizeOf(Character)
    EndIf
  Wend
  StringArray(c) = PeekS(*p)
  ProcedureReturn c
EndProcedure
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: [PB4] Split() and Join() commands

Post by wilbert »

GJ-68 wrote:@wilbert

You should test your Split function with a separator with more than one character and when the first char of sep matches.
Example: StringToSplit = "ABCxzDEFxyGHI", Separator = "xy"
Thanks for mentioning.
I updated my code. :)
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply