SplitString to list or array with option double-quotes (CSV)

Share your advanced PureBasic knowledge/code with the community.
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

SplitString to list or array with option double-quotes (CSV)

Post by mk-soft »

Maybe somebody needs :wink:

Update v1.04
- Added function SplitStringList and SplitStringArray

Update v1.05
- Update SplitString: Minimal performance update
- Added SplitParameter: Public from EventDesigner

Update v1.06.2
- Added StringBetweenList
- Added StringBetweenArray

Code: Select all

;-TOP

; Comment : SplitString to list and array with option double-quotes
; Author  : mk-soft
; Version : v1.06.2
; Create  : 03.11.2017
; Update  : 23.07.2022
; Link GR : 
; Link EN : https://www.purebasic.fr/english/viewtopic.php?t=69557

; OS      : All
; License : MIT

; ***************************************************************************************

Procedure SplitStringList(String.s, Separator.s, List Result.s(), DQuote = #False)
  Protected *String.character, *Separator.character
  Protected *Start, *End, exit, lock, do, dq, len
  
  ClearList(Result())
  *String = @String
  *Separator = @Separator
  *Start = *String
  *End = *String
  
  If DQuote
    Repeat
      If *String\c = 0
        exit = #True
        do = #True
        If Not dq
          *End = *String
        EndIf
      Else
        If *String\c = '"'
          If Not lock
            lock = #True
            dq = #True
            *Start = *String + SizeOf(character)
          Else
            lock = #False
            *End = *String
          EndIf
        EndIf
        If *String\c = *Separator\c And Not lock
          do = #True
          If Not dq
            *End = *String
          EndIf
        EndIf
      EndIf
      If do
        AddElement(Result()) 
        len = (*End - *Start) / SizeOf(character)
        If Len > 0
          Result() = PeekS(*Start, len) 
        EndIf
        *Start = *String + SizeOf(character)
        do = #False
        dq = #False
      EndIf
      *String + SizeOf(character)
    Until exit
  Else  
    Repeat
      If *String\c = 0
        exit = #True
        do = #True
        *End = *String
      Else
        If *String\c = *Separator\c
          do = #True
          *End = *String
        EndIf
      EndIf
      If do
        AddElement(Result()) 
        len = (*End - *Start) / SizeOf(character)
        If Len > 0
          Result() = PeekS(*Start, len) 
        EndIf
        *Start = *String + SizeOf(character)
        do = #False
      EndIf
      *String + SizeOf(character)
    Until exit
  EndIf
  ProcedureReturn ListSize(Result())
EndProcedure

Procedure SplitStringArray(String.s, Separator.s, Array Result.s(1), DQuote = #False)
  Protected *String.character, *Separator.character
  Protected *Start, *End, exit, lock, do, dq, len , count, size
  
  size = 7
  Dim Result(size)
  *String = @String
  *Separator = @Separator
  *Start = *String
  *End = *String
  If DQuote
    Repeat
      If *String\c = 0
        exit = #True
        do = #True
        If Not dq
          *End = *String
        EndIf
      Else
        If *String\c = '"'
          If Not lock
            lock = #True
            dq = #True
            *Start = *String + SizeOf(character)
          Else
            lock = #False
            *End = *String
          EndIf
        EndIf
        If *String\c = *Separator\c And Not lock
          do = #True
          If Not dq
            *End = *String
          EndIf
        EndIf
      EndIf
      If do
        If size < count
          size + 8
          ReDim Result(size)
        EndIf
        len = (*End - *Start) / SizeOf(character)
        If Len > 0
          Result(count) = PeekS(*Start, len) 
        EndIf
        *Start = *String + SizeOf(character)
        count + 1
        do = #False
        dq = #False
      EndIf
      *String + SizeOf(character)
    Until exit
  Else
    Repeat
      If *String\c = 0
        exit = #True
        do = #True
        *End = *String
      Else
        If *String\c = *Separator\c
          do = #True
          *End = *String
        EndIf
      EndIf
      If do
        If size < count
          size + 8
          ReDim Result(size)
        EndIf
        len = (*End - *Start) / SizeOf(character)
        If Len > 0
          Result(count) = PeekS(*Start, len) 
        EndIf
        *Start = *String + SizeOf(character)
        count + 1
        do = #False
      EndIf
      *String + SizeOf(character)
    Until exit
  EndIf
  ReDim Result(count - 1)
  ProcedureReturn count
EndProcedure

; ----

Procedure SplitParameterList(String.s, List Result.s(), fc=#False)
  Protected *String.character 
  Protected *Start, *End, exit, lock, do, len, temp.s, cnt
  Protected level
  
  ClearList(Result())
  *String = @String
  If *String = 0
    ProcedureReturn 0
  EndIf
  *Start = *String
  *End = *String
  
  Repeat
    If *String\c = 0
      exit = #True
      *End = *String
    Else
      If *String\c = '"'
        If Not lock
          lock = #True
        Else
          lock = #False
        EndIf
      EndIf
      If Not lock
        If *String\c = '('
          If level = 0
            If fc ; Get Functionname
              AddElement(Result())
              len = (*String - *Start) / SizeOf(character)
              If Len > 0
                temp = PeekS(*Start, len)
                ReplaceString(temp, #TAB$, " ", #PB_String_InPlace)
                temp = Trim(temp)
                cnt = CountString(temp, " ")
                If cnt
                  temp = StringField(temp, cnt + 1, " ")
                EndIf
                Result() = temp
              EndIf
            EndIf
            *Start = *String + SizeOf(character)
          EndIf
          level + 1
        ElseIf *String\c = ')'
          level - 1
          If level = 0
            do = #True
            exit = #True
            *End = *String
          EndIf
        ElseIf *String\c = ',' And level = 1
          do = #True
          *End = *String
        EndIf
      EndIf
    EndIf
    If do
      AddElement(Result()) 
      len = (*End - *Start) / SizeOf(character)
      If Len > 0
        Result() = Trim(PeekS(*Start, len))
      EndIf
      *Start = *String + SizeOf(character)
      do = #False
    EndIf
    *String + SizeOf(character)
  Until exit
  
  FirstElement(Result())
  ProcedureReturn ListSize(Result())
  
EndProcedure

; ----

Procedure SplitParameterArray(String.s, Array Result.s(1))
  Protected *String.character 
  Protected *Start, *End, exit, lock, do, len, temp.s, cnt, c1
  Protected level
  
  Dim Result(0)
  
  *String = @String
  If *String = 0
    ProcedureReturn 0
  EndIf
  
  *Start = *String
  *End = *String
  c1 = 0
  
  Repeat
    If *String\c = 0
      exit = #True
      *End = *String
    Else
      If *String\c = '"'
        If Not lock
          lock = #True
        Else
          lock = #False
        EndIf
      EndIf
      If Not lock
        If *String\c = '('
          If level = 0
            If #True ; Get Functionname
              len = (*String - *Start) / SizeOf(character)
              If Len > 0
                temp = PeekS(*Start, len)
                ReplaceString(temp, #TAB$, " ", #PB_String_InPlace)
                temp = Trim(temp)
                cnt = CountString(temp, " ")
                If cnt
                  temp = StringField(temp, cnt + 1, " ")
                EndIf
                Result(c1) = temp
              EndIf
            EndIf
            *Start = *String + SizeOf(character)
          EndIf
          level + 1
        ElseIf *String\c = ')'
          level - 1
          If level = 0
            do = #True
            exit = #True
            *End = *String
          EndIf
        ElseIf *String\c = ',' And level = 1
          do = #True
          *End = *String
        EndIf
      EndIf
    EndIf
    If do
      c1 + 1
      If ArraySize(Result()) < c1
        ReDim Result(c1 + 10)
      EndIf
      len = (*End - *Start) / SizeOf(character)
      If Len > 0
        Result(c1) = Trim(PeekS(*Start, len))
      Else
        Result(c1) = ""
      EndIf
      *Start = *String + SizeOf(character)
      do = #False
    EndIf
    *String + SizeOf(character)
  Until exit
  
  ReDim Result(c1)
  ProcedureReturn c1
  
EndProcedure

; ----

Procedure StringBetweenList(String.s, Left.s, Right.s, List Result.s())
  Protected pos1, pos2, len1, len2
  
  ClearList(Result())
  len1 = Len(Left)
  len2 = Len(Right)
  
  Repeat
    pos1 = FindString(String, Left, pos1)
    If pos1
      pos1 + len1
      pos2 = FindString(String, Right, pos1)
      If pos2
        AddElement(Result())
        Result() = Mid(String, pos1, pos2 - pos1)
        pos1 = pos2 + len2
      Else
        Break
      EndIf
    Else
      Break
    EndIf
  ForEver
  ProcedureReturn ListSize(Result())
  
EndProcedure

; ----

Procedure StringBetweenArray(String.s, Left.s, Right.s, Array Result.s(1))
  Protected pos1, pos2, len1, len2, size, count
  
  Dim Result(0)
  len1 = Len(Left)
  len2 = Len(Right)
  
  Repeat
    pos1 = FindString(String, Left, pos1)
    If pos1
      pos1 + len1
      pos2 = FindString(String, Right, pos1)
      If pos2
        If size < count
          size + 8
          ReDim Result(size)
        EndIf
        Result(count) = Mid(String, pos1, pos2 - pos1)
        count + 1
        pos1 = pos2 + len2
      Else
        Break
      EndIf
    Else
      Break
    EndIf
  ForEver
  If count > 0
    ReDim Result(count - 1)
  EndIf  
  ProcedureReturn count
EndProcedure

; ----

; v1.01.0, 25.08.2022

Procedure.s StringBefore(String.s, StringToFind.s, StartPosition = 1, Mode = #PB_String_CaseSensitive)
  Protected r1.s, pos.i
  
  pos = FindString(String, StringToFind, StartPosition, Mode)
  If pos
    r1 = Left(String, pos - 1)
  EndIf
  ProcedureReturn r1
EndProcedure

Procedure.s StringAfter(String.s, StringToFind.s, StartPosition = 1, Mode = #PB_String_CaseSensitive)
  Protected r1.s, pos.i
  
  pos = FindString(String, StringToFind, StartPosition, Mode)
  If pos
    pos + Len(StringToFind)
    r1 = Mid(String, pos)
  EndIf
  ProcedureReturn r1
EndProcedure

; v1.03.3, 30.08.2022

Procedure.s TrimChars(String.s, ReplaceString.s = " ", FindChars.s = " ")
  DisableDebugger
  
  Protected result.s, firstFound, found, size_max, size_replace
  Protected *result, *findChars, *c.Character, *fc.Character, *rc.character, *tc.character, *ofs
  
  If Not Bool(String)
    ProcedureReturn ""
  EndIf
  
  size_max = StringByteLength(String) + SizeOf(Character)
  size_replace = StringByteLength(ReplaceString)
  
  *result = AllocateMemory(size_max + size_replace, #PB_Memory_NoClear)
  *rc = *result
  *c = @String
  *tc = @ReplaceString
  *findChars = @FindChars
  
  While *c\c
    *fc = *findChars
    While *fc\c
      If *c\c = *fc\c
        If Not firstFound
          firstFound = #True
          If *tc\c
            CopyMemory(*tc, *rc, size_replace)
            *rc + size_replace
            If *rc - *result >= size_max
              *ofs = *rc - *result
              size_max = MemorySize(*result)
              *result = ReAllocateMemory(*result, size_max + size_replace, #PB_Memory_NoClear) 
              *rc = *result + *ofs
            EndIf
          EndIf
        EndIf
        found = #True
        Break
      EndIf
      *fc + SizeOf(Character)
    Wend
    
    If Not found
      If firstFound : firstFound = #False : EndIf
      *rc\c = *c\c
      *rc + SizeOf(character)
    Else
      found = #False
    EndIf
    *c + SizeOf(Character)
  Wend
  
  *rc\c = 0
  result = PeekS(*result)
  FreeMemory(*result)
  ProcedureReturn result
  
  EnableDebugger
EndProcedure

; ***************************************************************************************

;-Examples

CompilerIf #PB_Compiler_IsMainFile
  
  Global Dim a1.s(0)
  Global NewList l1.s()
  Global Dim p1.s(0)
  Global NewList r1.s()
  Global text.s, count
  
  text = "0;1x;2xx;'Text with separator (;)';4xxxx;5xxxxx;'Text with linefeed " + #LF$ + "and separator (;)';End"
  text = ReplaceString(text, "'", #DQUOTE$)
  Debug "Text to List = " + text
  Global count = SplitStringList(text, ";", l1(), #True)
  Debug "Count = " + count
  ForEach l1()
    Debug "Index " + ListIndex(l1()) + " = [" + l1() + "]"
  Next
  Debug "--------"
  
  Debug "Text to Array = " + text
  Global count = SplitStringArray(text, ";", a1(), #True)
  Debug "Count = " + count
  For index = 0 To count - 1
    Debug "Index " + index + " = [" + a1(index) + "]"
  Next
  Debug "--------"
  
  text = "MyFunction   (p1, p2, p3(1,2,3) ,p4,'10,20,30',r1,r2,r3(b1(x,y)),r4)"
  text = ReplaceString(text, "'", #DQUOTE$)
  Debug "Parameter to List = " + text
  count = SplitParameterList(text, r1(), #True)
  Debug "Count = " + count
  ForEach r1()
    Debug "Index " + ListIndex(r1()) + " = [" + r1() + "]"
  Next
  Debug "--------"
  
  Debug "Parameter to Array = " + text
  count = SplitParameterArray(text, p1())
  Debug "Count = " + count
  For index = 0 To count
    Debug "Index " + index + " = [" + p1(index) + "]"
  Next
  Debug "--------"
  
  text = "<title>Find me : title</title><title>Hello world!</title> Data 123456789 <title>Catch me</title><title>Error missing End"
  Debug "StringBetweenList = " + text
  count = StringBetweenList(text, "<title>", "</title>", r1())
  Debug "Count = " + count
  ForEach r1()
    Debug "Index " + ListIndex(r1()) + " = [" + r1() + "]"
  Next
  Debug "--------"
  
  Debug "StringBetweenArray = " + text
  ;text = "<title>Find me : title</title>"
  count = StringBetweenArray(text, "<title>", "</title>", p1())
  Debug "Count = " + count
  For index = 0 To count - 1
    Debug "Index " + index + " = [" + p1(index) + "]"
  Next
  Debug "--------"
  
  text = "string_before keyword string_after"
  Debug "StringBefore/After = " + text
  Debug "Before: " + StringBefore(text, " keyword ")
  Debug "After: " + StringAfter(text, " keyword ")
  Debug "--------"
  
  text = "       " + #CRLF$ + #CRLF$ + "    Hello      World!   " + #LFCR$ + #TAB$ + "    "
  Debug "TrimChars(text) = [" + TrimChars(text, " ", #CR$ + #LF$ + #TAB$ + " ") + "]"
  text ="p u r e b a s i c"
  Debug "TrimChars(text, '-') = [" + TrimChars(text, "-") + "]"
  Debug "TrimChars(text, '') = [" + TrimChars(text, "") + "]"
  Debug "--------"
  
CompilerEndIf
Last edited by mk-soft on Fri Dec 15, 2023 12:03 pm, edited 9 times in total.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: SplitString to list with option double-quotes (CSV, TXT)

Post by Sicro »

8)

In my opinion, it is better to use an array instead of a linked list. If I split a large string into several substrings, I want to have quick access to the substrings afterwards.

With an array you can jump directly to the array entry.
With a linked list every previous entry has to be jumped until my desired entry is reached, which causes a slow access time.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: SplitString to list with option double-quotes (CSV, TXT)

Post by mk-soft »

Thats right.

Now changed to both possible functions as List or Array

Update v1.04
- Added function SplitStringList and SplitStringArray
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Re: SplitString to list or array with option double-quotes (CSV)

Post by pdwyer »

cheers!

I did some perf comparisons with the three functions (parseCSV, regex and stringsplit) on a large CSV file (27mb, 50 cols, 100,000 rows)

at first I tested in debug mode out of the compiler and found that the regex was fasted, then I compiled them and tested and the regex was the slowest.

splitstring was the fastest at about 750ms, then parsecsv at about 1900ms then regex at about 4000.
regex ran the same in debug mode and the other two were about 5-6secs each
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: SplitString to list or array with option double-quotes (CSV)

Post by mk-soft »

Update v1.05
- Update SplitString: Minimal performance update
- Added SplitParameter: Public from EventDesigner
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: SplitString to list or array with option double-quotes (CSV)

Post by mk-soft »

Update v1.06.2
- Added StringBetweenList
- Added StringBetweenArray
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: SplitString to list or array with option double-quotes (CSV)

Post by Kwai chang caine »

Can be usefull
Works nice here :wink:
Thanks for sharing 8)
ImageThe happiness is a road...
Not a destination
User avatar
ChrisR
Addict
Addict
Posts: 1127
Joined: Sun Jan 08, 2017 10:27 pm
Location: France

Re: SplitString to list or array with option double-quotes (CSV)

Post by ChrisR »

Thanks mk-soft for this SplitString code, it works fine.
If you're interested or for others, I've slightly modified the 2 procedures SplitStringList and SplitStringArray (renamed here to SplitStringMList, SplitStringMArray) to accept multi-characters delimiters.

Code: Select all

;;-TOP
EnableExplicit

Procedure SplitStringMList(String.s, Separator.s, List Result.s(), DQuote = #False)
  Protected *String.character, *Separator.character
  Protected *Start, *End, exit, lock, do, dq, len, LenSeparator
  
  ClearList(Result())
  *String = @String
  *Separator = @Separator
  If *String\c And *Separator\c
    *Start = *String
    *End = *String
    LenSeparator = Len(Separator)
    If DQuote
      Repeat
        If *String\c = 0
          exit = #True
          do = #True
          If Not dq
            *End = *String
          EndIf
        Else
          If *String\c = '"'
            If Not lock
              lock = #True
              dq = #True
              *Start = *String + SizeOf(character)
            Else
              lock = #False
              *End = *String
            EndIf
          EndIf
          If *String\c = *Separator\c And Not lock
            If LenSeparator = 1 Or (MemoryStringLength(*String) >= LenSeparator And CompareMemoryString(*String, *Separator, #PB_String_CaseSensitive, LenSeparator) = #PB_String_Equal)
              do = #True
              If Not dq
                *End = *String
              EndIf
            EndIf
          EndIf
        EndIf
        If do
          AddElement(Result()) 
          len = (*End - *Start) / SizeOf(character)
          If Len > 0
            Result() = PeekS(*Start, len) 
          EndIf
          *String + LenSeparator * SizeOf(character)
          *Start = *String
          do = #False
          dq = #False
        Else
          *String + SizeOf(character)
        EndIf
      Until exit
    Else
      Repeat
        If *String\c = 0
          exit = #True
          do = #True
          *End = *String
        Else
          If *String\c = *Separator\c
            If LenSeparator = 1 Or (MemoryStringLength(*String) >= LenSeparator And CompareMemoryString(*String, *Separator, #PB_String_CaseSensitive, LenSeparator) = #PB_String_Equal)
              do = #True
              *End = *String
            EndIf
          EndIf
        EndIf
        If do
          AddElement(Result()) 
          len = (*End - *Start) / SizeOf(character)
          If Len > 0
            Result() = PeekS(*Start, len) 
          EndIf
          *String + LenSeparator * SizeOf(character)
          *Start = *String
          do = #False
        Else
          *String + SizeOf(character)
        EndIf
      Until exit
    EndIf
  EndIf
  ProcedureReturn ListSize(Result())
EndProcedure

Procedure SplitStringMArray(String.s, Separator.s, Array Result.s(1), DQuote = #False)
  Protected *String.character, *Separator.character
  Protected *Start, *End, exit, lock, do, dq, len, count, size, LenSeparator
  
  *String = @String
  *Separator = @Separator
  If *String\c And *Separator\c
    size = 7
    Dim Result(size)
    *Start = *String
    *End = *String
    LenSeparator = Len(Separator)
    If DQuote
      Repeat
        If *String\c = 0
          exit = #True
          do = #True
          If Not dq
            *End = *String
          EndIf
        Else
          If *String\c = '"'
            If Not lock
              lock = #True
              dq = #True
              *Start = *String + SizeOf(character)
            Else
              lock = #False
              *End = *String
            EndIf
          EndIf
          If *String\c = *Separator\c And Not lock
            If LenSeparator = 1 Or (MemoryStringLength(*String) >= LenSeparator And CompareMemoryString(*String, *Separator, #PB_String_CaseSensitive, LenSeparator) = #PB_String_Equal)
              do = #True
              If Not dq
                *End = *String
              EndIf
            EndIf
          EndIf
        EndIf
        If do
          If size < count
            size + 8
            ReDim Result(size)
          EndIf
          len = (*End - *Start) / SizeOf(character)
          If Len > 0
            Result(count) = PeekS(*Start, len) 
          EndIf
          *String + LenSeparator * SizeOf(character)
          *Start = *String
          count + 1
          do = #False
          dq = #False
        Else
          *String + SizeOf(character)
        EndIf
      Until exit
    Else
      Repeat
        If *String\c = 0
          exit = #True
          do = #True
          *End = *String
        Else
          If *String\c = *Separator\c
            If LenSeparator = 1 Or (MemoryStringLength(*String) >= LenSeparator And CompareMemoryString(*String, *Separator, #PB_String_CaseSensitive, LenSeparator) = #PB_String_Equal)
              do = #True
              *End = *String
            EndIf
          EndIf
        EndIf
        If do
          If size < count
            size + 8
            ReDim Result(size)
          EndIf
          len = (*End - *Start) / SizeOf(character)
          If Len > 0
            Result(count) = PeekS(*Start, len) 
          EndIf
          *String + LenSeparator * SizeOf(character)
          *Start = *String
          count + 1
          do = #False
        Else
          *String + SizeOf(character)
        EndIf
      Until exit
    EndIf
    ReDim Result(count - 1)
  EndIf
  ProcedureReturn count
EndProcedure

;-Example

CompilerIf #PB_Compiler_IsMainFile
  
  Global Dim a1.s(0)
  Global NewList l1.s()
  Global text.s, Index, count
  
  text.s = "0|-|1x|-|2xx|-|'Text with separator (|-|)'|-|4xxxx|-|5xxxxx|-|'Text with linefeed " + #LF$ + "and separator (|-|)'|-|End"   ; Use 1x|-2xx for testing instead of 1x|-|2xx)
  text = ReplaceString(text, "'", #DQUOTE$)
  Debug "Text to List (multi-characters separator: " +#DQUOTE$+ "|-|" +#DQUOTE$+ ") = " + text
  Global count = SplitStringMList(text, "|-|", l1(), #True)
  Debug "Count = " + count
  ForEach l1()
    Debug "Index " + ListIndex(l1()) + " = [" + l1() + "]"
  Next
  Debug "--------"
  
  text.s = "0|-|1x|-|2xx|-|'Text with separator (|-|)'|-|4xxxx|-|5xxxxx|-|'Text with linefeed " + #LF$ + "and separator (|-|)'|-|End"   ; Use 1x|-2xx for testing instead of 1x|-|2xx)
  text = ReplaceString(text, "'", #DQUOTE$)
  Debug "Text to Array (multi-characters separator: " +#DQUOTE$+ "|-|" +#DQUOTE$+ ") = " + text
  Global count = SplitStringMArray(text, "|-|", a1(), #True)
  Debug "Count = " + count
  For index = 0 To count - 1
    Debug "Index " + index + " = [" + a1(index) + "]"
  Next
  Debug "--------"
  
  text.s = "0;1x;2xx;'Text with separator (;)';4xxxx;5xxxxx;'Text with linefeed " + #LF$ + "and separator (;)';End"
  text = ReplaceString(text, "'", #DQUOTE$)
  Debug "Text to List (single-character separator: " +#DQUOTE$+ ";" +#DQUOTE$+ ") = " + text
  Global count = SplitStringMList(text, ";", l1(), #True)
  Debug "Count = " + count
  ForEach l1()
    Debug "Index " + ListIndex(l1()) + " = [" + l1() + "]"
  Next
  Debug "--------"
  
  text.s = "0;1x;2xx;'Text with separator (;)';4xxxx;5xxxxx;'Text with linefeed " + #LF$ + "and separator (;)';End"
  text = ReplaceString(text, "'", #DQUOTE$)
  Debug "Text to Array (single-character separator: " +#DQUOTE$+ ";" +#DQUOTE$+ ") = " + text
  Global count = SplitStringMArray(text, ";", a1(), #True)
  Debug "Count = " + count
  For index = 0 To count - 1
    Debug "Index " + index + " = [" + a1(index) + "]"
  Next
  Debug "--------"
  
CompilerEndIf
Edit: see mk-soft's answer below
Last edited by ChrisR on Thu Nov 10, 2022 12:42 pm, edited 2 times in total.
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: SplitString to list or array with option double-quotes (CSV)

Post by mk-soft »

@ChrisR

Small Bug. You must set the length of CompareMemoryString to "-1". Otherwise it can lead to a memory access error, because the comparison is also carried out after the zero characters.
If the length is set to '-1', the comparison is aborted at zero characters. See PB help ;)
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
ChrisR
Addict
Addict
Posts: 1127
Joined: Sun Jan 08, 2017 10:27 pm
Location: France

Re: SplitString to list or array with option double-quotes (CSV)

Post by ChrisR »

Yes, you're right. It is the case if the last *String characters are smaller than the separator length.
I fixed it in the code above. Thanks
User avatar
ChrisR
Addict
Addict
Posts: 1127
Joined: Sun Jan 08, 2017 10:27 pm
Location: France

Re: SplitString to list or array with option double-quotes (CSV)

Post by ChrisR »

Hello Mk-Soft,
Another small change, improvement, I do not really agree for the quotes (") and single-quotes (') support (ReplaceString(text, "'", #Dquote$), currently.
I mean, a string starting with a quote (") should end with a quote ("), the apostrophes (') between the 2 are kept as others characters.
and vice versa, a string starting with an apostrophe (') should end with an apostrophe ('), the quotes (") between the 2 being kept as others characters.
Up to you

Code: Select all

;-TOP
EnableExplicit

Procedure SplitStringMList(String.s, Separator.s, List Result.s(), DQuote = #False)
  Protected *String.character, *Separator.character
  Protected *Start, *End, exit, lock, do, dq, dsq, len, LenSeparator
  
  ClearList(Result())
  *String = @String
  *Separator = @Separator
  If *String\c And *Separator\c
    *Start = *String
    *End = *String
    LenSeparator = MemoryStringLength(*Separator)
    If DQuote
      Repeat
        If *String\c = 0
          exit = #True
          do = #True
          If Not dq And Not dsq
            *End = *String
          EndIf
        Else
          If *String\c = '"' And Not dsq
            If Not lock
              lock = #True
              dq = #True
              *Start = *String + SizeOf(character)
            Else
              lock = #False
              *End = *String
            EndIf
          ElseIf *String\c = 39 And Not dq   ; Single quote '
            If Not lock
              lock = #True
              dsq = #True
              *Start = *String + SizeOf(character)
            Else
              lock = #False
              *End = *String
            EndIf
          EndIf
          If *String\c = *Separator\c And Not lock
            If LenSeparator = 1 Or (MemoryStringLength(*String) >= LenSeparator And CompareMemoryString(*String, *Separator, #PB_String_CaseSensitive, LenSeparator) = #PB_String_Equal)
              do = #True
              If Not dq And Not dsq
                *End = *String
              EndIf
            EndIf
          EndIf
        EndIf
        If do
          AddElement(Result()) 
          len = (*End - *Start) / SizeOf(character)
          If Len > 0
            Result() = PeekS(*Start, len) 
          EndIf
          *String + LenSeparator * SizeOf(character)
          *Start = *String
          do  = #False
          dq  = #False
          dsq = #False
        Else
          *String + SizeOf(character)
        EndIf
      Until exit
    Else
      Repeat
        If *String\c = 0
          exit = #True
          do = #True
          *End = *String
        Else
          If *String\c = *Separator\c
            If LenSeparator = 1 Or (MemoryStringLength(*String) >= LenSeparator And CompareMemoryString(*String, *Separator, #PB_String_CaseSensitive, LenSeparator) = #PB_String_Equal)
              do = #True
              *End = *String
            EndIf
          EndIf
        EndIf
        If do
          AddElement(Result()) 
          len = (*End - *Start) / SizeOf(character)
          If Len > 0
            Result() = PeekS(*Start, len) 
          EndIf
          *String + LenSeparator * SizeOf(character)
          *Start = *String
          do = #False
        Else
          *String + SizeOf(character)
        EndIf
      Until exit
    EndIf
  EndIf
  ProcedureReturn ListSize(Result())
EndProcedure

Procedure SplitStringMArray(String.s, Separator.s, Array Result.s(1), DQuote = #False)
  Protected *String.character, *Separator.character
  Protected *Start, *End, exit, lock, do, dq, dsq, len, count, size, LenSeparator
  
  *String = @String
  *Separator = @Separator
  If *String\c And *Separator\c
    size = 7
    Dim Result(size)
    *Start = *String
    *End = *String
    LenSeparator = MemoryStringLength(*Separator)
    If DQuote
      Repeat
        If *String\c = 0
          exit = #True
          do = #True
          If Not dq And Not dsq
            *End = *String
          EndIf
        Else
          If *String\c = '"'  And Not dsq
            If Not lock
              lock = #True
              dq = #True
              *Start = *String + SizeOf(character)
            Else
              lock = #False
              *End = *String
            EndIf
          ElseIf *String\c = 39  And Not dq    ; Single quote '
            If Not lock
              lock = #True
              dsq = #True
              *Start = *String + SizeOf(character)
            Else
              lock = #False
              *End = *String
            EndIf
          EndIf
          If *String\c = *Separator\c And Not lock
            If LenSeparator = 1 Or (MemoryStringLength(*String) >= LenSeparator And CompareMemoryString(*String, *Separator, #PB_String_CaseSensitive, LenSeparator) = #PB_String_Equal)
              do = #True
              If Not dq And Not dsq
                *End = *String
              EndIf
            EndIf
          EndIf
        EndIf
        If do
          If size < count
            size + 8
            ReDim Result(size)
          EndIf
          len = (*End - *Start) / SizeOf(character)
          If Len > 0
            Result(count) = PeekS(*Start, len) 
          EndIf
          *String + LenSeparator * SizeOf(character)
          *Start = *String
          count + 1
          do = #False
          dq = #False
          dsq = #False
        Else
          *String + SizeOf(character)
        EndIf
      Until exit
    Else
      Repeat
        If *String\c = 0
          exit = #True
          do = #True
          *End = *String
        Else
          If *String\c = *Separator\c
            If LenSeparator = 1 Or (MemoryStringLength(*String) >= LenSeparator And CompareMemoryString(*String, *Separator, #PB_String_CaseSensitive, LenSeparator) = #PB_String_Equal)
              do = #True
              *End = *String
            EndIf
          EndIf
        EndIf
        If do
          If size < count
            size + 8
            ReDim Result(size)
          EndIf
          len = (*End - *Start) / SizeOf(character)
          If Len > 0
            Result(count) = PeekS(*Start, len) 
          EndIf
          *String + LenSeparator * SizeOf(character)
          *Start = *String
          count + 1
          do = #False
        Else
          *String + SizeOf(character)
        EndIf
      Until exit
    EndIf
    ReDim Result(count - 1)
  EndIf
  ProcedureReturn count
EndProcedure

;-Example

CompilerIf #PB_Compiler_IsMainFile
  
  Global Dim a1.s(0)
  Global NewList l1.s()
  Global text.s, Index, count
  
  text.s = "0|-|1x|-|2xx|-|'Text with separator " +#DQUOTE$+ "|-|" +#DQUOTE$+ "'|-|4xxxx|-|5xxxxx|-|" +#DQUOTE$+ "Text With linefeed " + #LF$ + "And separator '|-|'" +#DQUOTE$+ "|-|End|-"
  ;text.s = "1x|-|'Text with separator " +#DQUOTE$+ "|-|" +#DQUOTE$+ "'|-|2xx|-"
  Debug "Text to List (multi-characters separator: " +#DQUOTE$+ "|-|" +#DQUOTE$+ ") = " + text
  Global count = SplitStringMList(text, "|-|", l1(), #True)
  Debug "Count = " + count
  ForEach l1()
    Debug "Index " + ListIndex(l1()) + " = [" + l1() + "]"
  Next
  Debug "--------"
  
  text.s = "0|-|1x|-|2xx|-|'Text with separator " +#DQUOTE$+ "|-|" +#DQUOTE$+ "'|-|4xxxx|-|5xxxxx|-|" +#DQUOTE$+ "Text With linefeed " + #LF$ + "And separator '|-|'" +#DQUOTE$+ "|-|End|-"
  ;text.s = "1x|-|" +#DQUOTE$+ "Text with separator '|-|'" +#DQUOTE$+ "|-|2xx|-"
  Debug "Text to Array (multi-characters separator: " +#DQUOTE$+ "|-|" +#DQUOTE$+ ") = " + text
  Global count = SplitStringMArray(text, "|-|", a1(), #True)
  Debug "Count = " + count
  For index = 0 To count - 1
    Debug "Index " + index + " = [" + a1(index) + "]"
  Next
  Debug "--------"
  
  ; text.s = "0;1x;2xx;'Text with separator " +#DQUOTE$+ ";" +#DQUOTE$+ "';4xxxx;5xxxxx;" +#DQUOTE$+ "Text With linefeed " + #LF$ + "And separator ';'" +#DQUOTE$+ ";End"
  ; Debug "Text to List (single-character separator: " +#DQUOTE$+ ";" +#DQUOTE$+ ") = " + text
  ; Global count = SplitStringMList(text, ";", l1(), #True)
  ; Debug "Count = " + count
  ; ForEach l1()
  ;   Debug "Index " + ListIndex(l1()) + " = [" + l1() + "]"
  ; Next
  ; Debug "--------"
  ; 
  ; text.s = "0;1x;2xx;'Text with separator " +#DQUOTE$+ ";" +#DQUOTE$+ "';4xxxx;5xxxxx;" +#DQUOTE$+ "Text With linefeed " + #LF$ + "And separator ';'" +#DQUOTE$+ ";End"
  ; Debug "Text to Array (single-character separator: " +#DQUOTE$+ ";" +#DQUOTE$+ ") = " + text
  ; Global count = SplitStringMArray(text, ";", a1(), #True)
  ; Debug "Count = " + count
  ; For index = 0 To count - 1
  ;   Debug "Index " + index + " = [" + a1(index) + "]"
  ; Next
  ; Debug "--------"
  
CompilerEndIf
==>
text.s = "1x;'Text with separator " +#Dquote$+ ";" +#Dquote$+ "';2xx"
==> with ReplaceString(text, "'", #Dquote$)
Text to List = 1x;"Text with separator ";"";2xx
Count = 4 - Index0 = [1x] - Index1 = [Text with separator ] - Index2 = [] - Index3 = [2xx]
==> should be
Text to List = 1x;'Text with separator ";"';2xx
Count = 3 - Index0 = [1x] - Index1 = [Text with separator ";"] - Index2 = [2xx]
text.s = "1x;" +#Dquote$+ "Text with separator ';'" +#Dquote$+ ";2xx"
==> with ReplaceString(text, "'", #Dquote$)
Text to Array = 1x;"Text with separator ";"";2xx
Count = 4 - Index0 = [1x] - Index1 = [Text with separator ] - Index2 = [] - Index3 = [2xx]
==> should be
Text to Array = 1x;"Text with separator ';'";2xx
Count = 3 - Index0 = [1x] - Index1 = [Text with separator ';'] - Index2 = [2xx]
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: SplitString to list or array with option double-quotes (CSV)

Post by mk-soft »

This is not the norm. Double quotes are always used as brackets of fields when separators can occur in the field. This applies to all data text files.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
ChrisR
Addict
Addict
Posts: 1127
Joined: Sun Jan 08, 2017 10:27 pm
Location: France

Re: SplitString to list or array with option double-quotes (CSV)

Post by ChrisR »

I get it, no worries and sorry for the inconvenience, i don't want to spam your thread. I don't understand in a first reading why you use (Just get it, to get around the PB syntax):

Code: Select all

text.s = "0;1x;2xx;'Text with separator (;)';4xxxx;5xxxxx;'Text with linefeed " + #LF$ + "and separator (;)';End"
text = ReplaceString(text, "'", #DQUOTE$)

I agree that in most cases double quotes are used and it's what I usually use .
But personally, I prefer to have the possibility to mix the two, it does not change the behavior if double quotes are used.
So I'll keep to myself, if needed.

Single-quotes are sometimes used as in AutoIt3 help file, for example:
AutoIt3 help file wrote:Strings
Strings are enclosed in double-quotes like "this". If you want a string to actually contain a double-quote use it twice like:
- "here is a ""double-quote"" - ok?"

You can also use single-quotes like 'this' and 'here is a ' 'single-quote' ' - ok?'

You can mix quote types to make for easier working and to avoid having to double-up your quotes to get what you want. For example if you want to use a lot of double-quotes in your strings then you should use single-quotes for declaring them:
- 'This "sentence" contains "lots" of "double-quotes" does it not?'

is much simpler than:
- "This ""sentence"" contains ""lots"" of ""double-quotes"" does it not?"
Quin
Enthusiast
Enthusiast
Posts: 283
Joined: Thu Mar 31, 2022 7:03 pm
Location: United States
Contact:

Re: SplitString to list or array with option double-quotes (CSV)

Post by Quin »

Is there a version of this that can support regular expressions, or at the very least multi/partial delimiters? I want to split at line breaks, so \r\n?\r?\n, or if we're using multi-part delimiters, \r\n would match \r, \r, and \r\n. I can't seem to be able to do that with this function though without calling multiple times in just the right order. Is there another way?
Thanks!
PB v5.40/6.10, Windows 10 64-bit.
16-core AMD Ryzen 9 5950X, 128 GB DDR5.
AZJIO
Addict
Addict
Posts: 1318
Joined: Sun May 14, 2017 1:48 am

Re: SplitString to list or array with option double-quotes (CSV)

Post by AZJIO »

Quin wrote: Sat Mar 16, 2024 4:01 am multi/partial delimiters
SplitL2, SplitA2
Post Reply