[All Platforms] - String Tokeniser (OOP Paradigm)

Share your advanced PureBasic knowledge/code with the community.
Env
Enthusiast
Enthusiast
Posts: 151
Joined: Tue Apr 27, 2010 3:20 pm
Location: Wales, United Kingdom

[All Platforms] - String Tokeniser (OOP Paradigm)

Post by Env »

Hey again,

Here's a little piece of code that provides the programmer with a String Tokeniser (Tokenizer for our non-Brits) interface, providing an OOP-Style method for tokenising strings.

With the ability of adding multiple delimiters, this 'Object' or 'Class' will provide the programmer something a little more advanced than StringField().

Instantiation
  • IStringTokeniser_New() - Create a new instance of the tokeniser.
  • IStringTokeniser_ReleaseAll() - Release ALL instances of the tokeniser.
Methods
  • AddDelimiter(Delimiter$) - Add a delimiter to the list. You can specify multiple delimiters per single call of this method.
  • RemoveDelimiter(Delimiter$) - Remove a delimiter from the list. You can specify multiple delimiters per single call of this method.
  • IsDelimiter(Delimiter$) - Check if the passed Delimiter$ (Supports Single Character) is a defined delimiter.
  • ParseString(String$) - Parse a string through the tokeniser. Returns number of tokens found.
  • FirstToken() - Get the first found token. Returns an empty string if no tokens found.
  • NextToken() - Get the next found token. Returns an empty string if no more tokens exist.
  • TokenPosition() - Get the starting position in the parsed string of the current token.
  • Release() - Release this instance of the tokeniser.
Code (Contains Usage Example)

Code: Select all

; ----------------------------------------------------------------------------------------------------
; Title:        String Tokeniser (OOP Interface)
; Description:  Interface that provides a String Tokeniser.
; Author(s):    Michael R. King (mrking2910@gmail.com)
; Revision:     1
; Support:      Cross-Platform
;
; Notes:        Instances stored in a Linked List. Use *Object/Release() to free the allocation.
;
; ----------------------------------------------------------------------------------------------------

EnableExplicit

CompilerIf Defined(_PBI_ISTRINGTOKENISER_, #PB_Constant) = #False
  #_PBI_ISTRINGTOKENISER_ = #True
  
  ; - Interface Definition -
  Interface IStringTokeniser
    AddDelimiter(Delimiter$)      ; - Add a delimiter to the list. You can specify multiple delimiters per single call of this method.
    RemoveDelimiter(Delimiter$)   ; - Remove a delimiter from the list. You can specify multiple delimiters per single call of this method.
    IsDelimiter.a(Delimiter$)     ; - Check if the passed Delimiter$ (Single Character) is a defined delimiter.
    ParseString.l(String$)        ; - Parse a string through the tokeniser.  Returns number of tokens found.
    FirstToken.s()                ; - Get the first found token.  Returns an empty string if no tokens found.
    NextToken.s()                 ; - Get the next found token.  Returns an empty string if no more tokens exist.
    TokenPosition.l()             ; - Get the starting position in the parsed string of the current token.
    Release()                     ; - Release this instance of the tokeniser.
  EndInterface
  
  ; - Structures -
  Structure __IStringTokeniser_Token
    m_token.s
    m_location.l
  EndStructure
  
  ; - Instance Data -
  Structure __IStringTokeniser_Instance
    *m_funcTable
    *m_cToken.__IStringTokeniser_Token
    List m_delimiter.s()
    List m_token.__IStringTokeniser_Token()
  EndStructure
  
  ; - Instantiation List -
  Global NewList __IStringTokeniser_InstanceList.__IStringTokeniser_Instance()
  
  ; - Instantiation -
  Procedure.i IStringTokeniser_New()
    Protected *obj.__IStringTokeniser_Instance = AddElement(__IStringTokeniser_InstanceList())
    With *obj
      \m_funcTable = ?__IStringTokeniser_FuncTable
      \m_cToken = #Null
    EndWith
    ProcedureReturn *obj
  EndProcedure
  
  Procedure IStringTokeniser_ReleaseAll()
    ForEach __IStringTokeniser_InstanceList()
      With __IStringTokeniser_InstanceList()
        FreeList(\m_delimiter())
        FreeList(\m_token())
      EndWith
    Next
    ClearList(__IStringTokeniser_InstanceList())
  EndProcedure
  
  Procedure __IStringTokeniser_Release(*obj.__IStringTokeniser_Instance)
    ForEach __IStringTokeniser_InstanceList()
      If @__IStringTokeniser_InstanceList() = *obj
        With *obj
          FreeList(\m_delimiter())
          FreeList(\m_token())
          DeleteElement(__IStringTokeniser_InstanceList())
        EndWith
      EndIf
    Next
  EndProcedure
  
  ; - Object Methods -
  Procedure __IStringTokeniser_AddDelimiter(*obj.__IStringTokeniser_Instance, Delimiter$)
    Protected *this.IStringTokeniser = *obj
    Protected dCnt.l, dIx.l, d.s
    With *obj
      dCnt = Len(Delimiter$)
      If dCnt > 0
        If dCnt > 1
          For dIx = 1 To dCnt
            d = Mid(Delimiter$, dIx, 1)
            *this\AddDelimiter(d)
          Next
        Else
          If *this\IsDelimiter(Delimiter$) = #False
            AddElement(\m_delimiter())
            \m_delimiter() = Delimiter$
          EndIf
        EndIf
      EndIf
    EndWith
  EndProcedure
  
  Procedure __IStringTokeniser_RemoveDelimiter(*obj.__IStringTokeniser_Instance, Delimiter$)
    Protected *this.IStringTokeniser = *obj
    Protected dCnt.l, dIx.l, d.s
    With *obj
      dCnt = Len(Delimiter$)
      If dCnt > 0
        If dCnt > 1
          For dIx = 1 To dCnt
            d = Mid(Delimiter$, dIx, 1)
            *this\RemoveDelimiter(d)
          Next
        Else
          ForEach \m_delimiter()
            If \m_delimiter() = Delimiter$
              DeleteElement(\m_delimiter())
              ProcedureReturn
            EndIf
          Next
        EndIf
      EndIf
    EndWith
  EndProcedure
  
  Procedure.a __IStringTokeniser_IsDelimiter(*obj.__IStringTokeniser_Instance, Delimiter$)
    Protected *this.IStringTokeniser = *obj
    With *obj
      ForEach \m_delimiter()
        If \m_delimiter() = Delimiter$
          ProcedureReturn #True
        EndIf
      Next
    EndWith
    ProcedureReturn #False
  EndProcedure
  
  Procedure __IStringTokeniser_ParseString(*obj.__IStringTokeniser_Instance, String$)
    Protected *this.IStringTokeniser = *obj
    Protected cCnt.l, cIx.l, c.s, tCnt.l, t.s, tLoc.l
    tLoc = -1
    With *obj
      ClearList(\m_token())
      cCnt = Len(String$)
      For cIx = 1 To cCnt + 1
        c = Mid(String$, cIx, 1)
        If *this\IsDelimiter(c) Or cIx = (cCnt + 1)
          If Len(t) > 0
            AddElement(\m_token())
            \m_token()\m_token = t
            \m_token()\m_location = tLoc
          EndIf
          t = ""
          tLoc = -1
        Else
          If tLoc = -1
            tLoc = cIx
          EndIf
          t = t + c
        EndIf
      Next
      ProcedureReturn ListSize(\m_token())
    EndWith
  EndProcedure
  
  Procedure.s __IStringTokeniser_FirstToken(*obj.__IStringTokeniser_Instance)
    Protected *this.IStringTokeniser = *obj
    With *obj
      If FirstElement(\m_token())
        \m_cToken = @\m_token()
        ProcedureReturn \m_cToken\m_token
      Else
        \m_cToken = #Null
      EndIf
      ProcedureReturn ""
    EndWith
  EndProcedure
  
  Procedure.s __IStringTokeniser_NextToken(*obj.__IStringTokeniser_Instance)
    Protected *this.IStringTokeniser = *obj
    With *obj
      If NextElement(\m_token())
        \m_cToken = @\m_token()
        ProcedureReturn \m_cToken\m_token
      Else
        \m_cToken = #Null
      EndIf
      ProcedureReturn ""
    EndWith
  EndProcedure
  
  Procedure.l __IStringTokeniser_TokenPosition(*obj.__IStringTokeniser_Instance)
    Protected *this.IStringTokeniser = *obj
    With *obj
      If \m_cToken <> #Null
        ProcedureReturn \m_cToken\m_location
      EndIf
      ProcedureReturn -1
    EndWith
  EndProcedure
  
  ; - Method Table -
  DataSection
    __IStringTokeniser_FuncTable:
    Data.i @__IStringTokeniser_AddDelimiter()
    Data.i @__IStringTokeniser_RemoveDelimiter()
    Data.i @__IStringTokeniser_IsDelimiter()
    Data.i @__IStringTokeniser_ParseString()
    Data.i @__IStringTokeniser_FirstToken()
    Data.i @__IStringTokeniser_NextToken()
    Data.i @__IStringTokeniser_TokenPosition()
    Data.i @__IStringTokeniser_Release()
  EndDataSection
  
CompilerEndIf ;_PBI_ISTRINGTOKENISER_

; -------------------------------------------------------------------------------------
; - DEMONSTRATION CODE - DEMONSTRATION CODE - DEMONSTRATION CODE - DEMONSTRATION CODE -
; -------------------------------------------------------------------------------------

; - Define Test String -
Define.s TestString = "Hello World. This is a test string!"

; - Configure Tokeniser 
Define *Tokeniser.IStringTokeniser = IStringTokeniser_New()
*Tokeniser\AddDelimiter(" ")

; - Parse String -

Debug "Parsing String: '" + TestString + "'."
*Tokeniser\ParseString(TestString)

; - Output Results -
Define.s Token = *Tokeniser\FirstToken()
While Token
  
  Debug "Token: '" + Token + "' found at: [" + Str(*Tokeniser\TokenPosition()) + "]."
  Token = *Tokeniser\NextToken()
  
Wend

; - Release Tokeniser -
*Tokeniser\Release()
I encourage you to rip this apart, and offer suggestions, advice and constructive criticism... after all, a better solution to the problem is a better solution. ;)

Thanks :D


Alternatively, a Procedural version of this tokeniser can be found Here

Note - With OOP being a pretty taboo thing in the land of PureBasic, it is entirely your choice to use this code. Please don't open a debate about To OOP or Not To OOP. :P
Thanks!
buddymatkona
Enthusiast
Enthusiast
Posts: 252
Joined: Mon Aug 16, 2010 4:29 am

Re: [All Platforms] - String Tokeniser (OOP Paradigm)

Post by buddymatkona »

I like it. :)
A slightly different approach in the spirit of OOP: Return an Object which in this case could be just a PB List.

Code: Select all

*Tokeniser\Get_Tokens(TestString$,Separator$,MyTokenList.s())
ForEach MyTokenList() : MessageRequester("Token ", MyTokenList()): Next
Env
Enthusiast
Enthusiast
Posts: 151
Joined: Tue Apr 27, 2010 3:20 pm
Location: Wales, United Kingdom

Re: [All Platforms] - String Tokeniser (OOP Paradigm)

Post by Env »

buddymatkona wrote:I like it. :)
A slightly different approach in the spirit of OOP: Return an Object which in this case could be just a PB List.

Code: Select all

*Tokeniser\Get_Tokens(TestString$,Separator$,MyTokenList.s())
ForEach MyTokenList() : MessageRequester("Token ", MyTokenList()): Next

For something that simple, it would be more advisable to use the functional version of this tokeniser: http://purebasic.fr/english/viewtopic.php?f=12&t=48440...

Same functionality, but instead of using an OOP-style interface, you call a single function, and it will output the resulting tokens into a list for you.
Thanks!
Post Reply