Page 1 of 1

Parsing issues

Posted: Thu Feb 20, 2020 9:55 am
by kinglestat
I am trying to write an interpreter, but I am stuck on parsing. The idea of the code is to be compatible with Spiderbasic, which cannot use pointers and has a limited command set compares to purebasic. I have included the my code. I can't seem to get the parsing for nested brackets. I would appreciate some help.

Code: Select all


Enumeration
   #VAR_UNKNOWN
   #VAR_INTEGER
   #VAR_FLOAT
   #VAR_STRING
   #VAR_VARIANT
   #VAR_BRACKETS
EndEnumeration

Enumeration
   #REG_STRING
   #REG_BRACKET
EndEnumeration

Structure      stType
   name.s
   type.i
   i.i
   f.f
   s.s
EndStructure

Structure      stToken
   text.s
   type.i
   level.i
EndStructure

Structure      stLevel
   original.s
   level.i
   *p.stLevel
EndStructure

Global NewMap        mapToken.stToken()

Global               gTokens.i



Procedure            Evaluate( line.s )



EndProcedure

Procedure            ProcessBracket( string.s, List ll.stLevel() )
   Protected.s       temp, key
   Protected         i, j, k, level, total, t
   Protected Dim     res.s(0)
   Protected         *p.stLevel

   Debug #PB_Procedure

   total = ExtractRegularExpression( #REG_BRACKET, string, res() )
   
   If total
      If ListSize( ll() )
         *p    = ll()
         level = ll()\level
      EndIf
   
      AddElement( ll() )
         ll()\level     = level + 1
         ll()\original  = string
         ll()\p         = *p
   
      For i = 0 To total - 1
         gTokens + 1
         key = "$$Bracket" + Str( gTokens )
         
         AddMapElement( mapToken(), key, #PB_Map_NoElementCheck )
            mapToken()\text = res( i )
            mapToken()\type = #VAR_BRACKETS
            mapToken()\level= ll()\level
         
            Debug "Level: " + Str( ll()\level )
            Debug ll()\p\original + " , " + res( i )
            
            ;ll()\original     = ReplaceString( ll()\original, res( i ), key )
            ll()\p\original   = ReplaceString( ll()\p\original, res( i ), key )
            
            ProcessBracket( Mid( res( i ), 2 ), ll() )
               ;ll()\p\original   = ReplaceString( ll()\p\original, res( i ), key )
               ;res( i )          = key
            ;Else
            ;   PreviousElement( ll() )
            ;   Debug "No change"
            ;EndIf
      Next
   EndIf

   ProcedureReturn total
EndProcedure

Procedure.s          SplitSections( line.s )
   Protected.s       temp, key, string
   Protected         i, j, k, level, total, t
   Protected Dim     res.s(0)
   Protected NewList ll.stLevel()
   Protected         *p.stLevel
   
   total = ExtractRegularExpression( #REG_STRING, line, res() )
   
   If total
      For i = 0 To total - 1
         gTokens + 1
         key = "$$String" + Str( gTokens )
         
         AddMapElement( mapToken(), key, #PB_Map_NoElementCheck )
         mapToken()\text = res( i )
         mapToken()\type = #VAR_STRING
         line = ReplaceString( line, res(i), key )
      Next
   EndIf
   
   If FindString( line, Chr(34) )
      ; Syntax error; mismatched "
      ProcedureReturn "-1"
   EndIf
   
   AddElement( ll() )
      ll()\level = 0
      ll()\original = line

   If ProcessBracket( line, ll() )
      FirstElement( ll() )
      string = ll()\original
   EndIf

   
   ProcedureReturn string
   
EndProcedure

Procedure            Parse( line.s )
 
   Protected         pos.i
   Protected NewList llToken.stToken()

   ;Debug "1 " + Chr(34) + line + Chr(34)

   pos = Len( line )
   line = Left( line, pos - 1 )
   pos = 1
   
   ;Debug "2 " + Chr(34) + line + Chr(34)
   
   Debug "1." + line
   line = SplitSections( line )
   Debug "2." + line
   

EndProcedure


Procedure            Main()

   Protected         i
   Protected         pos
   Protected.s       temp, line
   
   ;If Not CreateRegularExpression( #REG_STRING, Chr(34) + "(.*?)" + Chr( 34 ) )
   If Not CreateRegularExpression( #REG_STRING, "'(.*?)'" )
      Debug RegularExpressionError()
   EndIf

   ;If Not CreateRegularExpression( #REG_BRACKET, "(1(?:\1??[^1]*?2))+" )
   If Not CreateRegularExpression( #REG_BRACKET, "\(((?>[^()]+)|(?R))*\)" )
      Debug RegularExpressionError()
   EndIf
   
   Restore program
   
   Repeat
      Read.s temp
      If temp = "--" : Break : EndIf
      
      Line + " " + temp
      
      If Right( temp, 1 ) = ";"
         parse( Trim( line ) )
         line = ""
      EndIf
   ForEver

EndProcedure


Main()

DataSection

program:
Data.s      "a=5"
Data.s      "b=6,c =7;"
Data.s      "d=a+b+c;"
Data.s      "('testing') 'hello' (cornflakes) ((a+(b-c))-(h+j));"
Data.s      "--"
Data.s      "(varA=Hello varB=World c=4) 6>>1 > c ? (varA+varB) : (varB+varA)"

;
; Language
; (const) ((default)int/float/string/variant)   <var>=value;
; while (condition ) / wend
;
EndDataSection

Re: Parsing issues

Posted: Thu Feb 20, 2020 5:15 pm
by Tenaja
I haven't looked at your code, but to handle nested brackets, it needs to be recursive descent. (Essentially, a bracket restarts the parse recursively with everything staying on the stack). Every "level" of priority starts a new function.

There's a million examples online, but Crenshaw has an excellent tutorial "how to build a compiler", which is an easy search. I successfully made a compiler of my own using his tutorial.

Re: Parsing issues

Posted: Fri Feb 21, 2020 7:10 pm
by srod
Yes when you actually get into it, a recursive descent parser can be quite simple and very very effective. Far more flexible than reverse-polish notation, for example. You do need to lay your data structures out very carefully though and have them set up to allow for recursion, but once you get it all mapped out, the code becomes quite straight forward.

Re: Parsing issues

Posted: Sat Feb 22, 2020 11:37 am
by Josh
To resolve a term correctly, I see three possibilities.
  1. Converting the term from an infix to a prefix/postfix notation using the shunting yard algorithm. We have to differentiate here:
    • Use in a compiler: An absolute must, since a large part of the work can be done at compile time and is therefore faster at runtime.
    • Use in an interpreter: In my opinion not recommended, because the effort is simply too great.
  2. Recursive procedure call for every opening parenthesis.
  3. Use of a queue. This is my personal favorite, because I feel heaving more control over the code when an error occur. Here is a very, very simplified example for creating a queue:

    Code: Select all

    EnableExplicit
    
    Structure PAIRS
      BracketOpen .i
      BracketClose.i
    EndStructure
    
    
    NewList Tokens$()
    NewList StackBrackets()
    NewList QueueBrackets.PAIRS()
    Define Term$
    Define IndexOpen
    Define IndexClose
    Define i
    
    
    Term$ = "2 * ( ( 3 + 4 ) * ( 5 - 6 ) + 7 )"
    
    For i = 1 To 17
      AddElement (Tokens$())
      Tokens$() = StringField (Term$, i, " ")
    Next
    
    ForEach Tokens$()
      Select Tokens$()
        Case "("
          AddElement (StackBrackets())
          StackBrackets() = @Tokens$()
        Case ")"
          AddElement (QueueBrackets())
          QueueBrackets()\BracketOpen  = StackBrackets()
          QueueBrackets()\BracketClose = @Tokens$()
          DeleteElement (StackBrackets())
      EndSelect
    Next
    
    ForEach QueueBrackets()
      ChangeCurrentElement (Tokens$(), QueueBrackets()\BracketOpen)  : IndexOpen  = ListIndex (Tokens$()) + 1
      ChangeCurrentElement (Tokens$(), QueueBrackets()\BracketClose) : IndexClose = ListIndex (Tokens$()) + 1
      Debug "" + IndexOpen + " - " + IndexClose
    Next