Parsing issues

Just starting out? Need help? Post your questions and find answers here.
kinglestat
Enthusiast
Enthusiast
Posts: 754
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Parsing issues

Post by kinglestat »

I am trying to write an interpreter, but I am stuck on parsing. The idea of the code is to be compatible with Spiderbasic, which cannot use pointers and has a limited command set compares to purebasic. I have included the my code. I can't seem to get the parsing for nested brackets. I would appreciate some help.

Code: Select all


Enumeration
   #VAR_UNKNOWN
   #VAR_INTEGER
   #VAR_FLOAT
   #VAR_STRING
   #VAR_VARIANT
   #VAR_BRACKETS
EndEnumeration

Enumeration
   #REG_STRING
   #REG_BRACKET
EndEnumeration

Structure      stType
   name.s
   type.i
   i.i
   f.f
   s.s
EndStructure

Structure      stToken
   text.s
   type.i
   level.i
EndStructure

Structure      stLevel
   original.s
   level.i
   *p.stLevel
EndStructure

Global NewMap        mapToken.stToken()

Global               gTokens.i



Procedure            Evaluate( line.s )



EndProcedure

Procedure            ProcessBracket( string.s, List ll.stLevel() )
   Protected.s       temp, key
   Protected         i, j, k, level, total, t
   Protected Dim     res.s(0)
   Protected         *p.stLevel

   Debug #PB_Procedure

   total = ExtractRegularExpression( #REG_BRACKET, string, res() )
   
   If total
      If ListSize( ll() )
         *p    = ll()
         level = ll()\level
      EndIf
   
      AddElement( ll() )
         ll()\level     = level + 1
         ll()\original  = string
         ll()\p         = *p
   
      For i = 0 To total - 1
         gTokens + 1
         key = "$$Bracket" + Str( gTokens )
         
         AddMapElement( mapToken(), key, #PB_Map_NoElementCheck )
            mapToken()\text = res( i )
            mapToken()\type = #VAR_BRACKETS
            mapToken()\level= ll()\level
         
            Debug "Level: " + Str( ll()\level )
            Debug ll()\p\original + " , " + res( i )
            
            ;ll()\original     = ReplaceString( ll()\original, res( i ), key )
            ll()\p\original   = ReplaceString( ll()\p\original, res( i ), key )
            
            ProcessBracket( Mid( res( i ), 2 ), ll() )
               ;ll()\p\original   = ReplaceString( ll()\p\original, res( i ), key )
               ;res( i )          = key
            ;Else
            ;   PreviousElement( ll() )
            ;   Debug "No change"
            ;EndIf
      Next
   EndIf

   ProcedureReturn total
EndProcedure

Procedure.s          SplitSections( line.s )
   Protected.s       temp, key, string
   Protected         i, j, k, level, total, t
   Protected Dim     res.s(0)
   Protected NewList ll.stLevel()
   Protected         *p.stLevel
   
   total = ExtractRegularExpression( #REG_STRING, line, res() )
   
   If total
      For i = 0 To total - 1
         gTokens + 1
         key = "$$String" + Str( gTokens )
         
         AddMapElement( mapToken(), key, #PB_Map_NoElementCheck )
         mapToken()\text = res( i )
         mapToken()\type = #VAR_STRING
         line = ReplaceString( line, res(i), key )
      Next
   EndIf
   
   If FindString( line, Chr(34) )
      ; Syntax error; mismatched "
      ProcedureReturn "-1"
   EndIf
   
   AddElement( ll() )
      ll()\level = 0
      ll()\original = line

   If ProcessBracket( line, ll() )
      FirstElement( ll() )
      string = ll()\original
   EndIf

   
   ProcedureReturn string
   
EndProcedure

Procedure            Parse( line.s )
 
   Protected         pos.i
   Protected NewList llToken.stToken()

   ;Debug "1 " + Chr(34) + line + Chr(34)

   pos = Len( line )
   line = Left( line, pos - 1 )
   pos = 1
   
   ;Debug "2 " + Chr(34) + line + Chr(34)
   
   Debug "1." + line
   line = SplitSections( line )
   Debug "2." + line
   

EndProcedure


Procedure            Main()

   Protected         i
   Protected         pos
   Protected.s       temp, line
   
   ;If Not CreateRegularExpression( #REG_STRING, Chr(34) + "(.*?)" + Chr( 34 ) )
   If Not CreateRegularExpression( #REG_STRING, "'(.*?)'" )
      Debug RegularExpressionError()
   EndIf

   ;If Not CreateRegularExpression( #REG_BRACKET, "(1(?:\1??[^1]*?2))+" )
   If Not CreateRegularExpression( #REG_BRACKET, "\(((?>[^()]+)|(?R))*\)" )
      Debug RegularExpressionError()
   EndIf
   
   Restore program
   
   Repeat
      Read.s temp
      If temp = "--" : Break : EndIf
      
      Line + " " + temp
      
      If Right( temp, 1 ) = ";"
         parse( Trim( line ) )
         line = ""
      EndIf
   ForEver

EndProcedure


Main()

DataSection

program:
Data.s      "a=5"
Data.s      "b=6,c =7;"
Data.s      "d=a+b+c;"
Data.s      "('testing') 'hello' (cornflakes) ((a+(b-c))-(h+j));"
Data.s      "--"
Data.s      "(varA=Hello varB=World c=4) 6>>1 > c ? (varA+varB) : (varB+varA)"

;
; Language
; (const) ((default)int/float/string/variant)   <var>=value;
; while (condition ) / wend
;
EndDataSection
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: Parsing issues

Post by Tenaja »

I haven't looked at your code, but to handle nested brackets, it needs to be recursive descent. (Essentially, a bracket restarts the parse recursively with everything staying on the stack). Every "level" of priority starts a new function.

There's a million examples online, but Crenshaw has an excellent tutorial "how to build a compiler", which is an easy search. I successfully made a compiler of my own using his tutorial.
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Re: Parsing issues

Post by srod »

Yes when you actually get into it, a recursive descent parser can be quite simple and very very effective. Far more flexible than reverse-polish notation, for example. You do need to lay your data structures out very carefully though and have them set up to allow for recursion, but once you get it all mapped out, the code becomes quite straight forward.
I may look like a mule, but I'm not a complete ass.
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Parsing issues

Post by Josh »

To resolve a term correctly, I see three possibilities.
  1. Converting the term from an infix to a prefix/postfix notation using the shunting yard algorithm. We have to differentiate here:
    • Use in a compiler: An absolute must, since a large part of the work can be done at compile time and is therefore faster at runtime.
    • Use in an interpreter: In my opinion not recommended, because the effort is simply too great.
  2. Recursive procedure call for every opening parenthesis.
  3. Use of a queue. This is my personal favorite, because I feel heaving more control over the code when an error occur. Here is a very, very simplified example for creating a queue:

    Code: Select all

    EnableExplicit
    
    Structure PAIRS
      BracketOpen .i
      BracketClose.i
    EndStructure
    
    
    NewList Tokens$()
    NewList StackBrackets()
    NewList QueueBrackets.PAIRS()
    Define Term$
    Define IndexOpen
    Define IndexClose
    Define i
    
    
    Term$ = "2 * ( ( 3 + 4 ) * ( 5 - 6 ) + 7 )"
    
    For i = 1 To 17
      AddElement (Tokens$())
      Tokens$() = StringField (Term$, i, " ")
    Next
    
    ForEach Tokens$()
      Select Tokens$()
        Case "("
          AddElement (StackBrackets())
          StackBrackets() = @Tokens$()
        Case ")"
          AddElement (QueueBrackets())
          QueueBrackets()\BracketOpen  = StackBrackets()
          QueueBrackets()\BracketClose = @Tokens$()
          DeleteElement (StackBrackets())
      EndSelect
    Next
    
    ForEach QueueBrackets()
      ChangeCurrentElement (Tokens$(), QueueBrackets()\BracketOpen)  : IndexOpen  = ListIndex (Tokens$()) + 1
      ChangeCurrentElement (Tokens$(), QueueBrackets()\BracketClose) : IndexClose = ListIndex (Tokens$()) + 1
      Debug "" + IndexOpen + " - " + IndexClose
    Next
    
sorry for my bad english
Post Reply