Parsers

Just starting out? Need help? Post your questions and find answers here.
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Parsers

Post by blueznl »

I know how to parse something slowly, I'm just wondering how to parse something fast :-)

As for an example, what would be the smartest / fastest way to parse a few lines, for example a little piece of PureBasic code, let's say I would be writing an interpreter processing the following:

Code: Select all

OpenWindow(1,10,10,100,100,''TEST'',#PB_Window_ScreenCentered|#PB_Window_SystemMenu)
a = 0
a = a+1
Debug a
Suggestions welcome :-)
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Parsers

Post by IdeasVacuum »

... depends on the purpose, no? If you were parsing a PB source code to create, say, a C/C++ source code, you'd need at least a double- barrel approach I think - a 'dictionary' of keywords which are a straight swap, and some intelligent functions, for each specific bespoke code method that is not shared by the two languages. Or have I misinterpreted your requirement?
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Re: Parsers

Post by blueznl »

Hmm. First step is probably to discern between commands and expressions etcetera.

Code: Select all

a = a+1
OpenWindow(...)
StartDrawing
So, how to discern between those variations?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
Little John
Addict
Addict
Posts: 4781
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Parsers

Post by Little John »

Hi blueznl,

does this code by srod do what you want (see also modification at the end of that page) ?

Regards, Little John
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Re: Parsers

Post by blueznl »

In a way, yes, but wouldn't it be faster to first discern between expressions and commands to speed up the process? I mean, what is the fastest way to see what is an expression and what is not?

Code: Select all

expressions...

a = a+1

commands (with or without parenthesis)...

StartDrawing
OpenWindow(...)
a = OpenWindow(...)

procedures...

MyProcedure(..)
For reasons I have to do this almost realtime, I can't parse all code then sit back and relax, unfortunately.

Edit: well, perhaps speed might not as much the issue as I thought, I need to rethink my strategy...
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Parsers

Post by Trond »

I'd like to know how you would do it slowly, so I know what the reference range for slow and fast is.
Normaly the slow way is recursive decent parsing, which is easy to write by hand. The fast way is to first provide a lexer (very simple parser that splits up the code into keywords, numbers, names, and so on) and parse the output of the lexer with a bottom-up parser like LR (see bottom-up parsing and lr parser on wikipedia).
tranquilChaos
New User
New User
Posts: 1
Joined: Sat Mar 12, 2011 3:33 pm

Re: Parsers

Post by tranquilChaos »

To get an understanding how to most efficiently do this, I would suggest you read-up on grammars - the most commonly discussed is Context Free Grammar (CFG). This will show you how to break your code into trees that can be parsed based on the grammar constructs.

To get started, I would say start smaller for now. Drop the OpenWindow(param, param, param) function call. Focus on an interpreter that will allow you to declare a variable and assign to it. It's kind of confusing but in execution, a function exists in the stack like a variable. A variable is a pointer to a memory location that holds data, a function is a pointer to a memory location that holds data (code to be evaluated). So just keep it simple for now.

There will be two components of the simple interpreter: tokens and symbols. Tokens are the operations, like = or +. You will need to build a list of tokens to operate on. You can put them in an array or linked list. The next part, the symbols, are anything the user defines, like variables; they are symbolic of a value. You will likely want to store them in a hash map so they can be quickly located.

So your first step (after you have created your token table and reserved space for symbols) will be to read in the line to parse. From your example:

a = 0

We want to break-up the tokens and symbols. This is called lexxing. This particular example is easy, we can split the string on spaces. The result should be an array with elements like this:

[a][=][0]

The next step is called parsing. You will evaluate each element and act on it in a way that makes sense for your interpreter. In this case, read in the first element and scan the token list. You have no token named 'a' so you know it's a symbol. Add an entry in the hash map for the symbol 'a'. You will need to do a check to see if the entry already exists. The next element IS in the token table and the code should know it's an assignment operator and it sets a flag that the next element is the value assigned to the symbol 'a'. Using this approach you could then add a simple function called 'PRINT' to your token table. Then if you add:

PRINT a

Using the same technique as above, it's lexxed as [PRINT][a], then the parser finds the token 'PRINT' and knows the next element is a symbol. It grabs 'a' from the symbol hash map and operates on it (in this case, printing the value of a to the screen).

Does this help? I've written a small C compiler in C++ for college about a year ago. I used linked list for my token look-ups and a balanced binary tree for my symbol table. It was really fast. I'm pretty new to PB so some of the terminology might be off, but the approach should be the same. I'm working on small projects to learn it right now, but I could put together a simple interpreter like this and share it if you're still interested.
inc.
Enthusiast
Enthusiast
Posts: 406
Joined: Thu May 06, 2004 4:28 pm
Location: Cologne/GER

Re: Parsers

Post by inc. »

Just have a look at Remi Meiers "Lexer" which does a superior job on tokenizing PB Sources, even '#' Marks within Macros are handled correctly.
Check out OOP support for PB here!
Post Reply