Re: Compiler tutorial: suggest more stuff
Posted: Sat Jan 09, 2010 10:27 pm
Any news on more installments to the tutorial? So far it's been great!
http://www.purebasic.com
https://www.purebasic.fr/english/
Code: Select all
!MOV eax, 1
!PUSH eax
!MOV eax, 2
!POP ecx
!ADD eax, ecx
!PUSH eax
!MOV eax, 3
!POP ecx
!ADD eax, ecx
Code: Select all
!MOV eax, 1
!ADD eax, 2
!ADD eax, 3
The reason we do it this way is to allow for operator precedence (1+2*3=7, not 9) and parentheses later, without having to write complex code.4. Addition wrote:So our generated code for the expression "1+2" just got worse by a huge amount. Previously just two instructions, now it's five!
You just have to get used to this. It is possible to generate better code, but it is harder. For beginners, this will do. Remember, it's probably way faster than a normal interpreter would be.
Try the program with a few expressions and see if it generates valid code. That is the most important of all. If the generated code doesn’t work correctly, it doesn't matter it it's really, really fast.
The compiler has input an expression in normal form from your program. It has decided that it is grammatically correct and has converted it into postfix form. Now it has to understand it -- i.e. work out what it is asking for. The compiler does not actually perform the operations called for by the expression (that is done when you run the program) but it generates a stream of machine instructions that will have the effect of evaluating the expression. To give you a taste of what happens let me invent a totally fictitious compiler and equally fictitious machine language.
The expression (a + b)*(c + d ) would be converted into something like the following
Code: Select all
fetch value of a into register
add value of b to register
put result into temporary store T
fetch value of c into register
add value of d to register
multiply value of T into register
The point is that the actual machine operations are usually rather elementary things (on small computers there would probably be far more instructions used than in this example simply because the instructions are that much more elementary).
Let's get on with the problem. The beauty of postfix expressions is that they are very easy to evaluate. That's why I converted into postfix in the first place. And here is the evaluation algorithm, which once more uses the two `predicates' isvar(x) and isop(x):
Code: Select all
01 algorithm evaluate(s,n)
02 // s is a postfix string of length n
03 for i = 1 to n begin
04 if isvar( s(i) ) then push(value of s(i) )
05 if isop( s(i) ) then begin
06 x = pop
07 y = pop
08 do y s(i) x and push result (note the order)
09 end
10 end
11 x = pop
12 return x
13 end.
Also see:The basic action is this: as each variable appears in the expression its value is pushed onto the stack (4). When an operation appears (5) the top two values are taken off the stack (6,7) and this operation is performed on them. The result is pushed back onto the stack (8). This means that, at any stage, the next operator applies to the previous two values on the stack. At the end there should be just one value left in the stack -- the result. Pop this (11) and return it as the answer.
A compiler, as I said, does not actually perform the calculation -- you are not running the program yet. At line 8 the compiler will write the machine code for performing the operation, rather than actually performing it.
I can assure you that my approach is both the simplest and that it is correct. There is no rule which says how you have to do this. The only (and really the only) reason people convert into postfix notation is that they don't know how to easily compile infix expressions. But I do know how to easily compile infix expressions, so the conversion to postfix notation would just be a waste of time.I haven't really read much into your sources but your approach to "3. Expressions: a digit" and "4. Addition" is completely backwards.
The correct procedure is to convert your infix (normal left-right expressions) to postfix using a stack.
Woa! That is cheating big time. It basically makes the compiler not respect precedence rules, which is the problem we are trying to solve in the first place. They just skip the problem altogether.The difference between this and our previous infix syntax is that the original (1) and (2) have been compressed into the single production (1) which forces the presence of brackets.
You have been deceived. To compile via postfix you do two passes. The first from infix to postfix, the second from postfix to asm.Mistrel wrote:The reason postfix notation is efficient is because it can be done in a single pass.