Institutt for Informatikk
Universitetet i Bergen
INF 225 - Introduction to program translation - H07
Project Part 2 - Recursive-Descent Parser
This part is highly recommended to be solved in groups of two, but each student is to hand in his/hers own program!
Goal
You are going to implement a recursive-descent parser recognising
valid C- programs, checking
whether the program is type correct, and outputting the syntax tree and
symbol table for the program.
If the program is invalid, an error message is to be printed. Note, that you have also to design a syntax tree
structue for C- suitable for generation by your parser.
Deadline
Deadline for hand-in is Monday October 22th 10:00.
Contents of hand-in
The answers handed in must include:
-
Complete, fully documented, source code. Including the code from part 1.
-
Archive file with all classes we need to run the program.
-
Class diagram for your syntax tree structure.
-
A description of your program, in a seperate text file. Emphasize the parts requiring special consideration and
report shortly your design decisions. Avoid repeating descriptions, even when your code is repeated.
Do not include general text about program analysis and restrict
yourselves to your own code.
-
Instructions on how to run your program (command line syntax and options).
-
Important! If you have discussed the assignment with another student, the name of that student must
be stated in the documentation.
The answers are to be mailed to Yngve Devik Hammersland (yngvedh@gmail.com),
no later than the specified deadline. Answers handed in after the deadline will NOT be considered.
The program
The program is to read source code for a C- program,
and print the abstract syntax tree and symbol table to standard out
after the program has finished.
Any errors found in the source code should be reported
to standard err. All output should be in English.
We give some examples of what the output may look like. The output from
your program should be very similar to this!
We should be able to run your parser by typing something like:
Java -cp mycompiler.jar Parse inputfile.c-
And your scanner by typing something like:
Java -cp mycompiler.jar Scan inputfile.c-
Specifications
Read the C- specifications on page 491-496 very, very carefully!
We also make a few additional specifications:
-
Local variables shadowing formal parameters are allowed, i.e. you may declare a function with a formal parameter
and a local variable of the same name, with the result that the formal parameter is of no use whatsoever.
In such a case, a warning should be issued.
-
Void variables, void arrays, and void parameters are not allowed (a void parameter list is of course allowed).
-
Identifiers may not be overloaded within the same scope, i.e., the same identifier cannot be used
both for a function and for a global variable. (You may of course declare local variables
with the same identifier as a variable or function in an outer scope.)
Error messages
-
When the parser finds errors during syntax analysis, it should determine the type and cause
of the errors as exactly as possible.
-
Each reference should be reported with reference to line number and column number in the source file.
-
The parser should attempt to recover after an error.
We require the program to continue parsing after some errors:
-
Errors in expression.
Encountering for instance
if (a <= b <= c) {
foobar(a + b);
}
the program shall report an error in the condition, and continue parsing to analyse the statement block.
- Multiple declarations.
If you write
int foobar;
void foobar(int b) { ... };
in the same block, the program should report that foobar is already defined, but continue.
Each multiple declaration should cause but one error message, so if you call foobar both as
a function and as a variable after this multiple declaration, the program should not report another error.
- Simple syntax errors
Encountering for instance
int foo
int bar[3;
if (a = b { ... };
the program should report exactly three errors. In these situations, the parser will have to make
some intelligent guesses based on follow sets. Reasonable error messages might be something like
(where x = line number and y = column number):
Syntax error: possible missing ';' after declaration on line x(y).
Syntax error: possible missing ']' in array declaration on line x(y).
Syntax error: possible missing ')' in expression on line x(y).
Symbol table
-
All symbols in the program must be reported after analysis, not only the global ones.
-
For each symbol, the program must report name and type.
-
Remember that parameters to functions work as local variables.
In the next hand-in, you will need more attributes for functions and local variables,
on top of those required here. It may be wise to take this into account even at this point.
To handle local variables, you may use a stack of symbol tables. When you come to a compound statement,
you add a new table on top of the stack, and when you reach the end, you pop the stack.
Each node in the syntax tree may keep a reference to the symbol table stack that is valid in
this particular node. Thus the symbol tables are available from the syntax tree, even if the
parser routine throws them away. To print the symbol tables, you may for example traverse
the syntax tree and print the top-most table in each node starting or ending a scope.
Type checking
-
All variables and functions must be declared before use.
-
Function calls must have the right number of parameters, which must be of correct type.
-
Assignments must be of compatible types (integers in practice).
-
A return statement must be of correct type, i.e., non-empty in int functions and empty in
void functions. You do however not need to check that an int function actually contains a return statement.
-
The last declaration in a program must be a function declaration of the form
void main(void)
Examples
We give you a few examples of what the output may look like. Your output doesn't have to look exactly
like this, but it should be close.
Please use the example files!
-
Test 1
test1.c-
test1_output.txt
A program to perform selection sort on a 10 element array.
No errors or difficulties, this file should pass scanner and parser without errors.
-
Test 2
test2.c-
test2_output.txt
A program that does absolutely nothing. The code is pointless and complicated, but the syntax is correct.
Should pass scanner and parser without errors.
-
Test 3
test3.c-
test3_output.txt
Another program that does absolutely nothing. The code contains exactly 10 errors and 1 warning.
We require your scanner to find all the errors, no more and no less.
-
Test 4
test4.c-
test4_output.txt
Yet another program that does absolutely nothing. The code contains exactly 8 syntax errors.
In the example output, all the 8 errors are found and correctly classified, without creating error cascades.
Naturally, we require your parser to report all the errors when found individually in otherwise correct code.
Due to the obvious problems with error cascading and error recovery, we require your parser to detect and
correctly classify at least 4 of the errors in this particular test file, while at the same time keeping
error cascading to a minimum.
Hint: The use of first and follow sets is vital when dealing with these errors.
Hints
-
Use the example files.
-
Start with the simple things, and expand the program as you go.
-
Write documentation and source code in parallel.
Afterwards everything appears simple or forgotten.
-
There is no one way to solve the problem.
-
Discuss the exercise with your partner.
-
Start early. This takes time.
-
Start early. This takes time.
-
Start early. This takes time.