Institutt for Informatikk
Universitetet i Bergen
INF 225 - Innføring i Programoversettelse - H07

Project Part 1 - Scanning

This part is to be solved and handed in individually.


Write a scanner which recognises tokens in a C- source file (see Appendix A.1). We recommend to write the scanner in JLex.


Deadline for hand in is Monday 17th September 10:00.

Contents of hand in

The answers must include:

The answers are to be mailed to Yngve Devik Hammersland (, no later than the specified deadline. Answers handed in after deadline will NOT be considered. After you have delivered the anwser there will be a day where you have to present and to explain your program on the computer. The dates will agreed with the group leader.

The problem

You are going to write a compiler for C- which is described in the text book, with a few changes:

  1. Nested comments should be allowed.
  2. Identifiers can contain numbers, but must contain at least one letter.
  3. A comment works as a white space. The line "void/*...*/main(void)" is thus allowed.
  4. You should take care of "error tokens". This means, if the scanner detects an error, you should document this by a special token and try to continue with scanning.

To test the scanner you will need an executable driver program which reads C- source code and write the tokens to stdout in exactly the following format:

  linenumber, start..end: token; attr= value\n

See also the example below. The driver program may very well be a main method in the Scanner class. The driver program is to be replaced by the parser in the next part of the project.

You have to write proper documentation for the scanner and the driver program. Make sure you include what problems you encounter in the implementation of the scanner and how you have solved them, as well as an overview of the classes of the scanner and how they may be used in the production of a parser.

Do not include general text about scanners, but restrict yourselves to your own code. If you use several classes, you have to explain what role each of them takes and how they interact.

You have to hand in the program as a jar file, so that we can run it.


Source code for C-:

main() {
  /* test-program */
  return 41;

Output from the driver program:

1, 1..4: Id; name="main"
1, 5: Vparen
1, 6: Hparen
1, 8: Vbrace
3, 3..8: Return
3, 10..11: Num; value=41
3, 12: Semicolon
4, 1: Hbrace


The list of tokens which must be recognised by the scanner is found on page 491-492 of the text book.



JLex is a tool for generating lexical parsers or scanners. JLex reads source files in its own format and outputs java code which may be used in larger programs like your compiler.

More information about Jlex can be found in the Handouts folder on the course page.


Jar (java archive) is a program to pack a bunch of class files into a single file. To make a jar file of all classes in the current directory you can run

  jar cf scanner.jar *.class

and to run the Scanner class from this file, you type

  java -cp scanner.jar Scanner 

Jar can do several other things as well, see the man page jar(1).