Institutt for Informatikk
Universitetet i Bergen
INF 225 - Innføring i Programoversettelse - H04

Project Part 1 - Scanning

This part is to be solved and handed in individually.


Write a scanner which recognises tokens in a C- source file. We recommend to write the scanner in JLex, but you can also use Flex or Flex++, respectively.


Deadline for hand in is Monday 13th September 12:00 am.

Contents of hand in

The answers must include:

The answers are to be mailed to Magnus Hoff (, no later than the specified deadline. Answers handed in after deadline will NOT be considered.

The problem

You are going to write a compiler for C- which is described in the text book, with a few changes:

  1. Nested comments should be allowed.
  2. Identifiers can contain numbers, but must contain at least one letter.
  3. A comment works as a white space. The line "void/*...*/main(void)" is thus allowed.
  4. You should take care of "error tokens". This means, if the scanner detects an error, you should document this by a special token and try to continue with scanning.

To test the scanner you will need an executable driver program which reads C- source code and write the tokens to stdout in exactly the following format:

  linenumber, start..end: token; attr= value\n

See also the example below. The driver program may very well be a main method in the Scanner class. The driver program is to be replaced by the parser in the next part of the project.

You have to write proper documentation for the scanner and the driver program. Make sure you include what problems you encounter in the implementation of the scanner and how you have solved them, as well as an overview of the classes of the scanner and how they may be used in the production of a parser.

Do not include general text about scanners, but restrict yourselves to your own code. If you use several classes, you have to explain what role each of them takes and how they interact.

You have to hand in the program as a jar file, so that we can run it.


Source code for C-:

main() {
  /* test-program */
  return 41;

Output from the driver program:

1, 1..4: Id; name="main"
1, 5: Vparen
1, 6: Hparen
1, 8: Vbrace
3, 3..8: Return
3, 10..11: Num; value=41
3, 12: Semicolon
4, 1: Hbrace


The list of tokens which must be recognised by the scanner is found on page 491-492 of the text book.



JLex is a tool for generating lexical parsers or scanners. JLex reads source files in its own format and outputs java code which may be used in larger programs like your compiler. (There is also FLex which generates C code from a format similar to that of JLex or FLex++ which generates C++ code, respectively.)

More information about Jlex can be found in the Handouts folder on the course page.


Jar (java archive) is a program to pack a bunch of class files into a single file. To make a jar file of all classes in the current directory you can run

  jar cf scanner.jar *.class

and to run the Scanner class from this file, you type

  java -cp scanner.jar Scanner 

Jar can do several other things as well, see the man page jar(1).