COP 5621 - Homework 1 - Lexical Analyzer
Due Monday, September 23
Read the description of the Tiger lexical analyzer on pages 34-35.
To find out details of Tiger's tokens, you will also need to refer
to the Appendix on pages 522-531.
You can view the JLex reference manual at
http://www.cs.princeton.edu/~appel/modern/java/JLex/.
As mentioned in the syllabus, you need to set some UNIX environment
variables to access JLex.
Add the following lines to your .cshrc file:
setenv JAVA_HOME /depot/J2SE-1.4
setenv CLASSPATH "/homes/smithg/javatools:."
My /homes/smithg/javatools/tiger/chap2 directory contains
a number of files to help you; you can copy them to your directory
with the command
cp -rp /homes/smithg/javatools/tiger/chap2 .
The only file that you need to modify is Parse/Tiger.lex,
which contains the JLex specification.
Here are a few tips:
- Your lexer must return tokens with class
java_cup/runtime/Symbol.
You can create such objects by calling the class constructor
or by using the handy method tok defined in Tiger.lex.
But note that tok won't give the correct position
for complicated tokens like string literals, which aren't recognized
all at once--on such tokens, you'll need to call the class constructor.
- The class ErrorMsg/ErrorMsg includes an error method
for printing nice error messages.
But it needs to be told where all the newlines in the source file
are, by calling errorMsg.newline(yychar).
- You should use JLex start states to deal with
nested comments and string literals.
- To test your lexer, you can try out the file test1.tig
and compare your output with mine in test1.out.
Feel free not to implement all the string literal escape sequences
described on page 527; you are only required to handle \n,
\", and \\.
But if you're curious, test2.tig and test2.out
demonstrate the weirder escape sequences.
And you can read an interesting explanation of ASCII control characters at
http://www.cs.fiu.edu/~smithg/cop5621/lowascii.html.