In the final part of the project we will generate ASTs from textual representations of MiniJava programs.
java -jar mjavac.jar parse marshal inputProg.java out.xml
will write to out.xml a valid XML representation (per the schema) of the AST that corresponds to the input MiniJava program. (More about the XML representation of ASTs here.) Including line numbers in the resulting AST/XML file is completely optional.
In case of a parser/lexer error, no file should be created, an an error message starting with "Syntax error at line 11 of input." must be printed to stderr (not stdout!), where "11" is replaced by the line number where the error occurred. Use the supplied methods for error reporting as in the solution starter kit's parser example, including the code under scan with
which captures exceptions thrown by the lexer.
Make sure you report errors originating both from the lexer and the parser!
0
(which denotes an octal number, e.g. 012
) or 0x
(which denotes a hexadecimal number, e.g. 0xDEADBEAF
).
/* comments */
are not nested, i.e.,
you need to implement /* */
comments, but the
first */
terminates any open comment regardless
of how many /*
sequences have appeared before it.
You are also required to support single-line //
comments.
You must generate an AST that faithfully captures the syntax of the input program; printing the AST back to Java should be very similar to the original program, excepting indentation, spacing, and parentheses.
For example, the order of classes in the AST's classdecls
should be the same as the order of class declarations in the original program.
You do not need to include any line numbers in the ASTs you generate.
Anything that doesn't match the grammar is a syntax error that is captured at this stage. For example, defining a boolean[]
variable is impossible according to the MiniJava, and your parser should identify this as a syntax error. (Indeed, there is no AST that can express this.)
You may need to massage the MiniJava grammar so that CUP can use it to produce a parser.
Take advantage of precedence and associativity declarations in the
parser specification to keep the overall size of the parser
grammar small. In particular, exp ::= exp op exp
productions along with precedence and associativity declarations
for various operators will shorten the specification considerably
compared to a grammar that encodes that information in separate
productions like expr
, term
, factor
,
etc. In any case, be sure to perform the same choices expected in standard Java concerning precedence and associativity.
Hint/warning: Be sure to test precedence and associativity
carefully, and not just in the obvious cases involving operators
like +
and *
. You should be sure that
things like variables, array element references, parenthesized
subexpressions, and method calls in expressions interact correctly
with other operations and that the resulting ASTs have the right
structure (your AST print visitor can be very helpful here).
For example, a+b.f()
must have a parse tree that
corresponds to a+(b.f())
, not
(a+b).f()
.
Your grammar should not contain any reduce/reduce conflicts. Shift/reduce conflicts that generate warnings (but not errors) are allowed but discouraged.
Hint: Lists can be parsed by left recursion (cs ::= cs c |
) or right recursion (cs ::= c cs |
). Sometimes one would cause you conflicts that can disappear when you switch to the other in some non-terminal. Generally, a left recursion is preferable for efficiency reasons, but a right recursion more easily makes the grammar LR, so try right recursion when you encounter conflicts.
Note that
In our lexical analysis, parsing and AST, several elements have somewhat different behavior than in Java.
For example, main
is not a regular function, and the word main
can be a special token in your lexical analysis.
You can ask CUP to generate more debug information using dump
flags; the demo in the recitation included these flags in build.xml
.
Sit back and enjoy all your hard work coming together.
java -jar mjavac.jar parse compile inputProg.java out.ll
lli out.ll