I will provide working versions of early homeworks in case you need them to test the later ones. It is important that you keep going even if you miss an early homework. The due dates for the assignments are firm: there will be no extensions. Your grade will be calculated as follows:
A. Appel, Modern Compiler Implementation in Java. (Errata.)
Some book about Java. Pick your favorite.
Notes on the SPARC assembly language
The "dragon book" is the classical reference about lexical analysis and parsing: Compilers---Principles, Techniques, and Tools, A. Aho and R. Sethi and J. Ullman, Addison-Wesley, Reading, Mass., 1985.
A industrial strength compiler is a major effort (in the order of 100K lines of code and 5 man/years). Few of you will write such compilers. Most of you, however, will write compilers that process "little languages." For example, you may be writing scripts that read an input from an HTML form, parse it, check it for consistency, and generate HTML code as output.
This course will present the general principles, algorithms, and techniques that are helpful for (big and small) compilers. In addition, the course should provide you with:
let var x : int := 2 /* silly comment */ var y : int := 7 in x+y endThe first stage of compilation, lexical analysis, is to group the characters in significant units (called tokens). In that stage, we also eliminate white space and comments. The output of that stage is a list of tokens:
[ Let, Var, Id "x", Colon, Id "int", Assign, Intliteral 2, Var, Id "y", Colon, Id "int", Assign, Intliteral 7, In, Id "x", Plus, Id "y", End ]The next phase of compilation, parsing, is to group the tokens in meaningful "sentences." The output of this phase is an abstract syntax tree that prints as follows:
LetExp { decs = [ VarDec { init = IntExp 2, typ = SOME (-,15), var = { escape = ref true, name = - } pos = 7, }, VarDec { init = IntExp 7, pos = 28, typ = SOME (-,36), var = { escape = ref true, name = - } } ], body = SeqExp [ (OpExp { left = VarExp (SimpleVar (-,48)), oper = PlusOp, pos = 48, right = VarExp (SimpleVar (-,50)) }, 48) ], pos=3 }The heart of the compiler is this next phase, which performs checks on the syntax tree, and then generates an intermediate (machine independent) representation. The most important aspect of this phase is that it eliminates variable names and replaces them by either register names, or offsets in the stack frame.
MOVE(TEMP T101, ESEQ(SEQ(MOVE(MEM[4](BINOP(PLUS,TEMP T100,CONST -4)), CONST 2), MOVE(MEM[4]( BINOP(PLUS,TEMP T100,CONST -8)), CONST 7)), BINOP(PLUS, MEM[4](BINOP(PLUS,TEMP T100,CONST -4)), MEM[4](BINOP(PLUS,TEMP T100,CONST -8)))))The next phase is rather simple. It flattens the intermediate representation and does some straightforward optimizations.
LABEL L19 MOVE(MEM[4](BINOP(PLUS,TEMP T100,CONST -4)), CONST 2) MOVE(MEM[4](BINOP(PLUS,TEMP T100,CONST -8)), CONST 7) MOVE(TEMP T101, BINOP(PLUS, MEM[4](BINOP(PLUS,TEMP T100,CONST -4)), MEM[4](BINOP(PLUS,TEMP T100,CONST -8)))) JUMP(NAME L18) LABEL L18The final stage of our compiler is to convert the above intermediate representation to a machine-readable format. I've chosen the SPARC assembly language but we may consider other choices.
.global toymain .section ".text" toymain: save %sp, -136,%sp L19: mov 2 , %g2 st %g2, [%fp-16] ld [%fp-16] , %g3 st %g3, [%fp+-4] mov 7 , %g2 st %g2, [%fp-24] ld [%fp-24] , %g3 st %g3, [%fp+-8] ld [%fp+-4], %g2 st %g2, [%fp-32] ld [%fp+-8], %g2 st %g2, [%fp-36] ld [%fp-32], %g2 ld [%fp-36], %g3 add %g2, %g3, %g2 st %g2, [%fp-28] ld [%fp-28], %i0 ba L18 nop L18: jmp %i7+8 restoreThat's it!
Date | Topic | Exam/Quiz/Assignment Due | |
9/28 | Introduction | ||
9/30 | Introduction | ||
10/2 | Regular Expressions (Ch. 2) | ||
10/5 | JLex (Ch. 2; JLex manual) | ||
10/7 | Finite Automata (Ch. 2) | ||
10/9 | Finite Automata (Ch. 2) | ||
10/12 | Context-Free Grammars (Ch. 3) | Lexer | |
10/14 | CUP (Ch. 3; CUP manual) | ||
10/16 | Top-Down Parsers (Ch. 3) | ||
10/19 | Bottom-Up Parsers (Ch. 3) | ||
10/21 | Bottom-Up Parsers (Ch. 3) | ||
10/23 | Abstract Syntax (Ch. 4) | Parser | |
10/26 | OO Design Patterns | ||
10/28 | Review | ||
10/30 | Midterm | Abstract Syntax / Midterm | |
11/2 | Symbol Tables (Ch. 5.1-5.2) | ||
11/4 | Typechecking (Ch. 5.3-5.6) | ||
11/6 | Typechecking (Ch. 5.3-5.6) | ||
11/9 | Typechecking (Ch. 5.3-5.6) | ||
11/11 | Stack frames (Ch. 6) | Quiz 1 | |
11/13 | Static Links (Ch. 6) | Typechecking | |
11/16 | Intermediate Trees (Ch 7.1) | ||
11/18 | Expressions to Trees (Ch. 7.2) | ||
11/20 | Declarations to Trees (Ch. 7.3) | ||
11/23 | Canonical Trees; Traces (Ch. 8) | Quiz 2 | |
11/25 | SPARC Assembly | ||
11/27 | Thanksgiving | ||
11/30 | Instruction Selection (Ch. 9) | ||
12/2 | Register Allocation | ||
12/4 | Conclusion | ||
12/7-12/9 | Final Projects Due |
sabry@cs.uoregon.edu