Compiler Design CIS 631
Popular in Course
Popular in Computer & Information Science
This 73 page Class Notes was uploaded by Sam Rau on Wednesday October 21, 2015. The Class Notes belongs to CIS 631 at Syracuse University taught by Staff in Fall. Since its upload, it has received 14 views. For similar materials see /class/225600/cis-631-syracuse-university in Computer & Information Science at Syracuse University.
Reviews for Compiler Design
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/21/15
Table Driven Parsers 0 Both top down and bottom up parsers can be written that explicitly manage a stack while scanning the input to determine if it can be correctly generated from the grammar productions In topdown parsers the stack will have nonterminals which can be expanded by replacing it with the righthandside of a production In bottomup parsers the stack will have sequences of terminals and nonterminals which can be reduced by replacing it with the nonterminal for which it is the rhs of a production 0 Both techniques use a table to guide the parser in deciding what production to apply given the top of the stack and the next input Topdown and Bottomup Parsers Predictive parsers are top down non backtracking Sometimes called LLk 0 Scan the input from Left to right 0 Generates a Leftmost derivation from the grammar 0 k is the number of lookahead symbols to make parsing deterministic If a grammar is not in an LLk form removing left recursion and doing leftfactoring may produce one 0 Not all context free languages can have an LLk grammar Shift reduce parsers are bottom up parsers sometimes called LRk Scan the input from Left to Right Produce a Rightmost derivation from the grammar 0 Not all context free languages have LR grammars Nonrecursive Predictive Parser Replaces non terminals 0n the top of the stack With the rhs of a production that can match the next input Input I a b ect Output Parsing Table M Parsing Algorithm The parser starts in a configuration With S lteotgt on the stack Repeat let X be the top of stack symbol a is the next symbol if X is a terminal symbol or lteotgt if X a then pop X from the stack and getnextsym else error else X is anon terminal if MX a X gtY1 Y2 Yk pop X from the stack push Y1 Y2 Yk on the stack With Y1 on top output the production else error Until stack is empty 4 Example from the dragon book Aho et al The expression grammar E 9 E T T T9 T F F F 9 E id 0 Can be rewritten to eliminate the left recursion E 9 TE E 9 TE 8 T 9 F T T 9 FT 8 F 9 E id Parsing Table The table is indexed by the set of nonterminals in one direction and the set of terminals in the other Any blank entries represent error states 0 Non LL1 grammars could have more than one rule in a table entry id Eot E EeTE EeTE E E e TE E9 8 E9 8 T TeEr TeEr T T ee T eET T92 T98 F Eeid F9E 963 96d p163 1 1 cldgtxlt cl 135k 135k p163 p1gtxltp1 cld l p1gtxltp1 135ka J 39 cH p1gtxltp1 Q cl p1gtxltp1 p1gtxltp1 p163 p1gtxltp1p1 cld l p1gtxltp1p1 E 143 p1 143 1 143 gtxlt 1 143 143 p1 143 1 LES LEI LEI E 143 phl LLES gELL El PIgtxltPIPI PIgtxltPIPI LEl E 1nd1n0 1ndu1 gttoms Constructing the LL parsing table 0 For each production of the form N 9 E in the grammar For each terminal a in FirstE add N 9 E to MN a If 8 is in FirstE for each terminal b in FollowN add N 9 E to MN b If 8 is in FirstE and eot is in FollowN add N 9 E to MN b All other entries are errors Bottomup Parsers 0 Also called Shift Reduce parser because it Will either Reduce a sequence of symbols on the stack that are the rhs of a production by their nonterminal Shift an input symbol to the top of the stack Input I a b eot Output Parsing Table M Shift Reduce Parser actions 0 During the parse the stack has a sequence of terminal and nonterminal symbols representing the part of the input worked on so far and the input has the remaining symbols 0 Parser actions Reduce If the stack has a sequence FE and there is a production N9 E we can replace E by N to get FN on the stack Shift If there is no possible reduction transfer the next input symbol to the top of the stack Error otherwise it is an error 0 If after a reduce we get the start symbol on the top of the stack and there is no more input then we have succeeded 0 Parser example pg 3 of Stanford notes 9 Handles 0 During the parse the term handle refers to a sequence of symbols on the stack that Matches the rhs of a production Will be a step along the path of producing a correct parse tree 0 Finding the handle ie identifying When to reduce is the central problem of bottom up parsing 0 Note that ambiguous grammars do not fit as they didn t for top down parsing either because there may not be a unique handle at one step E g dangling else problem LR Parsing 0 A specific way to implement a shift reduce parser is an LR parser 0 This parser represents the state of the stack by a single state symbol on the top of the stack 0 It uses two parsing tables action and goto For any parsing state and input symbol the action table tells what action to take Sn meaning shift and go to state n Rn meaning reduce by rule n Accept 0 Error For any parsing state and nonterminal symbol N the goto table gives the next state when a reduce has been performed to the non terminal symbol N 0 Example pg 6 of Stanford Notes 9 LR Parser 0 A Shift Reduce parser that encodes the stack With a state on the top of the stack The TOS state and the next input symbol are used to look up the parser s actions and goto function from the table Input a b eot Stack X 31 Output Y 82 Z S3 eot Parsing Table M 13 Types of LR parsers 0 LR parsers can work on more general grammars than LL par 86139 S Has more history on the stack to make decisions than topdown 0 LR parsers have different ways to generate the action and goto tables 0 Types of parsers listed in order of increasing power ability to handle grammars and decreasing efficiency size of the parsing tables becomes very large LR0 SLR LALRl LRl LRO parsing tables 0 Although not used in practice LR0 table construction illustrates the key ideas 0 Item or configuration is a production with a dot in the middle e g there are three items from A9 XY A 9 XY A 9 XY A 9 XY 0 The item represents how much of the production we have seen so far in the parsing process In the first item above we expect to see something that can start an X next In the third item above we expect to see something that can follow an A next Closure and Goto operations 0 Closure is de ned to construct a configurating set for each item For the starting item N 9W Y N9WYisinthe set If Y begins with a terminal we are done If Y begins with a nonterminal N add all N productions with the dot at the start of the rhs N 9 Z 0 For each con gurating set and grammar symbol the goto operation gives another configurating set If a set of items 1 contains items of the form N 9 W X Y where W and Y are sequences but X is a single grammar symbol the gotoIX contains N 9 W X Y 0 To create the family of configurating sets for a grammar add an initial production 8 9 S and construct sets from S 9 S 0 Use the sets for parser states states that end with a dot will be reduce EXample pg 11 of Stanford Notes 9 16 Limitations of LRO 0 Since there is no look ahead the parser must know Whether to shift or reduce based on the parsng stack so far A configurating set can have only shift or reduce and not both based on the input 0 Problematic examples Epsilon rules create shiftreduce con ict if there are other rules Items like these have shiftreduce con icts T 9 id T 9 id E Items like these have reducereduce con icts E9VE V id T id SLR Parsing 0 SLR1 simple LR uses the same configurating sets table structures and parser operations 0 When assigning table actions don t assume that any completed item should be reduced Look ahead by using the Follow set of the item Reduce an item N 9 Y only if the next input symbol is in the Follow set of N 0 The configurating sets may have shift and reduce in the same set but the Follow sets are required to be disjoint This requires that there are no reducereduce con icts in this state LR l Parsing 0 Although SLR1 is using 1 lookahead symbol it is still not using all of the information that could be obtained in a parsing state by keeping track of What path led to that item 0 In LRl parsing tables we keep the lookahead in the parsing state and separate those states so that they can have more detailed successor states 0 Lead to larger numbers of states in thousands instead of hundreds for programming language parsers 0 Example pg 7 of Stanford Notes 12 LALR1 parsing Compromises between the simplicity of SLR and the powere of LR1 by merging similar LR1 states Identify a core of configurating sets and merge states that differ only by lookahead 0 This is not just SLR because LALR will have fewer reduce actions but it may introduce reducereduce con icts that LRl did not have 0 Constructing LALR1 parsing tables is not usually done by brute force to construct LR1 and then merge sets As con gurating sets are generated a new configurating set is examined to see if it can be merged with an existing one 20 More on LR parsing 0 Almost all SR parsing done With automatically generating parser tables 0 Look at the types of parsers in available parser generators htt enwiki ediaor wikiCate o Parsin al orithms Note types of parsers but not types of trees Bison yacc ANTLR 0 J avaCC CocoR Elkhound 0 LR errors can be given by giving different error codes for different table entries Example p 16 Stanford Notes 16 21 One more type of SR Parsing Operator precedence parsing Useful for expression grammars and other types of ambiguities 0 Doesn t use a table just uses operator precedence rules to resolve con icts 0 Fits in With the various types of LR parsers In addition to the action table the parsing algorithm can appeal to a precedence operator table 22 General Context Free Parsers 0 All of the table driven parsers work on grammars in particular forms and may not work for arbitrary CFGs including ambiguous ones 0 General Backtracking Parsers 0n3 CYK Cocke Younger Kasami algorithm 0 Produces a forest of parse trees Earley s algorithm 0 Notable for carrying along partial parses subtrees the first of the Chart parsers 0 General Parallel Parser can be 0n3 GLR copies the relevant parts of the LR stack and parses in parallel whenever there is a con ict otherwise same as LALR 23 Code Generation Instruction Selection Can be a simple translation of three address instruction to target code X y z becomes lw t0 y lw tl z add t2 t0 tl sw t2 X Concepts of code generation register allocation and optimization are introduced Algorithm for generating code for arithmetic expressions using a minimal number of registers Target Code Addresses 0 Four areas of memory Code Static Heap and Stack 0 Can use one static base location for Code and Static variable area procedures have a location offset of the code in this area global variables allocated in the static area also given offsets here 0 Other program variables local variables and formal parameters are given offset locations With regard to a stack activation record pointer Basic Blocks and Flow Graphs 0 Representation of intermediate code as a graph nodes of the graph are basic blocks where the ow of control can only enter at the first instruction and leave through the last edges indicate which blocks can follow other blocks representing the jumps in the code 0 Useful for discussing code generation 0 Defining basic blocks separate sequence of TAC three address code into basic blocks by identifying the first instruction as a leader very first instruction is a leader 0 any instruction that is the target of a jump is a leader any instruction following a jump is a leader Example of Basic blocks psuedo code to initialize a 10 by 10 array to be the identity matrix fori from 1 to 10 d0 forj from 1 to 10 d0 aij 00 fori from 1 to 10 d0 aij 10 Three address code assuming a is the starting address of the array in row maj or form and that each element takes 8 bytes each 90gt QP 39gtWNH r tr tr tr tr tr tr tr to IONUIPUJNHO39 i1 j1 t110i t2t1j t38t2 t4t3 88 at400 jj1 ifj lt 10 goto 3 ii1 ifilt 10 goto 2 i1 t5i 1 t688t5 at610 ii1 ifilt 10 goto 13 leader leader leader leader leader leader Graph Representation 0 Basic blocks connected by edges representing jumps 0 add an entry and exit point 0 can also identify set of nodes as a loop loop entry is only node with predecessor outside the loop every node in the loop has a path to the entry B1 i1 B2 i1 B3 t110I t2t1j t38t2 t4t3 88 at400 jj1 ifi lt 10 goto B3 B4 iil ifi lt 10 goto B2 I l B5 i1 B6 t5i l t688t5 at6l0 iil ifi lt 10 goto B6 l l NeXt Use and Liveness 0 Use of a variable if instruction i assigns a value to X instruction j has X as an operand and control can ow along a path from i to j with no intervening assignments to X then j uses i 0 Live variables for each instruction X y z determine for X y and z which instruction neXt uses that variable 0 These properties can be determined by making a backward pass over a basic block and recording the information in the symbol table Good Code Generation for Basic Blocks 0 Some local optimization can be achieved by building a DAG representation of the basic block node for each instruction in the block whose children are the previous statements giving the last definition of the operands 0 Eliminate local common subeXpressions check if there are nodes with the same operator and the same children 0 Dead code elimination if there is a node with no ancestors and with no live variables then that node can be eliminated 0 Algebraic identities arithmetic identities eg X O X X O O X reduction in strength replacing eXpensive operations with cheaper ones 6gX2XX2XXX constant folding evaluate constant eXpressions at compile time e g 2 314 7 Simple code generation Generates code considering just one basic block but introduces the ideas of register allocation 0 assume that we keep information about registers register descriptor keeps track of which variable names have a current value in that register address descriptor for each program variable keeps track of where the current value of the variable can be found 0 GetRegister function gets an appropriate register for any operand of the TAC 0 For the instruction X y z call GetRegister for each of the operands if y is not already in its register lw Ry y 0 if z is not already in its register lw Rz z 0 give the instruction add RX Ry Rz Simple generation updates descriptors 0 At end of basic block ignore TAC temporaries not live for each program variable X if current value of X is not in memory 1ssue a sw 1nstruction 0 Updating descriptors For the instruction lw Ry y change register descr for Ry to only hold y add Ry to address descriptor of y as a location for the instruction sw X RX change the address descr of X to include memory address of X for each operation add RX Ry Rz register descr RX holds only X 0 address descr of X has only RX not any memory location 0 remove RX from address descr of any other variable GetRegister function 0 Pick a register Ry for any variable y that is an operand if y is already in a register pick it no load instruction needed if y is not in a register but one is free pick that register and load it if y is not in a register but there is not register free consider candidate registers R any register with a variable v whose descriptor says its current value is in memory already any register with the variable X the result of the instruction and X is not also one of the operands X will be rewritten anyway any register with a variable v that is not used later otherwise generate a store instruction sw R v to spill v repeat these steps for other variables in the register and pick a register with the fewest number of stores 0 Pick a register RX for the variable X that is the result in addition to above any register holding only X if y is not used later use Ry to hold the result RX e g X y 10 Peephole Optimization 0 Another strategy is to generate naive code and then improve the quality of the target code by simple optimizations sliding window of instructions peephole to examine 0 Redundant loads and stores 0 Eliminating unreachable code example eliminate jumps over jumps 0 Flow of control optimizations analyze jump sequences 0 Algebraic simplification and reduction in strength Register Allocation and Assignment 0 Global register allocation tries to keep frequently used variables in registers across basic block boundaries assign registers to most active values of inner loops 0 Usage counts approximate formula for the benefit to be obtained from allocating a register x Within loop L is 20310016 B in L usex B 2 livexB also assumes that loops iterations are a large number and that the difference in number of times a block is executed is negligible choose register allocations that maximize benefits over the variable set after allocating inner loops go on to similar counts for outer loops Register Allocation by graph coloring Technique for allocating registers and managing memory spills First pass of algorithm allocates variables to registers as if there were no limit generates code with symbolic registers Second pass assigns physical registers to symbolic ones and minimizes the cost of spills A registerinterference graph is constructed 0 nodes are symbolic registers and an edge connects two nodes if one is live at a point where the other is defined Try to color the graph with k colors where k is the of registers 0 a coloring assigns a color to each node so that no two adjacent nodes have the same color graph coloring is an NPcomplete problem but effective heuristic algorithms exist If the graph cannot be colored for k registers choose a register to spill Optimal Code Generation for Expressions Generate code one expression at a time instead of basic blocks Assign numbers to the expression tree as in the abstract syntax tree known as SethiUllman numbers or Ershov numbers Label the expression tree label any leaf 1 label of an interior node with one child is the label of its child label of an interior node with two children is larger of the two children if the children have different labels l label of a child if the labels are the same Given machine model where all operands must be in registers can prove that the label of a node is the fewest number of registers with which the expression can be evaluated with no stores Generating Code from Labeled Expressions 0 Recursive algorithm to generate machine code Start generating code at the root Algorithm proceeds recursively with respect to a base register number b If the label is k use k registers assume can be numbered with offsets 0 through kl result appears in last register 0 root starts with base l If the node has two children with equal labels k 1 generate code for the right child starting at base register b 1 result is in offset register b k 0 generate code for the left child using base register b result is in register b k 1 generate the instruction op Rbk Rbkl Rbk Generating Code from Labeled Expressions if an interior node has two children with unequal labels 0 generate code for the big child label k with base b result is in b k 1 generate code for the smalle child label m lt k using register base b result is in b ml generate instruction op Rbkl Rbml Rbkl For a leaf representing X with base register b generate lw Rb X Labeled Expressions with Register Spill 0 If there are not at least k registers in previous algorithm insert some store instructions to spill register values into memory 0 Augment previous algorithm save results of one child for an interior node with a child whose label lt r registers generate code for the big child using base register 1 result will be in reglster r generate instruction sw Rr tk where tk is a temporary memory location used to evaluate nodes with label k generate code for the little child If the label is r or greater use base 1 If the label is j lt r use base r j result appears in Rr generate instruction lw Rr l tk generate node instruction op Rr Rr Rrl or op Rr Rrl Rr References Sethi R and J D Ullman The generation of optimal code for arithmetic expressions JACM 1970 pp 715 728 0 William Waite University of Colorado computer science course on line notes Kenneth C Louden Compiler Construction Principles and Practices PWS Publishing 1997 Aho Lam Sethi and Ullman Compilers Principles Techniques and Tools Addison Wesley 2006 The purple dragon book ObjectOriented Symbol Tables and Type Checking In object oriented languages scope changes can occur at a finer grained level than just block or procedure definition Each class has its own scope Variables may be declared at any time Notation to represent symbol tables as a combination of environments An environment maps an identifier to its symbol table entry which we only give the type here for brevity el g gt string a gt int We will indicate adding to the symbol table with a but this addition will carry the meaning of scope from right to left e2el agt oat means to look an identifer up in the rightmost environment first and if not found continue looking to the left 1 Example Scope in Java 1 class C environment e0 already given for 2 int a b c predefined identi ers 3 public void m 4 System0utprintlna c Line2 4 5 int j a b el e0 a gtint bgtint cgtint 6 String a hello Line5 7 System0utprintlna e2 el j gt int 8 System0utprintlnj Line69 9 System0utprintlnb e3 e2 a gt string 10 LinelO 11 e1 Line 11 e0 0 All examples in these slides from Andrew Appel see last page for reference Each class has an environment 0 There may be several active environments at once multiple symbol tables 0 Class names are mapped to environments package M class E static int a 5 Class N static int b 10 static int a Ea b Class D static int d Ea Na All are added to environment e0 el a gtint e2Egtel e3 bgt intagt int e4Ngte3 e5dgtint e6Dgte5 e7e2e4e6 Classes E N and D are all compiled in environment e7 M gt e7 Symbol Table 0 Each variable and formal parameter name has a type 0 Each method name has its signature Each class name has its variable and method declarations class B C f int lj intq public int start int p int q public boolean stop int p int ret a gt lt gtk return ret gt return false m Params Fields p lnt f C q int j int Locals q fet lIlt Methods 3 mt start int P r ms stop bool a a 39 p 1nt Locals 4 Typechecking Rules 0 Additional rules include The new keyword C e new C Gives type C to e as usual Method calls of the form em ltparamlistgt 0 Suppose e has type C Look up definition of m in class C 0 Appel recommend the two pass semantic analysis strategy First pass adds identifiers to the symbol table Second pass looks up identifiers and does the typechecking Example of Inheritance 0 Note variables in scope of await de nition passengers position v this 0 In cawaitt in the body of wait vmove will be the move method from Truck class Vehicle class Main int position public static void main void move int X position position x Truck t new Truck Car 0 new Car class Car extends Vehicle Vehicle v c int passengers cpassengers 2 void awaitVehicle v cmove60 if vposition lt position vmove70 vmoveposition vposition cawaitt else thismovelO class Truck extends Vehicle void moveint x if x lt 55 position position x 6 Single Inheritance of data fields locations To generate code for Vposition the compiler must generate code to fetch the field position from the object that V points to V may actually be a Car or Truck 0 Prefixing fields When B extends A the fields of B that are inherited from A are laid out in a record at the beginning in the order they appear Fields not inherited are laid out in order afterwards class A int a 0 B D class B extends A a a a a intb0intc0 b d b class C extends A C 0 int d 0 e class D extends B int e 0 Single Inheritance for Methods locations 0 A method instance is compiled into code that resides at a particular address In semantic analysis phase Each variable s symbol table entry has a pointer to its class descriptor Each class descriptor contains a pointer to its parent class and a list of method instances Each method instance has a location 0 Static methods method call of the form of the code for a method declared as static depends on the type of the variable 0 and not the type of the object that 0 holds Get the class of c call it C 0 in Java syntax the method call is Cf making this clear Search class C for method f If not found search the parent for f then its parent and so on When f is found possibly in some ancestor class A then the method call for of will be compiled to the method instance of A Single Inheritance for Dynamic Methods Dynamic method lockup needs class descriptors may be overridden is a subclass To execute cf the compiled code must execute instructions Fetch the class descriptor d at from object c at offset 0 Fetch the methodinstance pointer p from the f offset of d Call p Instances of A B C D and D class A int X 0 I intf X X X X X class B eXtends A y y Intgm class C eXtends B intg A lA fl BAf C Af DDf class D eXtends C Bg Cg Cg deSCFiPtOIS int y 0 mt f O Notation Af is an instance of method f declared in clgss A Multiple Inheritance 0 In languages that permit multiple inheritance a class can extend several different parent classes Cannot put all fields of all parents in every class 0 Analyze all classes to assign one offset location for every field name that can be used in every record With that field Use graph coloring algorithm but still has large numbers of offsets with sparse use of offset numbers class A int a 0 A B class B a a intb 0 intc 0 class C c int d 0 d class D extends A B C int e 0 9amp0de Multiple Inheritance Solutions 0 After graph coloring assign offset locations and give a sparse representation that keeps Which fields are in each record Leads to another level in accessing fields and methods 0 Fetch the class descriptor d at from object 0 Fetch the fieldoffset value from the descriptor Fetch the method or data from the appropriate offset of d 0 The coloring of fields is done at link time can still have problems with dynamic linking Where a new class can be loaded at run time Solved with hash tables of field names and access algorithms with additional overhead Type Coercions 0 Given a variable C of type C it is always legal to treat C as if it were any supertype of C If C extends B and b has type B then assignment b c is safe 0 Reverse is not true Assignment C b is safe only if b is really at run time an instance of C Safe objectoriented languages Modula 3 and Java will add to any coercion from a superclass to a subclass a runtime typecheck that raises an exception unless b really is an instance of C C is unsafe in this respect It allows a static cast mechanism with runtime checking The dynamic cast mechanism does add the run time checking Private Fields and Methods 0 In the symbol table for every class C for all the fields and methods keep a ag to indicate whether that method or field is private When type checking cv or of the compiler will check the symbol table ag and must use the context ie whether the compiler is inside the declaration of the object to decide whether to allow the access This is another example of inherited attributes for the typechecking system 0 Additional ags can be kept for access at the subclass or package level And additional context must be kept by the typechecking algorithm 13 Main Reference Andrew Appel Modern Compiler Implementation in Java second edition Cambridge University Press 2002 Optimization Concepts Optimization seeks to improve the time or space used by the generated code Without changing the behavior of the program Compiler area Where most research is currently being conducted Many compiler optimization problems are NP approximations and heuristics used instead soup approach keep adding functionalities Many modern compilers have extensive optimization which may be quite time and space consuming themselves options to turn off optimization for better compiling Global Optimization Framework Extend the ideas of the program as a DAG of basic blocks to do global data ow analysis previously discussed optimizations within basic blocks as part of code generation look at examples in Stanford optimization handout constant folding constand and copy propagation Function preserving transformations can improve the program without changing what it computes Eliminating common subexpressions Copy propogation Deadcode elimination Loop optimizations Code motion for loopinvariant code Induction variables and reduction in strength Primarily at procedure level crossprocedure data ow is more difficult 2 Data for data ow analysis The compiler needs to collect information about the program as a Whole primarily about variable definitions 0 Data ow equations are setup A typical equation may have the form outS genS inS killS the information at the end of statement S is either generated Within S or enters at the beginning and is not killed during S for different data ow problems may solve in the direction of control ow to find outS in terms of inS or the problem may require to backwards to find inS in terms of outS Definitions for Flow Graphs 0 A point is between two statement or basic blocks B2 0 A path connects points 0 A node d is a dominator node B3 t1 i 10 over a node 11 1f every path in t2 t1 j the ow graph to 11 also goes t3 8 t2 through d t4 t3 88 at4 00 0 A loop 1s a set of nodes W1th a j J 1 single entry node called the ifi lt 10 goto B3 loop header that dominates all i other nodes in the loop B4 the loop must also have a path to iterate the loop ie go back to the B 5 header B6 Reaching Definitions 0 A definition of a variable X is a statement that assigns or may assign a value to X X eXpr and ReadX are unambiguous assignments ambiguous assignments include passing X as a reference parameter to a procedure or assigning to a pointer that could refer to X 0 A definition d reaches a point p if there is a path from the definition d such that d is not killed along the path 0 A definition d is killed if its variable X has another definition Equations for reaching definitions 0 Based on the structure of statements we can define the equations for reaching definitions Example Example S is an assignment statement def d d a b c genS d killS Defa d outS genS inS killS S is 2 statement sequence S1 S2 genS genS2 genSl killS2 killS killS2 killSl genS2 inSl inS inS2 outSl outS outS2 Example S has the body of a loop while Sl genS genSl killS killSl inSl inS genSl outS outSl UseDefinition Chains 0 Reaching definition information is often stored as use definitions chains also known as ud chains for each use of a variable there is a list of all the de nitions of the variable that reach that use 0 Algorithms to solve the data ow analysis may be difficult if there are break and continue statements allowed approaches include 0 iterative algorithms start with sets at emtpy and iterate either forward or backward adding definitions until convergence interval analysis 0 Similar techniques are used to define available expressions an expression X y is available at point p if every path from the initial node to p evaluates X y and after the last such evaluation there are no subsequent assignments to X or y 7 Live Variable analysis 0 For a variable X and a point p in the program we want to know if X can be used along some path starting from p If so X is live at p otherwise X is dead at p can be used in register allocation 0 Define inB to be the set of variables live at the point immediately before block B and outB to be the same at the end of the block Also defB is the set of variables defined in B prior to any use of that variable and useB is the set of variables whose values may be used in B prior to any definition inB useB outB defB outB 2 union over all successors S inS DefinitionUse Chains 0 A calculation done in the same manner as live variable analysis is definition use chaining du chains 0 A variable is used in a statement if its value is required b and c but not a are used in both a b c and ab c 0 The du chain computes for a point p the set of uses s of a variable X such that there is a point from p to s that does not redefine X Elimination of Global Common SubeXpressions 0 Given a ow graph With available expression information computes a revised ow graph eliminating common subeXpressions 0 For every statement of the form X y 2 such that y z is available at the beginning of the block search to find the evaluations of yz in previous block say W y z and replace With uyz W u Then replace all X y z in this block With X u 0 Conservative algorithm doesn t find all possible common subeXpressions 0 It doesn t always improve code but later optimizations may Copy Propogation 0 Various optimizations produce copy statements of the form SI X y 0 Determine all the places Where this definition of X is used and substitute y for X in all those places every use of X must meet conditions 0 statement s must be the only definition of X reaching u ie the udchain for use u consists only of s 0 on every path from s to u there are no assignments to y this requires setting up an additional data ow problem computing the use of copy statements Detection of LoopInvariant Computations 0 Use ud chains to detect computations Whose value does not change as long as control stays Within the loop 0 Consider an assignment X y z in a loop Where all possible definitions of y and z are outside the loop then y z is loop invariant 0 Suppose there is another statement V X W Where W is only defined outside the loop the this statement is also loop invariant the duchain for X y z will tell us there the value of X can be used so we need only check those uses of X for operands w that are also defined outside the loop Code Motion 0 Once we have found invariant statements then we want to perform code motion to move such statements outside the loop if possible 0 Before the header block of the loop we introduce a preheader block with loop invariant assignments 0 Conditions for moving statement s X y z block containing s dominates all eXit nodes of the loop there are no other statements in the loop that assign to X no use of X in the loop is reached by any de nition of X other than s 0 This is conservative there are other code motion strategies Elimination of induction variables 0 A variable X is called an induction variable of a loop L if every time that the variable changes values it is incremented or decremented by the same constant each time around the loop e g in a for loop the index variable I 0 We first look for basic induction variables those Whose only assignments are or a constant requires reaching definition information and loopinvariant information 0 We next look for additional induction variables j that are defined only in L and are a linear function of a basic induction variable 0 Modify the computations of the additional induction variables to use or constants instead of multiplications 14 Difficulties for Code Optimization 0 Dealing With aliases two or more expressions denote the same memory address 0 Aliases can be introduced both by pointers and by procedure calls analysis of pointer assignments data ow analysis of changed variables across procedure calls References Sethi R and J D Ullman The generation of optimal code for arithmetic expressions JACM 1970 pp 715 728 0 William Waite University of Colorado computer science course on line notes Kenneth C Louden Compiler Construction Principles and Practices PWS Publishing 1997 Aho Lam Sethi and Ullman Compilers Principles Techniques and Tools Addison Wesley 2006 The purple dragon book CIS 631 CSE 691 CIS400 Spring 2008 Notes for Code Generation Project For the fourth part of the programming project the code generation I have made a programming framework for code generation in Java This can be used directly by students writing their project in Java with modi cations to fit their own Abstract Syntax Tree and Symbol Table designs It can be used for ideas by students writing their projects in other languages The code generation operates by traversing the Abstract Syntax Tree and generating MIPS instructions at each node in an order that produces a complete program I am providing two partially written classes The CodeGenerator class has a set of recursive methods for traversing the tree and generating code The Code class appends instructions to the output stream with a set of methods that help you format the output Primarily for this part of the project you will complete the methods in the CodeGenerator class some of which are either missing or only partially written In addition I decided to add the allocation of simple variables to existing code for the Parser and the Symbol Table This is definitely not the only possible design 7 you could do the variable allocation directly in the CodeGenerator as you traverse the AST In my design the code generation class keeps the stack offset counter and interacts with the Parser and with the Symbol Table in allocating simple variables Each variable is allocated to the runtime stack as either a global variable or a local variable local to a procedure as a formal parameter name or local to a block in a variable declaration The code framework that I have provided does not discuss global variables but they could be added In allocating only local variables I didn t have to distinguish between local and global offsets in the symbol table I made two additions to the Parser that help manage the offset counter When I parse a method declaration I reset the offset counter to start counting again for the activation record of the newly entered method reset offset location counter at procedure entry offset of location 0 is used for return values so offset 4 is first location for formal parameters CodeGeneratorsetOffset 4 Note that I have always left a space on the stack for a return value from a procedure even if the procedure doesn t use it as this simplifies code generation When I parse a block I have a ag that keeps track if this is the first block of a method or if it is a local block and in the former case I increment the offset counter to skip the standard save parts fp and ra of the activation record if isMethodBlock if it is the first block of a method must jump the stack allocation offset by 8 CodeGeneratorincrementOffset8 The reason that these offsets are being managed by the parser is so that I can allocate simple variables as the parser inserts them into the symbol table As part of my insertVariable method I have a line that sets the location eld in the new symbol table entry type Stentry public static void insertVariableToken name Token type String actualName nameteXt Stentry actualEntry new StentryactualName 1 new STtypetypeteXt actualEntrysetLocationCodeGeneratoruseOffset In the CodeGenerator when I write GenerateEXpression whenever the expression is an identi er then I look up the name in the symbol table and get the value of the location If this value is n then the memory location of the variable on the stack is at offset n with respect to the current frame pointer Note that the offsets are negative already 7 examples ofn are 4 8 12 nfp Please take a look at the CodeGenerator and Code classes and don t hesitate to email me if there are further questions