GLOBAL ENVIRONMNT 1
GLOBAL ENVIRONMNT 1 ENVIRON 0001A
Popular in Course
verified elite notetaker
verified elite notetaker
verified elite notetaker
PSY 205 - M001
verified elite notetaker
Eudijessica Melo de Oliveira
verified elite notetaker
verified elite notetaker
Popular in Environment
This 59 page Class Notes was uploaded by Alessia Gleason on Friday September 4, 2015. The Class Notes belongs to ENVIRON 0001A at University of California - Los Angeles taught by Staff in Fall. Since its upload, it has received 168 views. For similar materials see /class/177788/environ-0001a-university-of-california-los-angeles in Environment at University of California - Los Angeles.
Reviews for GLOBAL ENVIRONMNT 1
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/04/15
CS 352 Compilers Principles and Practice 1 Introduction 2 2 Lexical analysis 31 3 LL parsing 58 4 LR parsing 110 5 JavaCC and JTB 127 6 Semantic analysis 150 7 Translation and simplification 165 8 Liveness analysis and register allocation 185 9 Activation Records 216 1 Things to do make sure you have a working mentor account a start brushing up on Java review Java development tools a nd quot quot uwiuuumhtm add yourselfto the course mailing list by writing on a CS computer mailer add me to cs352 H oswng ror personal or classroom use rs granteo wrmoutree prowoeo that copes are notmaoe or orsmouteo ror p u v To copy om w m nm p mnn ron res Request permission to puorsn rrom nosxrngos puroue eou Chapter 1 Introduction Compilers What is a compiler a program r an executable program in another language a we expect the program produced by the compiler to be better in some way than the original What is an interpretel a program that reads an executable program and produces the results of running that program a usually this involves executing the source program in some fashion This course deals mainly with compilers Many of the same issues arise in interpreters Motivation nte rest Why study compiler construction Why build compilers Why attend class Isn t it a solved problem Machines are constantly changing Changes in architecture gt changes in compilers new features pose new problems a changing costs lead to different concerns a old solutions need reengineering Changes in compilers should prompt changes in architecture a New languages and features Compiler construction is a microcosm of computer science artificial intelligence algorithms th eory systems architecture greedy algorithms learning algorithms graph algorithms unionfind lattice theory for analysis allocation and naming locality synchronization pipeline management hierarchy management instruction set use Inside a compiler all these things come together Intrinsic Merit Compiler construction is challenging and fun interesting problems real results extremely complex interactions primary responsibility for performance blame new architectures gt new challenges Compilers have an impact on how computers are used Compiler construction poses some of the most interesting problems in computing I 39 Abstract view You have used several compilers sougce maci ine What qualities are important in a compiler CO e co e 1 Correct code errors 2 Output runs fast 3 Compiler runs fast Impl39cat39ons 4 Compile time proportional to program size 5 Support for separate compilation recognize legal and illegal programs 6 Good dIagnostIcs for syntax errors generate curred code 7 Works well with the debugger 8 Good diagnostics for flow anomalies 39 manage Storage Of all variables and owe 9 Cross language calls agreement on format for object or assembly code 10 Consistent predictable optimization yum 39 L39 Big step up from assembler higher level notations 9 ID Traditional two pass compiler A fallacy source machine code code s errors Implications intermediate representation IR Can we build ngtlt m compilers WIth n7quot components a front end maps legal code into IR 39 baCk and maps IR Unto talga maChine must encode all the knowledge in each front end simplify retargeting must represent all the features in one IR allows mumple from ends a must handle all the features in each back end a multiple passes gt better code Limited success With lowlevel le Front end source code Responsibilities recognize legal procedure a report errors produce IR preliminary storage map a shape the code for the back end Much of front end construction can be automated Front end source code a scanner IR errors Parser recognize contextfree syntax guide contextsensitive analysis a construct Rs produce meaningful error messages attempt error correction Parser generators mechanize much of the work Front end token s slugger errors Scanner maps characters into tokens the basic unit of syntax x x becomes ltid x ltid x ltid ygt character string value for a token is a leXeme typical tokens number id at I do end eliminates white space tabs blanks comments c a key issue is speed gt use specialized recognizer as opposed to lax Front end Contextfree syntax is specified with a grammar ltsheep noise baa l baa ltsheep noise This grammar defines the set of noises that a sheep makes under normal circumstances The format is called BackusNaur form BN F Formally a grammar G SNTP S is the start symbol N is a set of nonterminal symbols T is a set of terminal symbols P is a set of productions or rewrite rules P NeNu T Front end Front end Context free syntax can be put to better use Given a grammar quot4 quot ltgoalgt ltEprgt exprgt ltexprgt ltopgt ltterm gt lttermgt 1 2 3 4 lttermgt 5 6 7 number 1 ltexprgt l id 2 ltexprgt ltopgt lttermgt ltopgt 5 ltexprgt ltopgt y l 39 7 ltexprgt 2 ltexprgt ltopgt lttermgt y This grammar defines simple expressions with addition and subtraction 4 ltexprgt ltop over the tokens id and number 6 ltexprgt 2 y 3 lttermgt 2 y S ltgoalgt 5 2 T number id N ltgoalgt ltexprgt lttermgt ltopgt P1234567 To recognize a valid sentence in some CFG we reverse this process and build up a parse Front end Front end A parse can be represented by a tree called a parse or syntax tree So compilers often use an abstract syntax tree ltid2ygt ltidxgt ltnum2gt This is much more concise Abstract syntax trees ASTs are often used as an IR between front end and back end id xgt Obviously this contains a lot of unnecessary information Back end Responsibilities translate IR into target machine code choose instructions for each IR operation decide what to keep in registers at each point a ensure conformance with system inter ces Automation has been less successful here Back end Re errors gister Allocation have value in a register when used limited resources changes instruction choices can move loads and stores optimal allocation is difficult Modern allocalors often use an analogy to graph coloring Back end machine code errors Instruction selection produce compact fast code a use available addressing modes pattern matching problem ad hoc techniques tree pattern matching string pattern matching dynamic programming Traditional three pass compiler IR IR source 9 front middle back achlne cede cede errors Code Improvement analyzes and changes IR goal is to reduce runtime must preserve values Optimizer middle end s errors Modern optimizers are usually built as a set of passes Typical passes a constant propagation and folding The Tiger compiler code motion a reduction of operator strength E Z 3 quotquot E 32 W 3 Assembly m E a common subexpression elimination Alfgcllvv ls E Analvsls mm X E redundant store elimination Pass PassB i my 5 PassB 3 Passa Passl 39squot a dead code elimination 25 The Tiger compiler phases A Stra39ght39hne Ianguage A straightline programming language no loops or conditionals Sim t Sim Sim CompoundSLm Sim t id ssign trn Sim gt print ExpLisl PrintStrn Analysls check types of expresslons request translatlon of each Exp l id Exp t num NumExp Exp t Exp Binop Exp OpEXp Exp t Sim Exp quxp ExpLI39sI t Expo ExpLI39sI PairExpList ExpLI39sI t Exp LastExpList Binop gt Plus Binop gt Minus Binop t x Times Binop t 39 Analysls llveness calculates places Where each varlable 39 59 v t t t a53 blprintlaca 1 10xa printlb prints 8 7 Tree representation 53b printl ala 1l10 x a printb c ampamtm Asag tm c Dmme a opExp Assgsnn ansm l LasiExme NunExp ms Nur Exp b 1552th i lamp l PnntStm opExp l PmExansL NunExp Tunes lamp l l lamp LasiExme 1o 3 l l a OpExp lamp Mmhw NunExp a 1 This is a convenient internal representation for a compiler to use Chapter 2 Lexical Analysis Java classes for trees abstxact class stm class CompoundStm extends Stu 1311 Stml I11 i CompoundSthStm 51 Ste 52 stun51 stun52 class Assignsm extends Stm String id Exp exp AssignsthString 1 Exp a 3 class Pxintstm extends stm xpList nxps FrintstmxxpList n e e 3 abstxact class Exp las IdExp extends Exp String id IdExpString i idi Scanner source code class Numiixp extends Exp int mm Nnmzxpunt n mmn class DpExp extends Exp Exp left right in spar nal static int Flu51Himx52Tiuzes3Div4 UpExptExp 1 int 0 Exp 139 left1 opeto tightt class Esquxp extends Exp m stm Exp ax abstxact class ExpList class paixExpList extends ExpList Ex head Ex List tail public Pair 39xpList 39xp h EIpList t headh tailt class LastExpList extends ExpList p nad public Last 39xpList 39xp h head11 3D A errors maps characters into tokens the basic unit of syntax x become y S ltid x ltid x ltid ygt character string value for a token is a IeXeme typical tokens number I39d at I do end eliminates white space tabs blanks comments akeyissuei p e s s e d gt use specialized recognizer as opposed to lax H H mm ror personal or classroom use rs granreo wrmourree orovroeo mar copes are normaoe or orsmoureo ror prort or To copy omerwrse toreouorsn to oosron servers or to reorsmoure to lists reourres orrorsoeorrro oermrssron tooor ree Pequ uroue eclu esroermrssron to poolso rrom nosm gem1 32 Specifying patterns Specifying patterns A scanner must recognize various parts of the languages syntax A canrel Some parts are easy Other parts are much harder white space identifiers ltwsgt ltwsgt alphabetic followed bykalphanumerics 7 amp l ltwsgt t i t numbers integers 0 or digit from 19 followed by digits from 09 keywordsandoperat r I t t f 09 specified as literal patterns do end demma 539 m eger dlgl 5 mm reals integer or decimal E or digits 39om 09 comments complex real real opening and closing delimiters at at We need a powerful notation to specify these patterns 33 Operations on languages Regular r Operation De nition Patterns are o en specified as regular languages union ofL and M LUM 5 l S f L 0 51W Notations used to describe a regular language or a regular set include written L UM both regular expressions and regular gram ars concatenation ofL and M LM st l s r L and t C M written LM Regular expressions over an alphabet Z Kleene closure of L L JED L written Lt 1 a Is a RE denoting the set a positive closure ofL L UEI L39 written LJr ifa 2 then a is a RE denoting a ifr and s are REs denoting Mr and Ms then r is a RE denoting Mr 7 l s is a RE denoting LrULs rs is a RE denoting LrLs r is a RE denoting Ll r Ifws atlnnt a operators quot We assume closure then concatenation then alternation as the order of precedence 36 Examples identifier Ietter MalblclmlzlAlBlCl le digit 0l1i2l3l4l5l6l7 8l9 id gt letter letter i digit numbers integer r H 7 l e 0 l 1 l 2 l 3 l l 9 digit decimal gt integer digit real 7 integer l decimal E l digit complex r real real Numbers can get much more complicated Most programming language tokens can be described with REs We can use REs to build scanners automatically Examples Let Z 1171 1 alb denotes 171 2 albalb denotes aa hbmbb ie albalb mlabibaibb 3 a denotes SJLWLGMVH 4 11le denotes the set of all strings ofds and b s including 9 Le ale a b m ala b denotes a7bab7aab7aaab7aaaab7m Algebraic properties of REs From a regular expression we can construct a deterministic finite automaton DFA Recognizer for identi er error identi er letter gtalblclmiglAlBlClle digit Molll2l3l4l5l6l7l8l9 id gt letterletterl digit Code for the recognizer char e nertuchar state e 0 Ix code ror state 0 xI done err false tokenyalue e quotquot Ix empty string xI wh11e not done ss lt chanclassic bar state e nertstatec1assstate svithstate c Ix building an id xI toxenyalue lt toxenyalue char char e nertucharO Ix accept state xI e tirier yp men true error x toxemtype error done true brea return tokeustype Automatic construction Scanner generators automatically construct code from regular expression like descriptions construct a dfa use state minimization techniques emit code for the scanner table driven or direct code A key issue in automation is an interface to the parser lex is a scanner generator supplied with UNIX emits C code for scanner provides macro definitions for each token used in the parser Tables for the recognizer Two tables control the recognizer A 7772 07779 other 11 s 1 c are ass value letter letter digit other next est at e To change languages we can just change tables Grammars for regular languages n we place a restriction on the form of a grammar to ensure that it de scribes a regular language Provable fact For any RE 7 there is a grammarg such that r Lg The grammars that generate regular sets are called regular grammars Definition In a regular grammar all productions have one oftwo forms 1 A MA 2 A mm whereA is any nonterminal and a is any terminal symbol These are aiso called type 3 grammars Chomsky More regular languages Example the set of strings containing an even number of zeros and an even number of ones The REis00l11 01l1000l11 01l1000l11 Finite automata A nondeterministic finite automaton N FA consists of 1 a set of statesS 50775n 2 a set of input symbols 2 the alphabet 3 a transition function move mapping statesym bol pairs to sets of states 4 a distinguished start state so 5 a set of distinguished accepting or nal states F A Deterministic Finite Automaton DFA is a special case of an NFA 1 no state has a etransition and 2 for each state and input symbol a there is at most one edge labelled a leaving 5 A DFA accepts iff there exists a unique path through thetransition graph from the so to an accepting state such that the labels along the edges spell X 47 More regular expressions What about the RE ai b abb 7 State 50 has multiple transitions on a gt nondeterministic finite automaton DFAs and NFAs are equivalent 1 DFAs are clearly a subset ofNFAs 2 Any NFA can be converted into a DFA by simulating sets of simulta neous states a each DFA state corresponds to a set of NFA states a possible exponential blowup NFA to DFA using the subset 39 example 1 Constructing a DFA from a regular expression RE gtN FA ws m build NFA for each term conne t em with 9 moves NFA we moves to DFA construct the simulation the subset construction DFA gt minimized DFA merge compatible states DFA gt RE constructRfjRfk llREkl R11URfj 1 RE to NFA RE to NFA example albl abb Ne Na all NAB E 00 0 O NA albl NFA to DFA the subset construction lnput FA Output A DFA a With states Dstates and transrtrons Dtrans uchthatL D at N Method Letsbea state lnNandTbea set or states and using the following operatrons Definition et ofNFA States reachable from NFA State ol i estransltlol is alone set of NFA States reachable from some NFA State s in T on 6 transitions aone set 0 NFA States to which there l5 5 transition on input Symbol a from some NFA State 5 in T eclusurem mavema add state T secusuretsn t unmarked to Dstates while 1 unmarked state T m Dstates mark T foreach input symbol a Larsusur move a if U z Dstates then add U to Dstates unmarked Dtransra U endfor endwhile secusuretm is the start state or n A state era is acceptmg rr it eontams at least one acceptmg state m N Limits of regular languages Not all languages are regular One cannot construct DFAs to recognize these languages L fat L wcw l WCX Note neither of these is a regular expression DFAs cannot count But this is a little subtle One can construct DFAs for alternating 0 s and 1 s 8l101 8l0 sets of pairs of 0 s and 1s 01 l 10 r NFA to DFA using subset construction example 2 01241 D 1245619 1 2 3 4 678 E1 2 4 56710 C1 2 4 567 So what is hard Language features that can cause problems reserved words PLl had no reserved words if then then then else else else then signi cant blanks FORT N and Algol68 ignore blanks do 10 i 1 26 do 10 i 1 26 string constants special characters in strings newline tab quote comment delimiter nite closures some languages limit identifier lengths adds states to count length FORTRAN 66 gt 6 characters These can be swept under the rug in the language design How bad can it get 1 INTEGERFUNCTIONA 2 PAEAMETEElt16 2 3 IMPLICIT CHAMCTERth B A B 4 INTEGER EomATlt10IElt10DogE1 5 100 EoEMATlt4H3 a 200 FORMATM lt3 7 8 9 9E11 D09E112 IEltXgt1 10 IEltXgtH1 11 IFltXgt300200 12 300 CONTINUE 13 END c this is a comment 3 FILE1 14 END Example due to Drv FVKV Zadeck of IBM Corporation The role of the parser source code scanner amp errors Parser performs contextfree syntax analysis guides contextsensitive analysis constructs an intermediate representation a produces meaningful error messages attempts error correction For the nat few weeks we will look at parser construction H usklng for personal or classroom use ls granted Wlmom fee olovloeo mat cooles are notmaoe or Ellsmbuted for p u v To copy om w m nm 7 Nina Hm fee Pequestpemllsslon to poollsn from nosklngg porous eclu Chapter 3 LL Parsing Syntax analysis Contextfree syntax is specified with a contextfree grammar Formally a CFG G is a 4 tuple VV where V is the set of terminal symbols in the grammar For our purposes V is the set oftokens returned by the scanner Vquot the nonterminals is a set of syntactic variables that denote sets of su strings occurring in the language These are used to impose a structure on the grammar S is a distinguished nonterminal SE an denoting the entire set of strings in This is sometimes called a goal symbol P is a finite set of productions specifying how terminals and nonterminals can e com ined to orm rings in the lan uage Each production must have a single nonterminal on its le hand side The set V V UV is called the vocabulary ofG Notation and terminology lfA 77 39ythen 091B gt 0043 is a singlestep derimiiun using A 77 7 Similarly gt and gt denote derivations of 2 0 and 21 steps IfS B then B is said to be a senieniial form of G LG w C V is w w MG is called a sentence of G Note up BC V is mnV 61 Scanning vs parsing Where do we draw the line a 2A zja 2A 2 i 0 quot793 9El0 93 l l expr term op term Regular expressions are used to classify identifiers numbers keywords REs are more concise and simpler for tokens than a grammar 39 39 be built 39om 39 quotquot39 grammars Contextfree grammars are used to count brackets begin imparting structure expressions end if then else Syntactic analysis is complicated enough grammar for C has around 200 produc ions Factoring out lexical analysis as a separate phase makes compiler more manageable 63 Syntax analysis Grammars are o en written in BackusNaurform BNF Example 1 expr 2 EXPF EXPFXOPXEXPF 3 l mm 4 l m 5 op 2 6 l 7 i ii 8 l This describes simple expressions over numbers and identifiers In a BNF for a grammar we represent 1 nonterminals with angle brackets or capital letters 2 terminals with typewriter font or underine 3 productions as in the example Derivations We can view the productions ofa CFG as rewriting rules Using our example CFG goal U U U U U U ll 4 We have derived the sentence 1 2 a y We denote this goal 1 mm at id Such a sequence of rewrites is a derivation or a parse The process ofdiscovering a derivation is called parsing Derivations Rightmost derivation At each step we chose a nonterminal to replace For the string 1 2 a y goal gt This choice can lead to different derimtions gt gt Two are of particular interest gt gt gt Ie most derivation gt the lettmost nonterminal is replaced at each step gt rightmost derivation the rightmost nonterminal is replaced at each step Again goal 1 mm at id The previous example was a le most derivation Precedence Precedence These two derimtions point out a problem With the grammar It has no notion of precedence or implied order of evaluation To add precedence takes additional machinery 1 goal expr 2 expr expr term 3 l expr term 4 l 5 term term a factor 6 l term factor 7 l factor 8 factor mm a mum 9 i m Treewilk EVEgangquot wmpmes x 2 it y This grammar enforces a precedence on the derivation the wrong answer terms must be derived from expressions Should be 1 2 a y forces the correct tree Precedence Now for the string 1 2 a y goal gt gt ll ll ll ll ll ll ll Again goal 1 mm at id but this time we build the desired tree Ambiguity If a grammar has more than one derivation for a single sentential form then it is ambiguous Example 39 trnt s it exprthen stm t l if exprthen strntelse strnt l other stints Consider deriving the sentential form if E1 than if E then 31 6156 S It has two derivations This ambiguity is purely grammatical It is a contextfree ambiguity Precedence lt1dxgt ltnum2gt Treewalk evaluation computes x 2 a y Ambiguity May be able to eliminate ambiguities by rearranging the grammar strnt 39 matched unmatched matched if expr than matched else matched l other stints unmatched if expr than strnt l if expr than matched else unmatched 39quotquot but applies the common sense rule match each else With the closest unmatched then This is most likely the language designer s intent Ambiguity Ambiguity is o en due to confusion in the context 39ee specification Contextsensitive confusions can arise from overloading Example a f 17 In many Algollike languages i could be a function or subscripted variable Disambiguating this statement requires context a need values ofdeclarations not contextfree really an issue of type Rather than complicate parsing we will handle this separately Topdown versus bottomup Parsing the big picture Topdown p arsers start at the root of derivation tree and fill in picks a production and tries to match the input a may require backtracking some grammars are backtrackfree predictive Bottomup parsers start at the leaves and fill in a start in a state valid for legal first tokens as A k LW recognize valid prefixes use a stack to store both state and sentential forms tokens parser grammar generator parser code IR Our goal is a flexible parser generator system Topdown parsing A topdown parser starts With the root of the parse tree labelled With the start or goal symbol of the grammar To build a parse it repeats the following steps until the fringe of the parse tree matches the input string 1 At a node labelled A select a production A 77 0c and construct the appropriate child for each symbo o 0c 2 When a terminal is added to the 39inge that doesn t match the input string backtrack 3 Find the next node to be expanded must have a label in V The key is selecting the right production in step 1 gt should be guided by input string Simple expression grammar Example Recall our grammar for simple expressions Consider the input string 1 2 a y Example 1 goal expr 2 em expr ltterm 3 l expr term 4 l erm 5 term term factor 6 l term factor 7 l actor 8 factor 9 l id Prod n Sententlal form input 7 goal x 2 s y l x 7 2 s 2 x 2 7 y 4 x 2 7 y 7 x 2 7 y 9 x 2 7 y 7 x f 2 7 y 7 Tx 7 2 s y 3 ft 2 7 y 4 ft 2 7 y 7 ft 2 g y 9 ft 2 7 y 7 x f 2 7 y 7 2 2 g y 7 x 7 T s y 8 x 7 72 x y 7 x 2 1 s y 7 x f 7 y 5 x 7 t2 2 y 7 x 7 y s x 7 12 s y 7 x 7 2 r 2 y 7 x 7 2 Ty x 2 x 2y 7 x 7 2 s y 1 Leftrecursion Another possible parse for x 2 a y expr If the parser makes the wrong choices expansion doesn t terminate This isn t a good property for a parser to have Parsers should terminate wwwwwmw ltltltltltltltltltltltltltlt Topdown parsers cannot handle leftrecursion in a grammar Formally a gram mar is leftrecursive if FA C Vquot such thatA gt Act for some string on Our simple expression grammar is leftrecursive Eliminating leftrecursion To remove leftrecursion we can transform the grammar Consider the gramm ar 39agment foo foooc i B where on and B do not start with foo We can rewrite this as Bbar ocbar e where bar is a new nonterminal This fragment contains no leftrecursion Example This cleaner grammar defines the same language 1 goal expr 2 expr term expr 3 i term 7 war 4 i erm 5 term factor a term 6 i factor term 7 i factor 8 factor mm 9 i 11 It is rightrecursive free of e productions Unfortunately it generates different associativity Same syntax different meaning Exam ple Our expression grammar contains two cases of Ie recursion expr expr term i expr term i m term term a factor i term factor factor Applying the transformation gives expr term EXP expr termexpr i i termxexpr term factor term a terms factortermquot i i factor term a Wth this grammar a topdown parser will terminate backtrack on some inputs 52 Exam ple Our longsuffering expression grammar 1 2 term expr 3 ltcexmgtltexpvgt 4 lttermgtltexp gt 5 e 6 factor term 7 a factor term 8 factor term 9 e 10 factor mm 11 i id Recall we factored out leftrecursion How much lookahead is needed We sathaI opdown parsers may need to backtrack when they select the Wrong production Do we need arbitrary lookahead to parse CFGs in general yes a use the Earley or CookeYounger Kasami algorithms Aho Hopcroft and Ullman Problem 2 34 Parsing Translation and Compiling Chapter4 Fortunately a large subclasses of CFGs can be parsed with limited lookahead most programming language constructs can be expressed in a gram mar that falls in these subclasses Among the interesting subclasses are LL1 left to right scan leftmost derivation 1token lookahead and LR1 le to right scan rightmost derivation 1token lookahead Left factoring What ifa grammar does not have his propeny Sometimes we can transform a grammar to have this property For each nonterminalA nd the longest prefix on common to two or more of its alternatives if on c then replace all oftheA productions A ocB1 0ch l l on with A r orA A gtB1lel an where A is a new nonterminal Repeat until no two alternatives for a single nonterminal have a common prefix Predictive parsing Basic idea For any two productions177 06 l B we would like a distinct way of choosing the correct production to expand For some RHS on C G define FlRSToc as the set of tokens that appear rst in some string derived from on That is for some w C V w C FlRSToc iff on gt wy Key property Whenever two productionsA 77 0c and A 77 B both appear in the grammar we would like FlRSToc FlRST i This would allow the parser to make a correct choice with a lookahead of only one symbol The example grammar has his propeny Example Consider a rightrecursive version of the expression grammar 1 ltsoa1gt ltcxprgt 2 expr mmltexprgt 3 l ltmmgt7ltexprgt 4 l lt gt 5 term factor a term 6 l factor term 7 l factor 8 factor 9 l id To choose between productions 2 3 amp 4 the parser must see past the mm or 11 and look at the it or FlRST2 FlRST3 DFiRSTM 1 This grammar fails the test Note This grammars rightrassocatve Example There are two nonterminalsthat must be le factored expr term expr i term expr i rm term factor a term i i actor term factor Applying the transformation gives us cxpr wimXexpr expr expr i expr l a factor term a y ten n e 59 Example Sentential form xgoali Y Y Y Y Y 2 s y The next symbol determined each choice correctly 91 Example Substituting back into the grammar yields a factor term a ar term l lttmgt l a factor num l id Howwumuawm Now selection requires only a single token lookahead Note This grammars sm ngnrassumarve Back to leftrecursion elimination Given a le factored CFG to eliminate leltrecursion ifA Aoc then replace all oftheA productions A mummy whereNandA are new productions Repeat until there are no le recursive productions Generality Question By Iefi factoring and eliminating le recursion can we transform an arbitrary contextfree grammar to a form where it can be pre dictiver parsed with a single token lookahead Answer Given a contextfree grammar that doesn t meet our conditions it is undecidable whether an equivalent grammar edsts that does meet our conditions Many contextfree languages do not have such a grammar M0b ln21Uaquot1bZ ln 21 Must look past an arbitrary number of ds to discover the 0 or the 1 and so t determine the deriva Ion Recursive descent parsing term if factor ERROR then return E else return termprime termprime if token MULT then token m nerLtokeno return termO else if token DIV then token m nerLtokeno return termO else return 0K factor if token mm then n m nerLtokenO return 0K else if token ID then token m nerLtokeno return 0K else return Emh Recursive descent parsing Now we can produce a simple recursive descent parser 39om the right associative grammar token m nexthokenO if exprO man i token all then return ERROR xpr if termo ERROR then return ERROR else return exprprime exprprime if token PLUS then token m nexthokenO return exprO else if token MINUS then to en m nextjokeno return exprO else return UK Building the tree A quot 39 39 quot isIobuild 39 quot39 of the source code To build an abstract syntax tree we can simply insert code at the appropri ate points factor can stack nodes id num termprime can stack nodes 1 telD20 can pop 3 build and push subtree exprprime can stack nodes exprQ can pop 3 build and push subtree goalQ can pop and return tree Nonrecursive predictive parsing Observation Our time stack or call stack Using recursive procedure calls to implement a stack abstraction may not be particularly efficient This suggests other implementation methods explicit stack handcoded parser stackbased tabledriven parser Tabledriven parsers A parser generator system o en looks like Source code grammar parser generator This is true for both topdown LL and bottomup LR parsers Nonrecursive predictive parsing Now a predictive parser looks like source code scanner Rather than writing code we build tables Building tables can be automated Nonrecursive predictive parsing Input a string w and a parsing tableror G tos m 0 Stacktos 4 EOF Stacktos 4 Start Symbol token m nexthokenO repeat x 4 Stackitos if x is a terminal or EDF then if x token tnen pop token m nexthokenO else errorO else Xisanonterminal if MXtoken X H39le Yk then pop x S kayk 1 Y1 else errorO until x EDF Nonrecursive predictive parsing What we need now is a parsing IableM Our expression grammar lts parse table expr lttermgtltexprgt ltexpr rltexprgt 2 factor terms it ten nl 2 factor AHA s c n B l we use 39 to represent EUF FOLLOW For a nonterminalA define FOLLOWA as the set of terminals that can appear immediately to the right ofA in some sentential form Thus a nonterminal s FOLLOW set specifies the tokens that can legally appear a er i A terminal symbol has no FOLLOW set To build FOLLOWA 1 Put in FOLLOWgoal 2 lfA gt BB a Put FlRSTB e in FOLLOWB b lf ze ieA 77 0d ors C FiRsTB ie 59 9 then put FOLLOWA in FOLLOWB Repeat until no more additions can be made FIRST For a string of grammar symbols on define FlRSToc as a the set ofterminal symbols that begin strings derived from on aCVt on focgt ethen e C FlRSToc FiRsT0c contains the set oftokens valid in the initial position in on To build FiRsT 1 Ier V then FiRSTX is X 2 Ierethen add eto FlRSTUI 3 le gtY1Yz Yk a Put FlRSTlY17S in FiRSTi X b Vi1ltigkife C FlRSTltY1 FlRSTK1 ie Y1 Yv1 s then put FlRSTfK7S in FiRSTX c IfSC FiRSTYl m FiRSTYkl then put a in FiRSTX Repeat until no more additions can be made LL1 grammars Previous de nition A grammar G is LL1 iff for all nonterminalsA each distinct pair of pro ductionsA B and A mwysatisfy the condition FiRSTBnFiRSTy 1 What ifA gt a Revised de nition A grammar G is LL1 iff for each set ofproductionsA 77 ocl l 0ch laquot 1 FiRSToc1FiRSToc2FiRSToci are all pairwise disjoint 2 If if ethen FiRSTocnFOLLOWA 11 gj 5m j lfG is efree condition 1 is sufficient LL1 grammars Provable facts about LL1 grammars 1 No le recursive grammar is LL1 2 No ambiguous grammar is LL1 3 Some languages have no LL1 grammar 4 A e free grammar where each alternative expansion forA begins with a distinct terminal is a simple LL1 grammar Example S 77 aSl a is not LL1 because FlRST0S FiRSTa a S was accepts the same language and is LL1 Example Our longsuffering expression gramm ar S 77 a gtFT E gtTE T leTle E gtEl Ele F gtidlnum LL1 parse table construction Input Grammar G Output Parsing table M Method 1 V productionsA 77 on a Va FlRSToc addA onto MIAa b If e C FlRSTocI i vb e FOLLOWA addA 0c toll141 ii If s c FOLLOWA then addA onto MlAj 2 Set each undefined entry ofMto error If MAa with multiple entries then grammar is not LL1 Note recall ab V 50 was a A grammar that is not LL1 strut if expr then strut if expr then strut else strut Leftfactored strut stmt if expr then strut strut l else strut l s Now FlRSTstmt 9 else Also FOLLOWstmt else But FlRSTstmt FOLLOWstmt else 1 On seeing else conflict between choosing stmt 2 else strut and stmt gt grammar is not LL1 The fix Put priority on strut 2 else strut to associate else with clos est previous then Error recovery Key notion For each nonterminal construct a set ofterminals on which the parser can synchronize When an error occurs looking forA scan until an element DfSYNCHA is found Building SYNCH 1 a FOLLOWA gta SYNCHM 2 place keywords that start statements in SYNCHA 3 add symbols in FlRSTA to SYNCHM If we can t match a terminal on top of stack 1 pop the terminal 2 print a message saying the terminal was inserted 3 continuethe parse ie SYNCHa V a Some definitions Recall For a grammar G with start symbol S any string on such that Sgt 0c is called a sentential form lfoc C V then on is called a sentence in LG Otherwise it isjust a sentential form not a sentence in LG A leftsentential form is a sentential form that occurs in the leftmost deriva tion of some sentence A rightsentential form is a sentential form that occurs in the rightmost derivation of some sentence H oswng ror personal or classroom use rs granreo wrmoutree prowoeo mat copes are notmaoe or distributed ror p u v To copy om w m nm p mnn Hm res Pequestpermsson to puorsn rrom noskmgg puroue eou Chapter 4 LR Parsing Bottomup parsing Goal Given an input string w and a grammar G construct a parse tree by starting at the leaves and working to the root The parser repeatedly matches a rightsentential form from the language against the tree s upper fron ier At each match it applies a reduction to build on the frontier each reduction matches an upper frontier of the partially built tree to the RHS of some production a each reduction adds a node on top of the frontier The final result is a rightmost derivation in reverse Example Consider the gramm ar 1 S 77 aABe 2 A 77 At 3 i 4 B ex 11 and the input string abbcde Sentential Form cde I ANM The trick appears to be scanning the input and finding valid sentential forms Handles w The handleA 77 B in the parse tree for ocBw Handles What are we trying to nd A substring 0c ofthe tree s upper frontier that matches some production A l on where reducing 0c toA is one step in the reverse of a rightmost derivation We call such a string a handle Formally a handle of a rightsentential form 7 is a production A l B and a po sition in y where B may be found and replaced by A to produce the previous rightsentential form in a rightmost derivation of39y ie if539fm XAW gtm ocBw then A l B in the position following on is a handle ofocBw Because 7 is a rightsentential form the substring to the right of a handle contains only terminal symbols Handles Theorem lf 39 39 39g 39 formquot 39 L dle Proof by de nition 1 G is unambiguous gt rightmost derivation is unique 2 gt a unique productionA l B applied to take 39ylv1to 39y 3 gt a unique position k at whichA l B is applied 4 gt a unique handleA 77 B Example The Ie recursive apression gramm ar original form Prod n Sentential Form 1 goal expr goal 2 cm expr term 1 3 l ltexprgt term 3 am 4 l term 5 term a factor 5 term 2 term rr factor 9 r 1 6 i ltterm factor 7 factor r m 7 l factor 8 m a 1d 8 factor mm 4 r m 9 l M 7 mm r m 9 mm 2 id Stack implementation One scheme to implement a handlepruning bottomup parser is called a shiftreduce parser Shi reduce parsers use a stack and an input buffer 1 initialize stack with 2 Repeat until the top ofthe stack isthe goal symbol and the input token is a find the handle ifwe don t have a handle on top of the stack shi an input symbol onto the stack b prune the handle ifwe have a handIeA i B on the stack reduce 0 P0P l B l symbols offthe stack ii push A onto the stack Handlepruning Tk t t A bottom pr u 1 L 4 V y To construct a rightmost derivation Squaw142 Yn71 7nw we set i to n and apply the following simple algorithm for i H downto 1 1 find the handle AgaBr in 7 2 replace B with A to generate 7F This takes 2n steps where n is the length of the derivation Example back to x 2 a y Stack input Action m hill 1 goal 2 expn i 3 i r 3 expr 7 term termr 1 4 5 term termr factor fact r tractor m t l 39 id mi am a tractor reduce 5 am reduce 3 193 goal accept 1 shift until top of stack is the right end of a handle 2 Find the left end ufthe handle and reduce 5 shiits 9 reduces 1 accept Shiftreduce parsing Shiftreduce parsers are simple to understand A shiftreduce parser has just four canonical actions 1 Shi next input symbol is shi ed onto the top of the stack 2 reduce right end of handle is on top of stack locate le end of handle within the stack pop handle off stack and push appropriate nonterminal LHS w accept terminate parsing and signal success A error call an error recovery routine The key problem to recognize handles not covered in this course LRk grammars Formally a grammar G is LRk iff 1 53 mm gtm ocBw and 2 Sag 78x sun ocBy and 3 FlRSTklw FlRSTkly gt My 78 Le Assume sentential forms ocBw and ocBy with common prefix ocB and common k symbol lookahead FlRSTkU FlRSTkW such that ocBw re duces to mm and ocBy reduces to 78x But the common prefix means ocBy also reduces to aAy for the same re sult Thus aAy yBx LRk grammars lnformally we say that a grammar G is LRk if given a rightmost derivation SYam 72 swat we can for each rightsentential form in the derivation 1 isolate the handle of each rightsentential form and 2 determine the production by which to reduce by scanning 7 from le to right going at most k symbols beyond the right end ofthe handle of 7 Why study LR LR1 grammars are o en used to construct parsers We call these parsers LR1 parsers everyone s favorite parser virtually all contad 39ee programming language constructs can be ex pressed in an LR1 form LR quot 39 a determin istic bottomup parser efficient parsers can be implemented for LR1 grammars LR parsers detect an error as soon as possible in a le toright scan of the input LR grammars describe a proper superset ofthe languages recognized by predictive ie LL parsers LLk recognize use of a production A i B seeing first k symbols of B LRk recognize occurrence ofB the handle having seen all ofwhat is derived from B plus k symbols oflookahead 124 Left versus right recursion Parsing review Right Recursion needed for termination in predictive parsers requires more stack space a right associative operators Le Recursion works fine in bottomup parsers limits required stack space le associative operators Rule of thumb right recursion for topdown parsers le recursion for bottomup parsers Chapter 5 JavaCC and JTB Recursive descent A hand coded recursive descent parser directly encodes a grammar typically an LL1 grammar into a series of mutually recursive proce dures It has most of the linguistic limitations of LL 1 LLk An LLk pd er 39 quot seeing only the first k symbols of its right hand side LRk An LRk parser must be able to recognize the occurrence ofthe right hand side of a production alter having seen all that is derived from that right hand side with k symbols oflookahead The Java Compiler Compiler Can be thought of as Lex and Yacc for Javaquot It is based on LLk rather than LALR1 Grammars are written in EBNF The Java Compiler Compiler transforms an EBNF grammar into an L Lk p arser Th k I y just like aYacc grammar can have embedded action codewritten in C a The lookahead can be changed by writing LUUKAHEAD a The whole input is given in just one file not two The JavaCC input format One file a header token specifications for lexical analysis a gram m ar Generating a parser with JavaCC javacc fortranjj generates a parser with a specified name javac Main java Main java contains a call of the parser java Main lt progf parses the program progf The JavaCC input format Example of a token specification TUKEN lt INTEGERLITERAL quot1quotquot9quot quot0quotquot9quot I quot0quot gt Example ofa production void StatementListReturnO Statemento quotreturnquot Expressiono quotquot The Visitor Pattern For objectoriented programming the Visitor pattern enables the definition of a new operation on an object structure without changing the classes of the objects Gamma Helm Johnson Vlissides Design Patterns 1995 Sneak Preview When using the Visitor pattern a the set of classes must be fixed in advance and a each class must have an accept method First Approach Instanceof and Type Casts List 1 The List object int sum boolean proceed true while proceed if 1 instanceof Nil proceed false else if l instanceof Cons sum sum Gons 1Aead 1 Cons 1tai1 Notice the two type casts Advantage The code is written without touching the classes Ni and Drawback The code constantly uses type casts and instanceof to determine what class of object it is considering First Approach Instanceof and Type Casts The running Java example summing an integer list interface List class Nil implements List 0 class Cons implements List Second Approach Dedicated Methods The first approach is not objectoriented To access parts of an object the classical approach is to use dedicated methods which both access and act on the subobjects interface List int sum We can now compute the sum of all components of a given Listobject 1 by writing 15um0 Second Approach Dedicated Methods class Nil implements List public int sumO 0 class Cons implements List int head List t 1 public int sumO return head tailsum t 1 Advanta e Tquot r and the code can be written in a systematic way Disadvantage For each new operation on Listobjects new dedicated methods have to be written and all classes must be recompiled Third Approach The Visitor Pattern The purpose of the accept methods is to invoke the visit method in the Visitor which can handle the current object class Nil implements List b 39c void acceptVisitor v t v visitthis class Cons implements List int head List tail public void acceptVisitor v t v visitthis Third Approach The Visitor Pattern The Idea Divide the code into an object structure and a Visitor akin to Func tional Programming Insert an accept method in each class Each accept method takes a Visitor as argument A Visitor contains a visit method for each class overloading A method for a class Ctakes an argument oftype C interface List void acceptVisitor v interface Visitor void visit l x void visitCons x Third Approach The Visitor Pattern The control flow goes back and forth between the visit methods in the Visitor and the accept methods in the object structure class SumVisitor implements Visitor 1 39nt um 039 i s public void visitmil x 0 public void visitCons x sum sum xmead x tail acceptthis SumVisitor sv new SumVisitorO Lacce tsv System out printlnltsv sum Notice The visit methods describe both 1 actions and 2 access of subobjects P The Visitor pattern combines the advantages ofthe two other approaches Frequent Frequent recompilation Instanceof and type casts No Dedicated methods Yes The Visitor pattern No The advantage of rs New methods without recompilation arr ugtlllg vrsuurs Tools that use the Visitor pattern JJTree from Sun Microsystems and the Java Tree Builder 39om Pur due University both frontends for The Java Compiler Compiler from Sun Microsystems The Java Tree Builder The Java Tree Builder JTB has been developed here at Purdue in my group JTB is a frontend for The Java Compiler Compiler JTB supports the building of syntax trees which can be traversed using visitors JTB transforms a bare JavaCC grammar into three components a a JavaCC grammar with embedded Java code for building a syntax tree a one class for every form of syntax tree node and a a default visitor which can do a depthfirst traversal of a syntax tree 143 Visitors Summary Visitor makes adding new operations easy Simply write a new visitor A visitor gathers related operations It also separates unrelated ones Adding new classes to the object structure is hard Key consid on are you most likely to change the algorithm applied over an object structure or are you most like to change the classes of objects that make up the structure Visitors can accumulate state a Visitor can break encapsulation Visitors approach assumes that the inter ce of the data structure classes is powerful enough to let visitors do theirjob As a result the pattern o en forces you to provide public operations that access internal state which may compromise its encapsulation The Java Tree Builder The produced JavaCC grammar can then be processed by the Java Com piler Compiler to give a parser which produces syntax trees The produced syntax trees can now be traversed by a Java program by writing subclasses of the default visitor Program JavaCC JTB a Javacc grammar gt Java Compnera Parser grammar with embe ompiler dded c Java code Syntaxtreenode Syntax tree c1asses with accept methods Default visitor Using JTB generates jtboutjj generates a parser with a specified name Mainjava contains a call of the parser and calls to visitors builds a syntax tree for progf and executes the visitors jtb fortranjj javacc jtboutjj javac Main java java Main lt progf Example simplified JTB produces a syntaxtreenode class for Assignment public class Assignment implements Node PrimaryExpression f0 Assignmentoperator f1 Expression f2 public Assignment PrimaryExpression n0 Assignmentoperator n1 Expression n2 f0n0 f1 n1 f2n2 public void acceptvisitorVisitor v i v visitthis Notice the accept method it invokes the method visit for Assignment in the default visitor 147 Example simplified For example consider the Java 11 production void Assignment 0 PrimaryExpressionO Assignment peratoro Expression JT B produces Assignment Assignment 0 PrimaryExpression n0 0 gt1 0 m m m 0 Ex n0PrimaryExpressionO n1 sigment pera torO n2Expression return new Assignmentn0n1n2 Notice that the production returns a syntax tree represented as an Assignment o is Examplequot Iquot The default visitor looks like this public class DepthFirstVisitor implements Visitor f0 gt PrimaryExpressionO f1 gt AssignmentUperator f2 gt Expression public void visitAssignment n nf0acceptthis nf1acceptthis nf2acceptthis Notice the body of the method which visits each of the three subtrees of the Assignment node Exam ple 39 Here is an example of a program which operates on syntax trees for Java 11 programs The program prints the righthand side of every assignment The entire program is six lines39 public class Vprinussignmis extends DepthFirstVisitor void visitCAssignment n Wrettyprinter v new VPrettyPrinterO nf2acceptv mounprintlno nf2acceptthis When this visitor is passed to the root of the syntax tree the depthfirst traversal will begin and when Assignment nodes are reached the method visit in VprintAssignBHS is executed Notice the use of VPrettyPrinter It is a visitor which pretty prints Java 11 programs JTB is bootstrapped Semantic Analysis as discovered by the parser Semantic routines interpret meaning ofthe program based on its syntactic structure a two purposes finish analysis by deriving contextsensitive information begin synthesis by generating the IR or target code associated with individual productions of a contat free grammar or subtrees of a syntax tree H oswng ror personal or classroom use rs granreo wrmourree provided mar copes are normaoe or orsmpureo ror p u u To copy mm W m nm p mnn Hm res Request permission to puprsn rrom noskmgg puroue eou Chapter 6 Semantic Analysis Contextsensitive analysis What contextsensitive questions might the compiler ask Is 1 scalar an array or a function Is 1 declared before it is used Are any names declared but not used Which declaration ofx does this reference Is an expression typeconsistent Does the dimension ofa reference match the declaration Where can x be stored heap stack Does 5 reference the result of a malloc Is 1 defined before it is used Is an array reference in bounds Does function too produce a constant value NT PFOQDNQQwat Can p be implemented as a memofunction These cannot be answered with a contextfree grammar Contextsensitive analysis Why is contextsensitive analysis hard answers depend on values not syntax questions and answers involve nonlocal information answers may involve computation Several alternatives abstract syntax tree specify nonlocal computations attribute grammars automatic evaluators symbol tables central store for facts express checking code language design simplify language avoid problems Symbol table information What kind ofinformation might the compiler need a textual name a data type a dimension information for aggregates declaring procedure a lexical level of declaration storage class base address offset in storage a if record pointer to structure table a if parameter by reference or byvalue can it be aliased to what other names a number and type of arguments to functions Symbol tables For compiletime efficiency compilers often use a symbol table associates lexical names symbols with their attributes What items should be entered a variable names a defined constants a procedure and function names a literal constants and strings source text labels compilergenerated temporaries we ll get there r forquot 39 r em 439 L A symbol table is a compiletime structure Nested scopes blockstructured symbol tables What information is needed a when we ask about a name we want the most recent declaration the declaration may be from the current scope or some enclosing innermost scope overrides declarations from outer scopes Key point new declarations usually occur only in current scope What operations do we need a void put Symbol key Object value binds key to value I Obj ect get Symbol key returns value bound to key a void beginScopeO remembers current state of table a void endScopeO restores table to state at most recent scope that has not been ended May need to preserve list of locals for the debugger Attribute information Attributes are internal representation of declarations Symbol table associates names with attributes Names may have different attributes depending on their meaning variables type procedure level frame offset types type descriptor data sizyalignment constants type value procedures formals namestypes result type block information lo cal decls frame size Type descriptors Type descriptors are compiletime structures representing type expres sions eg char x char lpointerliriteger gtlt pointer or X pointer char char integer char integer Type expressions Type expressions are a textual representation for types 1 basic types boolean char integer real etc 2 type names 3 Luquot uuusu r Frquot 4mr r a arrayIT denotes array of elements type T index typeI eg array110integer b T1 x T2 denotes Cartesian product of type apressions T1 and T2 c records fields have names eg recorda x integer b x real d pointerlT denotes the type pointer to object oftype Tquot e D gtR denotes type offunction mapping domain D to rangeR eg integer x integer l integer Type compatibility Type checking needs to determine type equivalence Two approaches Name equivalence each type name is a distinct type Structural equivalence two types are equivalent iff they have the same structure a er substituting type expressions for type name a s t iff s and t are the same basic types awry5152 2 Wrt1tz iff 51 Etl and 52 212 pointed pointerlt iff s t 51522t14rt2i 512t1andszztz Type compatibility example Consider type link Teen var next link last link p Tcell q r Tcell Under name equivalence next and last have the same type a p q and 3 have the same type a p and next have different type Under structural equivalence all variables have the same type AdaPascalModulaZlTiger are somewhat confusing they treat distinct type definitions as distinct types so p has different type 39om q and 3 Type compatibility Pascalstyle name equivalence Build com piletime structure called a type graph each constructor or basic type creates a node a each name creates a leaf associated with the types descriptor next last I q r l x r i link pointer pointer pointer cell Type expressions are equivalent if they are represented by the same node in the grap Type compatibility recursive types Type compatibility recursive types Consider type link Teen cell record info Integer next 11 en We may want to eliminate the names from the type graph Eliminating name link from type graph for record cell record info integer next pointer cell Allowing cycles in the type graph eliminates cell cell record info integer next pointer Tiaer IR trees iNST integer constant i NAME Symbolic constantn a code label 71 TEMP Temporary one ofany ndmoerot registers t 8 Aooiication ofbinary operatora 0682 PLUS MWUS MUL DW AND OR x oitWise logical Chapter 7 Translation and Simplification EEEHSH FT arithm l L AiZ hff l to integer operands a MW Contents ota Word of rnernory starting at address 9 0 LL A 7 i M fei en ESEO Expression sequence evaluate s ior siderenects then 9 for result Se H n kinn tor personai or ciassroom use is granted Witnoutree provided mat copies are notmade or distrinuted tor protor To copy otnervwse torepuniisn to pastai servers ortoredistnnuteto iists requires priorspeciic permission andor tee Request permission to puniisn from noskmgg purdue eclu 165 166 Tiuer IR trees A TEMP e Evaluateeintoternporaryt K39nds 0f expresswns MOVE Expression kinds indicate how expression might be usedquot MEM 9 Evaluate ei yieiding address ai as into Word ata Exexp apressions that compute a value a Nxstm statements expressions that compute no value EiXP Evaluate e and discard resuit Cx conditionals jump to true and false destinations e Releop left right T t t i t dd 1 s rang eroom r0 0 a regs 9 quot lfThenElseExp expressionstatement depending on use are aii possioie values fore valuate a lnen 92 yieiding a and bi respectively compare a With 1 Conversion operators allow use of one form in context of another do 921 uggg Legationai operatoro i Signed and uriSig ed Weger unEx convert to tree expression that computes value ofinner tree LT GT LE GE si ned gtULT ULE UGT UGE unsigned uan convert to tree statement that computes Inner tree but returns no umptot iftl ue flffalse EO 39 SA Statement Mowed by uant f convert to statement that evaluates Inner tree and branches to a true destination if nonzero false destination otherWIse LA iiEL Denne constant value of name 7 as current code address NAMEin can be used as target otiumos caiis etc m7 iea Translating Tiger Simple variables fetch with a MEM M M BI OP ExMEMTEMP fpCONST k PLUS TEMP fp CONST k where fp is home frame of variable found by following static links k is offset ofvariable in that level Tiger array variables Tiger arrays are pointers to array base so fetch with a MEM like any other variable ExMEMTEMP rp CONST 1 Thus for eij ExMEMeunEx xiunEx CONST w i is index expression and w is word size all values are wordsized scalar in Tiger Note must first check array index i lt sizee runtime will put size in word preceding array base Control structures Basic blocks a a sequence of straightline code a if one instruction executes then they all execute a a maximal sequence ofinstructions without branches a label starts a new basic block Overview of control structure translation control flow links up the basic blocks ideas are simple implementation requires bookkeeping some care is needed for good code Translatina Tiaer Tiger record variables Agairi records are poiriters to record base so fetch like other variables Fore ExMEM t e Lli iEx CONST a Where a is the byte offset of the field i iri the record Note rnust check record ooihteris l iOl H ill l e J iOl hZel O String iiterais statically allocated so lost use the sthhg s iaoei ExNAMEabel Where the literal Will be emitted as word 11 label ascu quothello worldquot Record creation e f eh ef 9 in the prereraoiycc omeao rirst aiiocate e space theh ihitiaiize it Ex ESEOSEOMOVETEMP h exterhaiCaii aiiocRecoro i CONST n SEOMOVEMEMTEMP rye uhEm SEO MOVEMEMTEMP h CONST 71 1w 9n uriEX TEMP r where w is the Word size Array creation ag e e ExexterrialCall iriitArray q ul iEX e2 ul iEXD while loops while c do 5 1 evaluate c 2 iffalsejump to next statement a er loop 3 iftrue fall into loop body 4 branch to top of loop eg if notc jump done S jump test done Nx SEQSEQSEQLABEL est cuanbody done SEQSEQLABEL body suan JUMPNAME zest LABEL done repeat e until 92 evaluatecomparebrarich at bottom of loop for loops for i 21 to 22 do 5 1 evaluate lower bound into indalt variable 2 evaluate upper bound into limit variable 3 ifindex gt limitjump to next statement after loop 4 fall through to loop body 5 increment 39ndex 6 if index 3 limitjump to top ofloop body t1 e 21 t2 e 22 iftl gt232 jump done body 5 231 Ht 1 iftl g t2 jump body done For break statements a when translating a loop push the dune label on some stack a break simplyjumps to label on top of stack a when done translating loop and its body pop the label Com parisons Translate a op b as Releop aunEx bunEx When used as a conditional uantf yields CJUMPop aunEx bunEx t wheret and f are labels When used as a value unEx yields ESEQSEQMOVETEMP r CONST1 SEQuant f SE LABEL f SEQMOVETEMP r CONST 0 LABEL o TEMP r Function calls ew enl ExCALLNAME labsf slel en where 5 is the static link for the callee f found by following n static links from the caller n being the difference between the levels of the caller and the callee Conditionals Th k u 5 B I ifexpressions in Tiger abstract syntax egxlt 5 ampagtbturns into ifxlt 5then agtb else 0 Translate ifel then 22 else 23 into lfThenElseExpel 22 23 used as a value unEx yields ESEQSEQSEQ21 unQt f SEQSEQLABEL t SEQMOVETEMP r ezunEx JUMP join SEQLABEL f SEQMOVETEMP r e3unEx JUMPjoin LABEL join TEMP r As a conditional uantf yields SEQeluanttff SEQSEQLABELtt ezuant f SEQLABEL ff euant Conditionals Example Applying uanLf to if lt 5 then agt b else 0 SEQCJUMPLTxunEX CONST 5 tt SEQSEQLABEL tt CJUMPGT aunEX bunEX t f SEQLABEL ff JUMP f or more optimally SEQCJUMPLT XunEX CONST 5 tt j SEQLABEL tt CJUMPGT aunEX buneX t m Multidimensional arrays Array allocation constant bounds allocate in static area stack or heap no runtime descriptor is needed dynamic arrays bounds fixed at runtime allocate in stack or heap descriptor is needed dynamic arrays bounds can change at runtime allocate in hea descriptor is needed Onedimensional fixed arrays var a ARRAY 2 5 of integer ae translates to MEMTEMP fp CONST k Zw XCONST w e unEX where k is offset of static array from fp w is word size lnPa cal quot soAij is equivalent to Aii so can translate as above Multidimensional arrays Array layout Contiguous 1 Rowmajor Rightmost subscript varies most quickly A11 A12 A21 A22 Used in PL1 Algol Pascal C Ada Modula 3 2 Column major Le most subscript varies most quickly A11 A21 A12 A22 Used in FORTRAN By vectors u iguw mur of 39 r L 39quotm arrays rowmajorlayout array 1IJ1M of T array 1N of array 1P i of T no of elt s in dimension j UFLj1 position ofAi1 in nl in71 Ln719Dn in72 Ln72DnDnrl in 7L i1 L1D DZ which can be rewritten as variable part WW L1 2 Dquot LZD3ADn Lnern Lnl constant part address ofAi1 Hy in addressA variable part constant part x element size case statements caseEofV1Sl VSend One translation approach t expr jump test L1 code for S1 jump next L2 code for S2 jump next Ln code for Squot jump next test ift V1 jump L1 if t V2 jump Lg if t Vquot jum Ln code to raise runIime exception next case statements caseEofVl 51 V Squot end 1 evaluate the expression 2 find value in case list equal to value of expression 3 execute statement associated with value found 4 jump to next statement a er case Key issue finding the right case a sequence of conditional jumps small case set 0lcasesl binary search of an ordered jump table sparse case set 0logz lcasesl hash table dense case set 01 Simplification Goal 1 No SEQ or ESEQ Goal 2 CALL can only be subtree of EXP or MOVETEMP t Transformations li ESEQs up tree until they can become SEQs turn SEQs into linear list ESEOm ESEOQ L 9 ESEOSEOS1 32 e BlNOPwp ESEOQ e 92 ESEOQ BlNOPwp 91 92 MEMESEOamp ESEOampMEM JUMPESEO 81 SEO JUMPej CJUMPWQ VSEOQ CJUMPUD QL 2 2 F FOl ll ti 1 l woman 91 ESEom e2 ESEOMOVETEMPL91 an BNOPop TEMPLeQ mummy saw OVETEMP L 9 e ESEom 9 l 12 SEO CJUMPWpIEMPLeLt1 t2 S fVMOVHESEou el 9 EO MOVE9 e ALL 4 TEMPt Register allocation io 9 lns tru lun selection rnaonine Code errors Register allocation have value in a register when used Chapter 8 Liveness Analysis and Register Allocation limited resources changes instruction choices can move loads and stores optimal allocation is difficult gt NP com plete for k 21 registers Cupyright c 2EEE by Arituny L Husking Permission to make oigitai or naro copies or part or all orrnis Wu o o ai or oisrriouteo ror roit or u t rirst page To copy omervwse to repuoiisn to post on servers or to reoisrrioute to lists requires prior sperm permission oorree Request permission to pooisn from noskmgg puroue eclu ias iae Liveness analysis Control flow analysis Problem Before performing liveness analysis need to understand the control flow IR contains an unbounded number oftemporaries by bunding a control owgraph CFG machine has bounded number of registers n nodes may be individual program statements or basic blocks Approach edges represent potential flow of control temporaries with disjoint live ranges can map to same register Outedges from node n lead to successor nodes succn if not enough registers then spill some temporaries 7395 1955to quotOde 7 come from P EdECESSUF quotOdesi Wailquot ie keep them in memory The compiler must perform liveness analysis for each temporary Example It is live if it holds a value that may be needed in future a 0 z L1 b n1 c s c b 2 if a lt N goto L1 return c Liveness analysis Gathering liveness information is a form of data ow analysis operating over the CFG liveness of variables flows around the edges ofthe graph assignments de ne a variable v de v set of graph nodes that de ne v de n set ofvan39ables de ned by n occurrences ofv in expressions use it useiv set of nodes that use v 39 set ofvariables used in n Liveness v is live on edge e if there is a directed path from 2 to a use ofv that does not pass through any defi v v is livein at node n if live on any ofn s inedges v is liveout at n iflive on any of 11 s outedges v usen 2 v livein at n v livein at n 2 v liveout at all m predn v liveout at mv de n gt v livein at n Iterative solution for liveness foreachn innp 1 autn 1 repeat foreach n in39n inn aut39n autn in N menuautn defn autn ins until in 11 Notes a should order computation ofinner loop to follow the flow liveness flows backward along controlflow arcs from out to in n nodes can just as easily be basic blocks to reduce CFG size hm a I t 4 r Y along the way Liveness analysis variables livein at 11 variables liveout at n in resuch Succ n 1 2 ouIn Note inn Q usen inn Q ouIn de n usen and de n are constant independent of control flow Now v inn iff v usen orv ouIn de n Thus inquot usenu ouIn de nj Iterative solution for liveness Complexity for input program of sizeN gNnodes in CFG 25 N variables 2 Nelements per inou 2 0N time per setunion for loop performs constant number of set operations per node 2 0NZ time for for loop a each iteration of repeat loop can only add to each set sets can contain at most every variable 2 sizes of all in and out sets sum to 2N2 bounding the number of iterations ofthe repeat loop 2 worstcase complexity of OW ordering can cut repeat loop down to 23 iterations 2 0N or 0Nzl in practice Iterative solution for liveness Least fixed points There is o en more than one solution for a given dataflow problem see example Any solution to dataflow equations is a conservative approximation v has some later use downstream from n gt v C ouKn but not the converse 39 g 39 uie piugianijust means more registers may be needed Assuming a variable is dead when it is really live will break things May be many possible solutions but want the smallest the least fixpoint The iterative liveness computation computes this least fixpoint Register allocation by simplification 1 Build interference graph G for each program point a compute set of temporaries simultaneously live b add edge to graph for each pair in set 2 Simplify Color graph using a simple heuristic a suppose G has node m with degree lt K b ifG G m can be colored then so can G since nodes adjacent to 111 have at most K 1 colors c each such simplification will reduce degree of remaining nodes leading to more opportunity for simplification d leads to recursive coloring algorithm 3 Spill suppose j m ofdegree lt K a target some node temporary for spilling optimistically spilling node will allow coloring of remaining nodes b remove and continue simplifying Register allocation ip 9 lns tru lun selection errors rnacnine code Register allocation have value in a register when used a limited resources changes instruction choices a can move loads and stores optimal allocation is difficult gt NP com plete for k 21 registers Cupyright c 2EEE by Arituny L Husking Permission to make oigirai or naro copies or part or all orrnis wo o o 5 or distributed ror HJlt or u t r I copy ornervwse to repuoiisn to post on servers or to reoisrrioure a o Re u s W irstpage o o iists requires prior speoiio permission orree q esrperniission to puoiisn from noskmgg p e eclu t till 194 Register allocation by simplification continued 4 Select assign colors to nodes a start with empty graph b if adding nonspill node there must be a color for it as that was the basis for its removal c if adding a spill node and no color available neighbors already K colored then mark as an actual spill d repeat select 5 Start over it select has no actual spills then finished otherwise a rewrite program to fetch actual spills before each use and store a er each definition b recalculate liveness and repeat Coalescing Can delete a move instruction when sources and destination ddo not interfere w quot quot 39 those of s and d a In principle any pair ofnoninterfering nodes can be coalesced unfortunately the union is more constrained and new graph may no longer be K colorable overly aggressive Conservative coalescing Apply tests for coalescing that preserve colorability Suppose a and b are candidates for coalescing into node ab Briggs coalesce only ifab has lt K neighbors of signi cant degree 2 K simplify will first remove all insignificantdegree neighbors ab will then be adjacent to lt K neighbors simplify can then remove ab George 39 39 quot all 39g quot39 fere with b simplify can remove all insignificantdegree neighbors ofa remaining significantdegree neighbors ofa already interfere with b so coalescing does not increase the degree of any node Simplification with aggressive coalescing build aggressive coalesce Iterated register coalescing l mm l u u 2 Srmplrrv remove nonemoveerelated nodes of low degree one at a tlme a Coalesce conservatlvelv coalesce moverrelated nodes remove assoclated move lnstructlon o rt resultlng node ls nonrmoverrelated lt can now be slmollned 4 Freeze lfurlable to slmollrv orcoalesce a look formoverrelated node oflovvrdegree o freeze lts assoclated moves glve up hope orcoalesclng tnem c rlovv39 39 p 5 Splll lfl lo low degree nodes a select candldate forspllllrlg o remove to stack and contlrlue slmolllvlng 6 Select pop stack asslgmng colors lrlcludlrlg actual sollls 7 startover lfselecthas no actual solllstnen nmsned otnerWlse a rewrlte code to fetch actual sollls before eacn use and store aftereach deflrlltlorl o recalculate llveness and repeat ZED Iterated register coalescing Precolored nodes a t Lt t stack pointer ar guments return address return value a select and coalesce can give an ordinary temporary the same color as a precolored register ifthey don t interfere eg 39 be reused39 39 a tempo simplify freeze and spill cannot be performed on them s also precolored nodes interfere with other precolored nodes 80 treat precolored nodes as having infinite degree coalescing can use the George criterion Spilling Spills require repeating build and simplify on the whole program a To avoid increasing number of spills in future rounds of build can sim ply discard coalescences Alternatively preserve coalescences from before first potential spill discard those alter that point since unlike registers there is no limit on the number of stackframe loca tions Temporary copies of machine registers Since precolored nodes don t spill their live ranges must be kept short use move instructions N move callee save registers to fresh temporaries on procedure entry and back on exit spilling between as necessary registerpressure will spill the fresh temporaries as necessary other wise they can be coalesced with their precolored counterpart and the moves deleted 9 Callersave and calleesave registers Variables whose live ranges span calls should go to calleesave registers otherwise to callersave This is easy for graph coloring allocation with spilling calls interfere with callersave registers a a CrDSS ua 39 L39 with u I 1 u t as well as with the fresh temporaries created for callee save copies forcing a spill choose nodes with high degree but few uses to spill the fresh callee save temporary instead ofthe crosscall variable a this makes the original calleesave register available for coloring the crosscall variable Example cont Interference graph No opportunity for simplify or freeze all nonprecolored nodes have significant degree 2 K a Any coalesce will produce a new node adjacent to 2 K significant degree nodes Must spill based on priorities Node usesdefs usesdefs degree priority ou side loop inside loop 2 10gtlt 0 39 a l 4 050 b lt 1 10gtlt 1 3 4 275 c l 2 10x 0 I 6 033 a lt 2 10gtlt 2 4 550 1 10 3 1030 e x a Node c has lowest priority so spill it Example enter 0 3 a 39 1 b r2 1 0 e a loop 1 39 d b e 39 e 1 if e gt 0 goto loop 1 r3 0 return r1 3 live out Temporaries are a b c d e Assumetarget machinewith K 3 registers r1 r2 callersaveargumentresul r3 callee save h generator has already made arrangements to save 3 ex plicitly by copying into temporary a and back again we Example cont Interference graph with c removed a Only possibility is to coalesce a and e as will have lt K significant degree neighbors a er coalescing 1 will be lowdegree though high degree before Example cont Can now coalesce b with 12 or coalesce as and 1 a Example cont enter 1 7 Mcloc cl b 1 if e gt 0 goto loop 1 c 10 c return r1 r3 live out Example cont Cannot coalesce rlae with d because the move is constrained the nodes interfere Must simpliy d m a Graph now has only precolored nodes so pop nodes from stack col oring along the way d r3 a b e have colors by coalescing c must spill since no color can be found for it Introduce new temporaries c1 and oz for each usedef add loads be fore each use and stores alter each def Example cont New interference graph a As before coalesce awith 9 then b with r2 m 2 Example cont Example cont As before coalesce ae with 11 and simplify d Rewrite the program with this assignment dd 2 enter 12b 3 r3 Mcloc IS quota r1 r1 Pop 1 from stack select 3 All other nodes were coalesced or precol r2 2 r2 ored So the coloring is 3 2 0 a I r1 r1 r1 b 2 loop cr3 r2 r3r2 dr3 r1r11 331 if r1 gt 0 goto loop r1 2 r3 r3 Mc1oc r3 39 3 return r1 r3 live out 213 214 Example cant I Delete moves with source and destination the same coalesced enter Mcloc r3 r3 0 loop r2 2 r3 r2 r1 2 r1 1 if II gt 0 goto loop Chapter 9 Activation Records r1 r3 r3 Mc1oc return r1 r3 live out a One uncoalesced move remains The procedure abstraction The procedure abstraction The essentials Separate compilation a on entry establish p s environment allows us to build large programs a at a call preserve p s envtronment keeps compile times reasonable Y a on exit tear down p s envtronment requires independent procedures o In between addressability and proper lifetimes The linkage convention procedure F a social contract a machine dependent a division of responsibility The linkage convention ensures that procedures inherit a valid runtime environment and that they restore one for their parents Linkages execute at run time Code to make the linkage is generated at compile time H usklng ror personal or classroom use ls granted wrtnoutree orovloeo that cooles are notrnaoe or olstnouteo ror p u u To copy om w m nth n mnn Mn ree PequestJermlsslon to ouollsn from hosklngg ouroue eou Each system has a standard linkage 217 218 Procedure in kages Procedu re linkages quotgle39add esses The linkage divides responsibility between caller and callee argumemn E 3 Caller Gallee z e 5 E s E is mm 3 Call pre call prologue w Wm l allocate oaslc trarne l save reglsters state Wquot W 2 evaluate a store oararns 2 store rn dvnarnlc lll lk Assume that each procedure activation has variables 3 store return address 3 set new FF t d I I d f I 4 lump to cnlld 4 store statlc lan an assocla e ac lva Ion recur or rame a W 5 extend be frame run time n for local data tempamvles E 6 lnltlallze locals Assumpt39ons39 3 7 fallthroughto code RlSC architecture evedlems can always expand an allocated block immm Rem pusma my locals stored in ame gt coov return value l store return value a 2 deallocate oaslc trarne 2 restore state E Swag 3 restore parameters 3 cut back to oaslc trarne p3quot We ll coov out 4 restore parent s rn i lump to return address g Wmmm At compile time generate the code to do this At run time that code manipulates the frame a data areas Runtime storage organization To maintain the illusion of procedures the compiler can adopt some con ventions to govern memory use Code space a xed size statically allocated link lime Data space fixedsized data may be statically allocated variablesized data must be dynamically allocated a some data is dynamically allocated in code Control stack a dynamic slice of activation tree a return addresses a may be implemented in hardware Runtime storage organization Where do local variables go When can we allocate them on a stack Key issue is lifetime ufloca names Downward exposure called procedures may reference my variables a dynamic scoping ladcal scoping Upward exposure a can I return a reference to my variables functions that return functions continuationpassing style Mth only downward exposure the compiler can allocate the frames on the runtime call stack Runtime storage organization Typical memory layout high address free memory heap static data low address The classical scheme allows both stack and heap maximal freedom code and static data may be separate or intermingled Storage classes Each variable must be assigned a storage class base address Static variables addresses compiled into code relucalable usually allocated at compiletime limited to fixed size objects a control access with naming scheme Global variables almost identical to static variables a layout may be important exposed naming scheme ensures universal access Link editor must handle duplicate definitions Storage classes cont Procedure local variables Put them on the stack a if sizes are fixed iflifetimes are limited ifvalues are not preserved Dynamically allocated variables Must be treated differently callby reference pointers lead to nonlocal lifetimes usually an explicit allocation explicit or implicit deallocation Access to nonlocal data Two important problems arise How do we map a name into a leveloffset pair Use a blockstructured symbol table remember last lecture look up a name want its most recent declaration declaration may be at current level or any lower level Given a Ieveloffset pair what s the address Two classic approaches access links or static links displays Access to nonlocal data How does the code find nonlocal data at runtime Real globals visible everywhere naming convention gives an address initialization requires cooperation Lexical nesting view variables as leveloffset pairs chain of nonlocal access links a more expensive to find at runtim e Access to nonlocal data To find the value specified by La need current procedure level k kl gt local value a k gt1 gt nd is activation record a k lt1 cannot occur Maintaining access links static links a calling level k1 procedure 1 pass my FP as access link 2 my backward chain will work for lower levels a calling procedure at level 1 lt k 1 find link to level 1 1 and pass it its access link will work for lower levels compiletime The display To improve runtime access costs use a display a table of access links for lower levels a lookup is index from known offset takes slight amount oftime at call a a single display or one per frame a for level k procedure need k l slots Access with the display assume a value described by La nd slot as displaij add offset to pointer 39om slot displayUHaD Setting up the basic framequot now includes display manipulation Calllretu rn Assuming callee saves caller pushes space for return value caller pushes SP caller pushes space for return address static chain saved registers caller evaluates and pushes actuals onto stack caller sets return address callee s static chain performs call callee saves registers in registersave area uals callee allocates dynamic arrays as needed 0 N97 91quot PN on return callee restores saved registers jumps to return address 00 Caller must allocate much of stack frame because it computes the actual parameters Alternative is to put actuals below callee s stack frame in caller s common when hardware supports stack management eg VAX callee copies byvalue arraysrecords using addresses passed as ac Calls39 Saving and restoring registers callee saves callers registers callees registers all registers 1 3 5 caller saves 2 4 6 l 9 4s Call includes bitmap ofcaller s registers to be sayedrestored bestWitn sayerestore instructions to interpret bitmap directly rs s Callersayes and restores its own re i e o exceptlon5create some problems since uted g st Unstructured returns e g rlorlalocal got ore must be located and exec Backpatch code to saye registers used in callee on entry restore on exit e g i for Nonalocal gotos and exceptions must ul lWll ld dynamic cnain restoring calleeasaved registers Bitmap in callee s stack frame is used by callerto sayerestore bestWitn sayerestore instructions to interpret bitmap directly Ul lWll ld dynamic cnain as or Easy Nonalocal gotos and exceptions must restore all registers trom outermost callee Easy use utility routine to keep calls compact Nonalocal gotos and exceptions need only restore original registers trom caller Topaleft is best sayes fewerreglsters compact calling sequences MIPS procedure call convention Registers MIPS procedure call convention Philosophy Use full general calling sequence onlywhen necessary omit por tions of it where possible eg avoid using fp register whenever possible Classify routines as nonleaf routines routines that call other routines leaf routines routines that do not them selves call other routines leaf routines that require stack storage for locals leaf routines that do not require stack storage for locals MIPS procedure call convention Precall 0N Pass arguments use registers a pushed on the stack along with save space for at a3 Save callersaved registers if necessary Execute a jail instruction jumps to target address callee s first in struction saves return address in register ra 0 a3 remaining arguments are MIPS procedure call convention The stack frame argumentn virtual static link locals temporaries eZISeuJZJI argument build luwmemury MIPS procedure call convention Prologue 1 Leafprocedures that use the stack and nonleafprocedures a Allocate all stack space needed by routine b a sufficient space for arguments to routines called by this routine subu spframesize Save registers ra etc sw 31 framesizeframeoffset31 SH 17framesizeframeoffset431 sw 16framesizeframeoffset831 time constants 2 Emit code for routine
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'