Algorithmic Language Compilers
Algorithmic Language Compilers CS 553
Popular in Course
verified elite notetaker
verified elite notetaker
Giulia Dias Roncoletta
verified elite notetaker
verified elite notetaker
BIOL 101 - 001
verified elite notetaker
Popular in ComputerScienence
This 143 page Class Notes was uploaded by Betty Kertzmann on Tuesday September 22, 2015. The Class Notes belongs to CS 553 at Colorado State University taught by Staff in Fall. Since its upload, it has received 10 views. For similar materials see /class/210193/cs-553-colorado-state-university in ComputerScienence at Colorado State University.
Reviews for Algorithmic Language Compilers
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/22/15
Some Thoughts on Grad School Goals 7 learn how to learn a subject in depth 7 learn how to organize a project execute it and write about it Iterate through the following 7 read the background material 7 try some examples 7 ask lots of questions 7 repeat You will have too much to do 7 learn to prioritize 7 it is not possible to read ALL of the background material 7 spend 2 hours of dedicated time EACH day on each classproject 7 have fun and learn a ton CS f Lecnu e Undeigi mhlaie Chum Undergraduate Compilers Review Announcements 7 Send me email if you have set up a group so we can set up a unix group Include account names for group 7 Each project is 10 of grade 7 No curve Today 7 Overall structure of a compiler 7 Lexical analysis scanning 7 Syntactic analysis parsing 7 Generating an AST 7 Semantic analysis type checking 7 Code generation 1355553 Lemme Undergrad Compilers Remain Structure of a Typical Inte r Compiler Analysis Synthesis character stream lexical analysis IR code generation syntactic analysis sentences semantic analysis annotated AST code generation target language 3353 Lecuu e Undergrad Cme pi 9139s Review 11 Structure of the MiniJava Compiler Analysis Synthesis character stream lexical analysis tokens words MIPS syntactic analysis AST semantic analysis AST and symbol table PA2 MiniJava and MIPS warmup sentences PA3 constant expression compiler PA4 add assignments and pl39llltlll PAS add control ow PA6 add functions PA7 add arrays and classes Lexical Analysis Scanning Break character stream into tokens words 7 Tokens lexernes and patterns 7 Lexical analyzers are usually automatically generated from patterns regular expressions eg lex Examples token lexemes pattern awn const const f if if mh um ltlt lt lt kn er fooindex a zA Za zA Z0 9 number 314159570 0 9 0 9 0 9 String hi I I mom 1 const pi 3 14159 const identifierpi assignnumber314159 CS f Lecuu e Undergrad Compilers Re ex 7 Specifying Tokens with JLex JLex example input file LETTERA Za z PGCkGQe mjparser DIGIT0 9 import javacup runtime Symbol UNDERSCOREHH LETTDIGUNDLETTER DIGIT UNDERSCORE 96 IDLETTERLETTDIGUND line 23 sgpubuc quotampampquot return new SymbolsymAND new TokenValueyytext yyline yychar eofval urn new SymbolsymEOF new at u u TokenValueE0F yyhne boolean return new SymbolsymBOOLEAN eofval ID return new SymbolsymID new a Lecim e h JLezJ 7 Interaction Between Scanning and Parsing lexer nextitokenO parse tree character stream orAST gt Lexical Parser 39 analyzer gt token 38553 Lecture Undexgl39anl Compilers Review Recognizing Tokens With DFAs l f 391f 39 1 gt 4 gt 5 r 1f letter or digit letter letter letter digit 1 tiid Ambiguity due to matching substrings 7 Longest match 7 Rule priority C8553 Lecture Undexgran Compilem Remew Syntactic Analysis Parsing Impose structure on token stream 7 Limited to syntactic structure gt highlevel 7 Parser often generates abstract syntax tree AST 7 Parsers are usually automatically generated from contextfree grammars e g yacc bison cup javacc sablecc Example f r 1 1 10 asg for11t010do ai X 5 arr tms a 1 x 5 for id equal number t0 number10 do ida bracket 1511 rbracket equal idX times number5 semi 3353 Lecuu e Undergrad Compilers Re ex 10 BottomUp Parsing ShiftReduce Grammer abc lS gtE s gtE 2E gtET gtET 3E gtT gtEid 4T gtid gtETid gtE id id gt T id id gt id id id Rightmost derivation expand rightmost nonterminals first SableCC yacc and bison generate shiftreduce parsers 7 LALR1 lookahead leftto right rightmost derivation in reverse 1 symbol lookahead 7 LALR is a parsing table construction method smaller tables than canonical LR Reference Barbara Ryder s 198 515 lecture notes 1255553 Lemme Undergrad Compilers Remain H LR Parse Table l s gt Actlon Goto 2 E gt State id S E T 3 E gt 0 S4 1 2 4 T gt 1 s3 accept 2 13 13 13 3 S4 5 4 I 4 I 4 I 4 5 12 12 12 Look at current state and input symbol to get action shiftn advance input push n on stack reducek pop rhs of grammar rule k k lhs gt rhs look up state on top of stack and lhs for goto 11 pushn accept stop and success error stop and fail CS f53 Lecture Undergrad Compilers Re ex ShiftReduce Parsing Example Stack Input Action 1 s gt so abc shift4 2E39gt o 4 b d 4 3E gt a c reuce 4Tgt 0T2 bc reduce 3 0 E 1 b c shift 0E13 bc shift 0E13b4c reduce4 0E13391395 c reduce2 0 E 1 c shift 0 E 1 3 c shift 0E13c4 reduce4 0E13391395 reduce2 0 E 1 accept Reference Barbara Ryder 198 515 lecture notes 1355553 Lemme Undergrad Compilers Remain Subset 0f MiniJava Expression Grammar S b t Expression 1 u 36 Expression quotquot quot quot quotquot Expression ltINTEGERLITERALgt quotquot Expression quotquot Full Expression Grammar Expression 1 Expression quotampampquot quotltquot quotquot quot quot quotquot Expression Expression quotEquot Expression quotJquot Expression quotquot quotlengthquot I Expression quotquot Identifier quotquot Expression quotquot Expression quotquot ltINTEGERLITERALgt quottruequot quotfalsequot I Identifier quotthisquot quotnewquot quotintquot quotEquot Expression quotJquot quotnewquot Identifier quot quotquot quotIquot Expression quotquot Expression quotquot Expression Grammar and AST Node Hierarchy JavaCUP specification of part of expression grammar Expression 1 Expression quot quotquot Expression ltINTEGERLITERALgt quotquot Expression quotquot N ode A Ex AP Token MinusExp PlusExp MulExp ntegerExp Syntaxdirected Translation AST Construction example Grammer with production rules s E 1 E E T new node quot 1 3 I T 1 i T TID new leaf idquot 1 Implicit parse tree for abc AST for abc S E E T c E T T ID b T TID a TID C b a Referem e Barbara Ryder 198 515 lecture notes 38553 Lecture Undexgl39anl Compilers Review 15 Using JavaCUP to specify grammar and generate AST Show srcmj parsermjastcup C8553 Lecture Undexgran Compilem Rewew l J Visitor Design Pattern Situation 7 Want to perform some processing on all items in a data structure 7 Will be adding many different ways to process items different features 7 Will not be changing the classes of the data structure itself much Possibilities 7 For each functionality add a method to all of the classes 7 Each new functionality is spread over multiple les 7 Sometimes can t add methods to existing class hierarchy 7 Use a large ifthenelse statement in Visit method 7 pro keeps all the code for the feature in one place 7 con can be costly and involve lots of casting 7 Visitor design pattern Borrowed SableCC Visitor Design Pattern dstrootdpplynew EvalVisitordstroot 2 in class MulExp public void dpplySwitch sw Andlysis swcaseMulExpthis in class DepthFirstAddpter public void inMulExpMulExp node defaultInnode public void outMulExpMulExp node defaultOutnode public void caseMulExpMulExp node inMulExpnode ifnodegetLExp null nodegetLExpdpplythis ifnodegetRExp null nodegetRExpdpplythis outMulExpnode in class EvalVisitor public void outMulExpMulExp node stackpushstdckpopstdckpop ifnoderoot Systemoutprintlnstdckpop SymTable Scope and STE classes used in BuildSymTab outpuanK l lnserwmpusnsmpemalned eapesm pushScopeSumg pachopeo Scope Stack and Tree 39 mEnnlasmg mElmlnswg t M hudS39IE nlsl namve Type implementation in the MiniJava compiler pubilc class Type pubilc statlc nal Type ARRAY new Typeo publlc statlc nal Type EUUL new TypeO pubilc statlc nal Type INT new Typeo class type nap key class name yalae type pr lvate statlc nal HasnMapltsmna Typegt classTypes new HashMapltStrlng TypegtO Only one instance ortne type object per atomic type and class type 7 to detennine iftypes are equal just compare references 7 Type class does not know about inheritance MiniJava Types for Example m sxauav Mei Type ARRAY TyDeBDOL Type INT Type classTypes Code Generation Conceptually easy 7 The visitor that builds the symbol table srcastJisitoIsBuildSymTable allocates space for vaIiables on stack or m heapallocated obje t 7 Visitor over AST will genemte MIPS code The source of heroic effort on modern architectures 7 Instmction scheduling for LP 7 Register allocation 7 Alias analysis 7 More later Pat t amp Pattel Book Stackframe for MIPS example 1nt foo1nt x1nt ylne 2 tzext 1m 3 51 M s A e x x a x p sp e e e x y z suur s s return e 5P 5P f x e 2 vold helm ll sea 2 1m x w sea 05fp x 7 7 printf dnquot ooA5hx 4 push tzext ll sea 5 fun suul sp sp prologue sw sea Osp uu sp sp 4 t xetval spsee r push suur ssp ssp 74 4 push rs ll sun 4 SW srs OSp suur sp s p suur sp sp 4 t push fp sw sea oltssp s srp Osp t 331 ifoo suur srp sp 4 t set up hew F t grab xetval suur sp sp 4 t luesl s 1w oltssp t pup xetval amp pereus sea 2fp t return result suur sp 5 16 suur sp sp t uesllue t prlhc sea w srp Osp t pup fp suur sp sp t ml was 1w srs Osp t pup rs ll sva la suur sp sp SQ sysesll r 3r Pat t and Patel book calling convention for MIP Calling convention contract between caller and callee 7 caller should push parameters right to left onto the stack er quot quot39quotpu1quot39 7 n v5hich should be followed by the first parameter 7 sp must be divisible by 4 for MIPS 7 sp should always be pointing at the top entry on the stack Standardizing the stack frame implementation for this course 7 ra and 117 should be stored on top of the return Value slot 7 locals should be stored on top of ra and fp 7 fp should be made to point at the first local Variable so that the address for the first local is fp0 the address for the second local is fp4 7 The o sets for the incoming parameters will differ based on whether there is a retum Value If there is a return Value then the first parameter will be at fp16 quot m If 39 then will Value etc be at fp12 the second at fp16 etc Structure of the MiniJava Compiler Analysis Synthesis character stream v Y lexical analysis code gen tokens V words MIPS v syntactic analysis PA2 MiniJava and MIPS warmup AST sentences PA3 constant expression compiler V PA4 add assignments and println 56111311th analys1S PAS add control ow AST and symbol table PA6 add mom PA7 add arrays and classes Concepts Compilation stages in a compiler 7 Scanning parsing semantic analysis intermediate code generation optimization code generation Lexical analysis or scanning 7 Tools SableCC 1eX ex etc Syntactic analysis or parsing 7 Tools SableCC yacc bison etc 7 Generation of AST Semantic Analysis 7 symbol tables 7 using Visitors over the AST Code Generation to MIPS Parsing Terms 7 see attached slides be familiar with these terms 1355553 Lemme Undergrad Compilers Remain Next Time Suggested Exercises for concepts covered this time 7 from book 221 222 231 7 follow a while loop in MiniJava through to code gen 7 what does AST look like 7 what does IRT Tree look like 7 what is the MIPSnoreg code 7 how would we implement at do while loop Lecture 7 Compiling OOP CS Leanne Undergrmhlaie Comp Parsing Terms Top down parsing 7 LL1 lefttoright reading of tokens leftmost derivation 1 symbol lookahead 7 Predictive parser an ef cient nonbacktracking topdown parser that can handle LLl 7 More generally recursive descent parsing may involve backtracking Bottom up Parsing 7 LR1 lefttoright reading of tokens rightmost derivation in reverse 1 symbol lookahead 7 Shift reduce parsers for example bison yacc and SableCC generated parsers 7 Methods for producing an LR parsing table 7 SLR simple LR 7 Canonical LR most powerful 7 LALR1 38553 Lemme Undergrad Compiler s RSVch 2 Parsing Terms De nitely know these terms Lexical Analysis 7 longest match and rule priority 7 regular expressions 7 tokens CFG Contextfree Grammer 7 production rule 7 terminal 7 nonterminal Syntaxdirected translation 7 inherited attributes 7 synthesized attributes 8353 Leanne If nrlergrrtduate Comp ers Renew 30 Dynamic r Last time 7 Pro ling Today 7 Dynamic compilation mm mm 0mm thuwzahmzr Motivation Limitations of static analysis 7 Programs can have Values and invariants that are known at runtime but unknown at compile time Static compilers cannot exploit such Values or invariants 7 Many of the motivations for pro leguided optimizations apply here Basic idea 7 Perform translation at runtime when more information is knon 7 Traditionally two types oftranslations are done 7 Runtime code generation JlT compilers 7 Partial evaluation Staged compilation mam Outhmla Hus Partial Evaluation Basic idea 7 Take a general program and partially evaluate it producing a specialized program that s more ef cient eg fabc7gtf ab where the result has is third parameter hard coded into the implementation 1 is typically more ef cient than f z change during program execution eg writeonce Variables Exploiting runtime constants n J moving 7Eliminate memory ops computation from runtime to compile 7Remove branches 7Unroll loops use mm Dmazmquot opmwzams Applications with Runtime Constants Interpreters Program being interpreted is runtime constant Simulators Subject of simulation circuit cache network is runtime c onstant Graphics renderers The scene to render is mntime constant a r L Extensible OS kernels Extensions to the kernel can be runtime constant Examples 7 A cache simulator might take the line size as a parameter 7 A partially evaluated simulator might produce a faster simulator for the special case where the line size is A Lemur mam Qumrumms Partial Evaluation cont Active research area 7 Interesting theoretical results 7 Can partially evaluate an interpreter with respect to a program ie compile the program 15 Futamura proj ec ion 7 Can partially evaluate a partial evaluator with respect to an interpreter Le generate a compiler 2nd Futamura pro39ection 7 Can partially evaluate a partial evaluator with respect to a partial evaluator i e generate a compiler generator 3 3 Futamura projection 7 Most PE research focuses on functional languages 7 Key issue 7 When do we stop partially evaluating the code when there is iteration or recursion use mm Dwtazm39 Optrnuzalwns n Dynamic Compilation with DyC DyC Aus lander et a1 1996 7 Staged compilation 7 Apply ideas of Partial Evaluation 7 Perform some of the Partial Evaluation at runtime 7 Can handle more runtime constanm than Partial Evaluation 7 R 39 39scent of linktime reg39ster allocation in the sense that the tages emrnr compilation is performed in s Tradeo s 7 Must overcome the runtime cost ofthe dynamic compiler 7 Fast dynamic compilation low overhea 7 High quality dynamically generated code high bene t 7 Ideal dynamically translate code once execute this code many times 7 Implication don t dynamically translate every ing 7 Only perform dynamic translation where it will be pro table Essa Lecturt 0mm xmnumms Applying Dynamic Compilation System goal 7 Both fast dynamic compilation and high quality compiled code How do we know what will be pro table 7 Let user annotations guide the dynamic compilation process System design 7 Dynamic compilation for the C language 7 Declarative annotations 7 Identify pieces of code to dynamically compile dynamic regions 7 Identify source code Variables that will be constant during the execution of dynamic regions DW 3m Qpnnuzamns 35 mme Staged Compilation in DyC executable program template dynamic setup c0 compiler stitcher t j K J dynamic compile time annotated C code static compiler ti values static compile time 7Make the static compiler do as much work as possible 7Give the dynamic compiler as little work as possible Essa Lecturt 0mm xmnumms Dynamically Compiled Code Static compiler 7 Produces machine code templates in addition to normal mach code 7 Templates contain holes that will be filled with runtime const values 7 Generates setup code to compute the vals of these runtime consts 7 Together the template and setup code will replace the original dynamic reglon dynamic region entrance rst time dynamic region exit Dynmmc x Dmmzm ns The Dynamic Compiler The Stitcher 7 Follows directives which are produced by the static compiler to copy code templates and to fill in holes with appropriate constants 7 The resulting code becomes part of the executable code and is hopefully executed many times C3555 Let Dynmmt Upimnzaimns The Annotations cacheResult cacheLookup Void addr Cache cache d ami cRegionltcache L cache is a runtime constant t bloc Size cache7gtblockSize blockSize numLines add blockSize numLines setStructure tsetArra cache7gtline5line7gt5et5 e7gtassociativity unrolled for set0 5etltassoc 5et 39 setArray5etdynamic7gttag tag return CacheHit return CacheMiss39 end of dynamic region Dfrnumt t me The Annotations cacheResult cacheLookup Void addr Cache cache dynamicRegioncache L cache is a runtime constant dynamicRegioncache Tdentifies a block that will be dynamically compiled Its arguments are runtime constants Within the scope of the dynamic region The static compiler will compute additional runtime constants that are derived from this initial set r c Luiu pociicniL i eturn CacheMiss39 end of dynamic region Dfrnumt t r The Annotations cacheResult cacheLookup Void addr Cache cache dynamic Any type of data can be considered constant In particular contents of arrays and pointerbased structures are assumed to be runtime constant Whenever they are accessedby runtime I constant pornters To ensure that this assumption is correct users must insert the dynamic annotation to mark pointer refs that are not constant if setArray5etdynamic7gttag tag return CacheHit l return CacheMiSS l L end of dynamic region The Annotations unrolled Directs the compiler to completely unroll a loop Loop termination must be govemed by runtime constants The static compiler can check Whether this annotation is legal Complete unrolling is a critical optimization Allows induction variables to become runtime constants unrolled for set0 5etltassoc 5et if setArray5etdynamic7gttag tag return CacheHit l return CacheMiSS l L end of dynamic region The Annotations cacheResult cacheLookup Void addr Cache cache ynamicRegion keycache foo key Allows the creation of multiple versions of a dynamic region each using different runtime constants Separate code is dynamically generated for each distinct combination of values of the runtime constants return CacheMiSS l L end of dynamic region The Need for Annotations Annotation errors 7 Lead to incorrect dynamic compilation 7 e g Incorrect code if a value is not really a runtime constant Automatic dynamic compilation is dif cult 7 Which variables are runtime constant over which pieces of code 7 Complicated by aliases side effects pointers that can modify memory 7 Which loops are profitable to unroll 7 Estimating pro tability is the difficult part The Static Compiler Operates on low7level IR 7 CFG three address code in SSA form Tasks 7 Identifies runtime constants inside ofdynamic regions 7 rquot L 39 uhgranh into set7up 4 r39 A subgraphs 7 Optimizes the control ow for each procedure 7 enerates machine code including templates 7 In most cases table space for runtime constants can be statically allocated do we do about unrolled loops 7 Generates stitcher directives H377 mm 0m 3mquot thnwzahmxs r Detecting Runtime Constants Simple damJlow analysis L L following transfer functions 7x y x is a constant iffy is a constant 7xyopz xisaconst iffy andzare consm and op isan idemp otent sideeffect free nontrapp ing op 7xfy yn xisaconstiffthey areconsm andfisan idempotent sideeffect free nontrapp ing function 7x p x is a constant iffp is constant 7x dynamic p x is not constant D melm Jivtirlrzlatmns Detecting Runtime Constants cont Merging control llow 7 U predecessors it s considered a constant after the merge t test 7Iftest is a runtime constant then we ll always take one branch or the other t 7Iftest is constant 3 is idempotent so the j result is constan Xi 1 X2 2 7Iftest is not constant q is not idempotent so the result is not constant X it X1 99 mm mm 0mm gpmuzamus u p Integrated optimizations 7 For best quality code optimizations should be performed across dynamic region boundaries eg global CSE global register allocation 7 Optimizations can be performed both before and after the dynamic region has been split into setup and template codes Restrictions on optimizing split code 7 Instructions with holes cannot be moved outside of their dynamic region irrl legal r at 2va FL I a 4 dynamic regions but induction Variables become constant for only a given iteration of an unrolled loop D melm Jivtirlrzlatmns v The Stitcher Performs dir edivedriven tasks 7 atc es holes in templates 7 Unrolls loops 7 Patches pcrelative instructions such as relative branches Performs simple peephole optimizations 7 Strength reduction of multiplies unsigned division modulus H577 mm 6mm thuwzalwus The End Result Final dynamically generated code from our example 7 Assuming the ollowing con guration 7 512 lines 32 byte blocks 4way set associative T cacheResult cacheLookup void addr cache cache in get 7 addr gtgt 14 mt line 7 add gtgt 5 a 511 setstructure MsetArra 7 cache7gt11nes11ne17gtsets 77 ag oto L 1f setArray07gttag g 1f setArrayl7gttag goto L if setArray27gttag 7 goto L 1f setArray37gttag 77 tag goto Ll D melnl 3Uhuulatmn5 4 The Original Code Without Annotations cacheResult cacheLookup void addr mt block51ze 7 cache7gtblock51z mt nulenes 7 cache7gtnumL1ne57 mt tag 7 addr block51ze nulenes mt line 7 add block51ze nulenes setstructure setArray 7 cache7gtllnesllnel7gtsets mt assoc 7 cache7gtassoc1atlvlty7 mt set Cache cache e for set70 setltassoc set setArrayset7gttag 77 tag turn CacheHlt l return CacheMlss mm mm 0mm gpmuzamus 4 Performance Results Two measures ofperi ormance 7 Asymp otic improvement speedup if overhead were 0 7 Break even point the fewest number of iterations at which the dynamic compilation system is pro table asymptotic speedup of benchmark calculator 1 7 916 interpretations matrix multiply 1 6 3139 scalar s sparse mat multiply 1 8 2645 matrix s 1 5 185 ma 1 s event dispatcher 1 4 722 event dispatches quicksort 1 2 3050 records 1 2 4760 records D melnl 3Uhuulatmn5 Evaluation Today s discussion 7 Simple caching scheme 7 Setup once reuse thereafter 7 More sophisticated schemes are possible 7 Can cache multiple versions ofcode 7 Can provide eager or speculative specialization 7 Can allow different dynamic regions for different variables Recent progress on DyC 7 More sophisticated language and compiler Grant et a 1999 7 More complexity is needed 7 Extremely dif cult to annotate the applications 7 Automated insertion of annotations Mock et a1 2000 7 Use profiling to obtain value and frequency information mm mm lr Dwam Qytuwzahmxs Lessons Is dynamic compilation worthwhile 7 For nntimi atinn 4 L because of39L 7 Important for Java Just in Time compilers Dynamo HP Labs Bala et a1 2000 7 Dynamically translate binaries no annotations 7 Only modest performance improvements 7 But many other interesting uses DELI system 7 Emulation of novel architectures 7 Software sandboxing 7 Software verification D melm Oumimd mns Next Time Reading 7 Ch 14 lecture 7 Compiling OOP mm mm lr Dwam Qytuwzahmxs Reuse Optimization Last time 7 Dead code elimination 7 Common subeXpression elimination CSE 7 Copy propagation 7 Simple constants Today 7 Partial redundancy elimination PRE 3353 Leann Reuse vaiinization PRE Partial Redundancy Elimination PRE Partial Redundancy 7 An expression eg xy is partially redundant at node n if some path from the entry node to n evaluates xy and there are no de nitions of x or y between the last evaluation of xy and n x y Y nxy Xyxy v Elimination Y H X y 7 D1scover part1ally redundant eXpress1ons Y 7 Convert them to fully redundant eXpress1ons 7 Remove redundancy X y X y v PRE subsumes CSE and loop invariant code motion n v 23353 Leann Reuse vaiinization PRE Loop Invariance Example PRE removes loop invariants 7 An invariant expression is partially redundant 7 PRE converts this partial redundancy to full redundancy 7 PRE removes the redundancy Example Y Y Y Y y z 1 X z 1 a b c 1 a b v v v v v v 2 2 2 a b c a b c V V Y 23553 LCCH Ir Reuse Optimization PRE Implementing PRE lazy code motion Knoop 92 dragon 2007 Big picture 7 Use global analysis data ow analysis to discover where partial redundancy can be converted to full redundancy 7 Global analysis also determines latest possible point to create redundancy 7 Insert code and remove redundant expressions 7 As in textbook assuming one statement per basic block expr killed expr insert computation expr lt delete computation 23553 LCCH Ir Reuse Optimization PRE Local Properties An expression is locally available or in egenb in block b if it is computed at least once and its operands are not modi ed after its last computation in b An expression is locally anticipated if it is computed at least once and its operands are not modi ed before its rst evaluation An expression is locally used or in euseb in block b if it is computed at least once With only one statement per block anticipated euseb An expression is locally killed or in ekilb in block b if any of its operands are de ned in b Example Available b 3 b C Anticipated 13 C eiuse 13 C eikill cl bal 39 39 39 3353 Leann Reuse Optimization PRE Global Anticipability Intuition 7 If e is globally anticipated at p then an evaluation of e at p will make the next evaluation of e redundant along all paths from p P v expr expr expr Flow Functions anticipatedioutn sEsuch anticipatediins anticipatediinn e7usen U anticipatedioutn e7killn 3353 Leann Reuse Optimization PRE 5 Global Availability for PRE Intuition 7 Global availability for PRE is almost the same as Available Expressions except it depends on the results of global anticipated expressions 7 If e is globally available at p then an evaluation at p will create redundancy along all paths starting at p expr A Y expr p Flow Functions availableiinn pEpredm availableioutp availableioutn anticipatediinn e7ki11n U availableiinn e7ki11n A xation PRE Earliest Intuition 7 The earliest place an expression is anticipated in but not globally available in 7 Does not require iterative data ow analysis Just requires one pass over all statements 7 Could place an expression generation statement at the beginning of any block b where expression is in earliestb Function earliestn anticipatediinn availableiinn B1abc B2Xb1 A A B3 3 b c B1 B2 B3 euse bc b1 bc ekill anticipatedout bc bc anticipatedin bc bc b1 bc available in bc available out bc bc b1 bc earliest bc bC 11 quot1 postponable in postponable out latest used0ut usedin L Sf x Leo ch Reuse iiiiiza u39on PRE U Postponable Expressions Intuition 7 Calculates all possible program points starting from earliest in which an expression can be computed while preserving semantics and not introducing more redundancy 7 By computing an expression at the latest possible postponable point the live range of the temporary that holds expression value is reduced earliest Flow Functions p expr expr postponableiinn sEpredM postponableiouts Y postponableioutn earliestn e7usen U p postponableiinn e7usen Y P v amp expr expr expr A iza u39on PRE 10 Latest Intuition 7 Last block the computation for which the expression computation can be postponed 7 If an expression is in 1atestb that indicates that the last point the expression can be computed is at the beginning of block b 7 Requires only one Visit to each statement Flow Function 1atestn earliestn U p0stp0nable7inn e7usen U not 0 56mm earliests U p0stp0nable7ins gallon PRE ll Example B1abc B2xb1 A A B3 3 b c B1 B2 B3 e use bc b1 bc 9 kill anticipatedout bc bc anticipatedin bc bc b1 bc availablein b39l39ci availableout bc bc b1 bc earliest bc bC M1 postponablein i postponableout bc latest bc bcb1 usedout usedin Global Used Expressions Intuition 7 If e is globally used at p then an evaluation of e at p will be used again along some path starting at p 7 If an expression is not in the usedioutb then a computation of the expression should not be put at the beginning of block b even if e is in 1atestb 7 Liveness analysis for expressions Flow Functions usedioutn U sEsuch usediins usediinn e7usen 1atestn U usedioutn 1atestn L Sf x Lecm c 1761530 iiiiza u39on PRE 13 Example B1abc B2xb1 A A B3 3 b c B1 B2 B3 e use bc b1 bc 9 kill anticipatedout bc bc anticipatedin bc bc b1 bc availablein bc availableout bc bc b1 bc earliest bc bC M1 postponablein postponableout bc latest bc bcb1 usedout bc bc usedin bc L Sf x Lecm c Reusethiiiiiza o PRE 17quotr PRE Summary Algorithm 7 Insert an empty block along all edges entering a block with more than one predecessor 7 Calculate latest and usediout sets 7 For each expression e 7 create a temporary t to store e 7 for all blocks where e is in latestb and usedioutb add Fe to beginning of block 7 for all blocks where e is in e7useb and not latestb or usedioutb replace original e Witht What s so great about PRE 7 A modern optimization that subsumes earlier ideas 7 Composes several simple data ow analyses to produce a powerful result 7 Finds earliest and latest points in the CFG at which an eXpression is anticipated CS Lemch Reuse Q riiiza iion PRE l39 Another Example B1abc B2bb1 B3 A3 39 b c B1 B2 B3 e use MC 1TH bC e kill bc b1 anticipatedout bC bC anticipatedin bc b1 bc availablein i availableout bC bC earliest bc 1TH bC postponablein l postponableout latest 1TH MC usedout i usedin i 8 CS Lemch Reuse Optimization PRE lo Next Time Assignments 7 HWl due tomorrow Midterm on Tuesday 7 Email questions 7 Do suggested exercises Reuse Optimization PRE 1 J LowLevel Issues Last lecture 7 Liveness analysis 7 Register allocation Today 7 More register allocation Later 7 Instruction scheduling 3353 Lecnu e Registere Lllmmnon 1 Register Allocation Problem 7 Assign an unbounded number of symbolic registers to a xed number of architectural registers 7 Simultaneously live data must be assigned to different architectural registers Goal 7 Minimize overhead of accessing data 7 Memory operations loads amp stores 7 Register moves C8553 Lecim e Improvement 1 Simplification Phase Chaitin 81 Idea 7 Nodes with lt k neighbors are guaranteed colorable Remove them from the graph rst 7 Reduces the degree of the remaining nodes Must spill only when all remaining nodes have degree 2 k Referred to as pessimistic spilling C8553 Lenore Simplifying Graph Allocators Chaitin spill code CSS 53 LeclllIr Briggs coalesce spill costs SDiII code Algorithm Chaith1 while interference graph not empty do while El a node n with lt k neighbors do Remove n from the graph Simplify Push n on a stack if any nodes remain in the graph then blocked with gt k edges Pick a node n to spill lowest spillcost or Spill Add n to spill set highest degree Remove n from the graph if spill set not empty then Insert spill code for all spilled nodes store after def load before use Reconstruct interference graph amp start over while stack not empty do Pop node n from stack c010r or se1ect Allocate n to a register CS Lecnu e Example Attempt to 3color this graph ll E Stack Possible order 6 e c a b f f b a c e d 38553 Lcc in e a The Problem Worst Case Assumptions Is the following graph 2colorable 1 s4 s2 53 Clearly 2colorable But Chaitin s algorithm leads to an immediate block and spill The algorithm assumes the worst case namely that all neighbors will be assigned a different color CS 353 Lecnu e J Improvement 2 Optimistic Spilling Briggs 89 1 7quot E s 3 Idea 7 Some neighbors might get the same color 7 Nodes with k neighbors might be colorable 7 Blocking does not imply that spilling is necessary 7 Push blocked nodes on stack rather than place in spill set Defer 7 Check colorability upon popping the stack when decision more information is available or C8553 Lecim e Algorithm Briggs et al 89 while interference graph not empty do while El a node n with lt k neighbors d0 Remove n from the graph Simplify Push n on a stack if any nodes remain in the graph then blocked with gt k edges Pick a node n to spill lowest spillcosthighest degree gt Push n on stack Remove n from the graph defer demslon while stack not empty d0 Pop node n from stack gt if n is colorable then make decision Allocate n to a register else Insert spill code for n Store after def load before use Reconstruct interference graph amp start over CS Lecnu e RegisterAJIwmzon III 5 Example Attempt to 2c010r this graph ll Increasing Weight e QOU HISU blocked node 38553 Lemme Regisler mmalien l1 l0 Improvement 3 C oalescing Move instructions 7 Code generation can produce unnecessary move instructions mov t1 7 lfwe can assign el and e2 to the sarne register we can elirninate the move Idea 7 Iftl and e2 are not connected in the interference graph coalesce thern into a single vanab e Problem 7 Coalescing can increase the number ofedges and make a graph uncolorable e Lirnit coalescing coalesce to avoid uncolorable t1 t2 t2 graphs Coalescing Logistics Rule 7 When building the interference graph do NOT make virtual registers interfere due to copies J i t 7 11 me and 7 4 statement 51 52 then 51 and 52 can be coalesced eExarnple r a t abtu b7 cab a xrblw bquot zcy y Before Coalesc39lng After Coalesc39lng Computing the Interference Graph in MiniJava compiler Computing interference graph to enable coalescing Use results of live variable analysis for each ow graph node n do for each def in defn do for each temp in liveoutn do if not stmtn isa MOVE or use temp then E e E U def temp 3353 Leanne 1 Register Allocation Spilling If we can t nd a k coloring of the interference graph 7 Spill variables nodes until the graph is colorable Choosing variables to spill 7 Choose least frequently accessed variables 7 Break ties by choosing nodes with the most con icts in the interference graph 7 Yes these are heuristics 1255553 Lemme H More on Spilling Chaitin s algorithm restarts the whole process on spill 7 Necessary because spill code loads stores uses registers 7 Okay because it usually only happens a couple times Alternative 7 Reserve 23 registers for spilling 7 Don t need to start over 7 But have fewer registers to work with CS Lecnu e 1 39 Weighted Interference Graph Goal 7 Weights I r is execution frequency of r Static approximation 7 Use some reasonable scheme to rank variables 7 Some possibilities 7 Weightt num of times t is used in program 7 Weightt 10 x uses in loops uses in straightline code 7 Weightt 20 x uses in loops 2 x uses in straightline code uses in a branch statement 38553 Lemme Ii Register Allocation and Procedure Calls Problem 7 Register values may change across procedure calls 7 The allocator must be sensitive to this Two approaches 7 Work within a wellde ned calling convention 7 Use interprocedural allocation not covering this CS f Lecnu e 1 J Calling Conventions Goals 7 Fast calls pass arguments in registers minimal register savingrestoring 7 Languageindependent 7 Support debugging pro ling garbage collection etc Complicating Issues 7 Varargs 7 Passingreturning aggregates 7 Exceptions nonlocal returns SthmP l ngjmp 1355553 Lemme gt3 Architecture Review Caller and CalleeSaved Registers Partition registers into two categories 7 Callersaved 7 Calleesaved Callersaved registers 7 Caller must saverestore these registers when live across call 7 Callee is free to use them caller Examplg0 0 rcaller 4 save rcaller 9 00 restore rcaller r 9 CS 353 Lecnu e caller39 90 rcaller callee goo is free to modify r caller Architecture Review Caller and CalleeSaved Registers Calleesaved registers 7 Callee must saverestore these registers when it uses them 7 Caller expects callee to not change them Example caller foo reallee 4 90 reallee C8553 Leclm e callee 9000 save r reallee restore r goo promises not to modify r calle 0 Architectures with Callee and Caller Registers SPARC 7 hardwaresaved i0i7 oO08 Alpha 7 7 calleesaved out of 32 registers MIPS 7 callersaved t0t9 a0a3 v0v1 7 calleesaved s0s7 ra fp PPC 7 18 calleesaved 7 14 callersaved StarCore EABI 7 4 calleesaved 7 28 callersaved 3353 Lecnu e Ll Register Allocation and Calling Conventions Insensitive register allocation 7 Save all live callersaved registers before call restore after 7 Save all used callee saved registers at procedure entry restore at return 7 Suboptimal foo Avariable that is not live across calls should go in callersaved registers f Avariable that is live across multiple calls should g go in calleesaved registers Sensitive register allocation 7 Encode calling convention constraints in the IR and interference graph 7 How Use precolored nodes C8553 Leanne 22 Precolored Nodes Add architectural registers to interference graph 7 Precolored mutually interfering 7 Not simpli able 7 Not spillable Express allocation constraints 7 Integers usually can t be stored in oating point registers 7 Some instructions can only store result in certain registers 7 Callersaved and calleesaved registers I s2 s4 5 L f3 s1 oating point integer oating point integer CS 353 Lecuu e Precolored Nodes and Calling Conventions Calleesaved registers 7 Treat entry as def of all calleesaved registers 7 Treat eXit as use of them all 7 Allocator must spill calleesaved registers to use them foo def r3 Live range of calleesaved registers use r3 4 Callersaved registers 7 Variables live across call interfere with all callersaved registers 1255553 Lecim e Example r1 r2 caller f O saved def r3 t1 r3 r3 calleesaved a b a call goo t1 b r3 t1 user3 153 return CS f53 Lecture 7 Tradeoffs Calleesaved registers Decreases code size one procedure body may have multiple calls Small procedures tend to need fewer registers than large ones calleesave makes sense because procedure sizes are shrinking May increase execution time For longlived variables may save and restore registers multiple times once for each procedure instead of a single endtoend saverestore The larger problem 7 We re making local decisions for policies that require global information 1355553 Lecim e lo Problem with CalleeSaved Registers Runtime systems e g setjmp longjmp and debuggers need to know register values in any stack frame 7Ca11ersaved registers are on stack frame at known location 7 Callee saved registers F4 save r3 F3 r1 r2 callersaved r3 callee saved F2 save d r2 F1 Concepts Decision tree for register allocation Global approaches centered around an interference graph 7 greedy coloring 7 coloring with simpli cation Chaitin 7 coloring with simpli cation and optimistic spilling Briggs 7 coloring with simpli cation coalescing and optimistic spilling Register allocation across procedure calls 7 precolored nodes in the interference graph Next Time Lecture 7 Instruction scheduling CS 553 Lecture Instruction Scheduling Last time 7 Register allocation Today 7 Instruction scheduling 7 The problem Pipelined computer architecture 7 A solution List scheduling Distinction Schedu Lug Background Pipelining Basics Idea 7 Begin executing an instruction before completing the previous one Without Pipelinng With Pipelinng time gt time gt L Sf x Lemur Distinction Sohfzdullng Idealized Instruction DataPath Instructions go through several stages of execution Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Instruction Instructlon gt Decode amp gt Execute gt Memory gt Reglster Fetch Access Writeback Reglster Fetch IF gt IDRF gt EX gt MEM gt WB time 39IIF IID IEX MM WB a IIF IID EX MM WB 0 g IIF ID EX MM WB 3 IF ID EX MM WBI IF ID EX MMIWBI IF ID EleMIWBI CS Lecnu c lJIsU lXL39l lUIl Schedul Lug Pipelining Details Observations 7 Individual instructions are no faster but throughput is higher 7 Potential speedup determined by number of stages more or less 7 Filling and draining pipe limits speedup 7 Rate through pipe is limited by slowest stage 7 Less work per stage implies faster clock Modern Processors 7 Long pipelines 5 Pentium l4 Pentium Pro 20 or 31 Pentium 4 7 Issue 2 Pentium 4 UltraSPARC or more dead Compaq EV8 instructions per cycle 7 Dynamically schedule instructions from limited instruction window or statically schedule eg IA64 7 Hyperthreading or simultaneous multithreading 7 Speculate Outcome of branches Value of loads mle hismiutiun Schedul Lug What Limits Performance Data hazards 7 Instruction depends on result of prior instruction that is still in the pipe Structural hazards 7 Hardware cannot support certain instruction sequences because of limited hardware resources Control hazards 7 Control ow depends on the result of branch instruction that is still in the pipe An obvious solution 7 Stall insert bubbles into pipeline Distinction Schedu Lug Stalls Data Hazards Code add r1 r2 r3 r1 is the destination mul r4 r1 r1 r4 is the destination Pipeline picture time 39 IF ID EX MM WB g IF km EX MMWB 5 B I LEE x Lee ch Distinction Sohfzdullug Stalls Structural Hazards Code mul r1 r2 r3 Suppose multiplies take two cycles mul r4 r5 126 Pipeline Picture me B a E Q 5 B I 23353 Lemur ileumvim Scheduling Stalls Control Hazards Code bz r1 label if r10 branch to label add r2r3r4 Pipeline Picture me 39 SUI 4 suopomi CS 353 Lemur ileumvim Scheduling Hardware Solutions Data hazards 7 Data forwarding doesn t completely solve problem 7 Runtime speculation doesn t always work Structural hazards 7 Hardware replication expensive 7 More pipelining doesn t always work Control hazards 7 Runtime speculation branch prediction Dynamic scheduling 7 Can address all of these issues 7 Very successful L IISU HL39llUD Schedu Lug 539 Instruction Scheduling for Pipelined Architectures Goal 7 An efficient algorithm for reordering instructions to minimize pipeline stalls Constraints 7 Data dependences for correctness 7 Hazards can only have performance implications Possible Simplifications 7 Do scheduling after instruction selection and register allocation 7 Only consider data hazards L IISU HL39llUD Schedu Lug 10 Data Dependences Data dependence 7 A data dependence is an ordering constraint on 2 statements 7 When reordering statements all data dependences must be observed to preserve program correctness True or ow dependences 7 Write to variable X followed by a read of X read after write or RAW x 5 t Antidependences prln x 7 Read of variable X followed by a write WAR print x 5 f 1 Output dependences x a se dependences 7 erte to var1able X followed by x 6 another write to X WAW x 5 3353 Leann Distinction Scheduling 11 Register Renaming Idea 7 Reduce false data dependences by reducing register reuse 7 Give the instruction scheduler greater freedom Example add r1 r2 1 add r1 r2 1 st r1 fp52 st r1 fp52 mul r1 r3 2 mul 1211 r3 2 st r1 fp40 st r11 fp40 add r1 r2 1 mul r11 r3 2 st r1 fp52 st r11 fp40 CS f Leann Distinction Schedul Lug 11 Phase Ordering Problem Register allocation 7 Tries to reuse registers 7 Arti cially constrains instruction schedule Just schedule instructions rst 7 Scheduling can dramatically increase register pressure Classic phase ordering problem 7 Tradeoff between memory and parallelism Approaches 7 Consider allocation amp scheduling together 7 Run allocation amp scheduling multiple times schedule allocate schedule L IISU HL39llUIl Schedul Lug U List Scheduling Gibbons amp Muchnick 86 Scope 7 Basic blocks Assumptions 7 Pipeline interlocks are provided z39e algorithm need not introduce no ops 7 Pointers can refer to any memory address z39e no alias analysis 7 Hazards take a single cycle stall here let s assume there are two 7 Load immediately followed by ALU op produces interlock 7 Store immediately followed by load produces interlock Main data structure dependence DAG 7 Nodes represent instructions 7 Edges s1s2 represent dependences between instructions 7 Instruction s1 must execute before s2 7 Sometimes called data dependence graph or data ow graph CS Lemch L IISU HL39llUIl Schedul Lug 11 Dependence Graph Example Sample code 1 addi 2 addi 3 st 4 1d 5 1d 6 7 8 EEEII 9 U 39 dst src src Dependence graph r21r1 sp12sp a r0 r34sp r48sp sp8sp 0spr2 r5a r41r4 Hazards in current schedule 34 56 78 89 Any topological sort is okay but we want best one Distinction Schedu Lug 1 3 Scheduling Heuristics Goal 7 Avoid stalls Consider these questions 7 Does an instruction interlock with any immediate successors in the dependence graph 10W is the delay greater than 1 7 How many immediate successors does an instruction have 7 Is an instruction on the critical path Distinction Schedu Lug 17 Scheduling Heuristics cont Idea schedule an instruction earlier when 7 It does not interlock with the previously scheduled instruction avoid stalls 7 It interlocks with its successors in the dependence graph may enable successors to be scheduled without stall 7 It has many successors in the graph may enable successors to be scheduled with greater exibility 7 It is on the critical path the goal is to minimize time after all 3353 Lecuu c blsunutinn Scheduling l J Scheduling Algorithm Build dependence graph G Candidates 7 set of all roots nodes with no inedges in G while Candidates g Q Select instruction s from Candidates Using heuristics7in order Schedule s Candidates 7 Candidates s Candidates 7 Candidates U exposed nodes Add to Candidates those nodes whose predecessors have all been scheduled 3353 Lecuu c blsunutinn Scheduling lS Scheduling Example Dependence Graph Scheduled Code 3 st a r0 2 addi sp12sp 5 1d r4 8 sp 4 1d r3 4sp 8 1d r5a 1 addi r21r1 6 addi sp8sp 7 st Ospr2 9 addi r41r4 Candidates Hazards in new schedule addi r21r1 8A 1 2 add 88 sp 3 11 atlm 5 8 1d r5 a 9 addi r41r4 6 addi sp8sp 39 Leann hlsmmiun Sch dullng 15 Scheduling Example cont Original code 1 addi r21r1 3 st 3 r0 2 addi sp12sp 2 addi sp12sp 3 st 3 r0 5 1d r4 8 sp 4 1d r3 4 sp 4 1d r3 4sp 5 1d r4 8 sp 8 1d r5a 6 addi sp8sp 1 addi r21r1 7 st 0 sp r2 6 addi sp8sp 8 1d r5 a 7 st 0 sp r2 9 addi r41r4 9 addi r41r4 Hazards in original schedule Hazards in new schedule 34 56 78 89 81 CS Leann hlsmmiun Schedul Lug 0 Complexity Quadratic in the number of instructions 7 Building dependence graph is Onz 7 May need to inspect each instruction at each scheduling step On 7 In practice closer to linear hismminn Schedul Lug r Example 106 in book Stalls 7 LD takes two clocks but 7 ST to same can directly follow 7 any LD can directly follow Flow dependences 7 il t0 i2 i3 t0 i4 i4 t0 i5 i5 t0 i6 7 i2 to i3 Anti dependences 7 i4 t0 i5 7 il t0 i7 i3 t0 i7 Output dependences 7 i3 t0 i4 i4 t0 i5 7 i2 t0 i7 CS Lemch hismminn Schedul Lug Kr Improving Instruction Scheduling Techniques 7 Register renaming Deal w1th data hazards 7 Schedullng loads 7 Loop unrolling 7 Software pipelining Deal with control hazards 7 Predication and speculation 23353 Lemm lnsmlciion Scheduling l A Register Renaming Idea 7 Reduce false data dependences by reducing register reuse 7 Give the instruction scheduler greater freedom Example add r1 r2 1 add r1 r2 1 st r1 fp52 st r1 fp52 mul r1 r3 2 mul 1211 r3 2 st r1 fp40 st r11 fp40 add r1 r2 1 mul r11 r3 2 st r1 fp52 st r11 fp40 23353 Lemm lnsmlciion Scheduling l 271 Scheduling Loads Reality 7 Loads can take many cycles slow caches cache misses 7 Many cycles may be wasted Most modern architectures provide nonblocking delayed loads 7 Loads never stall 7 Instead the use of a register stalls if the value is not yet available 7 Scheduler should try to place loads well before the use of target register instruction Scheduling l 25 Scheduling Loads cont Hiding latency 7 Place independent instructions behind loads load r1 r3 load r1 r3 load r2 r1 add r4 add r4 load r2 r1 41lllllll U12345678 U12345678 time time 7 How many instructions should we insert 7 Depends on latency 7 Difference between cache miss and cache hits are growing 7 If we underestimate latency Stall waiting for the load 7 If we overestimate latency Hold register longer than necessary Wasted parallelism 1ch instruction Scheduling l 46 Loop Unrolling Summary Bene t 7 Loop unrolling allows us to schedule code across iteration boundaries providing more scheduling freedom Issues 7 How much unrolling should we do 7 Try various unrolling factors and see which provides the best schedule 7 Unroll as much as possible within a code expansion budget 7 An alternative Software pipeljm39ng CS f Leann instruction Scheduling l 2 Software Pipelining Basic idea 7 Software pipelinng is a systematic approach to scheduling across iteration boundaries without doing loop unrolling 7 Try to move the long latency instructions to previous iterations of the loop 7 Use independent instructions to hide their latency 7 Three parts of a software pipeline 7 Kernel Steady state execution of the pipeline 7 Prologue Code to ll the pipeline 7 Epilogue Code to empty the pipeline CS f Leann instruction Scheduling l 30 Visualizing Software Pipelining Iteration O Iteration Iteration Iteration 3 Iteration 4 Software pipelined iteration 2003 EIsevier Science USA All rights reserved C8553 Lecture Instruction Scheduljng H 31 Software Pipelining versus Loop Unrolling Startup Winddown code code Number of overlapped operations a Software pipelining Time Proportional to number of Overlap between unrons unrolled iterations l I Number of overlapped operations b Loop unrolling Time 2003 Elsevier Science USA All rights reserved C8553 Lecture Instruction Scheduljng H 32 SW Pipelining Step 1 Construct DAG and Assign Registers int M100 H100 c100 ior a ilt1oo i an M1 an cu Exampl assumes In nzmjhncnanal amt and 5mg R1 R2 R3 yclg 11127113 Inmmm SLhedukng u 33 SW Pipelining sup 2 Unroll Satist Lawncies Find Pa ern SW Pipelining Step 3 Satisfy register constraints 7 WWW MM 3 SW Pipelining Step 1 Construct DAG and Assign Registers a 110 an an an cu int M100 H100 c100 ior 39 39 0 i Example assume v 739mactnm2 R1 R2 R3 xnmmm SLhedukng u 3 SW Pipelining Step 2 Unroll Satisfy Latencies Find Pattern This pattern does not world 8553 Lecture SW Pipelining Step 3 Satisfy register constraints R1 R2 R4 R3 R4 2 R1 R2 R4 2 CS553 Lecture Lnstrucnon Schedxumo n 38 w 7 SW Pipelining and Loop Unrolling Summary Unrolling removes branching overhead and helps tolerate data dependence latency SW pipelining maintains max parallelism in steady state through continuous tolerance of data dependence latency Both work best with loops that are parallel getting ILP by taking instructions from different iterations instruction Schedulin l 3 Software Pipelining Complications 7 What if there is control ow within the loop 7 Use control ow pro les to identify most frequent path through the loop 7 Optimize for the most frequent path 7 How do we identify the most frequent path 7 Pro ling instruction Schedulin l 40 ControlFlow Analysis and Loop Detection Last lime e Speeding up data ow analysis Today 7 Control ow analysis 7 Loops 7 Identifying loops using dominators e Reducibility WW 1 mutt linnthll ruta sntl leu DFl l39lel Con text Damrl low 7 Flow of data Values from defs to uses 7 Could alternatively be represented as a data dependence Controlrl Iow e Sequencing ofopemtions 7 Could alternatively be represented as a control dependence 7 2g Evaluation ofthencode and elsecode depends on iftest tacer LBEUL Why study control flow analysis Finding Loops 7 most computation time is spent in loops 7 to optimize them we need to nd them Loop Optimizations e Loopinvariant code hoisting 7 Induction variable elimination 7 Army bounds check removal 7 Loop unrolling e Parallelization Identiiying structured control now 7 can be used to speed up data ow analysis t th up at so Representing C ontrolFlow Highslevel representation 7 Control ow is implicit in an AST Lowslevel representation 7 Use a Controls ow graph 7 Nodes represent statements 7 Edges represent explicit ow ofcontrol Other options 7 Control 39 4 4 L e Dependences on explicit state in value dependence graph VDG Weise 94 t7 my lTYlle allll Lunpnettmntn What Is ControlFlow Analysis Controlr ow analysis discovers the ow ofcontrol within a procedure 2g builds a CFG identi es loops Example l 2 a b 3 L1 bd 4 lt x goto L2 5 b c 6 e 1 7 L2 f 8 t g 9 gt 0 goto L3 1U goto L1 3 eturn l E w l reiutr liwntml ruta tritl Limo Del t Loop Concepts Loop Loop entry edge Loop exit edge Loop header node Natural loop Back edge Loop tail node Loop preheader node Nested loop moss Lewdquot Source not in loop amp target inloop Source inloop amp target not inloop Target ofloop entry edge Nodes withpath to backedge without going through header Target is loop header amp source is in the loop Source ofback edge Single node that s source ofthe loop entry edge Loop whose header is inside another loop Picturing Loop Terminology t Wat vli pi to exit edge The Value of Preheader Nodes Not all loops have preheaders 7 Sometimes it is useful to create them Without preheader node 7 There can be multiple With single preheader node 7 There is only one entry edge Useful when moving code oumide the loop 7 Don t have to replicate code for multiple entry edges entry edges i nutml lnuwnid Lunpnrttmuin 0 Identifying Loops Why 7 Most execution tirne spent in loops so optirnizing loops will often give most bene t Many approaches 7 Interval analysis 7 Exploit the natural hierarchical structure ofprograms E Decornpose the prograrn into nested regions called intervals 7 Structural analysis a generalization of interval analysis 7 Identify dominators to discover loops We ll look atthe dominamrrbased approach l s m l CNVE iInnLMli rm and Limo DFt vayi m Dominator Terminology Dominalors d dom i ifallpaths from entry to node i include d Strict dominalors dsdomiifddomianddsi Immediate dominalors I aidoni b ifasdom b and there does not exist anode c ald m b E c and c dom b suchthatcacbadom Post dominalors pdom i ifevery possible path from i to exit includes p p dom i in the ow graph whose arcs are reversed and entry and exit are interchanged n p pdorn exit tacer tortilla Identifying Natural Loops with Dominators Back edges A back edge ofanatural loop is one whose target back edge dorninates its source a Natural loop The natural loop ofaback edge man where n dorninates rn is the set ofnodes x such that n natural dorninates x and there is apath from x to rn not 100p containing n Example a Tgetgetcdofthe t e ge gtc oes no quot5 1 P has W I dorninate its source d nazyipioints n sodgtcdoesnot kn 3 de ne anatural loop a quotmil1t or so Computing Dominators Input Set ofnodesN in CFG and an entry node s Output Dorni set of all nodes that dorninate node i Doms 5 Key Idea for eachn EN7 5 Ifanode dorninates all Dornn N predecessors ofnode n then it repeal also dorninates no en change false predn D n u meant Donpp D s Dornn quot J39l icha ge x E Domp A x E Domp2 A x E Domp3 bx E Domn t mt tm lrtnwnni lttnppetmmn 3 Computing Dominators example Input Set ofnodesN and an entry node s Output Domi set ofall nodes that dominate node i Reducibility De niu39on e A CFG is reducible wellstructured ifwe can partition its edges into two disjoint sets the forward edges andthe back edges such that e The forward edges form an acyclic graph in which every node can be reached from the entry node 7 mt t i A d i t sources be convened 39 39 T1 and T2 transfonnalions Structured comrolr ow constructs give rise to reducible CFGs Value ofreducibilily e Dominance useful in identifying loops 7 permits interval analysis and it is easy to calculate th CFG depth itcer Lemma i onlwl iiliiwsil buyiDF39rcLlnn Doms s lt 139 5 lt quot5 lor euchn EN 5 s Dornn N repeal 5 change false many for eachnENr s D0msl5 D n u 6an D0mp Domq n p q r s ilD s Dornn Finally changetrue Dornq tbs DomnD Dornr r 5 until lchange Domp p 5 Dornn n pr 5 l f lt71myg Humid sino mil Lmlu Delvri iull l4 T1 and T2 T1 trunslormuu39on 7 remove selfcycles gt El T2 trunslormuu39on e ifnode n has aunique predecessor p then remove n and make all the successors for n be successors forp II n n itmtmwliwshii Lnupoet Cum 6 Handling Irreducible CFG s Node splitting 7 Can tum irreducible CFGs into reducible CFGs isxs39 irtciuia i nntmi lTVlle allli tttnppetcnmn i Why Go To All This Trouble Modern languages provide structured control llow 7 Shouldn t the compiler remember this information rather than throw it away and then recompute it Answers e may want to work on the binary code in which case such information is unavailab e 7 Most modern languages still provide a gate statement 7 Languages typically provide multiple types ofloops This analysis letsus treat them all uni ormly 7 We ma want acompilerwith multiple front ends for multiple languages rather than translate each langu e to a CFG translate each language to a canonical LIR and translate that representation once to a CF WW 1 mm lilnL u rm and Limp Dei t Concepts ConLroH low analysis ConLroH low graph CFG Loop terminology Identilying loops Dominators Reducibility itcer Lewdquot Next Time Assignments 7 Project 2 due the writeup is IMPORTANT Reading 7 ch 18 2 Lecture 7 Loop invariant code motion 1 quotmi w m an Compiling Object Oriented Languages What is an Obj ectOriented I anguage Last time Objects Person 7 Dynamic compilation 7 Encapsulate code and data Student Teacher Today Inheritance 7 Introduction to compiling object oriented languages 7 PrintllalllePerson p at are the issues evolution Person p new Person student 5 new student Subtype polymorphism 7 Can use a subclass wherever a parent class is expecte Printllallle p Printllallle s prepriiliand 7 Binding of quot dynamically based on the dynamic type f the receiver object use mm cm Laigniscs 1 mm c Dynamic binding message sends l rig ohmmnmwd Languages Inheritance of Instance Variables 39 Dynamic Binding Goal Problem 7 Lay out object for typeindependent instance Variable access The appropriate memOd depends 0 the dynamic type Ofme Obie eg prepriiiiand 0 Solution single inheritance Solution 7 Pre xing superclass elds are at beginning ofobject 7 39r for e 39 7 Encode pointer to class descriptor in each object Ex I I 7 Lay out methods in class descriptorjust like instance Variables m Name Name gt Usage summary 7 Load class descriptor pointer from object 7 Load method addr maczmm ess from descriptor aner Laigniscs 7 Jumpto method Lmi 0m rig ohmmnmwd Languages Why are ObjectOriented Languages Slow Style 7 Granularity low of small objects 7 Exploit dynamism Other reasons 7 Highlevel modem features such as safety 7 Garbage collection 13773 write em Lalgniscs Dynamism Code Dynamic binding 7 What code gets executed at a particular static message send 7 It depends and it may change Example class rectangle int lengthO int widthO int areaO return lengthO Widtho extends shape 7 7 i class square extends rectangle rectarea ze HH 1 nt 1ength returnsize Sqarea int widthO returnsize A Lermt 0m ng Obtntlimmted Languages Cost of Dynamic Binding Dir ect cost 7 Overhead of performing dynamic method invocation Indirect cost 7 Inhibim static analysis ofthe code Example int lengtno lt gt int widthO int areaO return lengthO widtho Class Driessen and Holzle OOPSLA 96 on C programs median of 52 tom execution time spent on dynamic dispatch 13773 write 9mm em Lalgniscs Dynamism Data Object instance types are not statically apparent 7 Need to be able to manipulate uniformly 7 Boxing wrap up all data and reference it with a pointer Example Integer n new lnteger33 type descriptor data mt Ohm Damned Languages in Cost of Dynamisrn Data Direct cost 7 Overhead of actually extracting data 7 eg 2 loads versus 0 if data already in a register Indir ect cost 7 More dif cult to statically reason about data H527 Um Style Sometimes programmers write C7style code in 00 languages 7 Easy just optimize it away Sometimes programmers actually exploit dynamism 7 Hard it can t just be optimized away Programmers create many small objects 7 Thwarm local analysis 7 Exacerbates dynamism problem 7 Huge problem for pure 00 languages Programmers create many small methods 7 Methods to encapsulate data 7 eg Methods to get and set member fields A Lermc mm g mammal Languages A Concrete Example Java High7level and modern 7 Objectoriented 7 Mobil portable standard bytecode IL 7 Multithreaded great for structuring distributed and UI programs 7 Garbage collected 7 Dynamic class loading 7 Reasonable exception system 7 Rich standard libraries H527 Um Approaches to 139 a Interpremtion 7 Extremely portable 7 Simple stack machine 7 Performance suffers 7 Interpretation overhead 7 Stack machine no registers Direct compilation 7 Compile the source or bytecodes to native code 7 Sacri ces portability 7 Can have very good performance mammal Languages 4 Why is Java Slow Bytecode interpretation 7 Not a good answer 8555 me C39ompiiuig vizier Why is Java Slow Impediments to performance 7 Flexible array semantics 7 Runtime checks null pointers array bounds types 7 Precise exception semantics thwart optimization 7 Dynamic class loading thwarts optimization 7 Even the class hierarchy is dynamic 7 Multithreading introduces synchronization overhead 7 Lots of memory references poor cache performance and all the usual 00 challenges C8555 Lecture Compilmg wins 631ml Lmlgusz u i0 Scienti c Programming and Java Consider matrix multiplication for i0 iltm i for j0 jltp j for k0 kltn k CiJEj AEiJEk BEkJEj Costs 7 6 null pointer checks with just 2 oating point operations 7 6 index checks Can we optimize this code 7 Precise exception model 7 Exception semantics inhibit rem oval or reordering 7 No multidimensional arrays 7 Rows may alias Campaign liaect niemu More on Matrix Multiplication Why can t wejust do this if m lt Csize0 ampamp p lt Csizel ampamp m lt Asize0 ampamp r1 lt Asizel ampamp r1 lt Bsize0 ampamp p lt Bsizel for i0 iltm i for j0 jltp j for k0 kltn k Ci j AEi k BEk j else raise exception No outofbounds checks right 3901pimglt We Own Exceptions in Java Exceptions in Java are precise 7 The effects of all statements and expressions before a thrown exception must appear to have taken place and 7 The effects of all statements or expressions after a thrown exception must appear not to have taken place Implications 7 Must be very careful or clever when 7 Eliminating checks or 7 Reordering statem ents C8553 Lr uurc C39om1iugi39vlrlem Safe Regions Moreira et al TOPLAS 2000 Idea 7 Create two versions of a block of code 7 One is guaranteed not to except and is optimized accordingly 7 The other is used when the code might except if m lt Csize0 ampamp p lt Csizel ampamp m lt Asize0 ampamp n lt Asizel ampamp n lt Bsize0 ampamp p lt Bsizel for i0 iltm i for j0 jltp j for k0 kltn k Ci j AEi k BEk j safe region else for i0 iltm i for j0 jltp j for k0 kltn k Ci j AEi k BEk j unsafe region cw Java Arrays and Loop Transformations Java arrays 7 No multidimensional arrays 7 Instead use arrays of arrays can be ragged 7 Requires one memory reference for each array dimension 7 Rows may alias with one another Arrays are common in scienti c applications 7 Their use requires optimization for good performance 7 Large body of work on loop transformations makes assumptions 7 Arrays stored in contiguous memory 7 No aliasing among array elements 7 Arrays are not ragged Compng lt lbg c quotnamed Languages Java Arrays A U mm mm quotva 0mm romcy mm hnasmlwma 90MB n 39uwv mien m n MAHF AV woasm mums mm i mum mum mum 00113 An AEPJW n mm niscwwon smear WDMEIUD mm mNanm 39 WWW gt uauzcr no may noun mum ANqu mm Emma m Fowrn m 99mm 1 AA mm in mm 2 mm uwccl r39wNTEn we fumv women m n7 mmmv gasket 13me DDuL39lE naumi Denali mum mummy m oaJscr nssmlmm mum POWER 10 my mm quot 39 A quotW39 aloucv mum mom i unums mmw m mum rum mm IBM Ayxnemx Journal 391 2000 IBM Compng lt mgrc quotnamed Languages 7 Summary Implementing 001 requires handlin 7 member variables and inheritance 7 dynamic binding due to polymorphism Some 001 features that lead to inef cient code 7 ic code and a a 7 safety ie array bounds checks and precise exceptions 7 garbage collection Many sources or inefficiency in Java 7 The cost ofa c1eaner obj ecton39entedlanguage Progress in improving Java ef ciency 7 Sciena39iic application performance ie imitating mumdim arrays run Cmmlrl Lar W377 mm W Cmnpil Next Time Today Seminar on giving talks from 4L5 in USC 110 Assignments 7 Project 3 due today 7 Project 4 is posted Lecture 7 Garbage collection Generalizing Data ow Analysis Announcements 7 Read Section 93 in the book Today 7 Other types of data ow analysis 7 Reaching de nitions available expressions reaching constants 7 Abstracting data ow analysis What s common among the different analyses 23553 Lemm Generalizing Dara ow AnaJysis l Reaching Def39mitions De nition 7 A de nition statement d of a variable v reaches d node 11 1f there 1s a path from d to n such that v1s v d not rede ned along that path 39 g e v 39 Tl Uses of reaching de nitions d X 5 lt 7 Build use def chains Does this def of x reach 11 Constant propagation f V can we replace n with f 5 X 7 Loop invariant code motion 1 a 39 lt Reaching de nitions ofa andb i If 39 To determine whether its legal to move statement 4 or 39 out of the loop we need to ensure that there are no 2 X a b reaching de nitions of a orb inside the loop 6 23553 Lemm Cv eneraliZLng Dara ow AnaJysis Computing Reaching De nitions Assumption 7 At most one de nition per node 7 We can refer to de nitions by their node number Genn De nitions that are generated by node n at most one Killn De nitions that are killed by node n Defining Gen and Kill for various statement types statement Gens Kills statement Gens Kills s tbopc s de t7s s gotoL s t Mb s de t 7 s s L SIMalb Sifa s ifaopbgotoL sFfa s de t7s 23353 Leoner Cv enerahzing Dam ow Analysis A Better Formulation of Reaching Definitions Problem 7 Reaching de nitions gives you a set of de nitions nodes 7 Doesn t tell you what variable is de ned 7 Expensive to nd de nitions of variable v Solution 7 Reformulate to include variable eg Use a set of var def pairs ax by inlnl Xayb 3353 Leoner Generallzing Dam ow Analysis 4 Recall Liveness Analysis De nition 7 A variable is live at a particular point in the d 39 program if its value at that point will be used in the future dead otherwise g de v Uses of Liveness 7 Register allocation 7 Deadcode elimination If a is not live out of statement 1 then 39 statement 1 is dead code 5me Us fb x 3 I 23553 LCCH Ir Generalizing Dara ow AnaJysis Available Expressions De nition 7 An expression xy is available at node n if every path from the entry node to n evaluates xy and there are no de nitions of x or y after the last evaluation entry xy A xy A x and y not de ned xy along blue edges A 4 n 23553 LCCH Ir Generalizing Dara ow AnaJysis Available Expressions for CSE How is this information useful Common Subexpression Elimination CSE 71f an expression is available at a point where it is evaluated it need not be recomputed Exam le V P 1 i j v t 4 L 1 i j a a 4 L A 4 2 L L 1 2 L L 1 t 4 L b 4 1 b t s V I x Y 3 c 4 L 3 c t V v 23353 Leona Generating Drug ow AmJysis Aspects of Data ow Analysis Must or may Information guaranteed or possible Direction forward or backward Flow values variables de nitions Initial guess universal or empty set Kill due to semantics of stmt what is removed from set Gen due to semantics of stmt what is added to set Merge how sets from two control paths compose 23353 Leona Generating Drug ow AmJysis Must vs May Information Must information 7 Implies a guarantee May information 7 Identi es possibilities Liveness Available expressions May Must safe overly large set overly small set desired information small set large set Gen add everything that add only facts that are might be true guaranteed to be true Kill remove only facts that remove everything that are guaranteed to be true might be false merge union intersection initial guess empty set universal set 23353 Leanin Generallying Dara ow Analysis 9 Reaching Definitions Must or May Analysis Consider uses of reaching definitions We need to know if 1 might reach node 11 23353 Leanin Generallzing Dara ow Analysis 10 De ning Available Expressions Analysis Must or may Information Must Direction Forward Flow values Sets of expressions Initial guess Universal set Kill Set of expressions killed by statement s Gen Set of expressions evaluated by s Merge Intersection c3353 Lecuux Genemining Dam ow AnaJysis 11 Reaching Constants aka Constant Propagation Goal 7 Compute value of each variable at each program point if possible Flow values 7 Set of variableconstant pairs Merge function 7 Intersection Data ow equations 7 Effect of node n x c ki11II X Vd genn 090 7 Effectofnoden x y z ki11II X V0 7 genn Xc cvalyvalz y valy E inn z valz E inn 3353 Lemm Generaltzing Data ow AmJysis 11 Reaching C onstants Example Must or may inio Direction Initial guess Gmrialmng i Reality Check Some dermitions and uses are ambiguous a We can t tell whether or what variable is involved 2g p x what variable are we assigningai a Unambiguous assignments are called strong up dates a Ambiguous assignments are called weak updates Solutions 7 Be conservative 7 Sometimes we assume that it could be everything 2g De ning p generating reaching de nitions ometimes we assume that it is nothing 2g De ning p killing reaching de nitions a Try to gure it out aliaspointer analysis more later Gmrialmng i Side Effects What happens at function calls 7 For example the call f00ampX might use or de ne 7 any local or heap variable X that has been passed by addressreference 7 any global variable Solution 7 How do we handle this for liveness used for register allocation 7 In general 7 Be conservative assume all globals and all vars passed by address reference may be used and0r modi ed 7 Or Figure it out calculate side effects example of an interprocedural analysis CS Lemch General ng ha ow Ajrdysis 1 39 Concepts Data ow analyses are distinguished by 7 Flow values initial guess type 7 Maymust 7 Direction 7 Gen 7 Kill 7 Merge Complication 7 Ambiguous references strongweak updates 7 Side effects CS Lemch General ng ha ow Ajrdysis 16 Next Time Lecture 7 Lattice theoretic foundation for data ow analysis Suggested Exercises 7 exercises from book 921 922 926 23353 Leanin Cv eneraltzing Data ow AmJysis 1 J Finding Bugs Last time 7 Runtime reordering transformations Today 7 Program Analysis for nding bugs especially security bugs 7 problem speci cation 7 motivation 7 approaches 7 remaining issues vacv Lemlr Problem What is a bug 7 a path in the code that causes a runtime exception 7 a path through the code that causes incorrect results Issues 7 exponential many paths 7 cannot staticany determine the path a program wi11 take 7 Program testing can be used to find the presence of bugs but never to show their absence Dijkstra 1972 Undecidability 7 soundness and comp1eteness together is undecidab1e 7 confusion in 1iterature which is which 7 every reported error is genuine no false positives negatives Finding Bugs Motivation for the Automatic Detection of Bugs Time spent in program maintenance 7 most software engineers spend the majority of their time doing maintenanc e 7 most time spent doing maintenance is time spent debugging Costs due to bugs that allow security exploits approximations published at CNET Newscom Jan 31 20 3 7 Slammer 950 million 7 Code Red 2 6 billion productivity loss 7 LoveLener 8 8 billion 7 Klez vims 9 0 billion vacv Lemlr Approaches to Finding Bugs Approaches 7 strengthening the type system 7 static analysis to detect bug panems 7 automated theorem proving 7 dynamic analysis 7 catch errors before they occur 7 find the cause for failures after the fact Evaluating the di 39erent approaches 7 how many false positives 7 how many false negatives 7 extent of user intervention or ease of use 7 ef ciency of approach Finding Bugs Example Bugs null derel erence 1f pnull p7gtopen ltgt array bounds error int a2 a 2n untrusted access 7 format string vulnerability fgetsltbuffer size file printfbuffer Lettrrr noun Eng Static Analysis es 7 project at University ofMaryland for nding bugs in Java 7 implementation steps 1 2 implement it 3 apply it to real software Hopefully ndsome real bugs will probably produce some false warnings 4 add heuristics to reduce percentage of false warnin Their experience new detectors can usual y b irnplernente quickly somewhere between a few minutes and a few days Often tectors find more bugs than you would expect Kinds or analysis in implementing detectors 7 Exami ation ofme od narnes signatures class hierarchy 7 Linear scan of bytecode instructions u 39ng a state machine 7 Method control ow graphs datatlow anlysi 7 No interprocedural ow analysis or sophi 39c 39 mm in war 5 s ated heap analysis ype Quali ers Shankar etal 01 Idea 7 Add tainted lluldl fgets tainted char buffer int size FILE f printf untainted char Orm t 7 7Use type constraint solver to nd errors 7 Errors are type mismatches Issues 7What is the type ofstrdup C7 7What happens when the Value of strings change Finding Bugs How FindBugs Handles the Example Bugs Null pointer derel erences 7 found 37 in rtjarl 5b59 55 in eclipse3 0 Array bounds checking 7 not an issue in Java Untrusted Code 7 Can static elds or the objects they refer to be modi ed by untrusted co e 7 Biblic non nal static elds 7 Riblic static elds pointing to an 7 Warnings 254 inrtjar 1 5b59 967 in eclipse3 0 Finding Bugs Automated Theorem Proving SAL at Microsoft 7 Standard Annotation Language for interface pre and post conditions 7 focus is on buffer overruns and pointer usage 7 SALinfer is a tool that determines specifications automatically Cnde Base Annotation V Fixes Bug Fixes Bugs 05 61205 Manuvir Das Microsoft Corporation Finding Bugs lit SAL Example Requirement on fees callers must pass a buffer that is len elements long quot l l39l int kui in let 39icid frag V l V r Assumption made by foo but is count elements long Local Checker Do the assumptions imply the requirements Requtrement on foo argument but is Ieri 4 bytes long meiiiset buf I lenksizec f int Requirement on niemset s callers must pass a buffer that is len bytes long int iiieniset l l xquot u iii i Void dezt int c izeit len Bugs 05 61205 Manuvir Dasl Microsoft Corporation C8553 Lecture Finding Bugs ll Dynamic Analysis Ccured Taming C Pointers by George Necula Scott McPeak and Wes Weimer May 22 2002 7 adds runtime checks to C programs for catching memory safety errors 7 requires user annotations 7 the only thing that happens statically is guring out what special type a pointer should he want fastest possible type that still can catch any possible dynamic errors Halt Memory Safety Violation Instrumented CCured C program Compile amp Translator Execute Success 7 around 1550 times faster than purify C Program gt CSBSR Lecture Finling Bligh How CCured Handles the Example Bugs New Pointer Types 7 SAFE pointer on use does a null pointer check 7 SEQ pointer on use does a null pointer check and an array bounds check 7 DYN pointer on use does a null pointer check a bounds check and a type check checks type casts Null Pointer Dereference 7 use SAFE pointer Array Bounds 7 use SEQ pointer Untrusted Access 7 has special handling for variable number of arguments L b 3 lecture Finding Bugs 11 3952 mm lr Remaining Issues Evaluation of new techniques is tedious 7 must have a human determine ifproblem reported is an actual bug 7 getting developers t Static Analysis 7 whole program versus partial program analysi 7 quality of alias analysis affects quality number of false positives van me 0 fix the bug is another battle 7 how can 39 quot kug quot quot iall auuLircr 7 might analyze different languages 7 experimenm performed on different benchmarks version of the software make a different benchmark 7 approach r r 39 C oncepts Approaches to bug detection 7 augmenting the type system 7 static analysis 7 automated theorem proving 7 dynamic analysis Comparing bug detection techniques is tricky 7 what is considered a real bug 7 how can we compare false positives with false negatives how can we determine them at mm Bu Next Time cture 7 This is it 7 review of what we covered this quarter 7 how does it all fit together 7 any requests LatticeTheoretic Framework for DataFlow Analysis Last time 7 Generalizing data ow analysis Today 7 Introduce latticetheoretic flamewotks for data ow analysis lav qud TllCUx UF museum in Context for LatticeTheoretic Framework Goals 7 Provide a single fonnal model that describes all data ow analyses 7 Formalize the notions of correct conservative and optimistic 7 Correctness prooffor IDFA iterative data ow analysis 7 Place bounds on time complexity of data ow analysis Approach 7 De ne domain ofprogram properties ow values computed by data analysis and organize the domain ofelements as alatu39ce 7 a 39 39 39 over L39 A 39 39 lattice operations 7 Exploit lattice theory in achieving goals k stir m l Lattices 000 Derme lattice L V n r V is a set of elements ofthe lattice 001 010 7 n is abinary relation over the elements ofV meet or greatestlower bound m 1 m 111 Properties oin 7 xy e V 7 x n y e V closure 7 xy e V 7 x n y y n x commutativity 7 xyz EV 7 x n y n z x n y n z associativity Ldtilm Wham mutant rm in 100 110 Lattices cunt T 000 Under E 7 lmposes apartial order on V 001 010 100 7 x g y e x n y x lgtlt gtltl 011 101 110 Top T 7 Aunique greatest element ofV ifit exists k 7Ver7TxT 1 Bottom J 7 Aunique least element ofV ifit exists 7 Ver7llx Height of lattice L 7 The longest path through the partial orderfrom greatest to least element top to bottom lt3 L cur F vr m7 DataFlow Analysis via Lattices Relationship 7 Elements ofthe lattice V represent ow values in and outl sets 7 2g Sets of live variables for liveness 7 T represents bestcasequot information initial ow value 7 2g Empty set 0 7 l represents worstcase information 7 2g Universal set i i k 7 n meet merges ow values 7 2g Set union in bk bk 7 the E y then x is a conservative approximation ofy E S ilk g uperset l t j Lsruu Lanny Theorenrrmiwm c fav No DataFlow Analysis via Lattices cont Remember whatthese new values represent 7 At each program point a lattice element represents an inl set oran out set DataFlow Analysis Fram eworks Data7uow analysis framework 7 A set of now values V 7 A binary meet operator n 7 A set of now functions F also known as transfer functions Flow Functions 7 F f VgtV folescribes how each node in CFG affects the ow values 7 Flow functions map program behavior onto lattices Lamar th h l39i uvu wm t in in 0 y W Cqu Ll nilns at Visualizing DFA Frameworks as Lattices Example Liveness analysis with 3 variables U vlv2v3 z r 7 V 25 vlv2v3 vlv2vlv3v2v3 v1v2v3 Q v1 v2 v3 7 Meet n U 9 2 vlv2 vlv3 v2v3 7 TopT r25 7 Bottom l U fnX Genn u X 7 Kill vn 1N2 L Inferior solutions are lower on th lattice ore conservative solutions are lower on the lattice Lathr r tvi vianimus Far N A Lattice Example What is the datari low set for liveness What is the meet operation for liveness What partial order does the meet operation induce What is the liveness lattice for this example moss Ll lmi Lattir thawed comm rm WA 513 s2 h312 53 c 1readl 55 l c gt Al Recall Liveness Analysis Datadow equations ior liveness inn usen U outn 7 dei n t U 39 ou n Se summing Liveness equations in terms oi Gen and Kill 39 U t 7 ki quot n genm quot1 quotD A use ofa variable generates liveness outn U ins A defofavan39able kills liveness s 5 sun n Gen New information that s added at anode Kill Old information that s removed at anode Can derme almost any dam7uow analysis in terms oiGen and Kill r aw Lem lattlte Tm More Examples Reaching dermitions 25 s set oiall deis 39 2 r opT E 7 Bottom l U Lnlnl e l our i Jle Reaching Constants r V 2quot Variables V and constants c 7 Eopm 7 Bottom l l39iunwwm r m Elil Q U 3 Direction of Flow Backward datari low analysis 7 Information at anode is based on what happens later in the ow graph 12 in is defined in terms of out n inn genn U outn 7ki11n outn 5 EUEEM rns liveness Forward data7now analysis 7 Information at anode is based on what happens earlier in the ow graph 12 out is de ned in terms of in n rnn p SUM outp reaching outn genn U inn 7 killn definitions Some problems need both forward and backward analysis 7 2g Partial redundancy elimination uncommon r asitettmr Lalhl Merging Flow Values Reaching Defs Example Live variables and reaching de nitions What is the initial guess 7 Merge now values via set union Reaching Definitions Live Variables 7 What is the meet operation lnn p ELmedn0uts outn selstmmg outn genn U inn 7 killn inn genn U outn 7 killn Why When might this b e inappropriate moss Lam Lain a Tll v39 ll tummy rm DM Lame Tm Available Expressions cunt Data7F1ow Equations Available Expressions Example What is the initial guess inn outp pE meant outn genn U inn 7 killn What is the meet operation Plug it In to our general DFA algorithm for each node n inn 11 outn 11 NW I What does the la jce look like for each inln mn out39n outn lnn 0 out e 7 11 outn again U lnn k111n until n39nnn and out39n0utn for all n Lam min i he V39lunwwm fw DE t t ls Llhh Low Solving DataFlow Analyses Goal 7 For a forward problem consider all possible paths from the entry to a given program point compute the ow values at the end ofeach path and then meet these values together 7 Meetoverallpaths MOP solution at each program point mpathsnl n2 n nd 1220 thay xv Lentil 39l39hecieurrgneum 2 r Solving DataFlow Analyses cont Problems 7 Loops result in an infinite number ofpaths 7 Exponential blowup Solution 7 Compute meets early at merge points rather than at the end 7 Maximum xedpoint MFP Questions 7 Is this correct 7 Is this e icient 7 Is this accurate k 5le m u it Correctness Is vMH correct Is vMH Lvmp pl p2 v v Look at Merges F2 Fr Vnor FrVpi Frquotn2 van VMoP VMFP Fi pi Vpa VMFP E Vnor E FrVpi m E Fi pi Fvpa Observation xyEV l x39y E l x39 y es x y x y vMFP correct when F really the ow functions are monotonic min 39lhaueWrmiswn rm m Monotonicity Mon otonicity VxyEVx E y x E l y 7 Ifthe ow function fis applied to two members ofV the result of applying fto the lesser ofthe two members will be under the result of applying fto the greater ofthe two 7 Givin aflow function more conservative inputs leads to more conservative outputs never more optimistic outputs 0 Why else is monotonicity imp ormnt 0 D M For monotonic F over domain V Th maximum number oftimes F can be applied to id ik Lk selfwo reaching a xed point is heightV 7 l r IDFA is guamnteed to terminate ifthe ow llkgt functions are monotonic and the lattice has nite height Mr M A in Efficiency Parameters 7 n Number ofnodes in the CFG 7 k Height oflattice 7 t Time to execute one owfunction Complexity 0nkt Example 7 Reaching definitions i av Lvlm Lima Thetimrrmiwm raw rm Accuracy Distributjvity fluw fl r fV 7 VMFP E VMOP E FVpi sz E FVpi FVpa 7 Lfthe ow functions are distributive MFP 7 MOP Examples 7 Reaching definitions 7 Reaching constants f n V PM X3y2 7 m 7 Q f flV fx2y3 X3y2 7 x2y3w5 x2y2w5 7 w5 gt MFP MOP ka 5le m n 32 Tu ples of Lattices Problem 7 simple analyses may require Very complex lattices eg Reaching constants Solution 7 Use atuple oflattices one per variable LVquot L VT nTnN V Tm 7 Meet n pointwise application of nT 7v u v 39 7 Top T tuple oftops T T 7 Bottom l tuple ofbottoms LT 7 Height L 7N heightLT min laVLth39f i t rm m 2 Tuples of Lattices Example Reaching constants previously 7 P 7 ch for variables v amp constants c 7V2P Alternatively 7 V 7 c u T l L The whole problem is a tuple of lattices one for each Variable i awe amst m we Examples of Lattice Domains Two7pointlutticeT and l 7 Examples 7 Implementation Set ofincomparable Values and T and l 7 Examples Powerset la jce 25 and l s or vice versa 7 Isomorphic to tuple oftwopoint lattices i xv Lwlmi 39l39lieareurrmiewmx zv Concepts Latu39ces 7 Conservative approximation 7 Optimistic initial guess 7 Data ow analysis frameworks 7 Tuples oflattices Dutuuow analysis 7 Fixed point 7 Meetoverallpatlis MOP 7 Maximum xedpoint MFP 7 Legalsafecorrect monotonic 7 Ef cient 7 Accurate distributive k 5le m u Next Time Lecture 7 Some transformations that you can implement for project 4 7 Copy propagation 7 Constant propagation 7 Common subexpression elimination Lattin minim Fravu 39mi t in 3 Compiling Object Oriented Languages Last time 7 Undergraduate compiler review Today 7 Introduction to compiling object oriented languages 7 What are the performance issues 3353 Leanne Compihng Object Li a iented Languages What is an ObjectOriented Programming Language Objects Person 7 Encapsulate code and data 39 Student Teacher Inheritance 7 Supports code reuse and software PrintName Person p evolution Person p new Person Student 5 new Student Subtype polymorphism 7 Can use a subclass wherever a parent PrintName p class is expected PrintName S Dynamic binding message sends P reprimand0 7 B1nd1ng of method name to code 1s done dynamically based on the dynamic type of the receiver object C8553 Lemme omguhng Object Oriented Lang Implementation Inheritance of Instance Variables Goal 7 Lay out object for typeindependent instance variable access Solution single inheritance 7 Pre Xing superclass elds are at beginning of object Example Person Student Teacher Name Name Name 1 Salary 23353 Leonu e CompiLlng Object Li zi iented Languages Implementation Dynamic Binding Problem 7 The appropriate method depends on the dynamic type of the object eg p reprimandO Solution 7 Create descriptor for each class not object encoding available methods 7 Encode pointer to class descriptor in each object 7 Lay out methods in class descriptor just like instance variables Person I Student I Teacher getName getName getName reprimand reprimand workhard party Usage summary 7 Load class descriptor pointer from object 7 Load method address from descriptor Jumpeto method 1255553 Lech Compiling Object Oriented Laugh Why are ObjectOriented Languages Slow Dynamism 7 Code 7 Data Style 7 Granularity lots of small objects 7 Exploit dynamism Other reasons 7 Highlevel modern features such as safety 7 Garbage collection 3353 Leanne C cvn lpiLIng Objsci Li a iented Languages Dynamism Code Dynamic binding 7 What code gets executed at a particular static message send 7 It depends and it may change Example class rectangle extends shape 7 7 int length int width int area return length width class square extends rectangle int size reCtarea int length returnsize sqarea int width returnsize C8553 Lemme omguhng Object Oriented Lang Cost of Dynamic Binding Direct cost 7 Overhead of performing dynamic method invocation Indirect cost 7 Inhibits static analysis of the code Example class rectanglezshape Want to inline and assign to registers etc int length int width gt int area return length width Driessen and Holzle OOPSLA 96 in C programs median of 52 total execution time spent on dynamic dispatch 3353 Lecuu e Compihng Obiec i talented Languages Dynamism Data Object instance types are not statically apparent 7 Need to be able to manipulate uniformly 7 Boxing wrap up all data and reference it with a pointer Example Integer n new Integer33 n pointer L type descriptor data int 1255553 Lemme omguhng Objecl Gnemeci Laugh Cost of Dynamism Data Direct cost 7 Overhead of actually extractng data 7 eg 2 loads versus 0 if data already in a register Indirect cost 7 More dif cult to statically reason about data Compihng Objeci Oriented Languages 10 Style Sometimes programmers write Cstyle code in 00 languages 7 Easy just optimize it away Sometimes programmers actually exploit dynamism 7 Hard it can t just be optimized away Programmers create many small objects 7 Thwarts local analysis 7 Exacerbates dynamism problem 7 Huge problem for pure 00 languages Programmers create many small methods 7 Methods to encapsulate data 7 e g Methods to get and set member elds 1355553 Lemme Cunwihng Objecl Oriented Lam A Concrete Example Java Highlevel and modern 7 Obj ectoriented 7 Mobileportable standard bytecode IL 7 Multithreaded great for structuring distributed and UI programs 7 Garbage collected 7 Dynamic class loading 7 Reasonable exception system 7 Rich standard libraries 3353 Leanne Compihng Obieci Li a iented Languages 11 Approaches to Implementing Java Interpretation 7 Extremely portable 7 Simple stack machine 7 Performance suffers 7 Interpretation overhead 7 Stack machine no registers Direct compilation 7 Compile the source or bytecodes to native code 7 Sacri ces portability 7 Can have very good performance C8553 Lemme Compng Objecl Gnemeci Languages l3 Approaches to Implementing Java cont JIT compilation 7 Still supports mobile code with more effort 7 Can have very good performance 7 Compilation time is critical 7 Compiler can exploit dynamic information J IT Dynamic compilation 7 Compiler gets several chances on the same code 7 Compiler can exploit changes in dynamic information 8353 Leanne Compihng Objeci Oriented Languages 14 Why is Java Slow Bytecode interpretation 7 Not a good answer C8553 Lemme Compihng Object Oriented Languages 1 Why is Java Slow Impediments to performance 7 Dynamic class loading thwarts optimization 7 Even the class hierarchy is dynamic 7 Flexible array semantics 7 Runtime checks null pointers array bounds types 7 Precise exception semantics thwart optimization 7 Multithreading introduces synchronization overhead 7 Lots of memory references poor cache performance and all the usual 00 challenges CS f Leanne C mnp ihng Object talented Languages 17 Analysis with a Dynamic Class Hierarchy Approaches 7 Ignore it z39e disable dynamic class loading 7 Exploit nal classes amp methods 7 Conservative optimization eg guarded devirtualization 7 Track validity of current code fragments and rebuild as necessary 7 eg Resolution dependence graph 7 Necessitates JITdynamic compilation 1355553 Lemme Compiling Object Oriented Lam Scienti c Programming and Java Consider matrix multiplication for i0 iltm i for j0 jltp j for k0 kltn k Cij Aik Bkj Costs 6 null pointer checks with just 2 oating point operations 6 index checks Can we optimize this code Precise exception model Exception semantics inhibit removal or reordering No multidimensional arrays Rows may alias Lecture Compiiing Object Oriented Languages More on Matrix Multiplication Why can t we just do this if m lt CsizeO ampamp p lt Csize1 ampamp m lt AsizeO ampamp n lt Asize1 ampamp n lt BsizeO ampamp p lt Bsize1 for i0 iltm i for j0 jltp j for k0 kltn k Cij Aik Bkj else raise exception No outof bounds checks right 8553 Lecture Compiiing Dbiect riented Languages i9 Exceptions in Java Exceptions in Java are precise The effects of all statements and expressions before a thrown exception must appear to have taken place and The effects of all statements or expressions after a thrown exception must appear not to have taken place Implications Must be very careful or clever when Eliminating checks or Reordering statements Lechure Cempllmg Dhjec trimmed 20 Safe Regions Moreira et a1 TOPLAS 2000 Idea Create two versions of a block of code One is guaranteed not to except and is optimized accordingly The other is used when the code might except if m lt CsizeO ampamp p lt Csize1 ampamp m lt AsizeO ampamp n lt Asize1 ampamp n lt BsizeO ampamp p lt Bsize1 for i0 iltm i safe region for j0 jltp j for k0 kltn k Cij Aik Bkj else for i0 iltm i unsafe region for j0 jltp j for k0 kltn k Cij Aik Bkj Java Arrays and Loop Transformations Java arrays 7 No multidimensional arrays 7 Instead use arrays of arrays can be ragged r Requires one memory reference for each array dimension 7 Rows may alias with one another Arrays are common in scienti c applications 7 Their use requires optimization for good performance 7 Large body of Work on loop transformations makes assumptions 7 Arrays stored in contiguous memory 7 No aliasing among array elements 7 Arrays are not ragged mmmmg 0mm C nemed Lg Java Arrays mm All r v mm calm Imam W mmm mmm am pmquot n mam HWn mam Dom noumE um aw mm 0 0mm gamma umzcr quot arm mm mm wmmintu mm mum Hemmer vme Bound DEBBIE count ow ymm l pumm Poms m Amman mum 2 mm ow mm m mmm m m 1mm mm mm Dotmi locum locum noualz mman m 093ml immm omzcr mlmzkm 93M mmm M mm gtgtoilacr mums lawn nuum mm m wzcr Fawn bum mmsymm Jowmzl 391 mun EM anprlm 0mm nevrcd Loniuages Summary Implementing OOP requires handling 7 member variables and inheritance 7 dynamic binding due to polymorphism Some OOP features that lead to inefficient code 7 dynamic code and data 7 programming style ie use of dynamism and small function bodies 7 safety ie array bounds checks and precise exceptions 7 garbage collection Many sources of inefficiency in Java 7 The cost of a cleaner objectoriented language Progress in improving Java ef ciency 7 Greatest performance boost comes from eliminating interpretation overhead 7 Scienti c application performance ie extending language to include multidim arrays CS Leena Comping Obj quot med Languages rr Next Time Assignments 7 Project 1 due in one week Reading 7 Garbage collection readings on web and in book Lecture 7 Garbage collection Cum guhng Objecl Snell le Languages Garbage Collection Last time 7 Compiling ObjectOriented Languages Today 7 Motivation behind garbage collection 7 Garbage collection basics 7 Garbage collection performance 7 Speci c example of using GC in C Acknowledgements 7 These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides CS Lemch lt3anng C olieciion I Background Static allocation variables are bound to storage at compiletime 7 pros easy to implement 7 cons no recursion data structure sizes are compiletime constants data structures cannot be dynamic Stack allocation dyn alloc stack frame for each proc invocation 7 pros recursion is possible data structure sizes may depend on parameters 7 cons stack allocated data is not persistent stack allocated data cannot outlive the procedure for which it is de ned Heap allocation arbitrary alloc and dealloc of objects in heap 7 pros solves above problems dynamic persistent data structures 7 cons very dif cult to explicitly manage heap CS Lemch lt3anng C olieciion Memory Management Ideal not possible 7 deallocate all data that will not be used in the future What is garbage ManualExplicit 7 programmer deallocates with free or delete Automatic1m plicit 7 garbage collection L955 Lecture Garbage Collection Explicit versus Automatic Explicit efficiency can be very high gives programmers control 7 more code to maintain 7 correctness is difficult 7 core dump if an object is freed too soon dangling pointers 7 space is wasted if an object is freed too late 7 if never free at best waste space at worst fail Automatic reduces programmer burden eliminates sources of errors integral to modern OOP languages ie Java C can not determine all objects that won t be used in the future may or may not hurt performance Key Issues For both 7 Fast allocation 7 Fast reclamation 7 Low fragmentation wasted space For Garbage Collection 7 How to discriminate between live objects and garbage Basic approaches to garbage collection 7 reference counting 7 reachability Garbage Collection Reference Counting Idea 7 for each heap allocated object maintain count of of pointers to it 7 when creating object X rcX 0 7 when creating new reference to object X rcX 7 when removing reference to object X rcX 7 if ref count goes to zero free object ie place on free list Example 39 null Node X y x v X new Node 3 null y X X null y 39 NOde y X 1 0 Complication 7 what if freed object contains pointers Garbage Collection Reference Counting Analysis How it handles key issues 7 allocation is expensive because searching a freelist 7 reclamation is local and incremental 7 fragmentation is high Further analysis relatively simple very simple runtime system 7 cannot reclaim cyclic data structures shifts burden to programmer 7 high runtime overhead must manipulate ref counts for every reference update 7 space cost 7 complicates compilation CS Leann Garlng Collection Trace Collecting Observation rather than explicitly keep track of the number of references to each object we can traverse all reachable objects and discard unreachable objects Details 7 start with a set of root pointers program vars root set 7 global pointers 7 pointers in stack and registers 7 traverse objects recursively from root set 7 visit reachable objects 7 unvisited objects are garbage 7 we might visit an object even if it s dynamically dead ie we are only conservatively approximating dead object discovery When do we collect when the heap is full Garlng Collection MarkSweep Collecting Simple trace collector trace reachable objects marking reachable objects sweep through all of heap add unmarked objects to free list clear marks of marked objects Example V X T T T y p T b F F Garbage Collection 539 MarkSweep Collecting Analysis How it handles the key issues 7 allocation is expensive because searching a freelist 7 reclamation can result in the embarrassing pause problem 7 poor memory locality when tracing 7 fragmentation is high Further analysis collects cyclic structures simple 7 must be able to dynamically identify pointers in vars and objects 7 more complex runtime system 7 space overhead is only one bit per data object Garbage C DildulIOD 10 MarkCompact Collecting or Copy Collecting Idea move objects to new heap while tracing Details divide heap in half prog allocs in fromspace tospace is empty when fromspace is full copy nongarbage from fromspace to tospace tospace is compact when visiting object during tracing leave forwarding pointer in fromspace version of object if revisit this object redirect pointer to tospace copy Lecture Garbage Collectloh ll Copying Garbage Collection from space to space 8553 lecmre Garbage Collection l2 MarkCompact Collecting Analysis How it handles the key issues 7 allocation is very fast since there is no free list to search 7 reclamation can result in the embarrassing pause problem 7 poor memory locality when tracing 7 copying data from one heap to another 7 changing pointers because objects are being moved visits only reachable objects no fragmentation Further Analysis collects cyclic structures 7 requires twice the virtual memory 7 breadth rst traversal means tospace objects could have poor locality Gmng Collection Hybrid Collectors Idea 7 different collection techniques may be combined Example MarkSweepCopy collector 7 big objects managed with marksweep avoids copy time 7 small objects managed with copy collector Analysis may be more ef cient 7 more complex Gmng Collection Generational Collecting Observation quotyoungquot objects are most likely to die soon while quotoldquot objects are more likely to live on Idea exploit this fact by concentrating collection on quotyoungquot objects Details divide heap in generations G0 G1 GO for youngest objects collect GO most frequently G1 less frequently etc object is tenured from one gen to next after surviving several GCs Result usually only have to collect a small subheap CS Leann Giulng C alieniron 1 39 Generational Collecting cont Additional issues 7 need to encode age in object 7 root set for objects in one generation may come from another gen 7 generation Gi should be k times larger than Gil 7 each generation may be collected with different algorithm Dealing with crossgeneration pointers 7 older to younger z39e Gi to Gj for igtj are uncommon 7 search all of Gi 7 write barriers 7 younger to older z39e Gj to Gi for igtj are very common 7 collect Gj when collecting Gi CS Leann Giulng C alieniron 16 Generational Collecting Analysis How it handles the key issues 7 allocation in the youngest heap is fast if a copy collector is used 7 reclamation is fast because doing collection on smaller heap 7 fragmentation depends on collector used in each heap Further Analysis less memory is required if use marksweep for older generations possibly better locality 7 still sometimes do full slow collections embarrassing pause 7 need to record age with each object Garbage Collection 1 Who does What Pointers Issues in order to trace reachable objects we must be able to dynamically determine what is a pointer 7 imagine doing this in C 7 easier in Java how 7 compiler support andor runtime tagging 7 convention about what can be a pointer what if we re not certain about what is a pointer 7 be conservative assume anything that may be a pointer is 7 may keep extra garbage 7 can not move objects markcompact 7 conservative garbage collectors can be used with C CS Leann Garbage Collection 18 Who does What Scheduling Garbage Collection Generally 7 When allocation is no longer possible garbage collection is necessary 7 VM usually stops all mutator threads 7 J IT generated code must properly handle outofmemory exception Write barriers for reference counting and generational collection 7 Each time a write to a pointer occurs a write barrier catches this and performs some action 7 generation collection needs the write barrier to keep track of pointers from older generations to new generations 7 reference counting requires write barriers to detect when any pointer changes Garbage Collection 1 Performance Impact of Garbage Collection Blackburn et a12004 Experiment setup 7 use Jikes RVM research Virtual machine highly optimized 7 MMtk The Memory Management Toolkit is a framework for construction of garbage collectors within Jikes RVM 7 Machines Athlon best performance Pentium 4 Power PC 7 Benchmarks SPEC JVM benchmarks and pseudojbb variant of SPEC JBB2000 7 garbage collection algorithms 7 semispace a copying tracing collector 7 marksweep 7 referencecounting Garbage Collection 0 Blackburn et al 2004 Some Conclusions Contiguous allocation is better than freelist allocators 7 mutator performance is 515 better due to improved data locality Generational collectors are better than whole heap collectors 7 write barrier overhead is only 214 and is outweighed by improvements in collection time Tracing collection is better than reference counting 7 the overhead of reference counting is too expensive 7 reference counting can be bene cial for older generations of the heap Nursery size in generational collector should be about 48MB 7 debunks the myth that the size should be about L2 cache size 512KB 7 have to get to the point where constant number of roots about 64KB Gmdvage Collection 2 1 Concrete Memory Management Problem OpenAnalysis 7 goal is to do program analysis of large programs therefore can t just leak memory 7 explicit management is very error prone 7 difficult to debug segfaults 7 for analysis results it isn t clear which objects should own which other objects therefore an explicit management policy proved very problematic Options 7 use one of the conservative garbage collectors 7 have portability issues 7 smart pointers CS Lemch Gmdvage C Lillamom a Smart Pointers auto ptr in C standard 7 basic idea void f autoiptrltintgt p new int p 5 the dynamically created object is deleted when p goes out of scope l 7 problem is that only one autoiptr can point to a particular object at any one time CS f53 Larich Garbage Collection 73 0Aptr in OpenAnalysis Goals 7 allow multiple smart ptrs to refer to the same object similar to sharediptr in Boost library 7 catch as many common errors statically as possible 7 allow the smart pointer to be used just like a normal pointer as much as possible 7 use simple template mechanisms so as not to confuse many C compilers Details 7 implemented using reference counting e the OAiptr class has a pointer to the object and a pointer to a reference counter Garbage Collection 1 Some tricky stuff Allowing polymorphism OAiptrltfoogt p p new bar bar is a subclass of too p gthello hello is a virtual method p hellO Override the access operators Tt operatorgt const l assertmPtr l NULL return mPtr l Tamp operator const l assertmPtr l NULL return mPtr Garl age Collcchm 2 5 Allow type conversions Want to pass in a subclass to a base class iht foo OAiptrltbasegt l int maih OAiptrltsubgt p p foosub new sub Implementing subclass to base class assignment in smart pounter template ltclass Tgt class OAiptr so that can pass a subclass into base class Stroustrup 349350 template ltclass T2gt operator OAiptrltT2gt const returh OAiptrltT2gtmPtrmRefCouhtPtr Garbage Callaghan 6 Not a perfect solution Decided against raw pointer comparison to catch more static errors OAiptrltLocationzLocationgt 10c rgtfoomre if llocptrEqualNULL Returning an OAptr OAiptrltLocIteratorgt AliasMapgetMayLocsMemRefHandle ref int setId mMemRefToIdMapref OAiptrltAliasMapLocItergt retval retval new AliasMapLocItermIdToLocSetMapsetId return retval Usage that causes dynamic errors OAiptrltLocationgt loc NamedLoc myLoC DO NOT assign the quotthisquot pointer to an OAiptrlll loc this DO NOT assign the address of a local to an OAiptrlll loc amyLoc Gm ng C x 2 in Summary Categorizing garbage collection algorithms 7 how is garbage identi ed reference counting or tracing 7 when is it collected incrementally generationally or all at once 7 what allocator is used contiguousbump allocator or freelist 7 what mechanism is used for reclamation copying or put on freelist 7 how is the heap space managed split for copying or generations Open problems in garbage collection 7 still no conclusive evidence that it is always faster or slower than explicit memory management 7 how can we measure whether it is or not Gm LEE Collection mrS Next Time Tuesday 7 Implementing GC for MiniJava 7 Starting register allocation Assignments 7 Read the Project 2 writeup before Tuesday will send email to list when it has been posted CS f Lecnu c Garbage Collection 2 Introduction to Data ow analysis Last Time 7 Implementing a Mark and Sweep GC Today 7 Control ow graphs 7 Liveness analysis 7 Register allocation 3353 Lecnu e introduc ou to D812gt 0 1d18l 751i Data ow Analysis Idea 7 Data ow analysis derives information about the dynamic behavior of a program by only examining the static code Example 7 How many registers do we need for the program on the right 1 a i 0 7 Easy bound the number of 2 L1 b a 1 variables used 3 3 c i c b 7 Better answer is found by 4 a b 2 considering the dynamic 5 if a lt 9 goto L1 requirements of the program 6 return c C8553 Lemme Introduction to Data oWPmalysis Liveness Analysis De nition 7 A variable is live at a particular point in the program if its value at that point will be used in the future dead otherwise To compute liveness at a given point we need to look into the future Motivation Register Allocation 7 A program contains an unbounded number of variables iMust execute on a machine with a bounded number of registers iTwo variables can use the same register if they are never in use at the same time Le never simultaneously live Register allocation uses liveness information CS f53 Lecture lutroduc m to Dam cm Armlysli Control Flow Graphs CFGs Definition 7A CFG is a graph whose nodes represent program statements and whose directed edges represent control ow Y Example 1 30 YY 1 a 0 2ba1 2L1 ba1 v 3 ccb 3ccb 4 a b2 Y 5 1falt9gotoL1 4ab2 6 returnc V 5 alt9 pNo Yes 6 return c 1355553 Lemme lim oduclion lo iata ow r malvsis Terminology Flow Graph Terms 7 A CFG node has outedges that lead to successor nodes and in edges that come from predecessor nodes 7 predn is the set of all predecessors of node n V succn is the set of all successors of node n 1 a 0 Y Y 2 Examples b a 1 7 Outedges ofnode 5 56 and 52 3 Y c c b 7 succ5 26 v Pred5l 4 4 a b 2 7 pred2 15 y 5 alt9 LNo Yes 6 return c 38553 Lecture LntrCduch39 on to Data owkmalysis Liveness by Example What is the live range of b 7 Variable b is read in statement 4 1 a 0 so b is live on the 3 4 edge Y Y 7 Since statement 3 does not assign 2 b a 1 into b b is also live on the 23 Y edge 3 c c b 7 Statement 2 assigns b so any Y value ofb on the 12 and 4 a b 2 52 edges are not needed so b V is dead along these edges 5 alt9 L No Yes b slive range is 234 6 return c C8553 Lecture liltroduci i on lo Data ow Paralysis Liveness by Example cont Live range of a 7 a is live from l gt2 and again from 4 gt5 gt2 7 a is dead from 2 gt3 gt4 Live range of b 7 b is live from 2 gt3 gt4 Live range of c 7 c is live from entry gt l gt2 gt3 gt4 gt5 gt2 5 gt6 Y 1 a0 YY 2ba1 Y 3ccb Y 4ab2 Y 5 alt9 pNo Yes 6 return c Variables a and b are never simultaneously live so they can share a register CS 353 Lecuu e Uses and Defs Def or de nition 7 An assignment of a value to a variable lntl39Cdllc Oil to Dam ow mralysli J 7 definodev set of CFG nodes that de ne variable v 7 de n set of variables that are de ned at node n Use 7 A read of a variable s value 7 useinodev set of CFG nodes that use variable v 7 usen set of variables that are used at node n More precise de nition of liveness 7 A variable v is live on a CFG edge if a 0 a lt 9 vlive O i definodev V O E useinodev 1 El a directed path from that edge to a use of v node in useinodev and 2 that path does not go through any def of v n0 nodes in definodev 1255553 Lecim e liltroduci i on lo Datailow Analysis v The Flow of Liveness Data ow 7 Liveness of variables is a property that ows through the edges of the CFG Y 1 a 0 A D1rectlon of Flow V V 2 7 Lrveness ows backwards through the CFG b i 1 because the behavior at future nodes 3 V determines liveness at a given node c g b 4 a b 2 7 Consider a Q 7 Consider b 5 a lt 9 7 Later we ll see other properties y No Yes that ow forward 6 return c 23353 Lecnu e lntrodrlc m to Dam7 owzmalysri program points Liveness at Nodes edges We have liveness on edges V 4 just before computation 7 How do we talk about a 0 4 just after computation liveness at nodes Two More Definitions 7 A variable is liveout at a node if it is live on any of that node s outedges Y r Tl liveout L v 4 outedges 7 A variable is livein at a node if it is live on any of that node s inedges I V medges n lrvern V 4 C8553 Lecture Introduction lo Data oWPmalysis l0 Computing Liveness Rules for computing liveness 1 Generate liveness liveinA If a variable is in usen use it is livein at node n 2 Push liveness across edges If a variable is livein at a node n liveout liveout livemut Pradl l i V r then 1t 1s 11veout at all nodes 1n predn n livein 3 Push liveness across nodes If a variable is liveout at node n and not in de n livfin Tl then the var1able is also 11ve1n at n hkut Data ow equations 1 innusen U outn defn 3 outn U ins 2 s E succn 3353 Lecuu e introduc on lo Datagt DWIdlaly li 1 Solving the Data ow Equations Algorithm for each node n in CFG inn Q outn Q repeat for each node n in CFG in n inn initialize solutions save current results out n outn inn usen U outn 7 de n outn U ins e succn solve data ow equations until in ninn and out noutn for all n test for convergence This is iterative data ow analysis for liveness analysis C8553 Lemme liltl oduciion l0 Data oWPmalysis Example lst 2nd 3rd 4th 5th 6th 7th age use def in out in out inout inout in out inout inout l a 0 l a a a ac c ac c ac c ac V 2 a b a a bc ac bc ac bc ac bc ac bc ac bc 2 b a 3 bc c be bc b bc b bc b bc b bc bc bc bc v 4 bab baba bacbcacbcacbcac 3ccb 5 a a a a ac ac ac ac ac ac ac ac ac ac ac y 6 b V c c c c c c c c 4a Data ow Equations for Liveness 5 a lt 9 inn usen U 0utn 7 de n No Yes 6 return c 0utn U ins sE succ n Inn39nrjuc m to Dam cm Analyst 13 Liveness Analysis in the MiniJava compiler Currently 7 Parse into AST 7 Allocate space on stack for locals and parameters and space in heap for member variables 7 Use stack for expression evaluation 7 Generate MIPS code from AST To perform data ow analysis 7 Need intermediate representation like 3address code 7 Use temporaries for parameters locals and expression results 7 Indicate uses and defs 0f temporaries in each 3address code instruction 7 Create a control ow graph with each 3address code instruction as a node 1355553 Lemme lim oduciion lo iata ow r u39lalysis H Register Allocation Problem 7 Assign an unbounded number of symbolic registers to a xed number of architectural registers 7 Simultaneously live data must be assigned to different architectural registers Goal 7 Minimize overhead of accessing data 7 Memory operations loads amp stores 7 Register moves 8353 Leanne Registere Lllmmnon I 15 Scope of Register Allocation Expression Local Loop Global Interprocedural C8553 Lecture Regxsler Mmeanon I 6 Granularity of Allocation What is allocated to registers 7 Variables 7 Live rangesWebs i e duchains with common uses 7 Values ie de nitions same as variables with SSA b1 s x 5 Variables 2 x amp y gt lt Live Ranges Web 3 s1 gts2s4 2 S2 y x 3 S4 x 32933 S3 x y1 S5 x 3 3335936 Values 4 s1 s2 s3 s5 s3s5 b4 s6 x What are the tradeo s Each allocation unit is given a symbolic register name eg t1 t2 etc Register e Lllmmrzon 1 CS 353 Leanne Global Register Allocation by Graph Coloring Idea Cocke 71 First allocator Chaitin 81 1 Construct interference graph GNE 7 Represents notion of simultaneously live 7 Nodes are units of allocation e g variables live ranges values 7 El edge n1n2 E E if n and n2 are simultaneously live 7 Symmetric not re exive nor transitive 2 Find k coloring of G for k registers 7 Adjacent nodes can t have same color 3 Allocate the same register to all allocation units of the same color 7 Adjacent nodes must be allocated to distinct registers t2 t1 t3 C8553 Lemme Regisier AJIOcation I Interference Graph Example Variables a b C39 a d a F lt A d c e b a d 1 b 4 d c d c e a e b 8353 Leanne Registere Lllouanon I 19 Computing the Interference Graph Use results of live variable analysis for each symbolicregistertemporary ti do for each symbolicregistertemporary t j lt i do for each def E de nitions of ti do if tj is live out at def then E e E U titj Options 7 treat all instructions the same 7 treat MOVE instructions special 7 which is better C8553 Lemme Regxsler L louation I 70 Allocating Registers Using the Interference Graph Kcoloring 7 Color graph nodes using up to k colors 7 Adjacent nodes must have different colors Allocating to k registers E nding a kcoloring of the interference graph 7 Adjacent nodes must be allocated to distinct registers But 7 Optimal graph coloring is NPcomplete 7 Optimal register allocation is NPcomplete too must approximate 7 What if we can t k color a graph must spill CS f Leanne RegisterA mmzon l I Register Allocation Spilling If we can t nd a k coloring of the interference graph 7 Spill variables nodes until the graph is colorable Choosing variables to spill 7 Choose arbitrarily or 7 Choose least frequently accessed variables 7 Break ties by choosing nodes with the most con icts in the interference graph 7 Yes these are heuristics 1355553 Lemme Regxsier Mlooation I 7 quot Spilling Original CFG and Interference Graph a b c a d e f 3353 Leann Register A110me zen i A Spilling After spillin I b a b1 M 24 131 C 39 e b2M z4 bz 1255553 Lemme Regxsier Mlovaiion I Simple Greedy Algorithm for Register Allocation for each n E N do select n in decreasing order of weight if n can be colored then do it reserve a register for n else Remove n and its edges from graph allocate n to stack spill a After spilling b r1 M a4 r1 c 39 d a a e 39 d 39 f e e c r2 M 24 r2 23553 Lecture Register dlouarionl f d Example Attempt to 3c010r this graph a b Weighted order a b e C C d f f d e What if you use a di erent order C8553 Leanne RegisterAJlouation I Example Attempt to 2c010r this graph Weighted order a b C x 2 I 23353 Leanne Registere Lllmmnon 1 Concepts Liveness 7 Used in register allocation 7 Generating liveness 7 Flow and direction 7 Data ow equations and analysis Control ow graphs Register allocation 7 scope of allocation 7 granularity what is being allocated to a register 7 order that allocation units are Visited in matters in all heuristic algorithms Global approach greedy coloring C8553 Lemme liltroduciion l0 Data oWPmalysis Next Time Reading 7 Ch 84 92925 intro to data ow analysis 7 Ch 88 and Briggs paper register allocation Lecture 7 Improvements to graph coloring register allocators 7 Register allocation across procedure calls Suggested Exercises 7 See schedule on website Inn39gduc mt to Dam cm Analyst 28 quotompilers Review cont Structure of a Typical Compiler Announcements 7 Advice for the project writeups was posted on the mailing list 7 SVN log will need more than one entry for a one day extension Today 7 Generating intermediate representations 7 AST 7 3address code 7 Trees ssem 7 Generating MIPS assembly 35 mm htufrgi time Cnmpilrzn nsmw Analysis Synthesis character stream lexical analysis tokens words syntactic analysis TR Unrmgacua r rumpus lerw Program IR Code Generation AST 7 usually language dependent Intermediate Representation IR 7 U a language independent and target independent representation 7 Examples 7 3address code 7 RTL used in GCC like 3address code 7 LLVM used in the LLVM compiler like 3address code but typed 7 Tree data structure in the MiniJaVa Compiler a little different AST 77gt IR 77gt target code 3955 mm htufrgi time Cnmpilrzn nsmw Goal 7 Tran Form A Tinto lOWIFI el 39 Simpli es the IR 7 moves highlevel control structures for while do switch 7 Removes highlevel data structures arrays strucm unions enums Results in assembly7like code 7 Semantic lowering 7 Control ow expressed in terms of gotos 7 Each expression is Very simple threeaddress code abc t a 2g xtc Unrmgacua r rumpus lerw A LoW Level IR Register Transfer Langua Linear representation Typically languageind ge RTL ependent 7 Nearly corresponds to machine instructions Example operations t x 7 ssignmen y 7 op x op y 7 op x y op z 7 Address of p amp y 7 Load x p4 7Store p4 y 7 Call x f 7 Branch goto L1 7 Cbranch if x3 goto L1 vac1 Lethv Huufrgi mm Cnmpilnn Psi1w Example Source code High7level IR AST fori1t010do ai x 5 Low7level IR RTL i 1 loopl t1 a sizeof int t3 39 1 t2 t4 t1 i 1 lt 10 goto loopl NIKE ma I Dmpiia y mm n Compiling Control Flow Switch smtements 7 Convert switch into eg swi ch c t c lowlevelIR if co goto nextl break39 t1 9392 in t t2 nex 1 c go o nex case 1 90 90 break t d ho tz 92 ff t d break nex 1 c go o one done 7 Optimizations depending on size and density of cases 7 Create a jump table store branch targem in table 7 Use binary search vac1 Lethv Huufrgi mm Cnmpilnn Psi1w Compiling Arrays Array declaration 7 Store name size and type in symbol table Array allocation 7 Call mallox or create space on the runtime stack Array referencing 7egAi ampA i sizeofAelem sizeof Aelem i t2 t1t3 t4 numgacha r I Dmpiia Fermw MiniJava Compiler Tree Language Array Example azm as 39j s x aztaj 33 b v Ammgn5mmx mum 5 mcuxsr n mpcovsw I 33 IBWOF qus f Expmw xw Expmwlw ow 3 mawoppLus Amsxgnsmlcmem Expat ExpCUNST n upauur nus n 45 ammo ms Qumr I39LU mama MEL r MiniJava Compiler Tree Language IF Example if CPlt3 Systemoutprint1np else Systemout printlnCB Mm 4g an u msw T N 7 y7 M4 magnum WWW s l Structure of a Typical Compiler An alysis character stream lexical analysis syntactic analysis sentences semantic analysis annotated AST Undergz mluutr Synthesis IR code generation target language nmpilers Review 1 Gm 11Wu ma Q 31gt 1o Compiling Procedures Properties of procedures high adomm 7 Procedures de ne scopes AR zoo 7 Procedure lifetimes are nested 7 Can store information related to dynamic 39 quot of a procedure on a call stack activation record or AR or AR goo stack frame 7 Space for saving registers 7 Space for passing parameters and returning values AR3 f 7 Space for local variables 7 Return address of calling instruction AR foo Stack management lower addresses Stack 7 Push an AR on procedure entry 7 Pop an AR on procedure exit 7 Why do we need a stack T8553 eclurc Undergradnaie Compilers Revue Compiling Procedures cont Code generation for procedures 7 Emit code to manage the stack 7 Are we done Translate procedure body 7 References to local Variables must be translated to refer to the current activation recor 7 39 u the appropriate activation record or global data space mun13mm ill frgl lure Cmupllrlh Psilav l Code Generation Concepmally easy 7 Three address code is a generic machine language lowlevel IR The source of heroic effort on modern architecmres 7 Alias analysis 7 Instruction scheduling for ILP 7 Register allocation 7 More later Hnmg am I Dmpllr MIPS code generation in MiniJava compiler Assem data structure 7 has list ofuses defs and jump targets add rd rs n add do so sl beq rs rt label beq so so jo lw n address 1w do s0 swn address sw s0 s1 H527 Lethlr ill frgl allure Campllrln nsmw l Example MIPs program class MultiplePnr39ms 39 static void rmln5tr39lnu a stanoutprintlnneu Fouotestiru class Foo public int ttingo 39 local 3 return thisFoouarlomll locanlomlS lomla I mblic int Foobar nt par39ar int pamnz int PEMBX return pnr39anl pnr39arz parans I text mam malrLFraneslze32 sw up 7405 m Fry l n subu lsp lsp malrLFrCmeslze L1 sw lm 780 L0 Slnk statement addu lsp lsp malrLFrCmeslze lw le Mlsp J lm text testlng testingfmneslze 4 sw NE 7405 we Fp lsp L3 subu tsp tsp testingjmmeslze sw m 780 11 m 1 sw m 724cm lw he 7243Fp 5w m 7123FD Hnmg man umplls y wle M C oncepts Representations 7 AST lowlevel IR RTL Code Generation 7 3address code 7 IR Trees in MiniJaVa Compil 7 Assumes in nite temporaries 7 Mips 7 Requires mapping of all temporaries to an actual register H527 Um mung mm Cmnpilrzn Pix1w Next Time Reading Ch 10 Lecture 7 Control Flow Graphs 7 Liveness Analysis NIKE ma 1 Humid y mm
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'