Advanced Compilers EECS 583
Popular in Course
verified elite notetaker
Popular in Engineering Computer Science
This 69 page Class Notes was uploaded by Ophelia Ritchie on Thursday October 29, 2015. The Class Notes belongs to EECS 583 at University of Michigan taught by Scott Mahlke in Fall. Since its upload, it has received 17 views. For similar materials see /class/231532/eecs-583-university-of-michigan in Engineering Computer Science at University of Michigan.
Reviews for Advanced Compilers
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/29/15
EECS 583 Lecture 21 Review for Midterm University of Michigan March 26 2003 Mike Chu as guest speaker Midterm Information oz WhenW here gtgt Next Monday March 31 in class gtgt 4 10pm 640pm 25 hrs 9 Format Open book open notes 0 But don t try to learn how to modulo schedule during the test Bring a pencil or 2 No laptops oz Material Everything om lectureshomeworks is fair game up to and including register allocation but focus on the major topics No Trimaran speci cs Will be asked Studying oz Lecture notes are most important thing gtgt Go back and familiarize yourself with everything gtgt Work through examplesproblems done in lecture oz Today Work through problems from Winter 02 exam oz Notes on previous exam Way too long noone nished 0 This exam will be shorter Way too hard to grade this exam Will be better organized But problem style Will be similar for this exam 0 Emphasis is on understanding conceptsalgorithms and solving problems 0 A few questions which require some thinking cannot study for these EECS 583 Winter 02 Midterm Questions 1A Region Formation Trace Selection Identify the nontrivial traces Assume a threshold probability of 51 lB Region Formation Forming Superblocks Convert the traces into superblocks Redraw the resultant control ow graph below Remember that you must infer the pro le information for any basic blocks that you create lC Region Formation Forming Hyperblocks In class we discussed that when performing if conversion to form hyperblocks the resultant dependence height of the if converted code is the maximum across the constituent paths However at runtime only 1 path will really execute the other paths will have their predicates set to false thereby being converted into noops So perhaps this problem is a compiletime only problem that goes away at run time Does the dependence height problem go away at run time If no explain why If yes explain how the compiler should account for this 2A Loop Detection What is the average trip count of the loop w 1 t Loop r4 r1ltlt 2 r5 loadr4 r2 r2 7 10 r6 loadr2 r6 r6 1 store r1 r6 r7 r1 7 r2 100 p1 cmppiunr7 lt 4 branch p1 Exit 6 r8 r2 5 r9 r6 ltlt 3 r1 r1 2 r2 r2 7 1 EXltI p1 cmppiunr2 gt r10 branch p1 Loop 411 2B Induction Variable Detection List the basic and derived induction variables w 1 t Loop r4 r1ltlt 2 r5 loadr4 r2 r2 7 10 r6 loadr2 r6 r6 1 store r1 r6 r7 r1 7 r2 100 p1 cmppiunr7 lt 4 branch p1 Exit 6 r8 r2 5 r9 r6 ltlt 3 r1 r1 2 r2 r2 7 1 EX1t p1 cmppiunr2 gt r10 branch p1 Loop 411 2C Loop Optimization 101 r4 rl ltlt 2 r5 loadr4 r2 r2 7 10 r6 loadr2 r6 r6 1 store r1 r6 r7 r1 7 r2 pl cmppiunr7 lt 4 branch pl Exit 6 r8 r2 5 r9 r6 ltlt 3 rl rl 2 r2 r2 7 1 pl cmppiunr2 gt r10 branch pl Loop J39 Loop 41l Apply induction variable strength reduction to all derived induction variables to convert them into basic induction variables Show the transformed code below 100 Exit 2D Local Optimization In the following predicated code segment there are 4 computations of r2r3 Which pairs of operations can be optimized Via CSE rlr2r3 ifT p1p2cmppUNUC r4lt0ifT r4r2r3ifpl r2r2lifp2 rl0ifT r5r2r3ifpl r6r2r3 ifT IONUIbUJNH 2E ILP Optimization for i0 ilt100 iH val ai if val lt min min val x mm r20 l rl loadr2 pl cmpprl lt r3 r3 rl ifpl r2 r2 1 rl loadr2 pl cmpprl lt r3 r3 rl ifpl r2 r2 1 rl loadr2 pl cmpprl lt r3 r3 rl ifpl r2 r2 1 p2 cmpp r2 lt 100 branch p2 Loop l The following code segment on the left is a loop that searches for the minimum value in an array The compiler has unrolled the loop 3 times as shown on the right Now you want to apply ILP optimizations to eliminate as many dependences as possible Explain what optimizations you would apply and why Hint there are 2 optimizations you can apply though multiple applications of them How would you deal with the dependences associated with r3 the variable min Suggest an optimization strategy for dealing with r3 Hint rename with copy is not the right answer 10 3A Data ow Analysis 1 r1 2 r2 ly l BB2 3 r1 10adr2 4 r2 r2 1 BB3 BB4 5r3r1 8r7r11 6zr2r31 9r1r3 7r8r81 10r2r7ltlt3 BB5 11 r7 10adr2 12 store r7 r8 l 1 Compute the reaching de nitions GENKILL sets for each basic block 11 3B Data ow Analysis Compute the reaching de nition set OUT set along each control ow edge 1 r1 2 r2 in the code segment ly l BB2 3 r1 loadr2 4 r2 r2 1 BB3 BB4 5r3r1 8r7r11 6zr2r31 9r1r3 7r8r81 10r2r7ltlt3 BB5 11 r7 loadr2 12 store r7 r8 l l 12 3C Data ow Analysis Compilers often provide warnings if local variables can be used without rst being de ned in a function How would you go about guring out which variables require warnings You may use any type of data ow analysis you wish 3D Data ow Analysis Suppose you wanted to compute available upward exposed uses ie subsequent uses that are guaranteed to occur om a particular point in the program De ne the GEN KILL IN and OUT sets to perform this computation For a bonus point What optimization could make use of this information to be more effective 9Noone got the bonus point on the exam 4A Register Allocation Live Ranges 99 l a load100 2 b loada 8 b loadb 9 a loade 10 storea1 Construct the merged live ranges for each variable Note you should not merge live ranges unless they overlap 4B Register Allocation Interference Graph Draw the interference graph 1 a 10ad100 2 b 10ada 99 8 b 10adb 9 a 10ade 10 storea1 4C Register Allocation Graph Coloring l a load100 2 b loada 99 4 d 0 ltlt 2 5 store d0 8 b loadb 9 a loade 10 storea 1 Using the graph coloring algorithm discussed in class show the resulting allocation assuming 3 registers 3 color for the code segment You should indicate which variables are spilled and which ones are allocated 4D Register Allocation If the only analysis information you had was DUU D chains could you construct live ranges Brie y explain your answer 5A Control Flow Control Flow Graph Draw the control ow graph for the code segment Please indicate Where the updates to a b d e are located in your CFG Give each of your basic blocks a number do a loadX ifa gt 1 H d gt 2 ampamp e gt 3 b else if b gt 23 d e While e lt 100 5B Control Flow Dominators Post Dominators Compute the post dominator sets for each basic block ie for each BB specify Which basic blocks post dominate it 20 5C Control Flow Control Dependences Compute the control dependences for each basic block Use negative BB numbers to represent taken branch edges and positive BB numbers for fallthrough branch edges 21 5D Control Flow lf conversion If convert the code Show the code with 2target CMPPs and utilize unconditional predicates Whenever possible do a loadX i1 a gt 1 H d gt 2 ampamp e gt 3 b else if b gt 23 d e While e lt 100 22 5E Control Flow Backedge Coalescing In class we discussed the use of backedge coalescing to combine multiple backedges into a single one to enable more effective if conversion This may not always be a smart thing to do Brie y describe a situation where backedge coalescing would likely cause performance loss 23 6 Scalar Scheduling Processor model 2 fully pipelined function units 1 ALU does add mpy cmpp and br 1 MEM does load store i 100 Load store divide cause exceptions 1 r1 2 load r2 All other opcodes do not 2 r4 r1 r5 33 P1 2 WP r1 2 0 latencies Assume minmax 10 4 branch pl EXltl values are equal 5 r3 r1r3 1 r2 add src read 0 6 r5 load r3 r dst write 1 7 store r4 r5 mpy src read 0 8 r2 r7 r8 dst write 3 9 p2 cmppr3 lt 1 20 load src read 0 10 branch p2 EXit2 dst write 2 sync 1 l 70 r5 store src read 0 r2 dst wr1te sync 1 24 6A Scalar Scheduling Dependence Graph Draw the dependence graph for the code segment above assuming the restricted speculation model Which edges can be removed if nonexcepting i 100 versions of all operations are provided 1 r1 load r2 2 r4 r1 r5 3 p1 cmpp r1 0 4 branch pl Exitl 5 r3 r1 r3 6 r5 load r3 r1 r2 7 8 9 l 10 store r4 r5 r2 r7 r8 p2 cmppr3 lt 1 O branch p2 EXit2 20 l 70 r5 r2 25 6B Scalar Scheduling EstartLstart Calculate EstartLstart assuming nonexcepting versions of all operations are available Hint 100 remember there is an Lstart value per exit branch 1 r1 load r2 2 r4 r1 r5 3 p1 cmpp r1 0 4 branch pl Exitl 10 5 r3 r1 r3 6 r5 load r3 r1 r2 7 store r4 r5 8 r2 r7 r8 9 p2 cmppr3 lt1 20 10 branch p2 EXit2 l 70 r5 r2 26 6C Scalar Scheduling Priority Calculate the priority of each operation using the priority scheme discussed in class i 100 1 r1 load r2 2 r4 r1 r5 3 p1 cmpp r1 0 4 branch pl Exitl 10 5 r3 r1 r3 6 r5 load r3 r1 r2 7 store r4 r5 8 r2 r7 r8 9 p2 cmppr3 lt1 20 10 branch p2 EXit2 l 70 r5 r2 27 6D Scalar Scheduling List Scheduling Show the resultant schedule of the code segment You should employ cycle scheduling to derive this You need not show every step the nal schedule is i 100 all that is necessary 1 r1 load r2 2 r4 r1 r5 3 p1 cmpp r1 0 4 branch pl Exitl 10 5 r3 r1 r3 6 r5 load r3 r1 r2 7 store r4 r5 8 r2 r7 r8 9 p2 cmppr3 lt1 20 10 branch p2 EXit2 l 70 r5 r2 28 6E Scalar Scheduling Speculation Suppose you wanted to enable the compiler to be able to speculate store operations Assuming that you have taken care of the exception problem with nonexcepting store opcodes what additional problems do speculative stores cause and how would you overcome these problems You may assume any kind of architecture compiler support that you wish Just the basic idea here no need to write a book 29 7 Modulo Scheduling Processor model 4 fully pipelined function units LC 99 1 ALU does add 1 t 1 Multiplier does mpy Loop r3 1 loadrl0 1 Memory does load 141 1 Branch does brlc r5 l r3 l r4 l T6l391 will r5l391 Latencies Assume minmax Tll39ll THO 4 values are equal r2 l r20 4 add src read 0 brlc Loop dst write 1 l I mpy src read 0 dst write 3 load src read 0 dst write 2 30 7A Modulo Scheduling Dependence Graph Draw the dependence graph using the notation ltde1aydistancegt notation on each edge LC 99 it Loop r3 1 loadr10 r4 1 loadr20 r5 1 r3 1 r4 1 r6 1 r61 r5 1 r1 1 r10 4 r2 1 r20 4 brlc Loop l I 31 7B Modulo Scheduling ReeMIIRelel Calculate the ResMII RecMII and M11 LC 99 l Loop r3 1 10adr10 r4 l loadr20 r5 1 r3 1 r4 1 r6 1 r61 r5 1 r1 1 r10 4 r2 1 r20 4 brlc Loop l I 32 7C Modulo Scheduling Scheduling Process Generate the modulo schedule for the code segment Show the unrolled and rolled schedules LC 99 it Loop r3 1 loadr10 r4 l loadr20 r5 1 r3 1 r4 1 r6 1 r61 r5 1 r1 1 r10 4 r2 1 r20 4 brlc Loop l I 33 7D Modulo Scheduling Staging Predicates Show the nal assembly code With the staging predicates inserted LC 99 it Loop r3 1 loadr10 r4 l loadr20 r5 1 r3 1 r4 1 r6 1 r61 r5 1 r1 1 r10 4 r2 1 r20 4 brlc Loop l I 34 7E Modulo Scheduling ResMII How many fully pipelined function units of each type are required to achieve an M11 of 1 I for this loop LC99 1amp Loop r3 1 10adr10 r4 1 10adr20 r5 1 r3 1 r4 1 r6 1 r61 r5 1 r1 1 r10 4 r2 1 r20 4 brlc Loop l I 35 EECS 583 Class 2 Trimaran Installation Overview Basic Control Flow Analysis University of Michigan September I 0 2007 Reading Material 0 o See reading list on course webpage 0 o TrimaranHPLPD references o Class 1 EPIC An Architecture for InstructionLevel Parallel Processors Optional more indepth look at VLIWEPIC arch features 0 0 o Topic 1 Control ow analysis and optimization gtgt Today s class Review of 483 material Ch 94 104 from Red Dragon book Aho Sethi Ullman Next class 0 Trace selection for Compiling Large C Application Programs to Microcode The Superblock An Effective Technique for VLIW and Superscalar Compilation Trimaran Installation System Requirements 0 o EECS account Best if you have one for this class See me if you don t have one Q o We have 3 dualcore Linux machines for this class gtgt andrew wilma hugo feel free to use everyone will be given login access 9 Other DCO machines should work too if these get too busy or disks get full gtgt DCO gets grumpy about using too much disk space 9 to If you want to use your own machine Linux Fedora Core 26 should work Other avors of Unix should work but no guarantees o 15Gb disk space 0 Needed Software older versions likely to work but I m not certain perl gzip gunzip gnu make 380 autoconf 254 automake 17 need these verions gcc we use 4x 344 32 also should work Gnu ranlib ar binutils 29 215 gdb 6x ddd 3x tcsh Use this or you will have to translate some stuff emacs Vi pico or whatever text editor you prefer Graph Visualizing tools gtgt dot httpwwwresearchattcomswtoolsgraphviz TCLTK V80p2 or higher tclsh wish needed for Gui I don t recommend you use gui 3 Installation Step 1 O 00 Source on the course webpage trimaranf07tgz o In your home directory mkdir trimaran o tar zxvf trimaranf07tgz Will create 6 subdirectories openimpact elcor simu scripts benchmarks gui o Enviroment variablespath this is critical trimaranscriptsenvrc should have everything you need Or trimaranscriptsenvrcbash for you bash users Modify rst line if you don t put trimaran in HOMEtrimaran Add to your cshrc source trimaranscriptsenvrc So you don t forget 0 0 0 0 0 Installation Step 2 ozo Install impact ed trimaranopenimpact insta11openimpact 0 Fix errors that come up May be a path problem ie executable its looking for is not in your path so update your cshrcbashrc ozo Eleor install ed trimaranelcor make ozo Simu install ed trimaranSimu make ozo Gui install ed gui make Trimaran Overview C source Frontend parsing pro ling function inlining memory dep Openlmpact analys1s Also does some backend stuff reglon formatlon opt1m1zatlon l Lcode HPLPD speci c Lcode aka bridge code E1001 Backend analysis optimization scheduling register allocation code generation I Rebel Simu Emulator emulate VLIW semantics using C code compiled simulation J C emulation code Running Trimaran ozo Install trimaran verify your code works on strcpy and wc ozo Trimarangui I don t likeuse this oz tcc gcc equivalent sorta gtgt Command line way to invoke entire system Gui calls this gtgt tcc help to get options gtgt tcc bench strcpy to run strcpy with basic blocks gtgt Save output to examine to trimaranbenchmarks gtgt source compile ags link ags 9 input for pro ling the code output to check if ran properly gtgt To add a new benchmark follow the organization that is there Compiler Backend IR Our Input ozo Variable home location Frontend every variable in memory Backend maximal but safe register promotion 0 All temporaries put into registers 0 All local scalars put into registers except those accessed via amp All globals local arraysstructs unpromotable local scalars put in memory Accessed via load store ozo Backend IR intermediate representation machine independent assembly code really resource indep aka RTL register transfer language 3address code r1 r2 r3 or equivalently add r1 r2 r3 Opcode not machine independent HPLPD RISC Operands 9 Virtual registers in nite number of these 9 Special registers stack pointer pc etc macro regs o Literals compiletime constants 8 Control Flow O 9 Control transfer branch taken or fallthrough O Control ow Branching behavior of an application 9 9 gtgt What sequences of instructions can be executed 0 o Execution 9 Dynamic control ow Direction of a particular instance of a branch Predict speculate squash etc 9 99 Compiler 9 Static control ow gtgt Not executing the program Input not known so What could happen 9 99 Control ow analysis Determining properties of the program branch structure Determining instruction execution properties 9 Basic Block BB oz Group operations into units with equivalent execution conditions oz Defn Basic block a sequence of consecutive operations in which ow of control enters at the beginning and leaves at the end Without halt or possibility of branching except at the end Straightline sequence of instructions If one operation is executed in a BB they all are oz Finding BB s The rst operation starts a BB Any operation that is the target of a branch starts a BB Any operation that immediately follows a branch starts a BB Identifying BBS Example L1 r7 loadr8 L2 r1 r2 r3 L3 beq r1 0 L10 L4 r4 r5 r6 L5r1 r1 1 L6 beq II 100 L3 L7 beq r2 100L10 L8 r5 r9 1 L9 jump L2 L10 r9 load r3 L11 storer9 r1 Control Flow Graph CFG Defn Control Flow Graph Directed graph G VE Where each vertex V is a basic block and there is an edge E V1 BB1 9 V2 BB2 ifBB2 can immediately follow BBl in some execution sequence gtgt A BB has an edge to all blocks it can branch to gtgt Standard representation used by many compilers gtgt Often have 2 pseudo vertices 0 entry node 0 exit node CFG Example xz 239 B1XZ2 y2i y2z ifc them 1f 0 B2 else B3 else X X 1 B2 WV B 3 fallthrough y y XZXH XX 1 1 f 1 y 9 19 CSC g0t0B4 y y xX 1 yy 1 B4 ZZXy zXy Weighted CFG Pro ling Run the application on 1 or more sample inputs record some behavior gtgt Control ow pro ling edge pro le block pro le gtgt Path pro ling gtgt Cache pro ling gtgt Memory dependence pro ling Annotate control ow pro le onto a CFG 9 weighted CFG Optimize more effectively with pro le info gtgt Optimize for the common case gtgt Make educated guess Dominator DOM oz W Given a CFGV E Entry EXit a node X dominates a node y if every path from the Entry block to y contains X oz 3 properties of dominators Each BB dominates itself If X dominates y and y dominates z then X dominates z If X dominates z and y dominates z then either X dominates y or y dominates X ozo Intuition Given some BB Which blocks are guaranteed to have eXecuted prior to eXecuting the BB Dominator Examples Dominator Analysis oz Compute domBBi set of BBs that dominate BBi ozo Initialization Domentry entry Domeverything else all nodes oz Iterative computation While change do change false for each BB except the entry BB 9 tmpBB BB intersect of Dom of all predecessor BB s if tmpBB domBB l domBB tmpBB 1 change true Immediate Dominator ozo Defn Immediate dominator idom Each node n has a unique immediate dominator m that is the last dominator of n on any path from the initial node to n Closest node that dominates Dominator Tree First BB is the root node each node dominates all of its descendants WNt w DOM B DOM 14 5 146 147 19 BB1 1 BB2 BB3 BB4 BB5 BB6 BB7 Dom tree Class Problem Draw the dominator tree for the following CFG Post Dominator PDOM Reverse of dominator Defn Post Dominator Given a CFGV E Entry Exit a node X post dominates a node y if every path from y to the Exit contains X Intuition Given some BB which blocks are guaranteed to have executed after executing the BB pdomBBi set of BBs that post dominate BBi ozo Initialization Pdomexit exit Pdomeverything else all nodes oz Iterative computation While change do 0 change false 0 for each BB except the exit BB 9 tmpBB BB intersect of pdom of all successor BB s o if tmpBB pdomBB l pdomBB tmpBB 1 change true 21 Post Dominator Examples 22 Immediate Post Dominator ozo Defn Immediate post dominator ipdom Each node n has a unique immediate post dominator m that is the rst post dominator of n on any path from n to the Exit Closest node that post dominates First breadth rst successor that post dominates a node 23 Why Do We Care About Dommators oz Loop detection next subject Dominator gtgt Guaranteed to execute before gtgt Redundant computation an op is redundant if it is computed in a dominating BB gtgt Most global optimizations use dominance info 0 Post dominator gtgt Guaranteed to execute after gtgt Make a guess ie 2 pointers do not point to the same locn gtgt Check they really do not point to one another in the post dominating BB 24 Natural Loops oz Cycle suitable for optimization Discuss opti later oz 2 properties Single entry point called the m Header dominates all blocks in the loop Must be one way to iterate the loop ie at least 1 path back to the header from Within the loop called a backedge oz Backedge detection Edge X9 y Where the target y dominates the source X 25 Backedge Example Loop Detection oz Identify all backedges using Dom info oz Each backedge X 9 y de nes a loop Loop header is the backedge target y Loop BB basic blocks that comprise the loop All predecessor blocks of X for Which control can reach X Without going through y are in the loop ozo Merge loops with the same header Le a loop with 2 continues LoopBackedge LoopBackedgel LoopBackedge2 LoopBB LoopBBl LoopBB2 ozo Important property Header dominates all LoopBB 27 Loop Detection Example Important Parts of a Loop oz Header LoopBB oz Backedges BackedgeBB oz Exitedges ExitBB For each LoopBB examine each outgoing edge If the edge is to a BB not in LoopBB then its an exit ozo Preheader Preloop gtgt New block before the header falls through to header Whenever you invoke the loop preheader executed gtgt Whenever you iterate the loop preheader NOT executed All edges entering header Backedges no change 0 All others retarget to preheader O 99 Postheader Postloop analogous 29 ExitBBPreheader Example 30 Characteristics of a Loop oz Nesting generally Within a procedure scope Inner loop Loop with no loops contained Within it Outer loop Loop contained Within no other loops Nesting depth depthouter loop 1 depth depthparent or containing loop 1 oz Trip count average trip count How many times on average does the loop iterate for 10 IltlOO I 9 trip count 100 Ave trip count weightheader weightpreheader 31 Trip Count Calculation Example Calculate the trip counts for all the loops in the graph