Compiler Design CS 6241
Popular in Course
Popular in ComputerScienence
This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 6241 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 12 views. For similar materials see /class/234101/cs-6241-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.
Reviews for Compiler Design
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/02/15
CS 6241 Class 16 Module Scheduling II Georgia Tech February 28 2008 Exam Information oz WhenW here Thursday March 6 in class 135pm 255pm 80 minutes oz Format Open book open notes But don t try to learn how to modulo schedule during the test Bring a pencil or 2 No laptops oz Material Everything from lectureshomeworks is fair game up to and including modulo scheduling but focus on the maior topics No Trimaran specifics will be asked Studying oz Lecture notes Go back and familiarize yourself with everything Work through examplesproblems done in lecture oz Notes No memorization Emphasis is on understanding conceptsalgorithms and solving problems A few questions which require some thinking cannot study for these Work problems wo looking at the answers oz Next class bring questions Topics oz Control ow analysisoptimization Dompdomcontrol dependence analysis Basic blocks traces superblocks if conversion hyperblocks Profile guided code layout oz Data ow analysis and optimization Liveness reaching defs ClassicILP optimizations and transformations oz Scheduling register allocation Dependence graphs Estart Lstart priority Acyclic scheduling control speculation Modulo scheduling Register allocation Modulo Scheduling Process oz Use list scheduling but we need a few twists II is predetermined starts at MII then is incremented Cyclic dependences complicate matters 0 EstartPriorityetc Consumer scheduled before producer is considered 9 There is a window where something can be scheduled Guarantee the repeating pattern oz 2 constraints enforced on the schedule Each iteration begin exactly 11 cycles after the previous one Each time an operation is scheduled in l iteration it is tentatively scheduled in subsequent iterations at intervals of II 0 MRT used for this Priority Function Heightbased priority worked well for acyclic scheduling makes sense that it will work for loops as well Acyclic 0 if X has no successors HeightX 2 MAX HeightY DelayXY otherwise fo r all Y succX Cyclic 0 if X has no successors HeightRX 2 MAX HeightRY EffDelayXY otherwise fo r all Y succX EffDelayXY DelayXY IIDistanceXY 5 Calculating Height 1 Insert pseudo edges from all nodes to branch with latency 0 distance 2 0 dotted edges Compute II For this example assume II 2 HeightR4 HeightR3 00 0 I39 of 22 39 20 HeightR2 39 00 gt14 HeightRl The Scheduling Window With cyclic scheduling not all the predecessors may be scheduled so a more exible earliest schedule time is 0 if X is not scheduled EY MAX for all X predY MAX 0 SchedTirneX EffDelayXY otherwise where EffDelayXY DelayXY IIDistanceXY Every 11 cycles a new loop iteration will be initialized thus every 11 cycles the pattern will repeat Thus you only have to look in a window of size 11 if the operation cannot be scheduled there then it cannot be scheduled Latest schedule timeY LY EY II l Loop Prolog and Epilog Kernel Epilo g Only the kernel involves executing full width of operations Prolog and epilog execute a subset rampup and rampdown Separate Code for Prolog and Epilog A0 Prolog Loop body B A2 B1 CO pipe with 4 ops C D A B C D Kernel Bn Cnl Dn2 Epilog Cn Dnl drain the DH pipe Generate special code before the loop preheader to fill the pipe and special code after the loop to drain the pipe Peel off IIl iterations for the prolog Complete II l iterations in epilog Removing PrologEpilog Kernel Disable usin predicated execution Execute loop kernel on every iteration but for prolog and epilog selectively disable the appropriate operations to filldrain the pipeline 10 Kernelonly Code Using Rotating Predicates A0 A1 B0 A2 B1 C0 A B C D Aiwa BifP1 C ifP2 DifP3 Bn Cn l Dn 2 Cn Dn l Dn P referred to as the staging predicate PM PM PM P3 1 0 0 0 A 1 1 0 0 A B 1 1 1 0 A B C 1 1 1 1 A B C D o gt H H 03 O o O DUO 11 Modulo Scheduling Architectural Support oz Loop requiring N iterations Will take N S l where S is the number of stages oz 2 special registers created LC loop counter holds N ESC epilog stage counter holds S oz Software pipeline branch operations Initialize LC N ESC S in loop preheader All rotating predicates are cleared BRFBBF 0 While LC gt 0 decrement LC and RRB PO l branch to top of loop 9 This occurs for prolog and kernel 0 If LC 2 0 then while ESC gt 0 decrement RRB and write a 0 into P0 and branch to the top of the loop 9 This occurs for the epilog 12 Execution History With LCESC LC 2 3 ESC 3 Remember 0 relative Clear all rotating predicates PO l A if P0 B if Pl C if P2 D if P3 PO BRFBBF LC ESC PO Pl P2 P3 OOOOl NW Ol NUJUJUJUJ OOOHHHH OOHHHHO OHHHHOO Hl l l OOO 39gtgtgtgt wwww 4 iterations 4 stages 11 1 Note 4 4 l iterations of kernel executed 0000 13 UUUU Modulo Scheduling Driver compute M11 11 MII budget BUDGETRATIO number of ops while schedule is not found do iterativeschedulell budget gtgt H Budgetratio is a measure of the amount of backtracking that can be performed before giving up and trying a higher 11 14 Modulo Scheduling Iterative Scheduler oz iterativeschedulell budget compute op priorities while there are unscheduled ops and budget gt 0 do op unscheduled op with the highest priority min 2 early time for op EY maXminII l t findslotop min max schedule op at time t o Backtracking phase undo previous scheduling decisions o Unschedule all previously scheduled ops that con ict with op budget 15 Modulo Scheduling Findslot oz findslotop min maX gtgt gtgt gtgt gt V gt gtgt gt lt Successively try each time in the range gtk for t min to max do 0 if op has no resource con icts in MRT at t O retumt gt lt Op cannot be scheduled in its specified range gt gt lt So schedule this op and displace all con icting ops gt if op has never been scheduled or min gt previous scheduled time of op return min else 0 return MINl prev scheduled time of op max 16 Modulo Scheduling Example resources 4 issue 2 alu 1 mem 1 br latencies add1 mpy3 ld 2 st 1 bi 1 for j0jlt100 j bli 811 26 LOOP r3 loadrl r4 r3 26 store r2 r4 39 r1 r1 4 39 r2 r2 4 pl 2 cmpp r1 lt r9 brct p1 Loop Step1 Compute to loop into form that uses LC LC 2 99 Loop r3 loadr1 up 1 2 r4 r3 26 3 store r2 r4 4 r1 r1 4 5 r2 r2 4 7 brlc Loop 17 Example Step 2 resources 4 issue 2 alu 1 mem 1 br latencies add1 mpy3 1d 2 st 1 hr 1 Step 2 DSA convert LC 2 99 LC 2 99 Loop 1 r3 loadr1 Loop 1 r31 loadr10 2 r4 r3 26 2 r4 1 r3 1 26 3 store r2 r4 3 store r20 r41 4r1r14 4r1 1r104 5 r2r24 5r2 1 r20 4 7 brlc Loop remap r1 r2 r3 r4 7 brlc Loop 18 Example Step 3 resources 4 issue 2 alu 1 mem 1 br Step3 Draw dependence graph latencies add1 mpy3 1d 2 2 st 2 1 bf 1 Calculate MII LC 2 99 0 0 Loop 1 r3l loadrl0 2 r4 l r3 1 26 3 store r20 r4l RCCMH 1 4 rl l rl0 4 RESMII 2 5 r2 l r20 4 M11 2 2 remap r1 r2 r3 r4 7 brlc Loop 19 Example Step 4 Step 4 Calculate priorities MAX height to pseudo stop node Iter2 Iterl lH5 lH5 2H3 2H3 3H0 3H0 4H4 4H0 5H0 71H 5H0 0 0 71H 20 Example Step 5 resources 4 issue 2 alu 1 mem 1 br Schedule brlc at time 11 l latencies add1 mpy3 ld 2 st 1 hr 1 Unrolled Rolled Schedule Schedule LC 2 99 0 1 Loop 1 r31 loadr10 2 2 r4 l r3 l 26 0 3 3 store r20 r41 1 7 4 4 r1 1 r10 4 5 r2 l r20 4 5 remap r1 r2 r3 r4 6 7 brlc LOOp aluo alul mem br 0 MRT 1 X 21 Example Step 6 Step6 Schedule the highest priority op Opl E 0 L 1 Place at time 0 0 2 Unrolled Rolled Schedule LC 2 99 Schedule 0 1 Loop 1 r3l loadrl0 2 2 r4 l r3 l 26 0 1 3 store r20 r4l 1 7 3 4rl lrl04 4 5 r2 l r20 4 5 remap r1 r2 r3 r4 6 7 brlc Loop aluo 31111 mem br 0 X MRT 22 Example Step 7 Step7 Schedule the highest priority op Op4 E 0 L 1 Place at time 0 0 2 Unrolled Rolled Schedule LC 2 99 Schedule 0 1 Loop 1 r3l loadrl0 2 2 r4 l r3 l 26 0 1 4 3 store r20 r4l 1 7 3 4rl lrl04 4 5 r2 l r20 4 5 remap r1 r2 r3 r4 6 7 brlc Loop aluo 31111 mem br 0 X X MRT 23 Example Step 8 Step8 Schedule the highest priority op Op2 E 2 L 3 Place at time 2 2 2 LC99 Loop 1 r3l loadrl0 2 r4 l r3 l 26 3 store r20 r4l 4 rl l rl0 4 5 r2 l r20 4 remap r1 r2 r3 r4 7 brlc Loop Rolled Schedule aluO alul mem br LIIPWNl O 6 Unrolled Schedule X X X MRT 24 Example Step 9 Step9 Schedule the highest priority op Op3 E 5 L 6 Place at time 5 5 2 Unrolled Rolled Schedule Schedule 1 4 LC99 r3 l loadrl0 r4l r3l 26 0 Loop 1 2 3 storer20r4l 1 7 3 4 5 rl l rl0 4 r2 l r20 4 remap r1 r2 r3 r4 7 brlc LOOp aluo alul mem br 0 X X X OUllgtL Jgt O MRT 25 Example Step 10 SteplO Schedule the highest priority op Op5 E 0 L 1 Place at time 1 l 2 Unrolled Rolled Schedule Schedule 1 4 5 LC99 r3 l loadrl0 r4 l r3 l 26 0 store r20 r4l rl lrl04 1 7 3 5 r2 l r20 4 remap r1 r2 r3 r4 7 brlc LOOp aluo alul mem br 0 X X X Loop LII bugs OUllgtL Jgt O MRT k X XX 26 Example Step 11 Stepl I calculate ESC SC 2 max unrolled sched length ii unrolled sched time of branch 2 rolled sched time of br iiesc SC 2 6 2 3 ESC 2 SC 1 Unrolled time of br 2 l 22 5 Rolled Schedule LC 2 99 Schedule 1 4 5 r3 l loadrl0 r4l r3l 26 0 Loop 1 2 3 store r20 r41 1 4 7 3 5 5 rl l rl0 4 r2 l r20 4 remap r1 r2 r3 r4 7 brlc LOOp aluo alul mem br 0 X X X GUIPWNl O MRT k X XX 27 Example Step 12 Finishing touches Sort ops initialize ESC insert BRF and staging predicate initialize staging predicate outside loop Staging predicate each LC 99 successive stage increment ESC 2 the index of the staging predicate pl0 l by 1 stage 1 gets pX0 Loop 1 r3l loadrl0 if pl0 Unmued 21 1V41 1F31 26 ifpllll Schedule 4 rl l rlO 4 ifpl0 3 store r20 r4l if pl2 0 1 4 Stage 1 5 r2 l r20 4 if pl0 1 5 7 brlc Loop if pl2 2 Stage 2 3 4 Stage 3 5 6 28 Example Dynamic Execution of the Code Loop LC99 ESC2 p10 1 1 r31 loadr10 if pl 0 2 r41 r31 26 ifp11 4 r11 r10 4 if p10 3 store r20 r41 if pl 2 5 r21 r20 4 if pl 0 7 brlc Loop if pl 2 time ops executed 29 Homework Problem latencies add1 mpy3 ld 2 st 1 hr 1 How many resources of each type are required to achieve an 111 schedule for i0 jlt100 j bj aj 26 1f the resources are nonpipelined how many resources of each type are required to achieve 111 LC 2 99 Assuming pipelined resources generate Loop 1 r3 loadrl the 111 modulo schedule 2 r4 r3 26 3 store r2 r4 4 rl 2 rl 4 5 r2 r2 4 7 brlc Loop 30 What if We Don t Have Hardware Support ozo No predicates gtgt Predicates enable kemel only code by selectively enablingdisabling operations to create prologepilog gtgt Now must create explicit prologepilog code segments oz No rotating registers gtgt Register names not automatically changed each iteration gtgt Must unroll the body of the software pipeline explicitly rename Consider each register lifetime i in the loop 0 Kmin min unroll factor MAXi ceilingEndi Starti 11 Create Kmin static names to handle maximum register lifetime Apply modulo variable expansion 31 No Predicates Qua Kernel only code with prolog rotating registers and 39 kernel predicates II l epilo g Without predicates must create explicit prolog and epilogs but no explicit renaming is needed as rotating registers take care of this 32
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'