Computer Architecture CS 6810
Popular in Course
Popular in ComputerScienence
Marian Kertzmann DVM
verified elite notetaker
This 16 page Class Notes was uploaded by Marian Kertzmann DVM on Monday October 26, 2015. The Class Notes belongs to CS 6810 at University of Utah taught by Alan Davis in Fall. Since its upload, it has received 22 views. For similar materials see /class/229976/cs-6810-university-of-utah in ComputerScienence at University of Utah.
Reviews for Computer Architecture
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/26/15
Lecture 7 Branch prediction Topics bimodal global local branch prediction Sections 2326 Dynamic Vs Static ILP Static ILP The compiler finds parallelism 9 no scoreboarding 9 higher clock speeds and lower power Compiler knows what is next 9 better global schedule Compiler can not react to dynamic events cache misses Can not reorder instructions unless you provide hardware and extra instructions to detect violations eats into the low complexitypower argument Static branch prediction is poor 9 even statically scheduled processors use hardware branch predictors Building an optimizing compiler is easier said than done A comparison of the Alpha Pentium 4 and ltanium statically scheduled lA 64 architecture shows that the ltanium is not much better in terms of performance clock speed or povger Control Hazards In the 5stage inorder processor assume always taken or assume always not taken if the branch goes the other way squash misfetched instructions momentarily forget about branch delay slots Modern inorder and outof order processors dynamic branch prediction instead of a default nottaken assumption either predict nottaken or predict takentoX or predict takentoY Branch predictor a cache of recent branch outcomes Pipeline without Branch Predictor PC4 In the 5stage pipeline a branch completes in two cycles 9 If the branch went the wrong way one incorrect instr is fetched 9 One stall cycle per incorrect branch Pipeline with Branch Predictor Branch Predictor In the 5stage pipeline a branch completes in two cycles 9 If the branch went the wrong way one incorrect instr is fetched 9 One stall cycle per incorrect branch Branch Mispredict Penalty Assume no data or structural hazards only control hazards every 5th instruction is a branch branch predictor accuracy is 90 Slowdown 1 1 stalls per instruction Stalls per instruction branches x mispreds x penalty 20 x 10 x 1 002 Slowdown 1102 if penalty 20 slowdown 114 1Bit Prediction For each branch keep track of what happened last time and use that outcome as the prediction What are prediction accuracies for branches 1 and 2 below while 1 for i0ilt10i branch1 for j0jlt20j branch2 2 Bit Prediction For each branch maintain a 2 bit saturating counter if the branch is taken counter min3counter1 if the branch is not taken counter max0counter1 lf counter gt 2 predict taken else predict not taken Advantage a few atypical branches will not influence the prediction a better measure of the common case Especially useful when multiple branches share the same counter some bits of the branch PC are used to index into the branch predictor Can be easily extended to Nbits in most processors Ne2 Correlating Predictors Basic branch prediction maintain a 2bit saturating counter for each entry or use 10 branch PC bits to index into one of 1024 counters captures the recent common case for each branch Can we take advantage of additional information gt If a branch recently went 01111 expect 0 if it recently went 11101 expect 1 can we have a separate counter for each case gt lfthe previous branches went 01 expect 0 if the previous branches went 11 expect 1 can we have a separate counter for each case Hence build correlating predictors 9 Global Predictor A single register that keeps track of recent history fOr all branches 8 bits 6 bits Also referred to as a twolevel predictor Local Predictor Also a twolevel predictor that only uses local histories at the first level Use 6 bits of branch PC to index into local history table 10110111011001 14bit history indexes into Table of 64 entries of 14bit neXt level histories for a single branch LocalGlobal Predictors Instead of maintaining a counter for each branch to capture the common case 9 Maintain a counter for each branch and surrounding pattern 9 Ifthe surrounding pattern belongs to the branch being predicted the predictor is referred to as a local predictor 9 Ifthe surrounding pattern includes neighboring branches the predictor is referred to as a global predictor Tournament Predictors A local predictor might work well for some branches or programs while a global predictor might work well for others Provide one of each and maintain another predictor to identify which predictor is best for each branch Alpha 21264 1K entries in level1 1K entries in level2 4K entries 12bit global history BranCh PC 4K entries Table of 2 bit Total capacity saturating counters 13 Branch Target Prediction In addition to predicting the branch direction we must also predict the branch target address Branch PC indexes into a predictor table indirect branches might be problematic Most common indirect branch return from a procedure can be easily handled with a stack of return addresses An Outof Order Processor Implementation Branch prediction and instr fetch W R1 6 R1 R2 R2 6 R1 R3 BEQZ R2 R3 6 R1 R2 R1 6 R3R2 Instr Fetch Queue Decode amp Rename Reorder Buffer ROB Instr 1 T1 Instr 2 T2 Instr 3 T3 Register File Instr 4 T4 R139R32 Instr 5 T5 Instr 6 T6 T16R1R2 gt T2 6 T1 R3 BEQZ T2 T4 6 T1 T2 Results written to T5 6 T4T2 ROB and tags broadcast to IQ Issue Queue IQ Title Bullet
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'