Research Seminar CS 7123
Popular in Course
Popular in ComputerScienence
verified elite notetaker
This 26 page Class Notes was uploaded by Mireya Heidenreich on Thursday October 29, 2015. The Class Notes belongs to CS 7123 at University of Texas at San Antonio taught by Turgay Korkmaz in Fall. Since its upload, it has received 12 views. For similar materials see /class/231415/cs-7123-university-of-texas-at-san-antonio in ComputerScienence at University of Texas at San Antonio.
Reviews for Research Seminar
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/29/15
396 IATAC Citations UTSA IATAC A Smart Predictor to Turn off L2 Cache Lines JAUME ABELLA Universitat Polite cnica de CataunyaBarceona ANTONIO GONZALEZ and XAVIER VERA Intel Barcelona Research Center Intel LabsUPC Published in AC M Transactions on Architecture and Code Optimization Vol 2 No 1 March 2005 pp 55 77 Presented by Anthony M Castaldo Anthony M Castaldo UTSACS IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 1 13 c IATAC Abstract UTSA Two Main Sources of Power Dissipation In a Processor 0 Dynamic Power Has been most important 0 Leakage Growing at 5x rate of Dynamic per generation Large caches occupy most area responsible for most leakage Caches are getting larger L2 is largest on chip cache consumes most power IATAC 65 of L2 lines turned off 0 0 o 0 Need to power them off 0 o IATAC 2 performance hit Anthony M Castaldo UTSACS IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 2 13 C Related Work Low Level Circuit Level work Anthony M Castaldo UTSACS Combine Caches of Different Voltages Abella and Gonz lez 2003 Gated Supply Voltage VDD lost contents Powell et al 2000 Drowsy Caches reduced VDD no loss Require two supply lines so higher overhead Flautner et al 2002 Drowsy Caches Gated Ground Non viable requires too accurate a supply voltage Agarwal et al 2002 SuperDrowsy Caches One VDD no losses more susceptible to soft errors Kim et al 2004 IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 3 13 C Related Work High Level Cache Organization Work Anthony M Castaldo UTSACS Sub Banking Multiple Line Buffers Bitline Segmentation Ghose and Kamble 1999 Vertical and Horizontal Cache Partitioning Su and Despain 1995 Dynamic or Static reconfiguration cache size or associativity Dynamic Balasubramonian et al 2002 Dropsho et al 2002 Static Albonesi 1999 Zhang et al 2002 2003 Power off L2 items if in L1 Li etal 2002 Cache the L1 Cache L1 Filter Cache Kin etal 1997 Encode Compress frequently used values Yang and Gupta 2002 Way PFCCllCtOl S lnoue et al 1999 Way Prediction and Remap for Non Conflicting Access Powell et al 2001 based on Batson and Vijaykumar 2001 and by Calder et al 1996 Switch off Cache Lines Not Expected To Be ReUsed Very similar to IATAC Zhou et al 2001 Kaxiras et al 2001 IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 4 13 v C Groundwork The One Graph To Remember Applu Flcerec D me between Hts l nma bible miss D me beween hIIs I Time beVorB miss g m g 100 5 z 3 n E q 9 w 10 y r 1 2 3 A 5 S 8 10 I2 15 2 4 5 3 Number of axon Number of icesIns Gcc Parser Tlme between nun TIma heron mls DTlme between rm I nms Debra mls moon a I I I g 7 I 7 15 B E S g 100 5 a 5 10 E 10 r 1 4 1 2 3 4 5 a 7 a 1 2 4 a a m 12 14 16 Nunnor or acenu Numbr al lamla Fig 1 Average time between hits to cache lines time between hits and the average time from the last access to replacement time before miss against varying numbers of accesses The results correspond to four representative programs note logarithmic scale Anthony M Castaldo UTSACS IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 5 13 39 6 Observations The Structure of L2 Access L1 filters out most Stride 0 access so L2 deals with Stride gt 0 o The access pattern in one cache line tends to be duplicated in other lines Local cache line predictors don39t share information across multiple lines Almost always time between hits lt time between misses a time between hits varies depending on how often a cache line is accessed 0 If we wait as long as time between hits WITHOUT a hit Power off the line Anthony M Castaldo UTSACS IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 6 13 v C IATAC Algorithmic Description 0 Tag is always on to detect prediction errors Also hits are counted even if line is powered off 0 Per Line Variables 0 counter is incremented every access 0 thits is the maximum time between consecutive hits 0 elapsed cycles since line was last accessed 0 decay cycles that must elapse to turn off line 0 wrong a bit to indicate line was prematurely turned off a onoff indicates whether line is powered up or down 0 Global Data all per access count tracked o AcumCounters Number of cache lines replaced 0 GlobalDecay Decay required 0 MaxGlobalDecay Maximum of values to the right of a given access count Anthony M Castaldo UTSACS IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 7 13 6 Implementation UTSA Anthony M Castaldo UTSACS Coarse Counters Only tick every 1000 cycles Heirarchical saves energy Two tested cache sizes Large 4MB and Medium 512KB MINCOUNT Ignore count if fewer recent hits than MINCOUNT Typically 8 MAXCOUNT Decay value for AcumCounters Cache size dependent tested at 1024 512KB and 8192 4MB AcumCounters All are halved if any hits MAXCOUNT Every hit on a tag increment counter check if wrong set decay Every miss on a tag Use thits to update GlobalDecay add to AcumCounter and adjust all if needed then set line decay IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 8 13 6 Evaluation Framework UTSA 0 Cache Decay Hu et al 2002 Kaxiras et al 2001 A Local Information approach relying upon temporal clustering o DecayN Fixed Decay Intervals Turn off cache line if N cycles elapse with no access 0 DecayAdap Adaptive Decay Intervals Dynamically finds decay intervals for each cache line individually but turns off tags and makes compounding mistakes 0 AMCPF Adaptive Mode Control Zhou et al 2001 Whole cache but dynamic decay interval Relies on tags remaining on to properly calculate ideal decay interval and PF 39Performance Factor39 as tuning parameter CACTI 30 timing power and area model for cache memory WATTCH architecture level power and perf simulator Spec2000 benchmark suite simulating lB instructions for each benchmark after 100M instruction 39warmup39 for caches Anthony M Castaldo UTSACS IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 9 13 C Energy Delay Product Measuring Energy Efficiency Exactly what it sounds like Energy Used Multiplied by Gate Delay Execution time o It is so EASY to reduce Energy a Decrease Clock Period 0 Reduce Supply Voltage 0 Reduce Capacitance use smaller transistors 0 But all of these increase Gate Delay and thus Execution Time 0 Processors vary in Energy consumption by a factor of 100 but if we look at Energy X Delay variance is reduced to single digits 0 EDP and later ED2P are measures of Energy Efficiency Anthony M Castaldo UTSACS IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 10 13 c Positive Impact Turned Off L2 Misses Tumsd off cache lines 512 KB Turned off cache lines 4 MB r rseaerz s z 39x39 239s Egsseeigggg ggggg 3 ltgg g ltlt5 Fig 6 L2 turno 39 cache line ratio for the different mechanisms L2 miss ratlo 512 KB L2 miss ratla 4 MB AMC 05 iATAC a gi AMC 05 AME 015 E e deceivide AMC 1125 AME 00315 Fig 7 L2 Iniss ratio for the different mechanisms dlcayAdap AMc 025 AMC o 125 me new AMC 003125 damySIZK ca2049K mum dunaim decayian rim256K MSIZK ducay1024K Anthony M Castaldo UTSACS IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 11 13 v 6 Negative Impact IPC Instructions Per Cycle Degradation 14 bmi b decayde damy128K Anthony M Castaldo IPC loss 512 KB a gtgtquot gt mgr 390 D Fig 5 UTSACS IPC loss 4 MB 221 14 AMC 0 5 AMC 03925 AMC 0125 AMC 10625 IATAC baseline 1 sz dacay1024K dacme decay4099 dacay8192K cayArhp AMC 05 AMG 025 MC 0125 AMC O325 AMC 003125 AMC 003125 IPC degradation for the different mechanisms IATAC A Smart Predictor to Turnoff L2 Cache Lines September 10 2007 IATAC 1213 6 Motivation UTSA 0 Algorithms are the core of many papers 0 The value is in the problem solved not in the code 0 Why should the reader bother learning your Algorithm What does better mean 0 Faster Theoretically in practice or both 0 Fewer resources such as memory disk code size 0 Less error a Improves the average case 0 Improves the worst case a Improves a class of cases 0 Broader applicability o Solves previously insoluble problems 0 Enables a tradeoff between resources eg time for memory 0 Converts static to dynamic or vice versa 0 What is the cost Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 1 15 6 What s In It UTSA What should we expect to find in an algorithmic paper Steps Detailed steps that make up the algorithm Data Inputs Outputs and internal data structures Scope How is the algorithm used Where does it apply Is it globally applicable or limited to a class Limitations Where does it fail Correctness o How do we know it works Theoretical proof Empirical proof 0 Can you demonstrate correctness 0 Complexity Analysis Time Space Error 0 At least the worst case 0 Sometimes the average expected case or distribution data Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 2 15 v C Formalisms UTSA SOme examples follow but the main categorizations are 0 List code Numbered steps for each action with quotGOTOquot statements Control structure is obscure and ideas are buried o Pseudocode Numbered lines of a block structured language Detailed structure is clear but statements are terse and overall idea less obvious o prose code Outline form Numbered major steps with sub numbered component steps combined with explanatory text 0 literate code Details are introduced gradually intermingled with discussions of underlying ideas and even asymptotic analysis proof of correctness or details of key insights Do not use flowcharts or other large diagrams No room for complexity comments and lack modularity Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 3 15 C Formalisms Example of PseudoCode WeightedEditSl 52 1 L1 1ens1 2 L2 1ens2 3 M 2 L1L2 4 F00 O 5 for i from 1 to L1 6 FiO Fi10Mi 7 for j from 1 to L2 8 FOj F0j1Mj 9 for i from 1 to L1 10 C Mi 11 for j from 1 to L1 12 C C1 18 Fij minFi1 j C Fij1 C Fi1j 1C isdiffSli S2j 14 returnFL1L2 Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 4 15 v C Formalisms Example of Prose Code 1 Set Penalty Set p lt 2 ks kt 2 Initialize data structure The boundaries of array F are initialized with the penalty for deletions at start of string for example Eye is the penalty for deleting i characters from the start of s a Set Foyo lt O b For each position i in s set Fiyo lt Fillyo p i c For each positionj in 139 set FOIJ lt Foyjll p j 8 Compute edit distance For each position i in s andj in t a The penalty is C p i j b The cost of inserting a character into 139 or equivalently deleting from s is I Fillyj C c The cost of deleting a character from t is D Fi 1 C d If 51 is identical to tj the replacement cost is set as R lt Fillyjll otherwise we set R lt Fillyjll C e Set Fiyj lt minl D R 4 Return Return stykr Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 5 15 v C Formalisms Example of Literate Code The major steps of the algorithm are as follows 1 Set the penalty 2 Initialize the data structure a dynamic programming array 8 Compute the edit distance We now examine these steps in detail 1 Set the penalty The main property we require of the penalty is that costs reduce smoothly from the start to the end of the string As we will see the algorithm proceeds by comparing each position i in s to each positionj in 139 Thus a diminishing penalty can be computed with the expression p i j where p is the maximum penalty By setting the penalty with a Set p lt 2Xks kt the minimum penalty is p ks kt p2 and the next smallest penalty is p2 1 This means that two errors regardless of the positions in the strings will outweigh one 2 Initialize the data structure a dynamic programming array Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 6 15 v 6 Details Or Not UTSA Algorithms are about ideas Details are distractions 0 Your Audience CS Journal readers or conference attendees 0 Write for those skilled in the art They don39t need you to 0 Code a loop to add up a list of numbers 0 Implement a binary search 0 Write a quicksort o Implement any textbook algorithm 0 Provide the detail needed to implement what you invented reference work by Newton Gauss Djikstra Rivest and Knuth Remember you are presenting a new insight not your wondrous coding style Insights are best understood by one or two key relationships you have discovered When the audience understands these the framework of your algorithm will fall into place The details should address points in the implementation that require guidance only you can provide Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 7 15 v 6 Notation UTSA Use mathematical notation not programming notation 0 Use positional notation X not xi use X not X n 0 Do not use 3939 or 39X39 to denote multiplication use X39 or Avoid specific language syntax 39 39ab039 39a39 39ac39 etc Do not assume your audience programs in C Do not use fori0 iltn i or anything else that requires a knowledge of the syntax of a specific language or programming tool 0 Show nesting by indentation or by numbering style as in an outline don39t use BeginBlock EndBlock curly braces etc 0 Use mathematic shortcuts X H superscripting and subscripting in place of loops function calls or braces 0 Good programming practice may be bad expositional practice Variable names of one character leave no room for ambiguity pq cannot be mistaken for p q Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 8 15 6 Environment Data Structures lO types OS Hardware Anthony M Castaldo UTSACS Describe possible inputs Describe possible outputs If there are hardware constraints specify them and use realistic assumptions current technology or likely improvements on it 0 Memory requirements 0 Offline storage requirements disk non volatile memory 0 Communication speeds Provide data types when ambiguous Be consistent with notation and descriptions lf int and quotintegerquot mean the same thing pick E If they do not define the difference Avoid pseudo code for structures if possible and use mathematical set description eg Each element is a triple string length positions in which positions is a list of byte offsets September 10 2007 Presenting Algorithms 9 15 i916 Performance Methodology How do you evaluate Performance 0 Formal Proof 0 Mathematical Modelling 0 Simulation 0 Experimentation Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 10 15 6 Performance Basis of Evaluation Anthony M Castaldo UTSACS Functionality Is your algorithm useful in more situations Speed Theoretical or Actual Measured how exactly Asymptotic performance Typical performance Real data Synthetic data Compared to o Benchmark 0 Standard Library 0 Specific existing package or canonical algorithm Be realistic Those versed in the art like your reviewers will spot comparisons chosen to make you look good and you may not get a second chance eg submission to a conference September 10 2007 Presenting Algorithms 11 15 6 Performance Big Numbers What are you measuring Is it what people care about Anthony M Castaldo UTSACS CPU time Memory requirements Disk requirements Memory trafficthroughput Disk trafficthroughput fetch time transfer rate Network trafficthroughput Other measures fewer collisions fewer errors more hits more resilient tougher to crack etc Be honest don39t leave out unflattering numbers If you are twice as fast but use ten times as much memory this is a tradeoff some people might want to make September 10 2007 Presenting Algorithms 12 15 39 6 As m totic Com lexit UTSA y p p y How Well Does It Scale 0 OO notation worst case 0 Q notation best case 0 90 notation bounded case 0 A Common Abuse Big O is often used to indicate complexity as in quotcomparison based sorting takes Onlog n time which is clear and acceptable even though formally we would say quotcomparison based sorting has an operation complexity of Qnog Be careful with ambiguous words quotquadraticquot quotconstantquot quotlinearquot quotlogarithmicquot quotexponentialquot Anthony M Castaldo UTSACS Presenting Algorithms September 10 2007 13 15 C Asym ptotic Com plexity Problems with Problems Anthony M Castaldo UTSACS Database Number of records Record length too An integer operation is unit cost Even in cryptography Does the dominant cost change with scale On disk accesses Onlog n comparisons Onlog n complexity But disk takes 5000000 nanoseconds and comparison takes 1 What happens when the problem size exceeds the various cache sizes What happens when it exceeds real memory Does the limit matter Consider floating point error where the bound for a certain widely used class of data is hundreds of thousands of times what we see in practice Limits are just caps and not always proportional expectation predictors Are your simplifying assumptionsjustifiable and reliable eg can you really assume the input data for a sort is randomly arranged and not already in order September 10 2007 Presenting Algorithms 14 15
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'