Class Note for CMPSCI 635 at UMass(2)
Class Note for CMPSCI 635 at UMass(2)
Popular in Course
Popular in Department
This 10 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 14 views.
Reviews for Class Note for CMPSCI 635 at UMass(2)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 02/06/15
Cache Basics CMPSCI 635 Modern Computer Architecture Univ ofMass Amherst Lecture Outline l A briefhistory of computer memories I The memory hierarchy l Cache Basics I Motivation l Cache metrics l Cache implementation I Set associativity of cache Early Memories l Vacuum tubes relays l Cathode ray tubes l Acoustical delay lines I Magnetic drums l Magnetic cores I Punch cards I Openereel tape Modern Memory l Semiconductor based RAM l Magnetic disk I Magnetic or optical tape l Optical disk CDs DVDs I Magnetoeoptical disk I Flash memories Memory Technology I Memory technology is mature I Any new technology has a lot of catching up to do to be competitive I New memory technologies have become much more difficult to introduce I Memory needs are currently driven by entertainment industry DRAM vs SRAM I Dynamic RAM DRAM is implemented using capacitors to store charge I DRAM must be periodically refreshed due to capacitor leakage I SRAM uses ip ops that do not need refreshing thus SRAM is faster I SRAM is more complex and expensive DRAM vs SRAM SELECT SRAM DRAM 5 SELECT PLATE DRAM in Detail waym 1 MW D li H L Memory Hierarchy We use a memory hierarchy so that we can get varying amounts of cost per performance in a system A m sesaman uun Jed 1500 Performance increases The Performance Gap myth E CPU 8 Memory 1 U 2 a O C U E g a n 1980 Time years 2005 The Performance Gap truth I CPU performance cannot increase exponentially with respect to memory I Access time to main memory is typically N100 cycles I This gap is very signi cant I Opening gaps are between on and off chip memory and registers and L1 cache Caches I Probabilistic attempt to keep useful data closer to the CPU I Important because of the performance distance between CPU and main memory I Higher in the memory hierarchy than main memory so it is faster and more costly I Operates on the assumption that memory accesses exhibit may Basic Properties of Cache l Inclusion copies of a value exist in all levels below I Coherence all copies are the same I Locality programs access a limited portion of the address space during any window in time I May not strictly hold at all times Cache Parameters l Access time CPU to memory I Size bytes l Cost per unit byte l Transfer bandwidth sustained l Unit of transfer aka line size bytes moved I Set associativity Hit ratio l Probability that an access hits at a given level I A hit is an access to something already in cache I Miss ratio 1 iHit ratio l Access frequency is the product of the hit ratio for a given level and the miss ratios of all levels above it I Cache performance is not completely dependent on hit ratio Average Access Time I Sum of access frequencies for all levels multiplied by their corresponding access times I Assumes no bypass l Assumes that data remdes m the lowest level data is always present I Good design depends on simulation Average Access Time Average Memory Access T1me L1 H1t hrne L1 M155 rate L1 M155 penalty L1 M155 penalty L2 H1t hrne L2 M155 rate L2 M155 penalty L2 M155 penalty L3 H1t hrne L3 M155 rate L3 M155 penalty Example In 1000 memory references there are 40 misses in L1 and 20 misses in L2 Assume all are reads with no bypass L1 Hit time 1 cycle L2 Hit time 10 cycles Miss penalty 100 cycles Xhat is the average memory access time Answer L1 Local m155 rate L1 Local m155e5 Total references 401000 4 L2 Local m155 rate L2 Local m155e5 L1 Local m155e5 2040 50 Average Memory Access Time L1 H1tT1me L1 M155 Rate L2 H1tT1me L2 M155 Rate L2 M155 Penalty 1 4 10 50 100 34 clock cycles Reducing Avg Access Time I Make memory faster costly l Reduce cache miss rate I Try to exploit more om29 l Change cache parameters I OpthlZe code I Make cache bigger I More elaborate schemes later Programmed Transfer I We explicitly specify transfers from main to secondary memory I We can also explicitly specify transfers from registers to memory I No language exisE to provide information for better management of cache l Compilers can be used to optimize for the cache Linguistic Issues l Languages restrict compiler context I Higher level constructs could enable scheduling of more transfers l Compiler might reorder accesses to improve locality l Currently done for loops between functions is hard irregular structures are even harder Types of Locality l Temporal recently accessed data tends to be accessed again in the near future I Spatial data accesses tend to be close together I Sequential instructions also some data tend to be accessed in consecutive order Spatial Locality l Widely found in codes I Array accesses l Instruction streams l Spatial locality implies temporal locality but not v1ce versa l Exploited by larger line size I Similar to sequential locality Temporal Locality l Tends to hold for many codes I Arrays 111 loops l Linked lists I Local variables constants l Exploited by smaller line size I Does NOT imply spatial locality Sequential Locality l Especially true of instructions I Subset of spatial locality l Locality differences between data and instructions argue for split instruction separate from data caches Distinguishing Localities l Spatial hit A hit that occurs on a word already in cache that has not been previously accessed l Temporal hit A hit that occurs on a word already in cache that Q been previously accessed l Eviction erases history with respect to these de nitions Lines in Cache l A line stores a memory block which is comprised of contiguous words I Increased line sizes increase the probability that contiguous data is moved into the cache on a miss I Increasednumber oflines decrease the probability that a speci c line is evicted Basic Cache Structure l Comprised of lines which store contiguous blocks l Each line has a tag high order bits to keep track of which memory block is present I Position in cache is determined by low order bits except in fully associative Fully Associative Cache E a 1391 E Fully Associative KL I Pros I Any memory block can go anywhere in cache l Cache can be fully utilized I High hit rate does not suffer con ict misses l Cons I Requires one comparator per line impractical I Fan in delays for compare and access I Replacement algorithms become impractical Direct Mapped data out hit Direct Mapped SL I Pros I Only one comparator I Direct fetch and test of tag I Fast access I Cons I Low utilization I Many con ict misses I Lower hit ratio Cache Associativity I Cache can be divided into sets I K ways divide cache into sets with K lines per set I Number of sets is S I LK where L is number of lines in the cache I A block may go anywhere its associated set I Associativity is directly related to con ict misses 4way Set Associative address 4 to 1 multiplexer K way Associative I An effective balance between the speed of direct mapped and the performance of fully assoc I K wide comparator almost as fast and cheap as direct mapped I Hit ratio is almost like fully associative I Up to K blocks with the same mapping may be in cache at once without conflicting Example I 4way set associative cache with 16K 4word lines 1 word 4 bytes l 256K byte capacity I 4K sets I 4 ways 4 comparators l Each memory address can map to 4 locations in cache Next Lecture I More on reasons for miss I Cache traces l Running an example on I Generaung I Analyzrng l Statistical correctness of 1O