Popular in Course
verified elite notetaker
Popular in Computer Information Technology
verified elite notetaker
This 11 page Class Notes was uploaded by Angelina West on Monday September 28, 2015. The Class Notes belongs to CIT595 at University of Pennsylvania taught by D.Palsetia in Fall. Since its upload, it has received 21 views. For similar materials see /class/215408/cit595-university-of-pennsylvania in Computer Information Technology at University of Pennsylvania.
Reviews for DIGSYSTEMORG&DESIGN
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/28/15
Cache CIT 595 Spring 2007 Cache The purpose of cache memory is to speed up data accesses for processor by storing information in faster memory made of SRAM gt SRAM access time is 3ns to 10ns gt DRAM access time is 30ns to 90ns The data stored in cache is data that the processor is likely to use in the very near future gt SRAM is fast but has less memory so store only a subset of the data stored in main memory CIT 595 Basic Terminology Memory is divided into blocks Each block contains xed numbers of words Word size of data stored in one location eg 8 bits 16 bits etc One block is used as the minimum unit of transferbetween main memory and cache Hence each location in the cache stores 1 bloc CT595 00 BlockO 01 10 11 Block1 Main Memory Cache Mapping Scheme Ifthe CPU generates an address for a particular word in main memory and that data happens to be in the cache the same main memory address cannot be used to access the cache Hence a mapping scheme is required that converts the generated main memory address into a cache location Mapping Scheme also determines Where the block will placed when it originally copied into the cache CIT 595 Address Conversion to Cache Location Address Conversion is done bygiving special signi cance to the bits of the main memory address The address is split into distinct groups called fieids gtJustlike instruction decoding is done based on certain bitiields The group elds are a wayto ind gtWhich cache location 7 gt Which Word in the bio ck 7 gt Whether it is the right data are looking for 7 Some kind of unique idertifier cirsss consisting of N blocks of T R quot7 cache ie N locations block Xofmain memory maps to cache block Y X mod N Eg ifwe have 10 blocks of cache block7 of cache may hold blocks 7 17 27 37 of main memory cirsss Mapping Scheme 1 Direct Mapped Cache In a direct mapped cache m i in ltu mm Direct Mapped Scheme Address Conversion nbt main memory address Word which block in word Block Which location in Cache Tag unique identi erwrt one block Note Tag is used to distinguish whether main memory block 7 or 17 is stored in cache block 7 cirsss cirsss Cache with 4 blocks and 8 words per block Example of Direct Mapped Scheme Su pose our memory consists of 2M locations or words and cache has 16 24 blocks and each block holds 8 words Thus main memory is divided into 21423 2M blocks Ofthe 14 bit address we need 4 bits for the block eld 3 bits for the word and the tag is what s lelt over 7 m 4 m a bus Black Wald M uns 4 on 595 Direct Mapped Cache with 16 blocks Block No Tag Data on 595 11 n Eache Indexing and Data Retrieval cquot 595 Desired Word iftags m atch 11 3911 Example of Direct Mapped Cache contd Suppose a program generates the address 1AA ln 14bit binary this numberis 000001 1010 1010 The rst 7 bits ofthis address go in the tag eld the next 4 bits go in the block eld and the nal 3 bits indicate the word within the block slack Word lt Mons gt 11712 on 595 Direct Mapped Cache Example Black No Tag mwa xo 0000011 14 15 ci39r595 1AB 1AA 1143 Direct Mapped Cache Example contd However if the program generates the address 3A3 3AB also maps to block 0101 but we will not find data for 3AB in cache gtTags will not match ie 0000111 of addr 3AB is not equal to 0000011 of addr 1AB Hence we get it from main memory The block loaded for address 1AA would be evicted removed from the cache and replaced by the blocks associated with the 3AB reference on 595 11 14 Direct Mapped Cache with address 3AB Data 0000111 cIT595 3A8 11 15 Disadvantage of Direct Mapped Cache Suppose a program generates a series of memory references such as 1A3 3A3 1A3 gtThe cache will continually evict and replace blocks The theoretical advantage offered by the cache is lost in this extreme case Other cache mapping schemes are designed to prevent this kind of thrashing on 595 11 16 Address Breakup Why ls the address broken up ln a partlcular manner 7 Tag Thls ls done lr we use hlghomer memory address Interleavmg gt Due Io spallal locallly data from consecullve addresses are brought lnlo cache lrthe hlgher order blts I e blts used for tag are used for determlnlng cache lo atlon then values from consecutlve addresses would map to same ocatlon ln cache gt m less lhrashlng clrsas 1147 Val d Cache block How do we know whether the block ln cache ls valld or not For examp e gtWhen processorlust starts up the cache wlll be empty and tag elds ln each locatlon wlll be meanlngless gtThus tag elds must be lgnored lnltlally when the cache ls startlng to ll up For valldlty another blt called Valld blt ls added to the cache lndlcate whether the block contalns valld lnformatlon 0 e not valld 1 evalld gtAll blocks at start up would be not valld gtlf data from maln memory ls got lnto cache for a partlcular block then valld blt for that eld ls set gtValld blt wlll contrlbute to the cache slze clrsss 11 V15 Direct Mapped Cache w39 h Val d V Field Block Add reSS 3AB 3 555 Cache block 11 V15 Calculating Cache Size suppose our memory conslsts of 2M locatlons orwords and cache has 15 24 blocks and each block holds 3 words m m There are 15 locatlons ln the cache h blts for tag 3 word as 7 1 valld blt Assume 1 word ls 8 blts the total blt s slnrow8x871 72 bytes cache slze e 15 x 9 bytes 7 144 bytes CIT 555 11 72D t or Miss in the Cache Hit means that we actua y found data in the cache A hit Occurs wh n valid blt s 1 AND tag in the cache matches the tag fleld of the address If both condit data in cache Some more Terminology The hit rate is the percentage of time data is found at a g v eh memory level The hit time is the given memory level The miss penalty s the Scheme 2 Ful y Associative Cache erhory blocks in Specifc cache n memory address we Could al ow block to go anywhere in cache up before any This way cache would have to f blocks are ev cted This is how fully associative cache worllts oned into only two A memory address flelds the tag and the word Fu Iy Associat ve Cache Address Convers on When the cache is searched n parallel to retrieve the data d gtMore hardware cost than direct mapped gtBasically we need n comparators where n of blocks in cache all tags are searched u CKIy Fully Associate Which block to replace if cache is full Recall that direct mapped cache evicts a block whenever another memory reference needs that block V th fully associative cache we have no such mapping thus we must devise an algorithm to determine which block to evict from the cache The block that is evicted is called the victim block There are a number of ways to pick a victim we will discuss them shortly 01595 1115 Scheme 3 Set Associative Set associative cache combines the ideas of direct mapped cache and fully associative cache A set associative cache mapping is like direct mapped cache in that a memory reference maps to a particular location in cache But that cache locatrbh can hold more than one main memory block The cache location is then called a set gtnstead ofmapping anywhere in the entire cache fully associative a memory reference can map onlyto the subset of cache Cir 595 11 15 Scheme 3 Set Associative The number of blocks per set in set associative cache varies according to overall system design For example a 2way set associative cache can be conceptualized as shown in the schematic below gt Each set contains two different memory blocks Tag amuorsa Van Tag ccoouwn WmdsAE a r VlmdsLMN E a Black l Lil sat Valid c I 01595 Kway set associate cache will have K blacks per set 11 Scheme 3 Address Conversion Tag Set Word Like directmapped cache except middle bits ofthe main memory address indicate the setin cache Cir 595 117m KSet Associative Cache Example SuppDSE we have a maih memory at 214 icicaticihs This memory is mapped tci a 2way set associative cache having 16 blocks where each block cphtaihs 8 Words Number pr Sets Number pr Blocks ih cache K gt Sihce this is a way cache each setccihsists pr 2 blocks ahdthere are i6 sets i e i62 8 Thus we heed 3 pits rcir the set 3 pits rprthe Wurd giving E lE DvErbits rcir the tag Eh39 3 bits cn ass 11 729 Advantage amp Disadvantage Set Associative Advantage gtUriiillte direct mapped cache ir ah address maps to a set there is choice ror piacihg the hew block gt if both SlOtS are filled then We need an algorithm that Will decide Which old blockto evict l llte fully associative Disadvantage gt Tags or each blockiri a set heed to be matched ih paraiiei to rigure out whether the data is preseht ih cache Need k comparators gtAlthough the hardware cost ror matchihg is iess thah fully associative need h comparators where h blocks but it is ore thah direct mapped need only che compara or on sss 1171quot Which block to replace Mth fully associative and set associative cache a replacement algorithmpolicy needs to be used when it becomes necessary to evict a block from The replacement policy that we choose depends upon the locality that we are trying to optimize Eg ifwe are interested in temporal locality ie referenced memory is likely to be referenced again soon eg code within a loop then we will keep the most recently used blocks on ass 11 711 Replacement AlgorithmPolicy LRU Least recently used Evicts the block that has been unused for the longest period of time The disadvantage ofthis approach is its complexity LRU has to maintain an access history for each block which will slow down the cache cn sss 11712 Replacement AlgorithmPolicy FIFO Firstin rstout In FIFO the block that has been in the cache the longest regardless of when it was last used Random Replacement Does what its name implies t picks a block at random and replaces it with a new block Random replacement can certainly evict a block that will be needed often or needed soon but it never thrashes like in the case of directmapped cache cIT595 1133 What about blocks that have been written to While your program is running it will modify some locations We need to keep main memory and cache consistent if we are modifying data We have two options gtShould we update cache and memory atthe same time OR gtUpdate the cache and then main memory at a later time gtThe two choices are known Cache Write policies on 595 11 34 Cache Write Policies WriteThrough Update cache and main memory simultaneously on every write Advantage gt Keeps cache main memory consistent at the same time Disadvantage gt All writes require main memory access bus transaction gt Slows down the system If the there is another read request for main memo due to miss in cache the read request has to wait until the earlier write was serviced cIT595 11 35 Cache Write Policies contcl Write Back or Copy Back Data that is modi ed is written back to main memory when the cache block is going to be evicted removed from cache Advantage gt Faster than writethrough time is not spent accessing main memory gt Writes to multiple words within a block require only one write to the mainmemory Disadvantage gt Need extra bit in cache to indicate which block has been modi ed gt Like valid bit a another bit is introduced called Dirty Bit to indicate a modi ed cache block gt o Not Dirty 1 Dirty modi ed gt Adds to size ofthe cache on 595 11 36 Direct Mapped Cache with Valid and Dirty Bit I Dirty Words Within one block D Dirty Bit V Valid Bit cIT595 11 37 What affects Performance of Cache Programs that exhibit bad locality Eg Spatial Locality with matrix operations gt Suppose Matrix data kept in memory is by rows known as rowmajor ie offset rowquotNUMCOLS column Poor code gtforj 0 lt numcolsj fori 0 i lt numrows i gt Le xii followed byxi 11m gt The array is being accessed by column and we going to miss in he cache every ime Solution switch the for loops CC are rowmajor FORTRAN amp MATLAB is columnmajor CIT 595 11 38 Multilevel Cache Most of today s systems employ multilevel cache hierarchies The levels of cache form their own small memory hierarchy Current day processor uses Level1 cache 8KB to 64KB is situated on the processor itself gt Access time is typically about 4ns Level 2 cache 64KB to 2MB located external to the processor gt Access time is usually around 15 20ns cIT595 11 39 Multilevel Caches contcl In a multilevel cache If the cache system used an inclusive cache the same data may be present at multiple levels of cache Strictly inclusive caches guarantee that all data in a smaller cache also exists at the next higher level Exclusive caches permit only one copy of the data The tradeoffs in choosing one over the other involve weighing the variables of access time memory size and circuit complexity CIT 595 11 40 Instruction and Data Caches A uni ed or integrated cache is one where both instructions and data are cached Many modern systems employ separate caches for data and instructions gtThis is called a Harvard cache Advantage gtAllows accesses to be less random and more clustered gtLess access time than unified cache typically larger CT595 11 41 Review of Cache Organization Q1 Where can a block be placed in the cache level Mapping scheme Q2 How is a block found if it is in the cache Mapping Scheme Q3 If cache is full then Where do we putthe new block ie which old block should we replace Block replacement policy main memory atthe same time Write Policy Q4 If we write to a block in cache should we update the 39 CIT 595 11 42 Looking Forward Studied interaction between cache and main memory gt The memory is managed by hardware Next study the interaction between main memory and disk gt The memory is managed by hardware and complier CT595 11 43