Date Created: 09/19/15
CISC26O Machine Organization and Assembly Language Memew hieramhy And Cache cichEO Liao um Emma 59 eeemLmLoetzzzzzztlzzzzzzlzz a a s g as g a 135 a g g g g 1 9 um quotMy 59 Q 989 nknln ATnxaniulTnT niwiuWWoW Txm a EoEmE E oa beeu Joe 3amp0 nKnlnTiiuknTnT A 485 v 83 L 835 eouewmued 0 Fact Large memories are slow and fast memories are small How do we create a memory that gives the illusion of being large cheap and fast most of the time With hierarchy With parallelism cichGO liao Speed Techno ogy ComGB Access Time SRAM 10000 1 HS Cache DRAM 100 100 I15 Main Memor Hard Dwsk 3951 10000000 us Virtual Memory gt Capacity mscib Lwan Locality Principle of Locality I Programs tend to reuse data and instructions near those they have used recently or that were recently referenced themse ves I Temporal locality Recently referenced items are likely to be referenced int e near future I Spatial locality Items with nearby addresses tend to be ref erenced close together In time Locality Example Data 1 lt M iv 11 Reference array elements in succession stride1 reference pattern Spatial locality Reference sum each iteration Temporal locality instructions Reference instructions in sequence Spatial locality Cycle through loop repeatedly Temporal locality man Lian quot Two questions to answer in hardware Q1 How do we know if a data item is in the cache Q2 If it is how do we find it 0 Direct mapped For each item of data at the lower level there is exactly one location in the cache where it might be so lots of items at the lower level must share locations in the upper level cichGO liao Direct mapped 230Word Main Memory 23Word Cache usaso use By e Mammy 1 Tai 111 Se e Address FFFFFF E A Memory Address Benlry x 12732bn SRAM Dala msczs Uau Example run the code on 8sets oneword block cache which is initially empty addi t0 loop beq KitO lw litl lw t2 lw t3 addi KitO j loop done What locality is being used What is the miss rate Memory Address 0 5 0 done Ox40gt OxC0gt Ox80gt t0 71 Set7111 Setsmo 8915101 Set 0 000 cisc260 Liao Block Byte Set Offset Offset Tag 0W 100100 M Address 800000 9 C E uck By e Tag Set Offsetmfset Data I 1 Sel 1 I Set 0 132 132 32 132 4 0 d O o 32 Data mscib Lwau Example run the code on 2sets fourword block cache which is initially empty addi t0 0 5 loop beq 130 0 done lw tl Ox40 lw 122 OxC0 lw t3 Ox80gt addi tO tO 71 339 loop done What locality is being used What is the miss rate Block Byte Memory Ta SetOlfsetOlfsel Address OOOO V Tag Dam I I ISm1 0000 mem0x000C mem0x0008 mem0x0004 mem0x0000 Seto cisc260 Liao Memory and Cache performance Number 0 misses MISS Rate f 1 Hit Rate Number of total memory accesses 7 Hit Rate Iumber of huts 1 Miss Rate Number of total memory accesses Suppose a program has 2000 data access instructions loads or stores and 1250 of these requested data values are found in the cache The other 750 data values are supplied to the processor by main memory or disk memory What are the miss and hit rates for the cache Solution The miss rate is 75020 5 2 The hit rate 12502000 2 0625 l 7 037 5 2 cisc260 Liao Suppose a computer system has a memory organization with only two levels of hierarchy a cache and main memory with access times and miss rates given as follows Access Time Miss rate Cycles Cache 1 10 Main Memory 100 0 What is the average memory access time Solution The average memory access time is 1 01 100 11 cycles cichEO Liao Memory Performance Impact on Performance Suppose a processor executes at ideal CPI 11 50 arithlogic 30 ldst 20 control and that 10 of data memory operations miss with a 50 cycle miss penalty CPI ideal CPI average stalls per instruction 11cycle 030 datamemopsinstr x 010 missdatamemop x 50 cyclemiss 11 cycle 15 cycle 26 so 58 of the time the processor is stalled waiting for memory A 1 instruction miss rate would add an additional 05 to the CPI Cache block conflict eg Run this code in a machine with a 8 word cache addw MO 190 5 Memory loop beq KStO 0 done Address 1w t1 0x460 1w t2 0x240 addw t0 t0 71 j 100p done What is the miss rate mscie Lwan In a Rowmajor arrangement for example the elements for a 3x3 matrix A are stored in memory like A00 A0 1 A0 2 A1 0 A1 1 A1 2 A2 0 A2 1 A2 2 In a Columnmajor arrangement for example the elements for a 3x3 matrixA are stored in memory like A00 A1 0 A2 0 A0 1 A1 1 A2 1 A0 2A1 2 A2 2 Therefore the order of looping through the matrix indices is essential for spatial locality Sum 0 Sum O39 foriO ilt2 i forjO jlt2 j forJ0 Jlt2 J foriO ilt2 i sum ADJ sum AUJ Locality No locality cich O liao Advanced issues CISC 360 Cope with cache block conflicts multiway associative cache write data cichEO Liao


