Adv Computer Architecure
Adv Computer Architecure ECE 4100
Popular in Course
Popular in ELECTRICAL AND COMPUTER ENGINEERING
This 0 page Class Notes was uploaded by Cassidy Effertz on Monday November 2, 2015. The Class Notes belongs to ECE 4100 at Georgia Institute of Technology - Main Campus taught by Hsien-Hsin Lee in Fall. Since its upload, it has received 8 views. For similar materials see /class/233865/ece-4100-georgia-institute-of-technology-main-campus in ELECTRICAL AND COMPUTER ENGINEERING at Georgia Institute of Technology - Main Campus.
Reviews for Adv Computer Architecure
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/02/15
h Georgia mgst w H Techrmn toggw I I Memory Hierarchy m a Multiprocessor Shared cache Busbased shared memory Fullyconnected shared memory Dancehall lnterconneciion Network IE III Cache Coherency 0 Closest cache level is private 0 Multiple copies of cache line can be present across different processor nodes 0 Local updates Lead to incoherent state Problem exhibits in both writethrough and writeback caches 0 Busbased 9 globally visible 0 Pointtopoint interconnect 9 visible only to communicated processor nodes Georgia i r Tech Example Writeback Cache Georgia i r Tech f IE I I Example Writethrough Cache Memory Georgia i r 39 Tech IE I l Defining Coherence An MP is coherent if the results of any execution of a program can be reconstructed by a hypothetical serial order Implicit Definition of coherence Write propagation Writes are visible to other processes Write serialization All writes to the same location are seen in the same order by all processes to all locations called write atomicity Eg wl followed by w2 seen by a read from P1 will be seen in the same order by all reads by other pro Geeseas quot essors Pl Sounds Easy A0 B0 T1 A1 82 T2 A1 A1 82 4 32 T3 A1 A1 gtI 32 B22 B2 A1 T3 A1 A21 82 82 82 4 B2 A1 gt A1 See A s update before B s See B s update befOre A s Georgia 7 Tech If l ll Bus Snooping based on WriteThrough Cache 0 All the writes will be shown as a transaction on the shared bus to memory 0 Two protocols Updatebased Protocol lnvalidationbased Protocol Gear ia39l 39 39Tegch BusSnooping IE Iquot I Updatebased Protocol on WriteThrough cache 0 Each processor39s cache controller constantly snoops on the bus 0 Update local copies upon snoop hit Geor ia39i 39 39Tegch BusSnooping I i Iquot I lnvalidationbased Protocol on WriteThrough cache gt Bus transaction gt Bus snoop 0 Each processor39s cache controller constantly snoops on the bus 0 Invalidate local copies upon snoop hit Georgia 39 w Tech A Simple Snoopy Coherence Protocon i I for a WT No WriteAllocate Cache PrWr I BusWr Per I Bust BUSWrI Observed Transaction gt Processorinitiated Transaction quotV Bussnooperinitiated Transaction Geor ia39i 39 39Tegch I How about Writeback Cache 0 W8 cache to reduce bandwidth requirement 0 The majority of local writes are hidden behind the processor nodes 0 How to snoop 0 Write Ordering Gear ia39i 39 39Tegch if IE I I Cache Coherence Protocols for WB caches 0 A cache has an exclusive copy of a line if It is the only cache having a valid copy Memory may or may not have it 0 Modified dirty cache line The cache having the line is the owner of the line because it must supply the block Georgia l r Tech f Cache Coherence Protocol I In I Updatebased Protocol on Writeback cache Store X update gt Bus transaction Memory 0 Update data for all processor nodes who share the same data 0 For a processor node keeps updatingthe memory location a lot of traffic will be incurred Georgia l r Tech f Cache Coherence Protocol IE Iquot I Updatebased Protocol on Writeback cache Ht u pdate u pdate gt Bus transaction 0 Update data for all processor nodes who share the same data 0 For a processor node keeps updating the memory location a Iotoftrafi39lc will be incurred Georgia 39 Tech Cache Coherence Protocol IE Iquot I lnvalidationbased Protocol on Writeback cache Store X gt Bus transaction 0 Invalidate the data copies for the sharing processor nodes 0 Reduced traffic when a processor node keeps updatingthe same memory location Georgia 39 w Tech Cache Coherence Protocol IE Iquot I lnvalidationbased Protocol on Writeback cache gt Bus transaction gt Bus snoop o Invalidate the data copies for the sharing processor nodes 0 Reduced traffic when a processor node keeps updatingthe same memory location Georgia 39 Tech Cache Coherence Protocol I In I lnvalidationbased Protocol on Writeback cache Store X Store X Sto 1 gt Bus transaction gt Bus snoop Invalidate the data copies for the sharing processor nodes Reduced traffic when a processor node keeps updatingthe same memory location Georgia 39 w Tech I I MSI Writeback lnvalidation Protocol 0 Modified Dirty Only this cache has a valid copy 0 Shared Memory is consistent One or more caches have a valid copy 0 Invalid 0 Writeback protocol A cache line can be written multiple times before the memory is updated Georgia i r Tech f l l MSI Writeback lnvalidation Protocol Two types of request from the processor Per PrWr Three types of bus transactions post by cache controller Bust 0 Per misses the cache 0 Memory or another cache supplies the line Bust eXcIusive Readtoown o PrWr is issued to a line which is not in the Modified state BusWB o Writeback due to replacement 0 Processor does not directly involve in initiating this operation Geese quot 2 MSIWritebacklnvalidation ProtocolE I Processor Request PrWr BustX PrWr Per F er PrWr BustX Per Bust gt Processorinitiated Georgiaquot 39 Techquot MSIWritebacklnvalidation ProtocolE Bus Transaction Bust Flush I Buskg BustX Flush BustX 39 Flush data on me DU Both memory and requestorwm grab the copy The requestor get data by 7 ach ertorcache Ira nsfer or 7 Memory gtgt Bussnooperinitiated Geor iaquot 39 r 39Tegch MSI Writeback lnvalidation ProtocolE I Bus transaction Another possible implementation Anotner possibie vaiid irnpiernentation Anticipate no more reads from tn is processor A performance concern BunglFlush Save quotinvaiidation Irip me reqcrest39ingcac i iewnteslttesnared iine iater quot39quotF Bussnooperinitiated Georgia 39 iu 39Techquot ia39 l ll MSI Writeback lnvalidation Protocol PrWr BustX PrWr Per Per 2 quotl BustX Flush fBustX PrWr BustX Per Bust gt Processorin39 39ated gtgt Bussnooperinitiated Georgia 39 39 Techquot MSI Example MSI Example 4 u Bust MEMORY sz MSI Example 4 u I BustX MEMORY sz MSI Example MEMORY X25 Bust MSI Example w Vluua uu I u MEMORY X25 I I MESI Writeback lnvalidation Protocol To reduce two types of unnecessary bus transactions BustX that snoops and converts the block from S to M when only you are the sole owner of the block Bust that gets the line in 8 state when there is no sharers that lead to the overhead above Introduce the Exclusive state One can write to the copy without generating BustX Illinois Protocol Proposed by Pamarcos and Patel in 1984 39 Employed in Intel PowerPC MIPS Gegrggciiar 7 MESI Writeback lnvalidation Protocou i I Processor Request Illinois Protocol PrWr Per PrWr Per Bust PrWr BustX notS 5 Shared gt ProcessorInltlated Per Per Bust 8 Gear ia39i 39 39 Tegch MESI Writeback lnvalidation Protoct I Bus Transactions Illinois Protocol ii ii ip ioi 407 trail i lldul i Use a Selection algorlmm irtnere are multiple suppliers Mostor tne MESl implementations simply erte to m p supply tne data Alternative add an 0 state or rorce update memory mory 314ng Flush Norn u BustIX quot l x Bus Rd Flush BustX Pl39ush quoti 51st Flushquot BusBdX liltian gtgt Bussnooperinitia Georgia 39 Flush Flush for data supplier no action for other sharers TeCh MESI Writeback lnvalidation Illinois Protocol PrWr Per BusBd Flush Per Bust notS BUS4Ed Flush BustX F39iush aquot I I BustX Fush S Shared Signal gt Processorinitiated Fer BUSRd S Bussnooper initiated Georgian Fush Flush for data supplier no action for other sharers Tech 39 33 MOESI Protocol Add one additional state Owner state Similar to Shared state The 0 state processor will be responsible for supplying data copy in memory may be stale Employed by Sun UltraSparc AMD Opteron In dual core Opteron cache to cache System Request Interface transfer is done through a system Crossbar request interface SRI running at full CPU speed H II Mem Hyper Controller Transport Georgia Tech i 34 CPU Pead mt Cache State Transitions Based on CPU Requests CPU Pead miss piace read mrss on bus CPU wrrte piace wrrte mrss CPU Write miss Write back cache biockr piace wrrte CPU Pead hit miss on bus CPU Write hit me S Vaamancmh Georgia r r Tech Cache State Transitions Based on Bus Requests invahdate forms biock Wrrte mrss for mrs biock Wrrre mrss rormrs biock wrrre back biockr abort memory access CPU read miss Pead miss forms biock Write back biockr abort memory access meS Vaamancmh Georgia r Tech IE I Implication on MultILevel Caches How do you guarantee coherence in a multilevel cache hierarchy Snoop all cache levels Intel39s 8870 chipset has a snoop filterquot for quadcore Maintaining inclusion property Ensure data in the outer level must be present in the inner level Only snoop the outermost level eg L2 L2 needs to know L1 has write hits 0 Use WriteThrough cache 0 Use Writeback but maintain another modifiedbutstalequot bit in L2 Georgia 39 Tech IE I l Inclusion Property 0 Not so easy Replacement Different bus observes different access activities eg L2 may replace a line frequently accessed in L1 Split L1 caches Imagine all caches are direct mapped Different cache line sizes Georgia39 39 Tech I III Inclusion Property 0 Use specific cache configurations Eg DM Li bigger DM or setassociative L2 with the same cache line size 0 Explicitly propagate L2 action to L1 L2 replacement will flush the corresponding L1 line Observed BustX bus transaction will invalidate the corresponding L1 line To avoid excess traffic L2 maintains an Inclusion bit for filtering to indicate in L1 or not Georgia l r Tech I l Directorybased Coherence Protocol I Memory iDDIDlllDDDI ad Modmed m Presence me one for each node 0 Snoopingbased protocol Directory N transactions for an Nnode MP All caches need to watch every memory request from each processor Not a scalable solution for maintaining coherence in large shared memory systems Directory protocol Directorybased control of who has what HW overheads to keep the directory lines processors Directorybased Coherence Protocol 39 P P PPP Int ection Network Ck 0 o Ck1 o o 1 Ckj I 1 modified bit for each cache block in memory D 1 presence bit for each processor each cache block in memory GED ia IE5 II Directorybased Coherence Protocol Limited Dir P0 P1 39 n Network I 1 modified bit for each cache block in memory Presence encoding is NULL or not B Encoded Present bits Ig2N each cache line can reside in 2 processors in this example Georgia LvTech 1 39 42 Finn 39 39 l3 r I 1 4 i Distributed Directory Coherence Protocol l I 519199116 Riggerde Mg0 f Centralized directory is less scalable contention Distributed shared memory DSM for a large MP system Interconnection network is no longer a shared bus Maintain cache coherence CCNUMA Each address has a home mean 7 i 43 Some Additional Concepts Local node generates a memory reference RemOte Ode has 3 COPY 0f bIOCk Generating the request Dir Memory Messages are received in the order sent Directory entry indicates state of cached blocks and the members of the sharing set memory location of a memory reference From S Yaamanchii Ge rglia 351 o Tech 44 Snoop bus Qirectdi QirectoFy Interconnection Network Stanford DASH 4 CPUs in each cluster total 16 clusters lnvalidationbased cache coherence Directory keeps one of the 3 status of a cache block at its home node Uncached Shared unmodified state 0 Dirty Ge rgia Tech 39 r 45 Snoop bus Snoop bus Qireetd r IJAJRIAJ k IIILCIUUIIIICULIUII IVCLV r Processor Level Local Cluster Level Home Cluster Level address is at home If dirty needs to get it from remote node which owns it Remote Cluster Level Ge rgliairgu i Tech 46 i I i i i v 1 g i J V J k Directory Coherence Protocol Read Miss E Miss 2 read Data Z is shared clean Georgia Tech I 47 w i i i quot 39 J i 1 quot h Directory Coherence Protocol Read Miss Data Z ileis Shaved by 3 nodes Geri 13 i 1quot Il TZ 48 i I i i l i g 9 i l i J v k Directory Coherence Protocol Write Miss E Miss 2 write Write Z can proceed in P0 Geargiai 39Te chft l 49 I Directory Protocol Some General Features The sharing set is the set of processors with a copy of a memory block Implementation Bit vectors and fully mapped entries Linked lists When using a linked directory Update messages are propagated Requester is added to the head of the list lnvalidations reduce the size of the sharing set while updates increase their size Updates reduce new requests for the line lnvalidations increase network traffic From S Yaamanchii Gen ia39ji quot 50
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'