Computer Architecture CS 6810
Popular in Course
Popular in ComputerScienence
Marian Kertzmann DVM
verified elite notetaker
This 22 page Class Notes was uploaded by Marian Kertzmann DVM on Monday October 26, 2015. The Class Notes belongs to CS 6810 at University of Utah taught by Alan Davis in Fall. Since its upload, it has received 24 views. For similar materials see /class/229976/cs-6810-university-of-utah in ComputerScienence at University of Utah.
Reviews for Computer Architecture
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/26/15
Lecture 25 Interconnection Networks Disks Topics flow control router microarchitecture RAID Virtual Channel Flow Control Each switch has multiple virtual channels per phys channel Each virtual channel keeps track of the output channel assigned to the head and pointers to buffered packets A head flit must allocate the same three resources in the next switch before being fonNarded By having multiple virtual channels per physical channel two different packets are allowed to utilize the channel and not waste the resource when one packet is idle Example WOthOIeI I A is going from Node1 to Node4 B is going from Node0 to Node5 Node0 N d 1 idle idle o e39 a Traffic Analogy B is trying to make NOde392 N0 9393 NOde4 a left turn A is trying Node5 to go straight there blocked no free VCsbuffers IS F10 leftonly lane with wormhole but 0 Vlrtual Channel Node0 there is one with VC Node1 Node2 No e3 Node4 Node5 3 blocked no free VCsbuffers Buffer Management Creditbased keep track of the number of free buffers in the downstream node the downstream node sends back signals to increment the count when a buffer is freed need enough buffers to hide the roundtrip latency OnOff the upstream node sends back a signal when its buffers are close to being full reduces upstream signaling and counters but can waste buffer space Deadlock Avoidance with VCs VCs provide another way to number the links such that a route always uses ascending link numbers 1 20 201 20 21 IIIIII 1 mm oAlternatively use Westfirst routing on the 39 39quot 1St plane and cross over to the 2nOI plane in 21 case you need to go West again the 2nOI plane uses Northlast for example Router Functions Crossbar buffer arbiter VC state and allocation buffer management ALUs control logic Typical onchip network power breakdown 30 link 30 buffers 30 crossbar Virtual Channel Router Buffers and channels are allocated per flit Each physical channel is associated with multiple virtual channels the virtual channels are allocated per packet and the flits of various VCs can be intenNeaved on the physical channel For a head flit to proceed the router has to first allocate a virtual channel on the next router For any flit to proceed including the head the router has to allocate the following resources buffer space in the next router credits indicate the available space access to the physical channel 7 Router Pipeline Fourtypical stages RC routing computation the head flit indicates the VC that it belongs to the VC state is updated the headers are examined and the next output channel is computed note this is done for all the head flits arriving on various input channels VA virtualchannel allocation the head flits compete for the available virtual channels on their computed output channels SA switch allocation a flit competes for access to its output physical channel ST switch traversal the flit is transmitted on the output channel A head flit goes through all four stages the other flits do nothing in the first two stages this is an inorder pipeline and flits can not jump ahead a tail flit also deallocates the VC Router Pipeline Fourtypical stages RC routing computation compute the output channel VA virtualchannel allocation allocate V0 for the head flit SA switch allocation compete for output physical channel ST switch traversal transfer data on output physical channel Cycle 1 2 3 4 5 6 7 STALL Head flit Body flit l Body flit2 Tail flit Speculative Pipelines Perform VA and SA in parallel Note that SA only requires knowledge of the output physical channel not the VC lf VA fails the successfully allocated channel goes unutilized Cycle 1 2 3 4 5 6 7 Head flit Body flit 1 Body flit 2 Tail fit Perform VA SA and ST in parallel can cause collisions and retries Typically VA is the critical path can possibly perform SA and ST sequentially Router pipeline latency is a greater bottleneck when there is little contention When there is little contention speculation will likely work well Single stage pipeline 10 Recent Intel Router 11 arb Clock Buffer 3 crossbar 16 35 Used for a 6x6 mesh 16 B gt 3 GHz Wormhole with VC Data 74 flow control Router Area crossbar 54 Buffers 15 Components Source Partha Kundu OnDie Interconnects for NextGeneration CMPs talk at OnChip Interconnection Networks Workshop Dec 2006 11 Recent Intel Router Router link power at linkspowerl 8 Source Partha Kundu OnDie Interconnects for NextGeneration CMPs talk at OnChip Interconnection Networks Workshop Dec 2006 12 n1 a quot a W 69 2 s lt UUJQ H is J c Switch l Arbitration Request Set Crossbar UP VC M Buffer Read a Traversal Allocation spec 4stage pipeline 0 Buffer Read not in parallel with Switch Arbitration Crossbar traversal sets the cycle time Source Partha Kundu OnDie Interconnects for NextGeneration CMPs talk at OnChip Interconnection Networks Workshop Dec 2006 13 Magnetic Disks A magnetic disk consists of 112 platters metal or glass disk covered with magnetic recording material on both sides with diameters between 135 inches Each platter is comprised of concentric tracks 530K and each track is divided into sectors 100 500 per track each about 512 bytes A movable arm holds the readwrite heads for each disk surface and moves them all in tandem a cylinder of data is accessible at a time Disk Latency To readwrite data the arm has to be placed on the correct track this seek time usually takes 5 to 12 ms on average can take less if there is spatial locality Rotational latency is the time taken to rotate the correct sector under the head average is typically more than 2 ms 15000 RPM Transfer time is the time taken to transfer a block of bits out of the disk and is typically 3 65 MBsecond A disk controller maintains a disk cache spatial locality can be exploited and sets up the transfer on the bus controller overhead RAID Reliability and availability are important metrics for disks RAID redundant array of inexpensive independent disks Redundancy can deal with one or more failures Each sector of a disk records check information that allows it to determine if the disk has an error or not in other words redundancy already exists within a disk When the disk read flags an error we turn elsewhere for correct data RAID 0 and RAID 1 RAID 0 has no additional redundancy misnomer it uses an array of disks and stripes interleaves data across the arrays to improve parallelism and throughput RAID 1 mirrors or shadows every disk every write happens to two disks Reads to the mirror may happen only when the primary disk fails or you may try to read both together and the quicker response is accepted Expensive solution high reliability at twice the cost RAID 3 Data is bitinterleaved across several disks and a separate disk maintains parity information for a set of bits For example with 8 disks bit 0 is in diskO bit 1 is in disk1 bit 7 is in disk7 disk8 maintains parity for all 8 bits For any read 8 disks must be accessed as we usually read more than a byte at a time and for any write 9 disks must be accessed as parity has to be recalculated High throughput for a single request low cost for redundancy overhead 125 low tasklevel parallelism 18 RAID 4 and RAID 5 Data is block interleaved this allows us to get all our data from a single disk on a read in case of a disk error read all 9 disks Block interleaving reduces thruput for a single request as only a single disk drive is servicing the request but improves tasklevel parallelism as other disk drives are free to service other requests On a write we access the disk that stores the data and the parity disk parity information can be updated simply by checking if the new data differs from the old data RAID 5 If we have a single disk for parity multiple writes can not happen in parallel as all writes must update parity info RAID 5 distributes the parity block to allow simultaneous writes 20 RAID Summary RAID 15 can tolerate a single fault mirroring RAID 1 has a 100 overhead while parity RAID 3 4 5 has modest overhead Can tolerate multiple faults by having multiple check functions each additional check can cost an additional disk RAID 6 RAID 6 and RAID 2 memoryster ECC are not commercially employed 21 Title Bullet 22