Popular in Course
Popular in ComputerScienence
This 28 page Class Notes was uploaded by Vito Kilback on Wednesday September 23, 2015. The Class Notes belongs to CS281 at Drexel University taught by WilliamMongan in Fall. Since its upload, it has received 59 views. For similar materials see /class/212438/cs281-drexel-university in ComputerScienence at Drexel University.
Reviews for SystemsArchitecture
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/23/15
Systems Architecture l Lecture 10 Interfacing IIO Devices to Memory Processor and Operating System Jeremy R Johnson Anatole D Ruslanov William M Mongan This lecture was derived from material in the text Secs 8587 Some or all figures from Computer Organization and Design The HardwareSoftware Approach Third Edition by David Patterson and John Hennessy are copyrighted material COPYRIGHT 2004 MORGAN KAUFMANN PUBLISHERS INC ALL RIGHTS RESERVED Contributions from Mary Jane Irwin Lee 10 Systems Architecture 11 Introduction Objective To learn how an IIO device communicates with a user program How is a user IIO request transformed into a device command and communicated to the device How is data actually transferred to or from a memory location What is the role of the operating system Topics Role of the OS Giving commands to the IIO system IIO commands Memory mapped IIO Communicating with the processor polling interrupts Transferring data between a device and memory Direct Memory Access DMA Designing an IIO system Lee 10 Systems Architecture 11 Characteristics of HO The responsibility of the OS arise from three characteristics of IIO systems The IIO system is shared by multiple programs using the processor IIO systems often use interrupts externally generated exceptions to communicate information about IIO operations Because interrupts cause transfer to kernel or supervisor mode they must be handled by the OS The lowlevel control of an IIO device is complex because it requires managing a set of concurrent events and because the requirements for correct device control are often very detailed Lee 10 Systems Architecture 11 3 IIO and the Operating System The operating system acts as the interface between the IIO hardware and the program requesting IIO To protect the shared IIO resources the user program is not allowed to communicate directly with the IIO device Thus OS must be able to give commands to IIO devices handle interrupts generated by IIO devices provide equitable access to the shared IIO resources and schedule IIO requests to enhance system throughput IIO interrupts result in a transfer of processor control to the supervisor OS process Lee 10 Systems Architecture 11 4 Functions of the OS The OS guarantees that a user s program accesses only the portions of an llO device to which the user has rights The OS provides abstractions for accessing devices by supplying routines that handle lowlevel device operations The OS handles the interrupts generated by NO devices The OS tries to provide equitable access to the shared llO resources as well as schedule accesses in order to enhance system throughput Lee 10 Systems Architecture 11 5 Types of Communication Required The OS must be able to give commands to the IIO device eg read write disk seek etc The device must be able to notify the OS when the IIO device has completed an operation or has encountered an error Data must be transferred between memory and an IIO device Lee 10 Systems Architecture 11 Giving Commands to IIO Devices Dedicated IIO instructions eg Intel 80x86 command and device number specified in the instruction processor communicates the device address via a set of wires included as part of the IIO bus illegal to execute while in user mode Memorymapped IIO Portions of the address space are assigned to IIO devices commands and data are written to special addresses data and status info read from special addresses Memory system ignores operation determined by address IIO controller sees the operation and transmits it to the device Lee 10 Systems Architecture 11 Communication of IIO Devices and Processor How the processor directs the IIO devices Special IIO instructions Must specify both the device and the command Memorymapped IIO Portions of the highorder memory address space are assigned to each IIO device Read and writes to those memory addresses are interpreted as commands to the IIO devices Loadstores to the IIO address space can only be done by the OS How the IIO device communicates with the processor Polling the processor periodically checks the status of an IIO device to determine its need for service Processor is totally in control but does all the work Can waste a lot of processor time due to speed differences Interruptdriven IIO the IIO device issues an interrupts to the processor to indicate that it needs attention Lee 10 Systems Architecture 11 8 Communicating with the Processor Polling Simplest way for an IIO device to communicate with the processor IIO device simply puts information in a status register and the processor must come and get the information Periodically check status bits to see if it is time for the next IIO operation lnterru ptd riven IIO The disadvantage of polling is that it wastes a lot of time When a device wants to notify the processor that it has completed some operation or that it needs attention it causes the processor to be interrupted An interrupt is similar to an exception except it is asynchronous with respect to instruction execution the processor must be notified of the device causing the interrupt interrupts must be prioritized according to the devices that caused them Lee 10 Systems Architecture 11 Overhead of Polling Determine impact of polling on three different devices Assume 400 cycles for polling operation and a 500 MHz clock Determine fraction of CPU time consumed in the following 3 cases assume that you poll often enough so that no data is lost and that the devices are potentially always busy Mouse must be polled 30 times per second Floppy disk transfers data to processor in 16bit units and has a transfer rate of 50 KBlsec Hard disk drive transfers data in 4 word chunks and can transfer at 4 MBlsec Lee 10 Systems Architecture 11 10 Overhead of Polling 1 Mouse 30 accesses per second 30 x 400 12000 cycles per second for polling Fraction of processor clock cycles 12 x 103l500 x 105 0002 2 Floppy Drive 50KBsec 25K accessessec 2bytes pollmg access 25K x 400 cycles per second for polling Fraction of processor clock cycles 10 x 105l500 x 105 2 3 Hard Drive 4MB sec 250 K accesses sec 16bytes pollmg access 250K x 400 cycles per second for polling Fraction of processor clock cycles 100 x 105l500 x 105 20 Lee 10 Systems Architecture 11 11 Transferring Data between a Device and Memory Using polling Initiate transfer and periodically check for completion Periodically check for updates from device eg mouse lnterru ptd riven OS initiates transfer and waits for interrupt to indicate that the transfer has completed or an error has occurred OS still transfers data in small chunks and must communicate through interrupts many times during the complete llO operation Direct Memory Access DMA Also interruptdriven but in this case the transfer is controlled by the device without intervention by the OS interrupt occurs only when entire transfer is complete or an error occurs Appropriate for highbandwidth devices with relatively large blocks of data Lee 10 Systems Architecture 11 12 Interru ptDriven Input 1 input Processor interru user quot program I Memory I Receiver Keyboard input interrupt service routine memory Lee 10 Systems Architecture 11 13 Processor t Interru ptDriven Input I Memory I I Receiver 1 input interrupt 21 save 22 jump interrupt service routine 24 return to user code memory user PrOQ ram 23 servic interrupt input interrupt service routine Lee 10 Systems Architecture 11 Processor InterruptDriven Output x I Memory I Trnsmttr Display 10utput interrupt user program 21 save 23 servic 22 jump interrupt interrupt service routine output interrupt service 24 return routine to user code memory 39D Lee 10 Systems Architecture 11 15 Interru ptDriven lO An IIO interrupt is asynchronous wrt instruction execution Is not associated with any instruction so doesn t prevent any instruction from completing You can pick your own convenient point to handle the interrupt With IIO interrupts Need a way to identify the device generating the interrupt Can have different urgencies so may need to be prioritized Advantages of using interrupts Relieves the processor from having to continuously poll for an IIO event user program progress is only suspended during the actual transfer of IIO data tolfrom user memory space Disadvantage special hardware is needed to Cause an interrupt IIO device and detect an interrupt and save the necessary information to resume normal processing after servicing the interrupt processor Lee 10 Systems Architecture 11 16 Overhead of InterruptDriven lO Assume hard disk drive transfers data in 4 word chunks and can transfer at 4 MBlsec 500 MHz clock Overhead of transfer including interrupt is 500 cycles Hard drive is transferring data only 5 of the time Interrupt rate when the disk is busy is the same as polling 250K x 500 125 x 106 cycles per second for disk Fraction of processor clock cycles 125 x 105l500 x 105 25 Assuming that the disk is only transferring data 5 of the time Fraction of processor clock cycles 25 x 5 125 Compare to polling the absence of overhead when the disk is not active is the major advantage of an interruptdriven interface Lee 10 Systems Architecture 11 17 Direct Memory Access DMA For highbandwidth devices like disks interruptdriven IIO would consume a lot of processor cycles DMA the IIO controller has the ability to transfer data directly tolfrom the memory without involving the processor 1 The processor initiates the DMA transfer by supplying the IIO device address the operation to be performed the memory address destinationsource the number of bytes to transfer 2 The IIO DMA controller manages the entire transfer possibly thousand of bytes in length arbitrating for the bus 3 When the DMA transfer is complete the IIO controller interrupts the processor to let it know that the transfer is complete There may be multiple DMA devices in one system Processor and IIO controllers contend for bus cycles and for memory Lee 10 Systems Architecture 11 18 Overhead of DMA Assume hard disk drive transfers data in 4 word chunks and can transfer at 4 MBlsec 500 MHz clock Assume transfer with DMA and initial DMA setup takes 1000 cycles Overhead of interrupt at completion is 500 cycles If the average transfer is 8KB what fraction of the CPU is consumed if the disk is active 100 of the time ignore processorDMA controller bus contention Each DMA transfer takes 2 gtlt1073 sec 4MBsec 1000 500cyc1estransfer Cyclessec for disk 3 2gtlt 10 sec transfer 2 750 gtlt103 clock cycles sec Fraction of processor clock cycles 750 x 103l500 x 105 015 Lee 10 Systems Architecture 11 19 Issues with DMA With DMA there is another path to memory This provides difficulties with virtual memory and cache Should physical or virtual addresses be used If virtual the DMA unit must translate to physical addresses If physical must ensure that addresses don t cross page boundaries otherwise memory addresses would not be contiguous Can break transfer into a sequence of page size transfers OS must not remap memory during DMA transfer The value of a memory location as seen by DMA and the processor may differ Stale data or coherency problem value in cache different from memory Solved by routing through cache or cache flushing Lee 10 Systems Architecture 11 20 The DMA Stale Data Problem In systems with caches there can be two copies of a data item one in the cache and one in the main memory Fora DMA read from disk to memory the processor will be using stale data if that location is also in the cache For a DMA write from memory to disk and a writeback cache the IIO device will receive stale data if the data is in the cache and has not yet been written back to the memory The coherency problem is solved by 1 Routing all IIO activity through the cache expensive and a large negative performance impact 2 Having the OS selectively invalidate the cache for an IIO read or force writebacks for an IIO write flushing 3 Providing hardware to selectively invalidate or flush the cache need a hardware snooper Lee 10 Systems Architecture 11 21 Review Designing an IIO System Design IIO system that ensures that latency is bounded by a certain amount Design IIO system to meet a set of bandwidth constraints given a workload Lee 10 Systems Architecture 11 22 Review IIO Performance Approach Find bandwidths of individual components Configure components you can change To match bandwidth of bottleneck component you can t Remember it s just a units problem after that Example Parameters 300 MIPS CPU 100 MBls backplane bus 50K OS instructions 100K user instructions per IIO operation SCSI2 controllers 20 MBls each accommodates up to 7 disks 5 MBIs disks with seek rotational latency time 10 ms 64 KB reads Lee 10 Systems Architecture 11 23 Review IIO Performance What is the maximum sustainable IIO rate How many SCSI2 controllers and disks does it require First determine IIO rates of components we can t change CPU 300M instsec I 150K instIO 2000 lOIs Backplane 100M Bls I 64K BIIO 1562 lOIs Peak IIO rate determined by bus 1562 IOIs Second configure remaining components to match rate Disk 1 I 10 msIlO 64K BIIO I 5M Bls 439 lOIs Lee 10 Systems Architecture 11 24 Review IIO Performance What is the maximum sustainable IIO rate How many SCSI2 controllers and disks does it require How many disks 1562 IOIs I 439 IOIs per disk 36 disks How many controllers 439 IOIs per disk 64K BIIO 274M Bls per disk 20M Bls per SCSI controller I 274M Bls 72 disks per SCSI controller 36 disks I 7 diskslSCSl2 6 SCSI2 controllers Lee 10 Systems Architecture 11 25 DMA Performance 500 MHz CPU Interrupt handler takes 400 cycles Data transfer takes 100 cycles 4 MBIs 16 B interface disk transfers data 50 of time DMA setup takes 1600 cycles transfer 1 16KB page at a time Compare processor overhead using Interrupt driven IIO DMA Lee 10 Systems Architecture 11 26 DMA Performance Processor overhead for interruptdriven IIO 05 4M Blsl16 lefer500 cleerl500M cls 125 Processor overhead with DMA Processor only gets involved once per page not once per 16 B 05 4M Blsl16K Blpage 2000 clpage500M cls 005 Lee 10 Systems Architecture 11 27 Impact of HO on System Performance Elapsed time CPU time NO time A benchmark executes in 100 seconds elapsed time CPU time 90 seconds IIO time 10 seconds If CPU time improves by 50 per year for the next five years but NO time doesn t improve how much faster will the benchmark program run at the end of five years After n CPU time IIO time Elapsed IIO The Improvement In years time time CPU performance 0 90 sec 10 sec 100 sec 10 over ve years 393 1 901560 sec 10 sec 70 sec 14 9012 75 601 540 sec 10 sec 50 sec 20 The improvement in 401 527 sec 10 sec 37 sec 27 elapsed tIme IS only 271518 sec 10 sec 28 sec 36 10022 45 01th 181 512 sec 10 sec 22 sec 45 Lee 10 Systems Architecture 11 28