COMP ORG AND DESIGN
COMP ORG AND DESIGN CIS 371
Popular in Course
Popular in Computer & Information Science
This 16 page Class Notes was uploaded by Jackson Will on Monday September 28, 2015. The Class Notes belongs to CIS 371 at University of Pennsylvania taught by Staff in Fall. Since its upload, it has received 14 views. For similar materials see /class/215419/cis-371-university-of-pennsylvania in Computer & Information Science at University of Pennsylvania.
Reviews for COMP ORG AND DESIGN
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/28/15
Computer Organization and Design 215371 RothMartin Arithmetic Readings CIS 371 Unit 3 Arithmetic 0 PampH o Chapter3 215371 RothMartin Arithmetic This Unit Arithmetic o Alittle review 0 Binary 25 complement o Ripplecarry addition RCA 0 Fast integer addition 0 CarryselectC5eA o Shifters o Integer Multiplication and division 0 Floating point arithmetic i a I 215371 RothMartin Arithmetic 2 The of Fast Addition m reg le ALU datamem reg le 0 Addition of two numbers is most common operation 0 Programs use addition frequently 0 Loads and stores use addition for address calculation 0 Branches use addition to test conditions and calculate targets 0 All insns use addition to calculate default next PC 0 Fast addition critical to high performance 215371 RothMartin Arithmetic 4 Review Binary Integers 0 Computers represent integers in binary base2 3 11 4 100 5 101 30 11110 Natural since only two values are represented 0 Addition etc take place as usual carry the 1 etc 17 10001 5 101 22 10110 0 Some old machines use decimal base10 with only 01 30 011 000 Unnatural for digial logic implementation complicated amp slow 215371 RothMartin Arithmetic 5 What About Negative Integers Fixed Width 0 Sign magnitude 0 Unsigned plus one bit for sign 10 000001010 10 100001010 Matches our intuition from by hand decimal arithmetic Both 0 and 0 Addition is difficult 0 Range 2N 1 1 to 2N 1 1 Option II two s complement 2C 0 Leading Os mean positive number leading ls negative 10 00001010 10 11110110 One representation for 0 Easy addition 0 Range 2N 1 to 2N 1 1 215371 RothMartin Arithmetic 7 o On pencil and paper integers have infinite width 0 In hardware integers have fixed width 0 N bits 16 32 or 64 LSB is 20 MSB is 2 1 0 Range 0 to 2N 1 0 Numbers gt2N represented using multiple fixedwidth integers o In software 215371 RothMartin Arithmetic The Tao of 2C o How did 2C come about 0 Let s design a representation that makes addition easy 0 Think of subtracting 10 from 0 by hand 0 Have to borrow ls from some imaginary leading 1 0 100000000 10 00001010 10 011110110 0 Now add the conventional way 10 11110110 10 00001010 0 100000000 215371 RothMartin Arithmetic Bad idea a PLAbased Adder If any function can be expressed as twolevel logic 0 why not use a PLA for an entire 8bit adder Not small 0 Approx 215 AND gates each with 216 inputs 0 Then 216 OR gates each with 216 inputs 0 Number of gates exponential in bit width Not that fast either 0 An AND gate with 65 thousand inputs 2input AND gate 0 Manyinput gates made a tree of say 4input gates o 16input gates would have at least 8 logic levels 0 So at least 16 levels of logic for a 16 bit PLA 0 Even so delay is still logarithmic in number of bits There are better faster smaller ways 215371 RothMartin Arithmetic 17 CarrySelect Adder Theme Hardware Software o Carryselect adder 0 Do A158Bls8 twice once assuming C8 C0 0 once 1 0 Choose the correct one when CO7 finally becomes available Effectively cuts carry chain in half break critical path But adds mugtlt 0 Delay A1570 8 i570 81570 CO 215371 RothMartin Arithmetic 19 0 Hardware can do things that software fundamentally can t 0 And vice versa of course 0 In hardware it s easier to trade resources for latency 0 One example of this speculation 0 Slow computation is waiting for some slow input 0 Input one of two things 0 Compute with both slow choose right one later fast 0 Does this make sense in software Not on a uniprocessor 0 Difference hardware is parallel software is sequential 215371 RothMartin Arithmetic 18 MultiSegment CarrySlect Adder 0 Multiple segments A470 Example 5 5 6 bit 16 bit BM 0 Hardware cost 0 Still mostly linear 2gtlt 0 Compute each segment with 0 and 1 carryin 0 Serial mugtlt chain A15 1 0 Delay 3w swo 5bit adder 10 14 Two mugtltes 4 14 A15 1 215371 RothMartin Arithmetic 20 CarrySelect Adder Delay 0 What is carryselect adder delay two segment 39 dCO15 MAXdC0158I dCO7ro 2 dC015 MAgtlt28 28 2 18 o In general 2N2 2 N2 vs 2N for RCA What if we cut adder into 4 equal pieces 0 Would it be 2N4 2 10 Not quite dC015 MAXdC015712dC01170 2 dC015 MAgtlt24 MAgtltdCOnsdCO70 2 2 dC015 MAgtlt24MAgtlt24MAXdCO74dCO30 2 2 2 dC015 MAX24MAX24MAX2424 2 2 2 dC015 24 32 14 Nbit adder in M equal pieces 2NM M 12 0 16bit adder in 8 parts 2168 72 18 215371 RomMartin Ariihmetic Carry Lookahead Adder CLA 0 Calculate propagate and generate based on A B 0 Not based on carry in 0 Combine with tree structure 0 Prior years CLA covered in great detail 0 Dozen slides or so 0 Not this year 0 Take aways 0 Tree gives logarithmic delay 0 Reasonable area 215371 RomMartin Ariihmetic Another Option Carry Lookahead o Is carryselect adder as fast as we can go o Nope 0 Another approach to using additional resources 0 Instead of redundantly computing sums assuming different carries 0 Use redundancy to compute carries more quickly 0 This approach is called carry lookahead CLA 215371 RomMartin Ariihmetic Adders In Real Processors 0 Real processors superoptimize their adders 0 Ten or so different versions of CLA 0 Highly optimized versions of carryselect o Other gate techniques carryskip conditionalsum o Subgate transistor techniques Manchester carry chain c Combinations of different techniques 0 Alpha 21264 uses CLAC5eARippeCA 0 Used a different levels 0 Even more optimizations for incrementers Why 215371 RomMartin Ariihmetic 6 r Prefix Tree I 5 Carry Lookahead 4 32 g 3 Carry Select 32 bit g I I I 64bit lt ll Ripple Carry 2 o I 4 O Q 1 I b 0 O l l i i i 0 20 40 60 80 100 Delay F04 FIG quot147 Area vs delay of synthesized adders CIS371 RothMartin Arithmetic A 16bit ALU o All of these already in CLA addersubtracter 0 add A B sub A B check c not B is needed for subtraction o and A B are first level Gs c or A B are rst level Ps 0 0 X0 A B 1 39 Si AiABiAci A Build an ALU With functions addsub and or 1301xor What is still missing B a CIS371 RothMartin Arithmetic Add sum Subtraction Addition s Tricky Pal o Signmagnitude subtraction is mental reverse addition 0 2C subtraction is addition 0 How to subtract using an adder 0 subABaddA B o Negate B before adding fast negation trick B B 1 o Isn t a subtraction then a negation and two additions No an adder can implement AB1 by setting the carryin to 1 CIS371 RothMartin Arithmetic Shift and Rotation Instructions 0 Leftright shifts are useful 0 Fast multiplicationdivision by small constants next 0 Bit manipulation extracting and setting individual bits in words 0 Right shifts 0 Can be logical shift in Os or arithmetic shift in copies of MSB srl 110011 2 001100 sra 110011 2 111100 c Caveat sra is not equal to division by 2 of negative numbers 0 Rotations are less useful 0 But almost free if shifter is there 0 MIPS and LC4 have only shifts x86 has shifts and rotations CIS371 RothMartin Arithmetic A Simple Shifter o The simplest 16bit shifter can only shift left by 1 0 Implement using wires no logic 0 Slightly more complicated can shift left by 1 or 0 0 Implement using wires and a multiplexer mugtlt162t01 0 A A 0 A15 gt 0 215371 RothMartin Arithmetic 29 3rd Grade Decimal Multiplication Barrel Shifter o What about shifting left by any amount 0 15 0 16 consecutive leftshift by 1or0 blocks Would take too long how long 0 Barrel shifter 4 shiftleftbyXorO blocks X 1248 o What is the delay shift 0 Similar barrel designs for right shifts and rotations 215371 RothMartin Arithmetic 30 Binary Multiplication Same Refrain 19 multiplicand 12 multiplier 190 228 product 0 Start with product 0 repeat steps until no multiplier digits 0 Multiply multiplicand by least significant multiplier digit 0 Add to product 0 Shift multiplicand one digit to the left multiply by 10 0 shift multiplier one digit to the right divide by 10 0 Product of Ndigit Mdigit numbers may have NM digits 215371 RothMartin Arithmetic 31 19 010011 multiplicand 12 001100 multiplier o 000000000000 0 000000000000 76 000001001100 152 000010011000 0 000000000000 o 000000000000 228 000011100100 product i Smaller base gt more steps each is simpler 0 Multiply multiplicand by least significant multiplier digit 1 a no actual multiplication add multiplicand or not 0 Add to total we know how to do that 0 shift multiplicand left multiplier right by one digit 215371 RothMartin Arithmetic 32 Software Multiplication 0 Can implement this algorithm in software 0 Inputs md multiplicand and mr multiplier md md ltlt 1 shift left In gtgt 1 shift right C15371 RomMartin Ariihmetic Multiple Adder 0 Multiply by N bits at a time using N adders 0 Example N5 terms Pproduct Cmultipicand Mmultiplier o P MO C 0 M1 Cltlt1 0 M2 Cltlt2 0 M3 Cltlt3 0 0 Arrange like a tree to reduce gate delay critical path 0 Delay N2 vs Nlog N Not that simple depends on adder asyh apwrg nya hvgisus N log N WIth optimization Olog N Hardware Multiply Iterative 0 Control repeat 16 times 0 If least significant bit of multiplier is 1 0 Then add multiplicand to product 0 Shift multiplicand left by 1 0 shift multiplier right by 1 C15371 RomMartin Ariihmetic Consecutive Addition AB3 AB2 A181 AHBO BO 1n 7 Do I oo 33 s2 s1 s0 o 2 Nbit RC adders 2 dadd gate delays D0 o M Nbit RC adders delay 0 Nai39ve OMN 0 Actual OMN A333 D3A2Bz D2AiBi DiAUBo 3930 o M Nbit Carry Select I 0 Delay calculation tricky CD0 0 Carry Save Adder CSA 0 3to2 CSA tree adder 0 Delay Olog M log N 36 oo 33 s2 s1 s0 C15371 RomMartin Ariihmetic Hardware Software Part Deux 0 Recall hardware is parallel software is sequential o Exploit evaluate independent subexpressions in parallel o ExampleISABCD 0 Software 3 steps 151 AB 2 52 51C 3 S 52D Hardware 2 steps 151 AB 52CD 25 5152 o ExampleIISABC 0 Software 2 steps 151 AB 2 S 51C 0 Hardware 2 steps 151 AB 2 S 51C Actually hardware can do this in 12 steps 0 Subexpression parallelism egtltists below 16bit addition level 215371 RomMartin Ariihmetic 37 4th Grade Decimal Division Aside Strength Reduction 9 quotient 3 29 divisor dividend 2 remainder o Shift divisor left multiply by 10 until MSB lines up with dividend s 0 Repeat until remaining dividend remainder lt divisor c Find largest single digit q such that qdivisor lt dividend 0 Set LSB of quotient to q o Subtract qdivisor from dividend o Shift quotient left by one digit multiply by 10 o Shift divisor right by one digit divide by 10 215371 RomMartin Ariihmetic 39 0 Strength reduction compilers will do this sort of A 8 A gtgt 3 A5Altlt2 A 0 Useful for address calculation all basic data types are 2quotquot in size int A100 ampAN A Nsizeofint AN4 ANltlt2 215371 RomMartin Ariihmetic 38 Binary Division 1001 2 3 29 0011 011101 g 011000 5 000101 i 000011 2 000010 215371 RomMartin Ariihmetic 40 CIS 371 Computer Organization and Design Unit 11 Virtual Memory amp IO C15 371 RomMarlin Virtual Memory 8110 1 A Computer System Hardware 0 CPUs and memories 0 Connected by memory bus 0 IO peripherals storage input display network 0 With separate or builtin DMA 0 Connected by system bus which is connected to memory bus C15 371 RomMarlin Virtual Memory 8110 3 This Unit Virtualization and the OS o The operating system OS System SO ware o A superapplication 0 Hardware support for an OS Mem CPU IIO 0 Virtual memory 0 Page tables and address translation 0 TLBs and memory hierarchy issues 0 IO 0 Devices and buses 0 Polling and interrupts 0 DMA C15 371 RomMarlin Virtual Memory 8110 2 A Computer System App Software 0 Application software computer must do something 39A lic39atio39n sofwa39re Memory bus System IO bus bndge CPU opu C15 371 RomMarlin Virtual Memory 8110 A Computer System OS 0 Operating System OS virtualizes hardware for apps 0 Abstraction provides services eg threads files etc Simplifies app programming model raw hardware is nasty 0 Isolation gives each app illusion of private CPU memory 10 Simplifies app programming model Increases hardware resource utilization Memory bus System IO bus bndge opuls opu C15 371 RomMarlin Virtual Memory 8110 System Calls Operating System OS and User Apps o Controlled transfers tofrom OS 0 System Call a userlevel app function call to OS 0 Leave description of what you want done in registers o SYSCALL instruction also called TRAP or INT 0 Can t allow userlevel apps to invoke arbitrary OS code 0 Restricted set of legal OS addresses to jump to trap vector 0 Processor jumps to OS using trap vector 0 Sets privileged mode 0 OS performs operation 0 OS does a return from system call 0 Unsets privileged mode C15 371 RomMarlin Virtual Memory 8110 7 o Sane system development requires a split 0 Hardware itself facilitatesenforces this split Operating System OS a superprivileged process 0 Manages hardware resource allocationrevocation for all processes 0 Has direct access to resource allocation features 0 Aware of many nasty hardware details 0 Aware of other processes 0 Talks directly to inputoutput devices Userlevel apps ignorance is bliss o Unaware of most nasty hardware details 0 Unaware of other apps and OS 0 Egtltplicitly denied access to resource allocation features C15 371 RomMarlin Virtual Memory 8110 6 Interrupts o Exceptions synchronous generated by running app 0 Eg illegal insn 0 etc o Interrupts asynchronous events generated externally o Eg timer IO requestreply etc o Interrupt handling same mechanism for both 0 Interrupts are onchip signalsbits 0 Either internal eg timer exceptions or connected to pins 0 Processor continuously monitors interrupt status when one is high 0 Hardware jumps to some preset address in OS code interrupt vector 0 Like an asynchronous nonprogrammatic SYSCALL o Tlmer programmable onchip interrupt o Initialize with some number of microseconds o Timer counts down and interrupts when reaches 0 C15 371 RomMarlin Virtual Memory 8110 8 LC3P37X OS Support o Privilege modes 0 Two userkernel encoded in PSRl5 0 Separate address spaces 0 Low 12K XOOOOXZFFF high 12K XCOOOXFFFF reserved for OS 0 16bit Memory Protection Register MPR enforces 0 Call gates o TRAP 256 codes and R I39I39 o Interrupts 0 Supported in ISA 256 codes but not by simulator C15 371 RothMarlin Virtual Memory amp IO 9 Virtualizing Main Memory o How do multiple apps and the OS share main memory 0 Goal each application thinks it has infinite memory 0 One app may want more memory than is in the system c App s insndata footprint may be larger than main memory 0 Requires main memory to act like a cache with disk as backup 0 No notion of program not fitting in registers or caches Why 0 Solution 0 Part 1 treat memory as a cache 0 Store the overflowed blocks in swap space on disk 0 Part 2 add a level of indirection address translation C15 371 RothMarlin Virtual Memory amp IO 11 Virtualizing Processors o How do multiple apps and OS share the processors 0 Goal applications think there are an infinite of processors Solution timeshare the resource 0 Trigger a context switch at a regular interval 1ms o Preemptive app doesn t yield CPU OS forcibly takes it Stops greedy apps from starving others 0 Architected state PC registers 0 Save and restore them on context switches 0 Memory state 0 Nonarchitected state caches branch predictor tables etc o Ignore or flush Operating responsible to handle context switching 0 Hardware support isjust a timer interrupt C15 371 RothMarlin Virtual Memory amp IO 10 Virtual Memory VM Program 0 Programs use virtual addresses VA code heap stack 0 02N 1 0 VA size also referred to as machine size 0 Eg Pentium4 is 32bit Alpha is 64bit 0 Memory uses physical addresses PA 0 02M 1 typically MltN especially if N64 0 2quotquot is most physical memory machine supports 0 VA gtPA at page granularity 0P gtPP o By system 0 Mapping need not preserve contiguity 0 VP need not be mapped to any PP o Unmapped VPs live on disk swap C15 371 RothMarlin Virtual Memory amp IO 12 Virtual Memory VM Virtual Memory VM 0 Level of indirection like register renaming Application generated addresses are virtual addresses VAs Memory accessed using physical addresses PAs VAs translated to PAs at some coarse granularity OS controls VA to PA mapping for itself and all other processes Logically translation performed before every insn fetch load store 0 Physically hardware acceleration removes translation overhead OS controlled VAaPA mappings PAs physical memory C15 371 RomMarlin Virtual Memory 8110 13 Uses of Virtual Memory 0 More recently isolation and multiprogramming 0 Each app thinks it has 2N B of memory its stack starts 0gtltFFFFFFFF 0 Apps prevented from readingwriting each other s memory Also protection interprocess communication etc 215 371 RomMarlin Virtual Memory 8110 15 VM is an Old Idea Older than Caches 0 Original motivation singleprogram compatibility 0 IBM System 370 a family of computers with one software suite Same program could run on machines with different memory sizes Prior programmers egtltplicitly accounted for memory size 0 But also fullassociativity software replacement 0 Memory tmiss is high extremely important to reduce miss Parameter L2 Zns 10ns 10ns 30ns 8 64 KB 128 KB ZM B Block size 16 32B 32 256B C15 371 RomMarlin Virtual Memory 8110 14 VM The Basics 0 Programs use virtual addresses VA 0 VA size N aka machine size eg Core 2 Duo 48bit Alpha 64bit 0 Memory uses physical addresses PA 0 PA size M typically MltN especially if N64 0 2quotquot is most physical memory machine supports 0 VA gtPA at page granularity VP gtPP 0 Mapping need not preserve contiguity 0 VP need not be mapped to any PP unmapped VPs live on disk swap C15 371 RomMarlin Virtual Memory 8110 16 Address Translation Address Translation Mechanics I virtual address310 vPN 3 116 i POFS 150 translate don I touch El n E POFS 150 physical address250 o VA gtPA mapping called address translation 0 Split VA into virtual page number VPN 81 page offset POFS 0 Translate VPN into physical page number PPN o POFS is not translated VAePA VPN POFS s PPN POFS 0 Example above 0 64KB pages a 16bit POFS 0 32bit machine a 32bit VA a 16bit VPN 0 Maximum 256MB memory a 28bit PA a 12bit PPN C15 371 RomMarlin Virtual Memory 8110 17 Page Table Size 0 How big is a page table on the following machine 0 32bit machine 0 48 page table entries PTEs o 4KB pages 0 32bit machine a 32bit VA a 468 virtual memory 0 468 virtual memory 4KB page size a 1M VPs 0 1M VPs 4B PTE a 4MB 0 How big would the page table be with 64KB pages 0 How big would it be for a 64 bit machine 0 Page tables can get big c There are ways of making them smaller C15 371 RomMarlin Virtual Memory 8110 19 o How are addresses translated o In software for now but with hardware acceleration a little later 0 Each process allocated a page table PT 0 Software data structure constructed by OS 0 Maps VPs to PPs or to disk swap addresses 0 VP entries empty if page never referenced 0 Translation is table lookup struct pt 1 union int ppn diskblock int isvalid isdirty PTE struct PTE ptNUMVIRTUALPAGEs int translateint vpn if ptvpn isvalid return ptvpn ppn C15 371 RomMarlin Virtual Memory 8110 18 MultiLevel Page Table 0 One way multilevel page tables 0 Tree of page tables 0 Lowestlevel tables hold PTEs o Upperlevel tables hold pointers to lowerlevel tables 0 Different parts of VPN used to index different levels Example twolevel page table for machine on last slide 0 Compute number of pages needed for lowestlevel PTEs o 4KB pages 4B PTEs a 1K PTEspage 0 1M PTEs 1K PTEspage a 1K pages 0 Compute number of pages needed for upperlevel pointers 0 1K lowestlevel pages a 1K pointers 0 1K pointers 32bit VA a 4KB a 1 upper level page 215 371 RomMarlin Virtual Memory 8110 20 MultiLevel Page Table PT MultiLevel Page Table PT 0 20bit VPN 0 Upper 10 bits indegtlt 1stlevel table 0 Lower 10 bits indegtlt 2ndlevel table pt root 2ndlevel PTEs S struct union int ppn diskblock int isvalid isdir y struct struct PTE ptes1024 LZPT struct LZPT pt1024 int translateint vpn struct LZPT 12pt ptvpngtgt10 if 12pt Es 12ptgtptesvpnamp1023isvalid return 12ptgtptesvpnamp1023 ppn C15 371 RomMarlin Virtual Memory amp 10 21 Another Use of VM PageLevel Protection 0 Have we saved any space 0 Isn t total size of 2nd level tables same as singlelevel table ie 4MB 0 Yes but 0 Large virtual address regions unused 0 Corresponding 2ndlevel tables need not exist 0 Corresponding 1st level pointers are null 0 Example 2MB code 64KB stack 16MB heap 0 Each 2ndlevel table maps 4MB of virtual addresses 0 1 for code 1 for stack 4 for heap 1 1st level o 7 total pages 28KB much less than 4MB C15 371 RomMarlin Virtual Memory ampIO 22 Another InterProcess Communication o Pagelevel protection 0 Piggyback pagetable mechanism 0 Map VPN to PPN ReadWriteExecute permission bits 0 Attempt to execute data to write readonly data 0 Exception a OS terminates program 0 Useful for OS itself actually struct union int ppn diskblock int isvalid isdirty permissions PTE struct PTE ptNUMVIRTUALPAGES int translateint vpn int action if ptvpn isvalid Es lptvpn permissions amp action kill C15 371 RomMarlin Virtual Memory ampIO 23 o Interprocess communication through memory 0 OS maps different VPNs from multiple processes to one PPN 0 Or share files via the UNIX mmapO call C15 371 RomMarlin Virtual Memory ampIO 24 IO Devices 0 Primary characteristic data rate bandwidth 0 Latency really only an issue for disk and network 0 Contributing factors inputoutputboth partner 0 Interesting devices have high data rates C15 371 RomMarlin Virtual Memory 8110 Disk Latency 0 Disk readwrite latency has four components 0 Seek delay tmk head seeks to right track 0 Average of 5ms 15ms 0 Less in practice because of shorter seeks o Rotational delay twta on right sector rotates under head 0 On average time to go halfway around disk 0 Based on rotation speed RPM 0 10000 to 15000 RPMs o 3ms 0 Transfer time thawu data actually being transferred 0 Fast for small blocks 0 Controller delay tcontwuu controller overhead on either side 0 Fast no moving parts Iseek trotation ttransfer tonntrnller C15 371 RomMarlin Virtual Memory 8110 An Important IO Device Disk ad 0 Disk like stack of record players 0 Collection of platters 0 Each with readwrite head 0 Platters divided into concentric tracks 0 Head seeks to track 0 All heads move in unison 0 Each track divided into sectors 0 More sectors on outer tracks 0 Sectors rotate under head 0 Controller 0 Seeks heads waits for sectors 0 Turns heads onoff 0 May have its own cache a few MBs o Egtltploit spatial locality C15 371 RomMarlin Virtual Memory 8110 Disk Latency Example Example time to read a 4KB chunk assuming 0 128 sectorstrack 512 Bsector 6000 RPM 10 ms tseek 1 ms tcommller o 6000 RPM a 100 Rs a 10 msR a tombquot 10 ms 2 5 ms 0 4 KB page a 8 sectors a tIalnsfer 10 ms 8128 06 ms 39 tdisk tseek l39 troiation l39 thansfer l39 tcomroller 166 m5 tdisk105061166ms C15 371 RomMarlin Virtual Memory 8110 Disk Bandwidth Sequential vs Random 0 Disk is bandwidthinef cient for pagesized transfers 0 Sequential vs random accesses 0 Random accesses 0 One read each disk access latency 10ms o Randomly reading 4KB pages 0 10ms is 001 seconds a 100 access per second 0 4KB 100 accesssec a 400KBsecond bandwidth 0 Sequential accesses o Stream data from disk no seeks o 128 sectorstrack 512 Bsector 6000 RPM 0 64KB per rotation 100 rotationper sec 0 6400KBsec a 64MBsec o Sequential access is 10X or more bandwidth than random 0 Still no where near the 1GBsec to 10GBsec of memory C15 371 RomMarlin Virtual Memory amp IO 37 Typical IO Device Interface Some Old Example Disks Hitachi 0 Operating system talks to the IO device 0 Send commands query status etc 0 Software uses special uncached loadstore operations 0 Hardware sends these readswrites across IO bus to device 0 Direct Memory Access DMA o For big transfers the IO device accesses the memory directly 0 Example DMA used to transfer an entire block tofrom disk 0 Interruptdriven IO 0 The IO device tells the software its transfer is complete 0 Tells the hardware to raise an interrupt door bell 0 Processor jumps into the OS 0 Inefficient alternative polling C15 371 RomMarlin Virtual Memory amp IO 39 Ultrastar Travelstar Microdrive Diameter 35quot 25quot 10 Capacity 300 GB 40 GB 4 GB Cache 8 MB 2 MB 128KB RPM 10000 RPM 4200 RPM 3600 RPM Seek 45 ms 12 ms 12 ms Sustained Data Rate 100 MBs 40 MBs 10 MBs Interface ATA 0r SCSI ATA ATA Cost 450 120 70 Use Desktop Notebook some iPods 0 Flash nonvolatile CMOS storage 0 The new disk replacing disk in many C15 371 RomMarlin Virtual Memory amp IO 38 Brief History of DRAM o DRAM memory a major force behind computer industry 0 Modern DRAM came with introduction of IC 1970 o Preceded by magnetic core memory 19505 c More closely resembles today s disks than memory 0 And by mercury delay lines before that ENIAC o Recirculating vibrations in mercury tubes the one single development that put computers on their feet was the invention of a reliable form of memory namely the core memory It s cost was reasonable it was reliable and because it was reliable it could in due course be made large Maurice Wilkes Memoirs of a Computer Programmer 1985 C15 371 RomMarlin Virtual Memory amp IO 40
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'