Comp Architecture CSCI 4717
Popular in Course
verified elite notetaker
verified elite notetaker
verified elite notetaker
CH E 4253
verified elite notetaker
verified elite notetaker
verified elite notetaker
Popular in Communication Studies
This 57 page Class Notes was uploaded by Amira Von on Sunday October 11, 2015. The Class Notes belongs to CSCI 4717 at East Tennessee State University taught by David Tarnoff in Fall. Since its upload, it has received 48 views. For similar materials see /class/221372/csci-4717-east-tennessee-state-university in Communication Studies at East Tennessee State University.
Reviews for Comp Architecture
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/11/15
CSCI 47175717 Computer Architecture Topic Memory Management Reading Staings Sections 83 and 84 Memory Management Uniprogram memory split into two parts One for Operating System monitor One for currently executing program Multiprogram NonOS part is subdivided and shared among active processes Remember segment registers in the 8086 architecture Hardware designed to meet needs of 08 Base Address segment address Swapping Problem O Printing Network Keyboard etc is so slow compared with CPU that even in multiprogrammin system CPU can be idle most of the time Solutions Increase main memory Expensive Programmers will eventually use all ofthis memory for a single process Swapping What is Swapping Long term queue of processes stored on disk Processes swapped in as space becomes available As a process completes it is moved out of main memory If none of the processes in memory are ready ie a O blocked Swap out a blocked process to intermediate queue Swap in a ready process or a new process But swapping is an O process It could make the situation worse Disk lO is typically fastest ofall so it still is an improvement Partitioning Splitting memory into sections to allocate to processes including Operating System Two types Fixedsized partitions Variablesized partitions FixedSized Partitions continued Equal size or Unequal size partitions Process is fitted into smallest hole that will take it best fit Some wasted memory due to each block having a hole of unused memory at the end of its partition Leads to variable sized partitions partitions VariableSized Partitions Allocate exactly the required memory to a process This a to a hole at the end ofmemory too small to use Only one small hole less w ste When all processes are blocked swap out a process and bring in another New process may be smaller than swapped out process Reloaded process not likely to return to same place in memory it started in Another hole Eventually have lots of holes fragmentation Solutions to Holes in Variable Sized Partitions Coalesce Join adjacent holes into one large hole Compaction From time to time go through memory and move all hole nto one free block cf disk defragmentation Relocation No guarantee that process will load into the same place in memory Instructions contain addresses 7 Lunatiuns at data 7 Addresses fur instructiuns branching Logical address relative to beginning of program Physical address actual location in memory this time Base Address start of program or block of data Automatic conversion using base address Paging continued Split memory into equal sized small chunks page frames Split programs processes into equal sized small chun s pa es Allocate the required number page frames es Operating System maintains list of 39ee frames A process does not require contiguous page frames Paging continued Use page table to keep track ofhowt process is distributed through the pages in Now addressing becomes pa e numberrelative address within page which is mapped to ame numberrelat39ve address within frame Paging continued Paging Example Before Paging Example After Free name is In Process A pavmmt Virtual Memory Rememberthe Principle of Locality which states that active code ten s to cluster together and if a me ry item is used once it will most likely be used again Demand paging Do not require all pages of a process in memo Bring in pages as required Page Fault in Virtual Memory Required page is not in memory Operating System must swap in required pa e May need to swap out a page to make space Select page to throw out based on recent history Virtual Memory Bonus We do not need all ofa process in memory for it to run We can swap in pages as required So we can now run processes that are bigger than total memory available Main memory is called real memory Userprogrammer sees much bigger memory virtual memory Thrashing Too many processes in too little memory Operating System spends all its time apping Little or no real work is done Disk light is on all the time Solutions Better page replacement algorithms educe number of processes running Get more memory Page Table Structure VAX architecture each process may be allocated up to 231 2 GBytes ofvirtual memory broken in to 29512 byte pages Therefore each process may have a page table with 23l3992224 Meg entries This uses a bunch of memory Pages of Page Table Some processors solve this with a page directory that points to page tables each table 0 which is limited to a page and treated as such Another approach is the inverted page table strut ure Inverted Page Table Page tables based on logical program39s address space can be u e Alternatively restrict page table entries to real not virtual memory Problem 7 Simple page table says each line er table maps re legieai page 7 inverted Page Table need re have mapping algurithrn peeause there isn t a unerturune mapping eriegieaire Virtual pages Page of Page Table continued Translation Lookaside Buffer Every virtual memory reference causes two physical memory access Fetch page table entry Fetch data Use special cache for page table TLB Translation Lookaside Buffer contin ued Translation Lookaside Buffer continued Complexity Virtual address translated to a hysical address Reference to pEge table might be in TLB main memory or is Referenced word may be in cache main disk memory or freferenced word is on disk it must be copied or to main mem If in main memory or on disk block must be loaded to cache and cache table must be updated TLB and Cache Operation Segmentation Paging is not usually visible to the programmer Segmentation is visible to the programmer Usually different segments allocated to program and data May be a number of program and data segments Advantages of Segmentation Simpli es handling of growing data structures OS will expand or con ract the segment as needed Allows programs to be altered and recompiled independently without relinking and reloading Lends itselfto sharing among processes Lends itselfto protection since OS can specify certain privileges on a segmentby segment basis Some systems combine segmentation with Paging Recursion Many complex algorithmic functions can be broken into a repetitive application of a simple algorithm The typical recursion function begins with an initial value of n which is decremented with each recursive call until the last call reaches a terminal value of n o A recursive function contains a call to itself quotDefinition of recursion See recursionquot Recursion Fibonacci Numbers quot70 0 1 f39 2 NonRecursive Function f1bonaccl1nt n 1nt flbval f 7 return n els l 1 lt n 1 flbval 1 mlnus 2 flbval 1 mlnus 1 flbval1m1nusl f1bva1 17 flbval 1 flbval 1 n1nu er 1 1bvaliiim 1nu572 l l return r1bva171 l Recursion Factorial NonRecursive Function 1nt factor1a11nt n 1nt returnival 1 for 1nt 1 1 1 lt 1 n val returnival 1 r return returniva Recursive Function 1nt rfactor1a11nt n l if n 1 ll 0 return 1 else return nrfactorlaln e 1 Recursion Fibonacci Numbers continued Recursive Function int rfibonacci int n if n lln return I else return rfibonaccin 7 l rfibonacci n 7 2 Comparing Recursive and Non Recursive Functions Nonrecursive function has more variables Where does recursive function store values Nonrecursive function has more code 9 recursive requires less code and therefore less memory lnClass Exercise In groups discuss how recursion might affect an operating system Compare amp contrast iterative vs recursion algorithms in terms of growthmemory usage Pentium II Hardware for segmentation and paging Unsegmented unpaged 7 virtual address pnysieai address 7 Law urnplexity 7 Hi n perrdrmanee nsegmented paged 7 Memdry viewed as paged linear address spaee 7 Prdteetidn and management Via paging 7 Berkeley UNlX Pentium II continued Segmented unpaged 7 Cdiieetidn pr ideai address spaees 7 Prdteetidntd single byte level 7 Transiatiuntabie needed iS un emp When segment iS in meme Segmented paged 7 Segmentatiun used td denne idgieai memdry partitidns supieet td aeeess edntrdi 7 Paging manages aiideatidn drmemdryvvitmn partitidns 7 Unix System v Pentium II Segmentation Each virtual address is 16bit segment and 32 bit offset 2 bits of segment are protection mechanism 14 bits specify segmen ed virtual memory 232 4Gbyles Segmented 2 564 terabytes 7 Can be iarger7 depends dn wmen prdeess iS aetiye 7 Hair BK segments pr Aprtes iS gidpai 7 Hairis ideal and distinetrdreaen prdeess C m ta ta 3 m E Pentium II Protection Protection bits give 4 levels ofprivilege east Use oflevels so ware dependent Usually level 3 is for applications level 1 for 08 and level 0 for kernel level 2 not Level 2 may be used for apps that have internal security eg database Some instructions only work in level 0 Pentium II Paging Segmentation may be disabled in which case linear address space is used Two level page table lookup First page directory 7 24 entnes max 7 Splits4G linearrnernury intEI iEI24 page grdups uMMbyte 1N A ages 7 Can use dne page direetdryrdraii prdeesses dne per prdeess ur mixture Page direetdry rdr eurrent prdeess always in memdry Use TLB holding 32 page table entries Two page sizes available 4k or 4M Pentium SegmentJPaging Operation PowerPC Memory Management Hardware 32 bit paging with simple segmentation 7 B4 bit paging th mere puvverfui segmentatiun Or both do block address translation 7 Map 4 iarge biucks er mstruettens e 4 er memury te bypass paging 7 e g OStabies urgrapnicsframe buffers 32 bit effective address 712 hi byte seiectur 94kbyte pages 71B bit page id 9 64k pages persegment 7 4 bits indicate me et 16 segment registers Segment registers under 05 eentrei PowerPC 32bit Memory Management Formats PowerPC r t 32bit Address Translation CSCI 47175717 Computer Architecture Topic Functional View amp History Reading Sections 12 21 amp 23 wage 1 am Function All computer functions are comprised of four basic operations Data processing Data storage Data movement Control Data Processing The basic function of any computer is to process data Describes arithmetic and logical operations performed on a a Although end result may be complex there are few distinct types of data processing wage om Data Storage Long term Logging Data records Short term temp variables eg buffer containing the last key pressed program control data eg loop variables Data Movement Computer must be able to communicate with outside world Data must be accessible to devices outside computer Two types Peripheral Data communications Hm Dages om Data movement to a peripheral Data must be passed between computer and lO devices connected to computer Typically to simple devices Examples monitors and keyboards data acquisition peripheral control Data Movement to remote devices data communications Data communications is data movement over a longer range Typically to smart devices or other computers Control Something needs to monitor operation and maintain control of data processing data storage and data movement Automated control of computers resources m m l Operations Functional Data movement view 335 Figure 11 p 9 Figure 12a p 11 jm no Operations w Operations Storage Processing Figure 12b p 11 fromto storage Figure 120 p 11 Operations Processing from storage to lO M V D 13 and Figure 12d p 11 lnClass Exercise Determine which of the previous operations applies each of the following uses Router system Hard drive controller SETIHome Video capture or CD player Come up with additional examples for each of the previous operations Structure Top Level Computer Central Pvucessmg Unit Systems lntercunnemlu M V D 15 and Peripherals Cummumcatlun lines Figure 14 p 12 Structure The CPU Registers lntemalCPU l n evcunnectlun Structure The Control Unit Control Unit Sequencing Luglc CuntvulUnn Pegl ersand Decudevs M V D 17 am lnClass Exercise Think backto your first computer Try to recall the characteristics Processortype Processor speed Hz Memory size Characteristics such as Types of storage devices Cache Bus Network ENIAC Electronic Numerical Integrator And Computer Need Army s Ballistic Research Lab developed range and trajectory tables for new weapons Used gt200 people with desktop calculators to create trajectory tables for weapons ENIAC continued Mauchly EE professor and Eckert grad student at University of Pennsylvania39s Moore School of Electrical Engineering Proposed general purpose computer Started 1943 Finished 1946 1 yearto design 18 months to build Cost 500000 Too late for war effort ENIAC continued General purpose nature proven by using ENIAC to perform calculations for hydrogen bomb feasibility weather prediction cosmicray studies thermal ignition randomnumber studies windtunnel design ENIAC continued Programmed manually by 6000 switches programming took weeks Used 17468 vacuum tubes relays had been used up to this point Other components included 70000 resistors 10000 capacitors 1500 relays and 5 million soldered joints 30 tons 1800 square feet of floor space Consumed 160 kilowatts of electrical power ENIAC continued Twenty 10 digit accumulators Decimal base10 machine each digit represented by one of ten tubes ON 5000 additions per second 1000 times faster then any other device at that time o 357 multiplications per second 38 divisions per second ENIAC O Constants were loaded using switches Numbers changed during the course of computation were entered using punch cards or punch tape The basic memory device was a flipflip latch that had a neon lamp to represent its state von NeumannTuring Stored Program Computer ALU operates on binary data Main memory stores both instructions and data must be considerable in order to carry out long complicated sequences of operations Control unit interprets instructions from memory and causes them to be executed Input and output equipment operated by control unit est 4717 Computer Architecture Functional View amp History Page 25 of 34 Princeton Institute for Advanced Studies IAS First implementation of von Neumann stored program computer Completed 1952 Structure of IAS machine CPJ E C ZJ liri feisi39 PE E IEPP 39 est 4717 Computer Architecture Functional View amp History Page 27 of 34 IAS Memory 1000 x 40 bit words of either number or instruction Signed magnitude binary number 1 sign bit 39 bits for magnitude 2 x 20 bit instructions Left and right instructions left executed first 8bit opcode 12 bit address IAS Registers Set of registers storage in CPU Memory Buffer Register MBR Memory Address Register MAR Instruction Register IR Instruction Buffer Register IBR Program Counter PC Accumuator AC Multiplier Quotient MQ NM ComputerArchitecture runcllonal u a an A nmnmx luck I I m I Structure of IAS Figure 23 p 22 Functional View amp History Page so am IAS of instruction Figure 24 p 23 Moore s Law Gordon Moore cofounder o ntel He observed based on experience that number of transistors on a chip doubled every year Since 197039s growth has slowed a little Number oftransistors doubles every 18 months Cost ofa chi remained almost unchanged Higher packing density means shorter electrical paths gIvIng hIgher performa maller size gives increased exibilityportability Reduced power and cooling requiremen s Fewer system interconnections increases reliability csci A717 rCom u erAmm sc we Transistors Replaced vacuum tubes Smaller Cheaper Less heat dissipation Solid State device Made 39om Silicon S and Invented 1947 at Bell Labs by William Shockley et al csci A717 7 Comm mmmm mmw view 5 WW wage 2 073A mmm view 5 mm wage 33 073 E E CSCI 47175717 Computer Architecture Topic Buses Reading Stallings Sections 34 35 and 77 Inquot Buses Jane 1 Buses Common Characteristics Multiple devices communicating over a single set of wires Only one device can talk at a time orthe message is garble Each line or wire of a bus can at any one time contain a single binary digit Over time however a sequence of binary digits may be transferred These lines may and often do send information in parallel A computer system may contain a number of different buses Myquot EusesiPaae Buses Structure Serial versus parallel Around 50100 lines although it39s possible to have as few as 3 or 4 Lines can be classified into one of four groups Data lines Address Lines Control Lines Power Inquot Buses Jane 1 Buses Structure continued Bus lines parallel Data Address Control Power Bus lines serial Data address and control are sequentially sent down single wir There may be additional control lines Power Myquot ELsesiPaaeA Buses Structure continued Data Lines Passes data back and forth Number of lines represents width Address lines Designates location of source or destination Width of address bus specifies maximum memory capac39ty High order selects module and low order selects a location within the module Inquot Buses Jane 5 Bus Structure Control lines Because multiplg devices communicate on a line control is neede Timing Typical lines include Memory Read orWrite IIO Read or Write Transfer ACK Bus request grant Interrupt request Interrupt acknowledgement Clock Reset Myquot EusesiPageE Operation Sending Data Obtain the use ofthe bus Transferthe data via the bus Possible acknowledgement ems e 7 Operation Requesting Data Obtain the use ofthe bus Transfer the data request via the bus Wait for other module to send data Possible acknowledgement eases es Classic Bus Arrangement All components attached to bus STD bus Due to Moore39s law more and more functionality exists on a single board so major components re now on same board or even the same chip ems eS Physical Implementations Parallel lines on circuit boards ISA or PCI Ribbon cables IDE ems 7p e m Physical Implementations continued Strip connectors on mother boards P0104 External cabling USB or Firewire eases e M Single Bus Problems Lots of devices on one bus leads to Physically long buses e F39rupagatiun delayse Lun data paths mean that me Drdinatiun er bus use ean adversely affect perfurrnance e Reriemienstermmatien prubierns Aggregate data transfer approaches bus capacity Slower devices dictate the maximum bus speed ems 7p 912 Multiple Buses Multiple Buses Benefits Most systems use multiple buses to overcome Isolate processortomemory traf c from these problems lO traf c Requires bridge to buffer FIFO data due to Support wider variety ofinterfaces d39 eremes 39quot bus Speeds Processor has bus that connects as direct Sometimes lO devices also contain buffering interface to chip then an expansion FIFO interface inter ces it to external devices ISA Cache ifit exists may act as the inter ce to system bus arses 913 gm 4 91A Expansion Bus Example Mezzanine Approach Differences in Ho speeds demands separating devices Separate items that are highspeed and those that re not An additional highspeed bus is added to communicate with the faster devices and also the slower expansion bus Advantage is that highspeed devices are brought closer to processor mi mi Wm iniii Mezzanine Approach continued El Pentium mm m Example m v Source http WWW amd cumus enassetscuntentitypeWhiteipapeisianditechiducsQlEM put Bus Types Dedicated vs Time Multiplexed Dedicated Separate data amp address lines Time multiplexed Shared lines Address valid or data valid control line Advantage fewer lines Disadvantages More complex control Degradation ofperformance Inquot Eu esiPaae 19 Bus Types Physical Dedication Physically separating buses and controlling them with a quotchannel changer Advantages faster Disadvantages physically larger Myquot Eu esiPauezo Bus Arbitration Listening to the bus is not usually a problem Talking on the bus is a problem need arbitration to allow more than one module to control the bus at one time Arbitration may be centralised or distributed Inquot Eu esiPaae 1 Centralised vs Distributed Arbitration Centralised Arbitration Single hardware device controlling bus access Bus ControllerArbiter May be part of CPU or separate Distributed Arbitration Each module may claim the bus Access control logic is on all modules Modules work together to control bus Myquot Eu esiPaae Bus Timing Coordination of events on bus Synchronous controlled by a clock Asynchronous timing is handled by well defined specifications ie a response is delivered within a specified time after a request Buses 7 Page 21 Synchronous Bus Timing Events determined by clock signals Control Bus includes clock line A single 10 cycle is a bus cycle All devices can read clock line Usually sync on leadingrising edge Usually a single cycle for an event Analogy Orchestra conductor with baton Usually stricter in terms of its timing requirements Inquot Buses Jane 24 Synchronous Bus Timing EusesrPi 925 Asynchronous Timing Devices must have certain tolerances to uli provide responses to signal stim More exible allowing slower devices to communicate on same bus with faster devices Performance offaster devices however is limited to speed of bus gm 4 92s Asynchronous Timing Read 17 Asynchronous Timing Write Bus Width Wider the bus the betterthe data transfer rate orthe widerthe addressable memory space Serial Width is determined by lengthduration of ame Peripheral Component Interconnection PCI Bus Brief history Original PC came out with 8bit ISA bus which was slow but had enormous amount of existing equipment ForAT IBM expanded ISA bus to 16bit by adding connector manufacturers started making higher speed proprietary buses Intel released the patents to its PCI and this soon took over as the standard PCI Bus continued Brief list of PCI 22 characteristics General purpose Mezzanine or peripheral bus Supports single and multiprocessor architectures 32 or 64 bit multiplexed address and data Synchronous timing Centralized arbitration requires bus controller 49 mandatory lines see Table 33 Inquot Eu esiPaueH Required PCI Bus Lines Table 33 Systems lines clock and reset Address amp Data 32 time multiplexed lines for addressdata Parity lines Interface Control Hand shaking lines between bus controller and devices Selects devices Allows devices to indicates when they are ready Myquot Eu esiPaueJ Required PCI Bus Lines continued Arbitration Not shared Direct connection to PCI bus arbiter Error lines parity and criticalsystem Inquot Eu esiPaueJJ Optional PCI Bus Lines There are 51 optional PCI 22 bus lines Interrupt lines Not shared Multiple lines for multiple interrupts on a single device Cache support 64bit Bus Extension 2 lines to enable devices to agree to use 64bit transfer JTAGBoundary Scan For testing procedures Myquot Eu esiPaueM PCI Commands Transaction between initiator master and target Master claims bus During address phase of write 4 CBE lines signal the transaction type One or more data phases Inquot Buses 7 Page 5 PCI Transaction Types Interrupt acknowledge prompts identification from interrupting device Special cycle message broadcast IO read read to IO address space IO write write to IO address space Memory read 1 or 2 data transfer cycles Memory read line 3 to 12 data transfer cycles Memory read multiple more than 12 data transfers Inquot Buses Jane 36 PCI Transaction Types continued Mammy write 7 writing 1 ur mare cycles m memury Memory Write and invalidate 7 Writing 1 or more cycles tn Emmy allowing fur cache wmdback pulicy Cunfiguratiun read 7 reading F39Cl device s Eunfiguratiun up tn 25B Eunfiguratiun registers per device Cunfiguratiun Write 7 Writing F39Cl device s configuration up tn 25B Eunfiguratiun registers par de Dual address cycle 7 indicatiun at 647m addressing un 32 bit lines EusesrPi 937 PCI Read Timing Diagram m U l k39 mm 39 39 V 7 if lllnyl w Ems 7 935 PCI Bus Arbiter PCI Bus Arbitration Between Two Masters r rm 3 Higher Performance External Buses Historically parallel has been used for high speed perip erals eg SCSI parallel port zip drives rather than serial port High s eed serial however has begun to replace this need 39al communication also used to be restricted to pointtopoint communications Now there39s an increasing prevalence of multipoint w 2 EEE1394 FireWire Inexpensive alternative needed for SCSI Hig performance serial bus erial implies cheaper cabling fewer wires less shielding less synchronization Small connectors for smaller devices Low cost Easyto implement Also being used in digital cameras VCRs and TVs FireWire Configuration Daisy chaintree structure Mac OS Help indicates that daisy chain is preference for evices Up to 63 devices on single port really 64 ofwhich one is the interface itself Up to 1022 buses can be connected with ri es Automatic con guration for addressing No bus terminators Hot swappable EusesrPi w FireWire 3 Layer Stack Elisa 7p 944 FireV re 3 Layer Stack Physical Layer Transmission medium electrical and signaling characteristics Up to 400 Mb s Arbitration basic form Fair arbitration Urgent arbitration Link layer packet transmission Asynchronous lsochronous Arbitration Basic form Upon automatic con guration each tree designates a root Parentchild relationship forms tree topology oot acts as central arbi ra or Requests are rstcome rstserve Simultaneous requests are granted rst to the closest node to e root and second to the lower Two additional functions are used to best allocate the use ofthe bus 7 Fair arbitratiun 7 Urgentarbitratiun Fair arbitration Keeps one device from monopolizing the bus by allowing only one request during a set fairness interval At beginning of interval all devices set arbitrationenable flag bus access If bus access is granted arbitrationenable flag is cleared prohibiting bus access until next fairness interval Each device may compete for 39l39hey may use the bus up to 75 ofthe Urgent arbitration Allows overriding offairness interval by nodes con gured as having an urgent priority Provides support for devices with severe latency requirements or high throughput such as video time ie for eac nonurgent packet three urgent packets may be sent FireV re 3 Layer Stack Link Layer Transmission of data in packets Two types oftransmission supported Asynchronous Isochronous EusesrPi ea Asynchronous Packets contain Variable amount ofdata Several bytes of transaction layer information Addressing information Acknowledgement gm 4 950 Asynchronous Sequence Asynchronous Sequence continued Arbitration sequence gives one device control of the bus et transmission packet contains source and destination IDs type CRC and possible a a Acknowledgement gap allows destination to receive and process message Acknowled ement destination sends packet containing source and destinationle and code indicating action t ken Subaction gap idle period between packets to r avoid bus conten Ion Isochronous Fixed packet size wvariable amount of data Simpli ed addressing No acknowledgement Pen39odically cyclestart packet is sent indicating period where only isochronous packets are transmitted Transaction Requestresponse protocol This is what your code sees lt separates the programmerfrom the packetlevel stuff Isochronous Sequence W w 2 CSCI 47175717 Computer Architecture Topic Internal Memory Details Reading Stallings Sections 51 amp 53 Basic Organization Memory Cell Operation Represent two stablesemistable states representing 1 and 0 Capable of being written to at least once Capable of being read multiple times Semiconductor Memory Types Random Access Memory RAM Read Only Memory ROM Programmable Read Only Memory PROM Eraseable Programmable Read Only Memory EPROM Electronically Eraseable Programmable Read Only Memory EEPROM Flash Memory Random Access Memory Misnomer Last week we learned that the term Random Access Memory refers to accessing individual memory locations directly by address RAM allows reading and writing electrically of data at the byte level Two types Static RAM Dynamic RAM Volatile Read Only Memory ROM Sometimes can be erased for reprogramming but might have odd requirements such as UV light or erasure only at the block level Sometimes require special device to program ie processor can only read not write Types EPROM EEPROM Custom Masked ROM OTPROM FLASH ROM Uses Permanent storage nonvolatile Microprogramming Library subroutines Systems programs BIOS Function tables Embedded system code EPROM Written to only with a programmer Erased with ultraviolet light Positive 7 nbn7ybiatiie stbrage Witnbut battery 7 ean Write tb it but only Witn aid at brbgrammer Negative 7 brbgrammer reguirements 7 Ex ensiye 7 ideatibns must be erased berbre Writing Written to with either programmer or the processor electrically Erased with eI byleby byle electrically Positive EEPROM 39ther a programmer or the processor 7 nbn7ybiatiie membry Witnbut batteries a singieiueatibn at a time 7 only smaiier sizes availabi 7 extremely SiEIW Write times iEI ms vs iEIEItu 2mm n5 Custom masked ROM You send the ROM manufacturer your data and they mask it directly to the ROM only when you are selling large volume of a single pro uct Positive 7 beebmes eneabertb use rbr abbrbximateiy mbre than mun bans 7 ebmbbnents ebme rrbm enib manufamurer aiready brbgrammed and tested taking uuta manufacturing steb Negative 7 ebsts several tnbusand duiiars rbr eustbm mask 7 sbrtWare enanges are ebstiy 7 eannbt be rebrbgrammed Uses ruses tnatare burnedtbdisebnneetaibgiei and turn it gie u OTPROM tEI a in Written to by you using a brbgrammersimiiartb EF39ROM Once it s Written my the data is in there forever Positive 7 eneabertnan EF39ROM duetu eneaber baekaging 7 metre baekaging bbtibns than EF39ROM duetu iess ebnstraints iike erasure Winde 7 standard brttnesneir ebmbbnent 7 eneabertnan Custurn masked ROM up to abbut iEI EIEIEI deyiees Negative 7 to reprogram have to tner uutthe enib 7 Should only be used rbr stabie design FLASH e memories are basically EEPROMs except that erasure occurs at the block level in order to speed up e write process Nonvolatile This makes FLASH work like a fast solid state hard drive 7 nbn7ybiatiie 7 nignerdensitiestnan bbtn SRAM and DRAM Negative 7 brbeess brstbring data is at a bibek ieyei and sibWer ata eeii must be erased berbre Writing data tb it Flash Density Comparison same 9mm 1 Matas a de Suberbasaux C Mernury199 integrated om Engineering Corporation Scottsdale AZ Wine m smithsoriiarichvs si ediMicecdMEMe WTLE PDF Memory Cell Operation Figure 51 from textbook Connoi Cnnuol Select Dam in a Write b Read Dynamic RAM DRAM Bits stored as charge in capacitors Simpler construction Smaller per bit Less expensive Slower than SRAM maintenance and read overhead explained later Typical application is main memory Essentially analogue level of charge determines value DRAM Structure Figure 52a from textbook 14Il lillquot Trainile In mm unulml ll DRAM Operation Address line active when bit read or written Logic 1 closes transistor switch ie current ows Write Voltage to bit line High for 1 low for 0 Signal address line Controls transfer of charge to capacitor Read Address line selected transistor turns on Charge from capacitor fed via bit line to sense ampli er Compares with reference value to determine 0 or 1 Static RAM SRAM Essentially uses latches to store charge transistor circuit As long as power is present transistors do not lose charge no re 39es Very fast no sense circuitry to drive nor charge depletion Can be batterybacked A small battery is piggybacked to t e RAM chip an allows data to remain even when power is removed Not possible with DRAM More complex construction Larger per bit More expensive Used for Cache RAM because of speed and no need for large volume or high density SRAM Operation Figure 52b from textbook 1 mum mum Wm Mquot u n SRAM Operation Transistor arrangement gives stable logic state State 1 C1 high C2 low T1 T4 of39f T2 T3 on State 0 C2 high C1 low T2ampT3 of39f T1 ampT4 on Address line transistors T5 amp T6 act as switches connecting cell Write apply value to B amp compliment to B Read value is on line B SRAM vs DRAM Both volatile Power needed to preserve data Simpler to build smaller More dense Less expensive Needs refresh Larger memory units Used for cache DRAM Organization Details by example A 16Mbit chip can be organised as a 2048 x 2048 x 4 bit array This arrangement reduces the number of address pins Multiplex row address and column address 11 pins to address 2112048 Adding one more pin doubles range of values for rows and for columns and therefore increases capacity by factor of four DRAM Organization Details continued Mae 33 memener a was5e Number ofbits per addressr ir O 1 rir fowl quotrrFErFr391 wz I 39ipq ii able location Rows Columns DRAM Process Total number of address lines is halfthat of the total needed for the addressable locations A single addressable memory location has the address divided in half eg the MSB half representing the row address and the LSB half representing the column address This saves on pins DRAM Process continued quotRAS row address select strobes the row address in to its buffer or latch while quotCAS column address select strobes the column address into its buffer or latch Note one more pin on the address quadruples the size ofthe matrix doubles rows and doubles columns for an Increase by factor of four To make 16 bit wide data bus you39ll need four of these example modules DRAM Refresh Two things discharge a DRAM capacitor 7 Data read 7 Leakage current Need refreshing even when powered and idle once every few millisec nds Refresh circuit included on chip Even with added cost still cheaper than SRAM cost e resh process involves disabling chip then reading data and writing it back Performed by counting through rows Takes time Slows down apparent performance DRAM Organization Example may it Module Organization Using multiple memories in parallel to increase data width Module Organization Using chip selects to 39ncrease the number ofwords Advan ced D RAM Organization M Cache was the traditional wayto improve performance of the D M Basic DRAM is unchanged since rst RAM chips Enhanced DRAM Contains small SRAM as well SRAM acts as cache holding last line read Cache DRAM CDRAM Larger SRAM a ded Acts as either cache or serial buffer FPM and EDO DRAM Fast Page Mode FPM shortens cycle time by allowing processor to use the same row address but a different column address removes one step in the addressing sequence The data ofa single row is referred to as a quotpagequot Extended DataOut EDO allows the processor to overlap the data read cycle with the write for the next column add ess EDO result is a savings of approximately 10 ns for each read within a single page Synchronous DRAM SDRAM Access is synchronized with an external clock Address is presented to RAM nds data CPU waits in conventional DRAM CPU does not have to wait it can do something else Burst mode allows SDRAM to set up stream of data and re it out in block DDR SDRAM sends data twice per clock cycle leading amp trailing edge SDRAM Sample Timing RAMBUS or RDRAM Suggests transfer rates from 16 to 107 GBytes per secon Subsystem consists ofthe memory array the RAM controller and a wellde ned bus Bus de nition includes all components including the microprocessor and any other devices that may usei Vertical package all pins on one side called a s inline memory modules RIMMs Adopted by Intel for Pentium amp Itanium Bus definition Data exchange over 28 wires Different de nitions require bus lengths less than m long some de nitions are longer up to 2 m lon Bus addresses up to 320 RDRAM chips Communication protocol is packetbased Implements pipelined operation overlapping command an ata 800 to 1200 MHz operation lnititial access time 4 ns A erthat 16 GBps CSCI 47175717 Computer Architecture Topic Cache Memory Reading Stallings Chapter 4 Characteristics of Memory Location wrt Processor Inside CPU temporary memory or registers Inside processor L1 cache Motherboard main memory and L2 cache Main memory DRAM and L3 cache External peripherals such as disk tape and networked memory devices Characteristics of Memory Capacity Word Sizequot The natural data size for a processor A 32bit processor has a 32bit word Typically based on processor39s data bus width ie the width of an integer or an instruction Varying widths can be obtained by putting memory chips in parallel with same address lines Characteristics of Memory Capacity Addressable Unitsquot Varies based on the system39s ability to allow addressing at byte level etc Typically smallest location which can be uniquely addressed At mother board level this is the word It is a cluster on disks Addressable units N equals 2 raised to the power ofthe number of bits in the address bus Characteristics of Memory Unit of transferquot The number of bits read out of or written into memory at a time Internal Usually governed by data bus width ie a word External Usually a block which is much larger than a word Characteristics of Memory Access methodquot Based on the hardware implementation of the storage device Fourtypes Sequential Direct Random Associative Sequential Access Method Start at the beginning and read through in order Access time depends on location of data and previous location Example tape Direct Access Method Individual blocks have unique address Access is by jumping to vicinity then performing a sequential search Access time depends on location of data within quotblockquot and previous location Example hard disk Random Access Method Individual addresses identify locations exactly Access time is consistent across all locations and is independent previous access Example RAM Associative Access Method Addressing information must be stored with data in a general data location A specific data element is located by a comparing desired address with address portion of stored elements Access time is independent of location or previous access Example cache Performance Access Time Time between quotrequestingquot data and getting it RAM Time between putting address on bus and getting data lt39s predictable Othertypes Sequential Direct Associative Time it takes to position the readwrite mechanism at the desired location Not predictable Performance Memory Cycle time Primarily a RAM phenomenon Adds quotrecoveryquot time to cycle allowing for transients to dissipate so that next access is reliable Cycle time is access recovery Performance Transfer Rate Rate at which data can be moved RAM Predictable equals 1cycle time NonRAM Not predictable equals TN TA NR TN Average time to read or write N bits TA Average access time N Number ofbits R Transfer rate in bits per second Physical Characteristics Decay Power loss Degradation over time Volatility RAM vs Flash Erasable RAM vs ROM Power consumption More specific to laptops PDAs and embedded systems Physical Types Semiconductor RAM Magnetic Disk amp Tape Optical CD amp DVD Others Bubble old memorythat made a quotbubblequot of charge in an opposite direction to that of the thin magnetic material that on ic it was mounte Hologram new much like the hologram on your credit card laser beams are use to store computer generated data in three dimensions 10 times faster with 12 times the density Organization Physical arrangement of bits into words Not always obvious Nonsequential arrangements may be due to speed or reliability benefits eg interleaved Memory Hierarchy Tradeoffs among three key characteristics Amount Software will ALWAYS fiII available memor Speed Memory should be able to keep up with the processor Cost Whatever the market will bear Balance these three characteristics with a memory hierarch alogy Refrigerator amp cupboard fast access lowest v rie a freezer amp pantry slower access bettervariety grocery store slowest access greatest variety Memory Hierarch continued Implementation Going down the hierarchy has the following results Decreasing cost per bit cheaper Increasing capacity larger Increasing access time slower KEY Decreasing frequency of access of the memory by the processor Memory Hierarch continued More costly Access times More costly T If volume IS mounted Source Null Linda and Lobur Julia 2003 Computer Organization andArchiIecture p 236 Sudbury MA Jones and Bartlett Publishers M echanics of Technology The basic mechanics of creating memory directly affect the first three characteristics ofthe hierarchy Decreasing cost per bit Increasing capacity Increasing access time The fourth characteristic is met because of a principle known as locality of reference lnClass Exercise In groups examine the following code Identify how many times the processor quottouchesquot each piece of data and each line of code int values 8 I9 34 23 67 23 7 3 65 for count 0 con t lt 8 count sum valuescount For better results try the same exercise using the assembly language version found at I7week 03assvbdf httn39lfarultv et 11 Locality of Reference Due to the nature of programming instructions and data tend to cluster together loops subroutines and data structures Over a long period of time clusters will change Over a short period clusters will tend to be the same Breaking Memory into Levels Assume a hypothetical system has two levels of memory Level 2 should contain all instructions and data Level 1 doesn39t have room for everything so when a new cluster is required the cluster it replaces must be sent back to the level 2 These principles can be applied to much more than just two levels If performance is based on amount of memory rather than speed lower levels can be used to simulate larger sizes for higher levels eg virtual memory Memory Hierarchy Examples Example If 95 of the memory accesses are found in the faster level then the average access time might be 095001 uS 00501 u8 00095 00055 0015 uS Performance of a Simple Two Level Memory Figure 42 41 mm umm A n limqu m iumu mmiung nnh Luvl 1 quotIt mm c5047 444 4 4 Hierarchy List Registers volatile L1 Cache volatile L2 Cache volatile CDRAM main memory cache volatile Main memory volatile Disk cache volatile Disk nonvolatile Optical nonvolatile Tape nonvolatile cscr4717 39 M n 4 Cache What is it A cache is a small amount of fast memory What makes small fast Simpler decoding logic More expensive SRAM technology Close proximity to processor Cache sits between normal main memory and CPU or it may be located on CPU chip or module c5047 444 4 4 Cache continued Block Transfer Word Transfer 34717 4 n4 4 4 Cache operation overview CPU requests contents of memory location Check cache forthis data If present get from cache fast If not present one of two things happens read required block from main memory to cache then deliverfrom cache to CPU cache physically between CPU and bus read required block from main memory to cache and simultaneously deliverto CPU CPU and cache both receive data from the same data bus buffer c5047 4 n4 4 4 Going Deeper with Principle of Locality Cache quotmissesquot are unavoidable ie every piece of data and code thing must be loaded at least once What does a processor do during a miss It waits for the data to be loaded Power consumption varies linearly with clock speed and the square of the voltage Adjusting clock speed and voltage of processor has the potential to produce cubic cubed root power reductions httn39lvwvw isr Vt edu 39 39 39 39 WnPh nd Identify places in inclass exercise where this might happen cscr4717 39 k M n A 4 Cache Structure Cache Structure continued L Cache includes tags to identify the address of numl the block of main memory contained in a line 539 of the cache 2 Each word in main memory has a unique nbit a ress There are M2quotK block of K words in main memory or Cache contains C lines of K words each plus a tag uniquely identifying the block of K words 4 53t nigg39h p Memory Divided into Blocks Cache Design Mammy Add Size 2 Mapping Function 3 Elackuf KW 5 Replacement Algorithm Av Write Policy quot Block s e Number of Caches mm 271 Wmqu Cache size Typical Cache Organization Addm Cost More cache is expensive Speed 11 More cache is faster up to a point Larger decoding circuits slow up a cache Algorithm is needed for mappin main memory resses to lines in the cache This takes more time than just a direct RAM Mapping Functions A mapping function is the method used to locate a memory address within a cache It is used when copying a block from main memory to the cache and it is used again when trying to retrieve data from the cache There are three kinds of mapping functions Direct Associative Set Associative Cache Example These notes use an example of a cache to illustrate each of the mapping functions The characteristics of the cache used are Size 64 kByte Block size 4 bytes ie the cache has 16k 2 lines of 4 bytes Address bus 24bit ie 16M bytes main memory divided into 4M 4 byte blocks Direct Mapping Traits Each block of main memory maps to only one cache line ie if a block is in cache it will always be found in the same place Line number is calculated using the following function i j modulo m where i cache line number j main memory block number m number of lines in the cache Direct Mapping Address Structure Each main memory address can by divided into three elds Least Signi cant w bits identify unique word within a block Remaining bits s specify which block in memory These are divided into two el s Least signi cant r bits ofthese s bits identi es which line in Most signi cant sr bits uniquely identi es the block within a line ofthe cache Tag Bits identifying Bis identifying Word W in cac e offset into block Direct Mapping Address Structure continued Why are the rbits used to identify which line in cache More likely to have unique r bits than sr bits based on principle of locality of reference Direct Mapping Address Structure Example Tag sr Line or slotr Word w 14 24 bit address 2 bit word identifier 4 byte block 22 bit block identifier 8 bit tag 22 14 14 bit slot or line No two blocks in the same line have the same tag Check contents of cache by finding line and comparing tag Direct Mapping Cache Line Table m4 m71 2m71 3m712571 Direct Mapping Cache Organization Direct Mapping Examples What cache line number will the following addresses be stored to and what will the minimum address and the maximum address of each block they are in be ifwe have a cache with 4K lines of 16 words to a block in a 256 Meg memory space 28bit address Tag Lineorslotr Wordw a 9ABCDEF16 b 12345671B More Direct Mapping Examples Assume thata purtiun ut the tags lrithE cache in uurexarnple leeks like the table beluvv Which ufthe fulluvving addresses are Euntained in the cache a438EEEm bFlEEFFm cmeaeram MADEEFLE Direct Mapping Summary Address length s w bits Number of addressable units 25 W words or b es Block size line width 2W words or byles Number of blocks in main memory 25 Wl2W 25 Number of lines in cache m 239 Size oftag s r bits Direct Mapping pros amp cons Simple Inexpensive ixed location for given block If a re m cesses 2 blocks that map to the same line repeatedly cache misses are very high thrashing Associative Mapping Traits A main memory block can load into any line of cache Memory address is interpreted as Every line39s tag must be examined for a match ache searching gets expensive and slower Associative Mapping Address Structure ple 22 in example 22 bit tag stored with each 32 bit block of data Compare tag eld with tag entry in cache to check for hit Least signi cant 2 bits of address identify which of the four 8 bit words is required from 32 bit data block Fully Associative Cache Organization W V it 7 um mm Fully Associative Mapping Example sume thata emeh ufthetags in the eaehe ih uurexarnp looks like the table beluvv vvhieh er the fuiiDWlng addresses Euntalned in the eae e ie are aJASEEEEm bJFiEEFFm emEaEFam MADEEFLE Associative Mapping Summary Address length s w bits Number of addressable units 25 W words or b es Block size line size 2W words or bytes Number of blocks in main memory 25 Wl2W 2s Number of lines in cache undetermined Size oftag s bits Set Associative Mapping Traits Address length is s w bits Cache is divided into a number of sets v 2d k blockslines can be contained within each set klines in a cache is called a k way set associative mappin ber of lines in a cache vk k2d Size oftag sd bits Set Associative Mapping Traits continued Hybrid cit DirectandAssuciativE k i i tnis is basically direct mapping v lms is assciciative mappin E ins a number cit lines basically tne number cit iines divided bythe ndrnbercir sets A given black Black a can b 2 lines persetisthe rn mapstu any line Witnin its specified seti e g e in anyiine cit seti cist curnrncin cirganizaticin e Called 2 Way assciciative mappln e A given black can be in cine cit2 lines in unly cine specinc set 7 Significant irnprcivernent civer direct mapping KWay Set Associative Cache Organization Csciu rcom merArcm sc we CacheWerner 7e 9580751 How does this affect our example Let39s go to twoway set associative mapping Divides the 16K lines into BK sets This requires a 13 bit set numbe With 2 word bits this leaves 9 bits for the tag Blocks beginning with the addresses 000000i 00300016 01000016 01300016 02000016 02300015 Blocks beginning with the addresses 00000416 00300416 01000416 01300416 02000416 02300415 etc map to the same set Set 1 Set Associative Mapping Address Structure Note thatthere is one more bit in tne tag tnan for tnis sarne example using direct rnapping Therefore it is ervay set associative Compare tag field to see itwe have a nit Set Associative Mapping Example rcireacn cittne rciiicivving addresses answer thefulluvvlng ddesticins based cin a Zevvay set aSSDElathE cacne Witn 4K lines i i g 16 WDrdS Witntne rnain rnErnury cit size 25B Meg rnErnury space 2am address cacne set number Wiiitne pluck be stcired tci7 What Wiiitneir tag be What Will tne rn lmum address and tne maximum address cit eacn black they are in be i BAECDERE Tag 5139 Set 5 Word w 2 imamm n Set Associative Mapping Summary Address length s w bits Number of addressable units 25 W words or bytes Block size line size 2W words or bytes Number of blocks in main memory 25 W2W 25 Number of lines in set k Number of sets v 2d Number of lines in cache kv k 2d Size oftag s d bits Replacement Algorithms There must be a method for selecting which line in the cache is going to be replaced when there s no room for a new line Hardware implemented algorithm speed Direct mapping There is no need for a replacement algorithm with direct mapping Each block only maps to one line Replace that line Associative amp Set Associative Replacement Algorithms Least Recently used LRU Replace the block that hasn39t been touched in the longest period of time Two way set associative simply uses a USE bit When one block is referenced its USE bit is set while its partner in the set is cleared First in first out FIFO replace block that has been in cache longest Associative amp Set Associative Replacement Algorithms continued Least frequently used LFU replace block which has had fewest hits Random only slightly lower performance than usebased algorithms LRU FIFO and LFU Writing to Cache Must not overwrite a cache block unless main memory is up to date Two main problems If cache is written to main memory is invalid or if main memory is written to cache is invalid Can occur if lO can address main memory directly Multiple CPUs may have individual caches once one cache is written to all caches are invalid Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local to CPU cache up to date Lots of traffic Slows down writes Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced write to main memory only if update bit is set Other caches get out of sync lO must access main memory through cache Research shows that 15 of memory references are er es Multiple ProcessorsMultiple Caches Even if a write through policy is used other processors may have invalid data in their caches In other words if a processor updates its cache and updates main memory a second processor may have been using the same data in its own cache which is now invalid Solutions to Prevent Problems with Multiprocessorcache systems Bus watching with write through each cache watches the bus to see if data they contain is b i g cessors must be using the write through policy Hardware transparency a quotbig brotherquot watches all caches and upon seeing an update to any processor39s cache it updates main memoryAND all ofthe caches Noncacheable memory Any shared memory identi ed with a chip select may not be cached Line Size There is a relationship between line size ie the number ofwords in a line in the cache and hit ratios As the line size block size goes up the hit ratio could p due to more words available to the principle of locality of reference As block size increases however the number of blocks goes down and the hit ratio will begin to go back down a era wh39 e Lastly as the block size increases the chances ofa hit to a word fartherfrom the initially referenced word goes dovm MultiLevel Caches Increases in transistor densities have allowed for caches to be placed inside processor chip Internal caches have very short wires within the chip itself and are therefore quite fast even fasterthen any zero waitstate memory accesses outside of the chip This means that a super fast internal cache level 1 can be inside ofthe chip while an external cache level 2 can provide access faster then to main memory Unified versus Split Caches Split into two caches one for instructions one for data Disadvantages Questionable as uni ed cache balances data and instructions merely with hit rate Hardware is simplerwith uni ed cache Advantage What a split cache is really doing is providing one che for the instruction decoder and one for the t nit a execu ion u This supports pipelined architectures Intel x86 caches 80386 no on chip cache 80486 8k using 16 byte lines and fourway set associative organization main memory had 32 address lines 4 Gig Pentium all versions Two on chip L1 caches Data amp instructions Pentium 4 L1 and L2 Caches L1 cache 7 SK bytes 7 E4 byte iines 7 Fuur Way set assbeiatiye L2 cache 7 Feeding buth Li cache 7 ZSBK 7 128 byte iines 7 8 Way set assueiatiye Pentium 4 Figure 413 Pentium 4 Operation Core Processor FetchDecade Unit 7 Fetches instructiuns frurn L2 cache 7 Deebde intb miembbs 7 Sture miembbs in Li cache Outufurderexecutiun iugic 7 Scheduies miembbs 7 Based ein data debendenee and resuurces 7 May sbeeuiatiyeiy execute Pentium 4 Operation Core Processor continued Executiun units 7 Execute miereeubs 7 Data frurn Li cache 7 Resuits in register Memury subsystem 7 L2 cache and systems bus Pentium 4 Design Reasoning Deebdes instruetieins intei RiSC iike mierdbbs befure Li cac e Miembbs fixed iength 7 Superscaiar pipeiining and scheduiing Pentium instructiuns icing Stcurnpiex Perfurrnance imbmyed by separati g decudingfrum scheduiing e pipeiining 7 Mere iat r7 ch14 n e Pentium 4 Design Reasoning n nue Data cache is Write back 7 Can be cunfigured tn Write thruug Li cache cuntruiied by 2 bits in register 7 CD ache disabie W nut Write thruugh 7 2 instructiunstu inyaiidate ush cache and Write back then inyaiidate Power PC Cache Organization BEIi 7 singie 3M 8 way set assuciative EDS 716mm X m wD way set assuciative EEI4 7 3 EiEI 7 B4iltb e 34 7 B4iltb Li cache 7 a way set assuciative 7 256K 512k ur 1M L2 cache 7 Mn way set assuciative PowerPC G4 Figure 414 Comparison of Cache Sizes Table 43 CSCI 47175717 Computer Architecture Topic Error Detection amp Correction Reading Stallings Section 52 Error Correction in Memory Types of errors hard or soft Hard Failure Permanent defect caused by 7 Harsh Envirunmentai abuse ineiuuing statie eieetrieity 7 Manufacturing uereet Wear suen as traee eresien Soft Error 7 Randum nen7uestruetive 7 Caused by eieetrieai urEMradiuactive giitenes 7 Nu permanent damage te mernury Error Detection amp Correction Additional information must be stored to detect these errors When M bits of data are stored they are run through function fwhere a K bit code is created MK bits are then stored in memo When data is read out it is once again run through function f and the resulting K bits of code are compared with the stored K bits of code In some cases the code can be corrected error correcting codes In all cases and error code is generated Error Correcting Code Function l rmr mini mu quottil n1 11 Hamming Error Correction Code One way to detect speci c bit errors is to use multiple parity bits eac 39 responsible for the parity of a smaller overlapping portion of the data A ipped bit in the data would show up as a parity error in the overlapping groups of which it was a member and not in the other groups This would handle singlebit corrections 4bit Hamming Code Below is an example ofa 4bit word broken into 3 groups each group has a parity bit to generate even an Dquot represent data bits while Pquot represent parity bits Gmup A 1 u 1 u Gruup B 1 1 1 1 Gruup c 1 u 1 u 4bit Hamming Code continued be an represented graphically using e intersecting circles 4bit Hamming Code continued Areas are de ned as 7 A and El butnut c 7 A and CbutnutE1 7 El and c euthth 7 A and El and 0 Each nonintersecting area contains a parity bit to make it and the three intersecting areas in a single circle have even parity A change in only one area will make parity odd in 2 or all 3 ofthe circles indicating which intersection changed General SingleBit Error Correction The mechanics ofthe typical error correctiondetection system are created with XOR gates Upon data retrieval two Kbit values are generated 7 The stdred Kbitvaide 7 The Keitvaide generated rrdrh the stdred data A bitbybit comparison is performed on these two values generating a Kbit resut El s 2 Dr 7 is lri bit pusitiuns Where Wu bits disagree 7 Kat result is eaiied a syndrome word Generation of Syndrome Word Storudh V 7 Gamma i 7am um Storcdi Z Calculated U quot39 1 quot 39 shied gt I 7 t39uicummiM 0 a bqutll 1 mu Syndrome Word All zeros means that the data was successfully retrieved m39l39l 9 data with M bits and K code bits then there re MK possible single bit errors ie there could be an error in the data OR the Kbit code a e word there are 291 minus one for the no error case possible values to represent singlebit errors Therefore forthe system to uniquely identify bit errors 2K1 3 MK 39n 9 7 239 m lt Single Error Correcting SEC Code Example Assume MB First how big does K have to be 3 2313337 7 is not311 K 2M 38M 15 is312 mm W cumrm W m tamm mm m a ram aims SEC Code Example continued Next decide what the values ofthe syndrome word represent 0 no errors in syndrome word or data Only one bit of syndrome word set to one 1000 100 0010 or 0001 error was in syndrome word and data needs no correction SEC Code Example continued The table beluvv is used tn identify which hits thhe MK hits at the Dmbined data and syndrume War are assDDiatEd Wit Which pussible values at the syndrume Ward 110987654321 MKExt 12 paaudn Paaum Multiple bits of syndrome quot 39 represented by syndrome word identi es which bit of data was flipped and needs to be corrected Dntahts D8 D7 D6 D5 D4 D3 D2 D1 cdde bxts c2 c4 c2 Cl SEC Code Example continued We need a system such that the XORing ofthe stored code or check bits with the code or check bits calculated identi es the position number from the table above This means that when a hit changes in the data then ones need to appear in the digi identifying that position Each code bit C8 C4 C2 and C1 is calculated by XORing a ofthe bits in that position that SEC Code Example continued 08D8 D7 D6 D5 C4D8 D4 D3 D2 02D7 D6 D4 D3 D1 C1D7 D5 D4 D2 D1 Single Error Correcting Double Error Detecting SECDED Code Duuble errurdetectiun Will nut currectduuble Errurs but itWill see if a Duble Errur has deedne Adds additiunal bit fur even parity tn the MK bits thhe data and check edde lfune bit changed the enange eadsed parity in dd frum even tn ddd Changing it back Will resture parity lftvvu bits changed parity stayed even and a edneendn Will fume paritytu dd tn ddd indicating a duuble Errur SECDED Example CSCI 47175717 Computer Architecture Topic Storage Media Reading Stallings Chapter 6 sm enema 91 Types of External Memory Magnetic Disk 7 RAlD 7 Remuvable Optical 7 CDROM 7 CDREmrdable CUR CURW 7 DVD Magnetic Tape Magnetic Disk sm enema e Physical Disk Disk substrate coated with magnetizable material iron oxiderust Substrate used to be aluminium now 7 lrnpruved surfaee uriiferrnity7 iriereases reliability 7 Reduetieri iri surfaee defects 7 Reduced readWrite errers 7 Luvver fly heights 7 Better stiffriess 7 Better shtickdamage resistance Read and Write Mechanisms Read and Write Mechanisms continued Recording and retrieval via conductive coils called a heads May be single readwrite head or separate ones During readwrite head is statio ary n actually moves radially to platters and platter rotates beneath head Hard Drive Write Current through coil produces magnetic eld Pulses sentto head Magnetic pattern recorded on sur ce below Hard Drive Read traditional Magnetic eld moving relative to coil produces current Analogous to a generator or alternator Coil can be the same for read and write Used with Floppies Older harddrives 5m enema e7 Hard Drive Read contemporary Separate read head close to write head Partially shielded magneto resistive MR Electrical resistance depends on direction 39c iel Passing current through it results in different voltage levels for different resistances High frequency operation Higher storage density and speed 5m enema es Data Organization and Formatting Data Organization and Formatting continued Concentric rings or tracks Track is same width as head Thousands oftracks per platter surface Intertrack gaps Gaps between tracks protect 7 possibly increase errors due in misalignment at head or interference from uthertracks Constant angular velocity Same number of bits pertrack variable packing density Tracks divided into sectors Minimum block size is one sector although may have more than one sector per block Typically hundreds of sectors pertrack May be xed orvariable in length Contemporary systems are fxedlength with 512 bytes being common Sectors also have gaps called intratrack or intersector gaps Imagine a matrix with the rows as tracks and the Twist matrix into a disk and see how much more Creates pie shaped sectors and concentric cks Capacity limited by density on inside track Constant Angular Velocity CAV columns 5 sectors packed the center is than the outside Regardless of head position sectors pass beneath it at the same constant speed Outer tracks waste with lower data density Multiple Zone Recording 5m eMedierz 913 Multiple Zone Recording ntinued Divide disk into zones typical number is 16 Each zone has xed bitssectors per track More complex circuitry to adjust for different data rates as heads move farther out 5m eMedierz 91A Identifying Sectors ST506 Example old u n Mum n H mm mm M um 0 mm Formatting Two kinds offormatting Low level allows hard drive to nd sectors OS level allows for le system Must be able to identify start of track and sector Format disk Additional information not available to user Marks tracks and sectors Characteristics of Hard Drives Head Motion Disk Portability i es Platters Head Mechanism Head Motion Fixed head vs heads on a movable arm Fixed head old One read write head per track Heads mounted on xed ridged arm Movable head Heads move radially across tracks One read write head per side Disk Portability Removable vs xed Remova le dis Examples oppy ZIP Jazz Can be removed from drive and replaced with ot er is Provides unlimited storage capacity Easy data transfer between systems Nonremovable disk permanently mounted in the drive an enema 913 Sides and Platters Single old or cheap vs double typical sided Single ormultiple pl tter One head per si e Heads arejoined and aligned Aligned tracks on each platter form cylinders Data is striped by cylinder reduces head movement Increases speed transfer rate an enema 920 Cylinders Cylinder Head mechanism There are a number ofcharacteristics of the head that affect drive performance Head size Distance of head from platter Head Mechanism Tradeoffs Smaller heads allow for higher densities but force head to be closerto the disk The closerthe head the greater risk of quotcrashes Distance ofhead 39om magnetic media Contact Floppy Fixed gap Flying Winchester Head rests en platter at rest vvnen platter Splrls alr pressure in head frurn platter Dat polarization corresponding to two values 1 and 0 Reas ns39 Data Encoding a is not stored as two directions ofmagnetic o 7 Hard drlvE heads detectthe Changesln rnagnerle direerlen nut rne direerlen errne eld 7 Dlrrleulr in read large blacks er all enes ur all zeres e eventually eenrreller Wuuld lese synenrenlzarlen One method for storing data uses a clock to de ne the bit positions and by watchin how the magnetic eld changes with respect to that clock indicates presence of one or zero Me FM Encoding MFM Encoding A magnetic eld change at the beginning and 39 JUS like FM EXCEPNnEt Changes at beginning middle of a bit time represents a ogic o of bit time are removed unless two 0 s are next netic eld change only at the t eaCh ther beginning represents a logic zero Called Modified Frequency Modulation MFM Referred to as Frequency Modulation FM mini 9 6 a 1 5 gt 5mm gt 14 v F a t fh JA 1c hd1gtJ LLLIA LL1 1 J JJ LLLL1 J J J Mm 111 I 0 n 1 U i I I quot 1111111 uiuii 11 single wk 1mm 1mm 5mm mm 925 Sign mm 925 RLL Encoding RLL Encoding continued Run Length Limited RLL uses polarity Goals ofencoding changes to represent sequences of its to ensure enough polarity changes to ratherthan individual 0 s or 1 maintain bit synchronization to ensure enough bit sequences are 39 41 min de ned so that any sequence ofones and 9H os can be handled and ti to allow for the highest number of bits to 011 1 represented with the fewest number 0 mm it t polarity changes 111m 1 RLL Encoding continued Latest Encoding Technology Note that the shortest period between polarity changes is one and a halfbit periods Improved encoding methods have been This produces a 50 increased data density introduced since the development of RLL Ver MFM enCOdan Use digital signal processing and er methods to realize better data densities i nhm 94 396 39 gt 7 39 quot 7 These methods include Partial Response Rm v j Z J L Maximum Likelihood PRML and WW 1 u o 1 o 1 1 1 0 Extended PRMLEPRML encoding LH W J t gmup 1m mm bu gimp lm gimp Me SMART SelfMonitoring Analysis amp Reporting Technology System SMART is a method used to predict hard drive failures Controller monitors hard drive functional parameters For example longer spinup times may indicate that the bearings are going bad SMART enabled drives can provide an alert to the computer39s BIOS warning of a parameter that is functioning outside of its normal range Attribute values are stored in the hard drive as an integer in the range from 1 to 253 The lowerthe value the worse the condition is De ending on the parameter and the manufacturer different failure thresholds are set for each ofthe parameters Sample SMART Parameters Power On Hours This indicates the age of the drive Spin Up Time A longer spin up time may indicate a problem with the assembly that spins the platters Temperature Higher temperatures also might indicate a problem with the assembly that spins the platters Head Flying Height A reduction in the flying height of a Winchester head may indicate it is about to crash into the platters Doesn t cover all possible failures IC failure or a failure caused by a catastrophic event Speed Queuing time waiting for Ho device to be useable Waiting for device if device is serving another Waiting for channel if device shares a channel with other devices multiplexing Disk rotating at a constant speed energy saver disk may s op Seek time Process of finding data on a disk Find correct track by moving head moveable head Selecting head fixed head takes no time Some details cannot be pinned down Ramping functions Distance between currenttrack and desired track Shorter distances and lighter components have e reduced seek tIm Rotational Latency Waiting for data to rotate uncler head Floppies 3600 RPM Hard Drives upto15000 RMP Average rotational delay is 12 time forfull rotation Total Access time Seek Latency Transfer Time Transfertime time it takes to retrieve the data as it passes underthe hea T brN where T transfer time b number of bytes to transfer N number ofbytes on a track ie bytes per mil revolution r rotation speed in RPS ie tracks per second Rotational Position Sensing RPS Allows other devices to use IIO channel while seek is in process When seek is complete device predicts when data will pass under heads a xed time before data is expected to come tries to reestablish communications 39 requestin processor if fails to reconnect must wait full disk turn before new attempt is made RPS miss 5 sm eMedlirPi e37 Random access File is arranged in contiguous sectors only one seek time pertrack File is scattered to different sectors or device is shared with multiple processes seek t39 e increased to once per sector sm eMedlirPi 935 Redundant Array of Independent Disks Rate of improvement in secondary storage has not kept up with that of processors or main mory In many system gains can be had through parallel systems In disk systems multiple requests can be serviced concurrently ifthere are multiple disks and the data for parallel requests is stored on different disks RAID continued Standardization of multidisk arrays levels 0 through 6 Not a hierarch Common characteristics 7 Set at pnysieal dlsks VlEWEd as single ldgieal driye by 7 Data distributed aerdss rnultiple pnysieal driyes dr array 7 Cari use redundant eapaeitytd stdre parity inrdrrnatidn td aid in errdr edrreetidndeteetidn Third characteristic is needed because multiple mechanisms mean that there are more possibilities for failure Striping User s data and applications see one logical drive Data is divided into strips Could be physical blocks sectors or some other unit The strips are then mapped to the different physical drives Striping continued W RAID 0 May not be considered RAID of cially as it doesn39t support third characteristic from above common characteristics No redundan Data striped across all disks Round Robin stripin Performance characteristics Increases speed since multiple data requests are probably in se u nce of strips and therefore can be done in parallel High IO request rate 5m 9mg w RAID 0 continued 5m 9mg e44 Expensive RAID 1 Mirrored Disks 2 copies of each stripe on separate disks Data is striped across disks just like RAID 0 Read from either slight performance increase 1 disk has shorter seek time Write to both slight performance drop one disk will ave longer seek time Recovery is simple swap faulty disk amp remirror no down time Performance characteristics Same as for RAID 0 since twice capacity is required likely to be limited to critical system software and data les 5m 9mg 946 RAID 1 continued 5m 9mg 946 RAID 2 Disks are synchronized to the point where each head is in s me osition on each i n a single read or write all disks are accessed simultaneous y Striped at the bit level Error correction calculated across corresponding bits on disks Multiple parity disks store Hamming code wparity SECDED error correction in corresponding position 5m 9mg 9N RAID 2 continued Error correction is redundant as Hamming and such are already used within stored data Only effective when many errors occur 0 s of redundancy Ex ensive P Not commercially accepted Performance characteristics Only one IO request at a time nonparallel 5m 9mg aw RAID 2 continued 5m Mam ea RAID 3 Similar to RAID 2 Only one redundant disk no matter how large the array Simple parity bit for each set of corresponding bils doesn39t actually detect failed drive but can replace it Data on failed drive can be reconstructed from surviving data and parity info 5m Mam 950 RAID 3 continued Example assume RAID 3 with 5 drives X4i X3i X2i X1i X0i Failed bit e X1i can be replaced with 39 4i X3i X2i X0i Equation derived from XOR39ing X4i X1i to both sides Performance characteristics Very high transfer rates Problem Only one lO request at a time non parallel 1 RAID 3 continued i RAID 4 Not commercially accepted Each disk operates independently Large strip s Bitby bit p rity calculated across stripes on each disk stored on parity disk e a 7 High lo request rates parallel 7 Less sulted furhlgh data transferrates RAID 4 continued Problem there is a write penalty with each write 1 old data strip must be read old parity strip must be read a new parily strip must be calculated a new parily strip must e store 2 3 4 5 new data must be stored RAID 4 continued Original parity calculation X40 X30 9 X20 9 X10 9 X0i New bit is stored eg X1i parity is recalculated X4390 X30 9 X20 9 X10 9 X0i X4390 X30 9 X20 9 X10 9 X00 9 X10 9 X10 X40 X40 9 X10 9 x10 5m 9mg 956 RAID 4 continued u my mmrwt mm 5m 9mg 95 RAID 5 Like RAID 4 except drops parity disk Parity strips are staggered across all data disks Round robin allocation for parity stripe Avoids RAID 4 bottleneck at parity disk Commonly used in network servers RAID 5 continued M ivul 0mm Mun RAID 6 Two parity calculations XOR parity is one ofthem Independent data check algorithm Stored in separate blocks on different disks User requirement of N disks needs N2 High data availability Three disks need to fail for data loss Signi cant write penalty RAID 6 continued innmum mm RAID Summary AAAAA w MM 0
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'