New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Computer Architecture

by: Ashleigh Dare

Computer Architecture ECS 201A

Ashleigh Dare
GPA 3.75


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Engineering Computer Science

This 30 page Class Notes was uploaded by Ashleigh Dare on Tuesday September 8, 2015. The Class Notes belongs to ECS 201A at University of California - Davis taught by Staff in Fall. Since its upload, it has received 89 views. For similar materials see /class/191697/ecs-201a-university-of-california-davis in Engineering Computer Science at University of California - Davis.

Similar to ECS 201A at UCD

Popular in Engineering Computer Science


Reviews for Computer Architecture


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/08/15
Lecture 6 Storage De es Metrics RAID IIO Benchmarks and uss Prof Fred Chon ECS 250A ComputerArchitecture Winter 1999 mutation room cszsz comm Dias uca Motivation Who Cares About IIO CPU Performance 60 peryear no system performance limited by mechanical delays disk IID lt1 peryear tlo persec or MB persecl Amdahl39s Law system speedup limited by the slowest part 1 IO u 1 x cPu gt 5x Perlnrmznce tiose swirl 1 IO u 1 x cPu gt1 x Perlnrmznce tiose swirl lD bottleneck shiny lrzclinn nllime in CPU shiny value nHasIer CPUs Storage System Issues Historical Context of Storage IID Secondary and Tertiary storage Devices Storage IID Performance Measures Processorlnterface Issues RedundantArrays oflnexpensive Disks RAID ABCs ofUNIX File Systems IID Benchmarks Comparing UNIX File System Performance IID Busses new new new quot0 systems Technology Trends We Storage Technology Drlvers Diskeamm Driven by the prevailing computing paradigm now doubles e 195 s ration from batcbto nnrline processing rimth e la r39r rlrntbs before Today Processing PowerDoubles Every 13 montbs Today Memory size Doubles Every 13 mnnlthXjyrl Today Disk Capacity Doubles Every 13 mantis Disk Fashioning Rate Seek Rotate Doubles Every Ten Years more 5 199D every 339s motnbs e 199 s m grat computers in pbones boo carsvideo cameras nationwidefiber optical network with wireless tail Effects on storage industry 7 Embedded storage smaller cbeaper more reliable lower power Data es r ionto ubiguitous computing ls brgb capacity hierarchiczly managed storage rimwad Page 1 Historical Perspective 1956 IBM Ramac early 19705 Winchester Developed for mainframe computers proprietary interfaces Steady shrink in form factor 27 in to 14 in 19705 developments 525 inch oppy disk form factor early emergence of industry standard disk interfaces ST506 SASI SMD ESDI Early 19805 PCs and rst generation workstations Mid 19805 Clientserver computing Centralized storage on le server Capacity of Unit Shown Megabytes Disk History Historical Perspective Late 19805IEarly19905 Laptops notebooks palmtops 35 inch 25 inch 18 inch formfactors Form factor plus capacity drives market not so much performance Recently Bandwidth improving at 40 year Challenged by DRAM ash RAM in PCMCIA cards still expensive Intel promises but doesn t deliver unattrac ive ytes per cubic inch Optical disk fails on performance eg NEXT but finds accelerates disk downsizing 8 inch to 525 inch 1973 1979 niche co ROM Mass market disk drives become area 39ty 1 7 Mb39tlsq In 7 7 Mb39tlsq In industry standards scsr IPI IDE Byt 2300 MBytes e York Times 22398 page C3 53925 moh dnves for Standalone Pcs39 End of mammary Inten ww Makers ofdisk drives crowd even more data into even smaller spaces FTC W398 FTC W399 Alternative Data Storage Disk History MBits per square inch Technologies Early 1990s 0 DRA M as A Of DISK over time Cap BPI TPI BPI TPI Data Xfer Access 9 v 22 Mbsi Technology MB 39 39 Time Conventional Tape Cartridge 25quot 150 12000 104 12 92 minutes IBM 3490 5quot 800 22860 38 09 3000 seconds Helical Scan Ta e Video 8mm 4600 43200 1638 71 492 45 secs 7 470 V 3000 Mbsi DAT 4mm 1300 61000 1870 114 183 20 secs 1989 1997 1997 Magnetic amp Optical Disk 3 39n 1450 Mb39 sq39 39quot 3090 Mb39 sq39 39quot Hard Disk 525quot 1200 33528 1880 63 3000 18 ms ytes 2300 MBytes 8100 MBytes IBM 3390 105quot 3800 27940 2235 62 4250 20 ms 1974 1980 1986 1992 1998 5 M0 5 25quot 640 24130 18796 454 88 100 urce New York Times 22398 page C3 source New York Times 22398 page C3 any 39 ms ers of disk drives crowd even mroe data into even smaller spaces FTC W39 m Makers of disk drives crowd even mroe data into even smaller spaces FTC W39 M FTC W39 2 Page 2 Devices Magnetic Disks Purpose I Track 7 Lnngrlermnnnvnlzllleslnrzge Sm rge inexpensivesluwleve in the surrage h rehy Characteristics ek r l 3 ns avgl Cylinder pusiuunallaieney Hm Planer rulaliunal latency Transfer rate gtE msver rev em 25 mSDer seam Es 5715 MB l l as M e alueks 39 capaF39ty Respunselime Gluzhy vs Queue Cnnlrnller Seek Rnlo xfer e Quadruples everya years laerudynamiesl Service lime nevurm Disk Device Terminology Disk Latency 0ueuin Time Cnnlrnller lime Seek ime Rnlzlinn Time xfer lime Order a magnitude times Iordk byte transfers e rless Rnlzle u ms 7Znn rpm xfer 1 ms 7Znn rpm quotmm Cost and EnvimnmemalEl ciencies Advantages of Small Formfactor Disk Drives run rm Tape vs Disk Longitudinal tape uses same technology as 39 ents hard disk tracks its density lmprovem Disk head flies above surface tape head lies on surface Disk fixed tape removable inherent cost performance based on geometries fixed rotating pla e s 39 gaps lrandum access li led area 1 media leader E v removable long strips wound on spool lsegueniial access quotun im dquot length muniple readerl New technologytrend Helical Scan VCR camcorder DAT Spins head at angle to tape to improve density quotWM Current Drawbacks to Tape Tape wearout r Helical 1 s nl passvs In 1 s lnrlnngiludinzl Head wearout rZ hauls lnrheliczl Both must be accounted forin economic reliability model Long rewind eject load spinup times not inherentiust no need in marketplace so far Designed forarchival nevurvw Automated Cartridge System STC 4400 3 fee 10 feet 6000 x 08 GB 3490 tapes 5 TBytes in 1992 500000 DEM Price 6000 x1053 D3tapes 60 TBytesin199a Library ofCOngress all information in the world in 1992 ASCII of all books 30 TB nevurvu Page 3 Library vs Storage Getting books today as quaint as the way I e punch cards batch processing 7 mnderthru shelves a patory purchasing Cost 1 perbook to check out 90 for a catalogue entry 90 of all books never checked out Write only journals Digital library can transform campuses Will have lecture on getting electronic information rmvmvw Relative Cost of Storage Technology Late 1995Early 1996 Magnetic Disks 15quot 91 GB 2129 SulaMD 1995 S ZZMB 35quot as GB 1199 S 17MB 999 S ZCiMB 25quot 515 MB 299 snsnMD 1163 3355 S JCiMB Optical Disks 15quot as GB 31695499 S MMB 31599499 S JSMB PCMCIA Cards Static RAM 59 MB 3799 31mm Flash RAM mu MB mun 175 ME MB 33an sznsnMa rrcvvrozo Outline Historical Context of Storage IID Secondary and Tertiary storage Devices Storage IID Performance Measures Processorlnterface Issues RedundantArrays oflnexpensive Disks RAID ABCs ofUNIX File Systems IID Benchmarks Comparing UNIX File System Performance IID Busses rimmm D39 k IIO Performance etrics Response Time Throughput no Through ut 1 total wi Queue Responsetime Queue D rm vow to Response Time vs Product39 39ty Interactive environments Each intera or transaction has 3 parts 7 Entry rn me lor user to enter command 7 System Response Tirne time between us er entry a system replies nrrkir nre Tim lrom response until userhegins next command 2nd transaction What happens to transaction time as shrink system response time rom1o sec to 09 sec e 7 math Graphics n15 sec entry1o sec thrnktrme rm you 23 Response Time amp Productivity common an cunveminnzl I s nm sun I min To o7sec off response saves 49 sec 94 and 20 sec 70 total time per transaction gt greater productivity Another study everyone gets more done with faster response but novice with fast response expert with rimmu Page 4 Dlsk Time Example Disk Parameters Bsec Controller ovemead is 2 ms Assume that disk is idle so no queuing delay What is Average Disk Access Time for a Sector 7 Ave seek ave mt delay transler time cnntrnller overhead 7 12 ms o 572 RPM6n H Kim MBs 2 ms 7120615 Z ms Advertised seek time assumes no locality typically 14 to 13 advertised seek time 20 ms gt 12 ms Outline Historical Context of Storage IID Secondary and Tertiary Storage Devices Storage IID Performance Measures Processor Interface Issues RedundantArrays oflnexpensive Disks RAID ABCs ofUNIX File Systems IID Benchmarks Comparing UNIX File System Performance IID Busses Processor Interface Issues Processorinterface e Interrupts 7 Memory ma ed IO IID Control Structures 7 Polling e Interrupts 7 DMA 7 IO Cnnlrnllels 7 IO FIDEESSDIS Capacity Access Time Bandwidth Interconnections r Bussvs mm mm quotmm IIO Interface Memory Mapped IIO Programmed quoto Pol quot9 Sin IeMemo LIO Bus MdEPMdem 0 Bus No epamm I76 Instructions ROM husym39 up nnl an ef cient AM wzyln usetne PLI unless the dw39ce is very last IO hul cnecks Irrr 12 Lines distinguish between comp etrirn can e W and memirrytranslers uni rr sjdpwzlng AEI Mbylesset intensive cede VIle bus Dpllmlsllcaiiy Mulllbusrll Nubus ID MIP plucessul cumplelely salulaleslhe busl mm quotmm quotmm Page 5 AC Then issues quotstartquot Interrupt Driven Data Transfer D39reCt memgrx 39ccef h InputOutput Processors i n EISZI mse e 2532 mm prngrzm ciru sends a Starling address gllrwe ln i and lengl cuunttu ivi e sequence an psec iinni secund iil CPU time CPU ROM lO target dev e hus where mnds are PLI issuts instructinn In IOP Memn mltc gt10 a MAC Mappe lO RAM IOP interrupts when dune l els 211 ms each errupts z iis ec er interrupt 39 quotquot V m W memiiiy 1 interrupt seivice a iisec each 1CPLI secunds DMAC 39d h d h k 39 q p 39 h Peripherals Device Inlrnm memnly Device xfer rate in Muytesce u 1 x1 sedi y te gt 1 userher cm gff n sm mfyi g L d 3332 transleis are cnntniiied 1 h 5152 signals Mammy hythe IOP directly 1 transleis x1 Isms 1nn ns PLI secun s DMAC IOP steals memiiiy cycles Still Iar Imm device Imnsler rate 12 in interrupt overhead Fr vmvai quot m up Fr win 33 Rela lonshlp to Processor Archl ecture Relationship to Processor Architecture Summary IID instructions have largely disappeared ch required forprocessor performance cause Disk industry growing rapidly improves interrupt vectors have been replaced by jump tables Problems for hendwid 5quot rv Pclte M IVA o interrupt numher lush expensive lO pui utias cache 7 area density Emiryear Win taste 7 PC lt39 M iquotquot quotquoti quotMW e siii hurruwedlriim shared memiiiy muitipriicessiiis interrupts quot urine 7 stackrepiaceu hyshzdnwreg39slers Virtual memory frustrates DMA andiersaves registers and rerenzhlvs higherprinrilyinl39s e interrupt types reduced in numher handiermust dueiy interrupt cnnlrnller queue controller seek rotate transrer Loadstore architecture at odds With atomic Response time vs Bandwidth deem operations r nzd nckedslnre cnndili nzl Value Of faster respon staterui processors hard to context switch rm m u esas sec and z n sec mil tutai in e eveiyiine gets mure dune with raster respiinse hut nuvice with last respiinse expertwith slnw Proc sor interrace today peripheral processors DMA IID bus interrupts quotclmas rm m an Page 6 Summary Relationship to Processor Architecture IIO instructions have disappeared Interrupt vectors have been replaced byjump tables Interrupt stack replaced by shadow registers Interrupt types reduced in number Caches required for processor performance cause problems for IIO Virtual memory frustrates DMA Loadstore architecture at odds with atomic operations Stateful processors hard to context switch FTC vwa 37 Outline Historical Context of Storage IIO Secondary and Tertiary Storage Devices Storage IIO Performance Measures Processor Interface Issues Redundant Arrays of Inexpensive Disks RAID ABCs of UNIX File Systems IIO Benchmarks Comparing UNIX File System Performance IIO Busses FTC vwa 38 Network Attached Storage Decreasing Disk Diameters 14quot 10quot 8quot 525quot 35quot 25quot 18quot 13quot high bandwidth disk systems based on arrays of disks High Performance Storage Service on a High Speed Network Network provides well de ned physical and logical interfaces separate CPU and storage system Network File Services 3 fl OS structures 5quot supporting remote le access 3 Mbs 10Mbs 50 Mbs 100 Mbs 1 Gbs 10 Gbs networks capable of sustaining high bandwidth transfers Increasing Network Bandwidth FTC W9 39 Manufacturing Advantages of Disk Arrays Disk Product Families Conventional 4 disk 9 designs 35 525 10 Low End gtHigh End Disk Array 1 disk design 35 a Replace Small of Large Disks with Large of Small Disks 1988 Disks IBM 3390 K IBM 3 5 0061 X70 DEB Capacity 20 GBytes 320 MBytes 23 GBytes Valium 97 cu ft 01 cu ft 11 cu ft Power 3 KW 11 W 1 KW DEB R319 15 MBs 15 MBs 120 MBs 0 Rite 600 IOss 55 IOss 3900 IOss MTTF 250 KHrs 50 KHrs Hrs Cost 250K 2K 150K large data and IO rates Disk Arrays have potential for high MB per cu ft high MB per KW W9 41 reliability Array Reliability Reliability of N disks Reliability of1 Disk N 50000 Hours 70 disks 700 hours Disk system MTTF Drops from 6 years to 1 month Arrays without redundancy too unreliable to be useful Hot spares support reconstruction in parallel with access very high media availability can be achieved FTC vwa 42 Page 7 Redundant Arrays of Disks 0 Files are quotstripedquot across multiple spindles Redundancy yields high data availability Disks will fail uum J s reuirrthe array gt Capacity penalty to store it Bandwidth penalty to update MirroringShadowing high capacity cost Horizontal Hamming Codes overkill Techniques Parity amp ReedSolomon Codes Failure Prediction no capacity overhead VaxSimPlus Technique i39s controversial FTC vwa 43 Redundant Arrays of Disks RAID 1 Disk MirroringShadowing recovery sagquot sag Each disk is fully duplicated onto its quotshadowquot Very high availability can be achieved Bandwidth sacrifice on write Logical write two physical writes Reads may be optimized Most expensive solution 100 capacity overhead Targeted for high 0 rate high availability envnonmems FTC W39 44 Redundant Arrays of Disks RAID 3 Parity Disk 10010011 11001101 10010011 logical record Striped physical gt records aooaooa AOAAOOAA aooaooa cocoa Do 1 1 Parity computed across recovery group to protect against hard disk failures 33 capacity cost for parity in this configuration 39der arrays reduce capacity costs decrease expected availability increase reconstruction ime Arms logically synchronized spindles rotationally synchronized ogically a sing e high capacity high transfer rate disk Targeted for high bandwidth applications Scienti c Image ProcessingW39 45 Redundant Arrays of Disks RAID 5 W Problems of Disk Arrays Small Writes RAID5 Small Write Algorithm 185 ms 167 ms 28 MEls Increasing A logical write Logical 1 Logical Write 2 Physical Reads 2 Physical Writes 1200 ME becomes fou Dis physical IOs Addresses 14 ms Independent wntes 24 MEls ossible because of new old Id 300 ME interleaved panty dam dam 1 Read garin 2 Read ReedSolomon 0R Codes for D12 P D13 D14 D15 recons ruc smpe XOR P D16 D17 D18 D19 Stripe applications Unit Mirrored RA9039s a M a III 3 Write 4 Write FTCVWBAE m FTCVWBA allwrites Disk Columns Page 8 Su bsystem Organ Izatlon single board disk controller single board k con roller single board sk con roller 39ng board sk controller often piggybacked in small format devices FTC W9 49 array controller manages interfacellI to host DMA control bufferin parity logic physical device control striping software offloaded from host to array controller no applications modi cations no reduction of host performance System Orthogonal RAIDs Bah Redundant Support Components fans power supplies End to End Dam Integrity internal parity protected data paths SystemLevel Availability Array 0 ntroller QQE EggB Eggng with duplicated paths higher performance can be gigzge y 39 obtained when there are no failures E E El Summary Redundant Arrays of Disks RAID Techniques Disk Mirroring Shadowing RAID 1 Each disk is fully duplicated onto its quotshadowquot Logical write two physical writes 100 capacity overhead 0 o 0 0 0 0 GENE Parity Dam Bandwidth Array RAID 3 Parity computed horizontally Logically a single high data bw disk High 0 Rate Parity Array RAID 5 Interleaved parity blocks Independent reads and writes Logical write 2 reads 2 writes Parity ReedSolomon codes Outline Historical Context of Storage 0 Secondary and Tertiary Storage Devices Storage 0 Performance Measures Processor Interface Issues Redundant Arrarys of Inexpensive Disks RAID ABCs of UNIX File Systems IIO Benchmarks Comparing UNIX File System Performance IIO Buses FTC vwa 53 ABCs of UNIX File Systems Key Issues File vs Raw IO File Cache Size Policy Write Polic Local Disk vs Server Disk File vs Raw File system access is the norm standard policies apply Raw alternate IO system to avoid le system used by data bases File Cache Size Policy A of main memory dedicated to le cache is fixed at system generation eg 10 A of main memory for le cache varies depending on amount of file No eg up to 80 FTC vwa 54 Page 9 ABCs of UNIX File Systems Write Policy nusn le cacheaafter xed period eg 30 seconds Write Through with Write Buffer rite Back Write Buffer often confused with Write Back Write Through with Write Buffer all writes go to disk Write Through with Write Buffer writes are a c pi ucessui uisn Wine gt 39 r a can be called Write Cancelling no W55 55 ABCs of UNIX File Systems Local vs Server Unix File systems have historically had different policies and even le systems for local client vs remote serve NFS local disk allows 30 second delay to ush writes NFS server disk writes through to disk on le close addition to server lecache NFSjust writes through on le close stateless protocol periodically get new copies of le blocks Other state and selectively invalidate or update We W55 55 Network File Systems Appiieanen ngzm UNIX System Cali Layer virtual File Sysem intereee NFS Client UNIX File System local accesses accesses Netwurk Prutueul Staeir Bieeir Device Driver UNIX System Call Laya UNIX System Call Layer Virtual File SyStEm lnler ee Virtual File System lnta ee wmmW l ll RPCTranmssim Prutneuls RPChnsmssiunFmtneuls no W99 57 Typical File Server Architecture Kemel N39FS Protocol amp File Processing TCPIP Protocols Unix File System mama Ethernet 7 Dny er NTS Request Single Processor File Server Limits to performance data copying read data staged from device to primary memory copy again into network packet templates copy yet again to network interface No specialization for fast processing between network and disk no ea 58 AUSPEX NS5000 File Server Special hardwaresoftware architecture for high performance NFS IIO Functional multiprocessing IIO buffers Single Board Computer Enhanced UNIX I V39MIEBaekplane frontend File Proeessor Paraii ei Independent File gt4 Chan Systern Storage Proeessor speciali ed for dedicated FS manages 10 SCSI protocol processing so ar C annals no W55 59 Page10 AUSPEX Software Architecture Unix Systern Call Layer Host Proeessor I Storage Proeessor Pnrn ary Mem ory Disk Arrays Limited r I Primary data ow a Primary control ow interfaces no W55 an Berkeley RAIDll Dlsk Array File Server an latency transrers mixeu with high bandwidth transrers quotDiskless Supercumputersquot tu 1Z disk drives quotWWW IIO Benchmarks Forbetter orworse benchmarks shape afield e Prncvssnrhenchmarls classically aimed at respunsetime inrlixedsized prnhlem 7 V0 henchmarls typically measure thruughput pus y with upper milnnnzspnmelimlzs tursnh ulrespunse esl What if fix problem size given 60Iyear increase in city Benchmark size eroata Timelo Year losumes 1MB 26 199 as MB 6 1m 7 Nut much time in IO 7 Nut measuring uisk lur euen main memuryl rre vmv t2 IIO Benchmarks Alternative selfscaling benchmark automatically and dynamically in ase tch characteristics of system me ure 7 Measures wiue range ur current a ruture Describe three selfscaling benchmarks e Transacitiun Prucessing TFCA TPCVB lecrc 7 5 Sr lL DDIS n m e Llnix lO vlnlly rre vmv m IIO Benchmarks Transaction Process ng Transaction Processing TP or 0n4ine TPDLTP b e reservatiun systems a hauls use TP Atomi transactions makes this work Each transaction gt 2 to 10 disk lle amp5000 and 20000 CPU instructions perdisk IID El c39 39 39 m n g u39sks accesses by keeping inrurmatiun memury Classic metric is Transactions Per Second TPS nuerwhatwurkluau7huwmac cunngureu rre vmv m IIO Benchmarks Transaction Processing Early 1980s great interest in DLTP e Expecting uemanu lur high TPS leg ATM machines credit cards 7 Tanuem39s success implieu me m range 0LTP ex ands 7 Each vendnr c e nwn cnndillnns lnr TPS claims repnn nnly CPU limits with widely dmerenl lO e Cunnicting claims leutu uishelielul all henchmarlsgt chaus 1984 Jim Gray of Tandem distributed paper to Tandem employees and 19 in otherindustries to propose standard benchmark Published A measure of transaction processing power quot Da amation1985 by Anonymous et al 7 Tu inurcatethatthis was errurt ullarge gruup 7 Tu auuiu delays at legal depanmenl at each aulhnr s arm 7 Still get mail aITandem In aulhnr rre vmv as IIO Benchmarks TP by Anon et al Proposed 3 standard tests to characterize commercial DLT e TP1 0LTP test DehitCre mulates ATMs TF1 7 Batch sun 7 Batch scan DebitCredit e 0netype ultransactiun1nn hytes each 7 Recurueua places accuunthle hranch lelellerlileo euents recurueu in h39stury le an uaysl 15 reguests rur umerent branches 7 Unuerwhat cunuitiuns huwrepurt results rre vmv rt Page 11 IIO Benchmarks TP1 by Anon et al DebitCredit Scalability size of account branch teller historyfunc ion ofthroug pu TPS NumhernlATMs AEDnunL lESlZE 1n 1 quot163 W 1mm 1 GB 1r 1 1 GB 1 Um 1 GB 7 Each input TPS gt1nnnnn account records 1n branches 1nn ATMs rACCnunts must growsincea person is not like to usethe bank more lrequenw just because the bank has a faster computer Response time 95 transactions take S1 second Configuration controjust report price initial purchase price 5 year maintenance cost of ownership By publishing in public domain mom IIO Benchmarks TP1 by Anon et al Problems 7 often ignoredthe usernetwork to terminab 7 Used transaction generatorwith nothinktime made sensefor database vendors but notwhat customerwould see um compliance list 13 pages still Propos mi im DEC tried IBM test on different machine with poorer resu a39 ed by auditor lts than cl lm Created Transaction Processing Performance Council in 1933 founders were CDC DEC lCL Pyramid Stratus Sybase Tandem and ang 46 companies today Led to TPC standard benchmarks in 1990 org mom IIO Benchmarks Old TPC Benchmarks Revised version of TP1DebitCredit Random TPq vs unitorm mart vs dumb affects instruction path length n 1n terminals per TPS vs 1nn anch record perTPs vs 7 Branch sca br 1n 7 Responsetime constraint 9 s2 seconds vs95 S1 7 Full disclosure approved byTP l e vs 7 Comp et TPS responsetime plots vssingle point TPCB Same as TPCA but withou r essing of requests 7 Responsetime makes nosense plots tps vs residence time time oftransaction resides in system These have been withdrawn as benchmarks rrc will to IIO Benchmarks TPCC Complex OLTP Models awholesale suppliermanaging orders Orderentry conceptual model forbenchmark Workload 5 transaction types Users and database scale lineany with throughput De nes fullscreen enduserinterface Metrics neworder rate tme and priceperfonnance sitme Approved July 1992 rrc vmv 7n IIO Benchmarks TPCD Complex c39 on Support Workload OLTP business operation Decision support business analysis historical Workload 17 adhoc transactions 7 eg impact on enue of eliminating companywide discount Synthetic generator of data Size determined by Scale Factor 100 GB 300 GB1TB3 TB10 TB Metrics Queries perCigabyte Hour Power QppDSize 3600 x SF ICeo Mean of queries Throughput chDSize 17 x SF I time3600 PricelPerformanceQphD Size Report time to load database indices stats too mm Approved April 1995 IIO Benchmarks TPCW Transactional Web Benchmark Represent any business retail store software distribution airline reservation electronic stock trades etc that markets and sells over the Internet Intranet Measure systems supporting users browsing ordering and conducting transaction oriented business activities Security including userauthentication and data encryption and dynamic page generation are important Before processing ofcustomerorderbyterminal operatorworking on LAN connected to database system Today customer accesses company site over lnternet connection browses both static and dynamically generated Web pa es and searches the databa efor oduct or customer information Customer also initiate finalize and c ec n product orders and deliveries Started 1197 hope to release Fall 1993 quotm 972 Page 12 TPCC Performance tpmc Rank Cnn g tpmc stpmc Dat base 1 16M R56666 517112 nndex6way1 5765366 11716 Oracle6661 2 HP P 9666 112256116my1 5211766 36117 Sybase ASE 3 Sun ultra E6666 Us 12 nnde x22my15167162 13116 Oracle6663 1 HP HP 9666 112266 116m 1 39 6917 9116 Sybase ASE 5Fujitsu GRANPO 666 Mndel666 3111693 35766366 Oracle6 6 Sun ultra E66 1 1 11 761 96 Oracle6663 tal pba5616611 nnde x 6 my1 36396 66 365 66 Oracle7 1773 SGI r rds 126 1 25369 26 139 61 INFORMIX 9 16M A5166e5ewert12way 2511975 12666 D62 1 Digilzl Alpba56166 5625116way1 2153766 11616 Sybase 56L Fr Imv 73 TPCC PricePerformance Itpmc Rank stpmc E tpmc Database 1167267 MS SQL 65 1696167 MS SQL 65 1652696 MS SQL 65 1366936 MS SQL 65 2 1656597 MS SQL 65 Cnn g AeerAltns 1966617m1 werEdge 6166 e 337 3762 3796 13391 13 1339113 1366936 MS SQL 65 11 655 76 quzn a 1156 Us 33939 1262667 MS SQL 65 z lt Fr Imv 71 Rank Cnn g 1 llcR w6rldMark 5 TPCD PerformancePrice 300 GB abnd thD 366pr Database 156 92 31176 217266 T a 211179666 517522 116 nnde1 5661 26296 196266 lnrnrmixxlzs 3D6A 33656 12777 131966 Oracle66661 1SunultraEnterprise6666 3 766 LINE 155366 lnrnrmixxlzs 55equentlluMA62666132way132323 16976 326366 Oracle6666 E nfig apnd thD 366pr 1 DcAmiollA 33656 12777 131966 Oracle66661 2 s ultraEnterpr 6666 32766 11776 1553 6 3 11179666 517522 t16nnde1 12 26296 196266 1 llch rldM rk5156 31176 217266 1 l a 5 SequenINLlMArQZ 32 way132323 16976 326366 Oraclenimu Fr Imv 75 TPCD Performance 1TB Rank Cnn g 0116 361pr Database 1 Sun Ultra E 16 xldamyj 123319 j 1353 Inlnmix Dyn 2 quotCR Wurlszrk 12 x drwzyj 121691 Z1 3 Terzdzlz 3 IBM RS SP 12 x rwzyj 7633 5 4 Z 95 DBZ LIDBV NOTE Inappropriate to compa results from different d es atabase slz Fr Imv Tn TPCD Performance 1TB Rank Cnn g 6116 1 D Database 1 SunultraE666611x21my1 129319 511563 135366 lnrnminyn ZNCRWanszrk Zxdrwzyj 121192 39123 216366 Teradata 316M RS SP Zx rwzyj 76336 51551 269566 D62uD6v5 Fr Imv 77 SPEC SFSILADDIS Predecessor NFSstones NFsstones synthetic benchmark that generates series of NFS requests from single client to test server reads writes amp commands ampfile sizes from otherstudies r Prnhlem1cllenlcnuld nnl always SHESS server 7 F and hlncksizvs nnl rezl39slic 7 Clients had In run SunOS Fr Imv 7x Page 13 SPEC SFSILADDIS 1993 Attempt by NFS companies to agree on standard Legato Auspex Data General DEC lnterphase Sun Like NFSstones 7 Run an multiple clients a netwurls ttu prevent huttleneclsl ame caching pulicy in all clients black amp15 partial hlncls I hlnck St Wn partial hlncls 7 Average respunsetime 5n i x v d 35 lull s n ssec increase capaci 1GB 7 Results plut ulserver luad lthruughputl vs respunsetime a numher ul users Assumes 1 user gt 1n NFS upssec Example SPEC SFS Result DEC Alpha 200 MHz 21064 aKl BKD 2MB L2 512 MB1 Gigaswitch DEC DSF1 v20 A FDDI networks 32 NFS Daemons 24 GB file size as Disks 16 controllers 34 le system a 4817 a gt9 in V 1 Willy UNIX File System Benchmark that gives insight into IID system behaviorChen and Patterson 1993 Self scaling to automatically explore system size u e atasize lucalityvia LRLI Gives hle cache 7 Percentage ul reads luwrillzs 17 quotiv reads 39ypicallysnnl 1 quotlv reads gives peak thruughput 7 Average lO Reguest size aernu ullic1 7 Percentage seguential reguests typically 5 quotiv 7 llumherulprucesses cuncurrency ulwurkluad tnumher prucesses issuing lO reguestsl u Fix rourparameters while vary one parameter quot mquot m m Searches space to find high throughput mm mm gwmwm mm mm Example Willy DS 5000 Sprite Ultrix AvgAccess Size 7 a Disktramc my to he dominatedhy writes 5 n 32 KB 13 KB 9quot Data touched le cache 2MB15 MB 2 MB wmeop mized File System a Data touched disk 36 MB 6 MB 0nly representatiun nn d39skis lug 2 Unix 39 Wquot quotms 1 5W Wquot WWW 5W Stream uut les directuries maps withuut seels 39 as am 32 MB memury ultr 39 File Cache size Write thruugh Ad t I In im Sprite Dynamic File Cache size Write hack Write cancellingl gs 59 9539 g n39VLims m in thnm Tm Slri easily acruss severaldisis Ll7 wrap Ii g i zmy 39D395 W quotquotquotquotquotquotquot Log Structured File System effective write cache of Versurning LFS much Smaller5lt8 MB than read cache 20 MB quotmm quotmm s Reads cached while writes are nnl gt 3 p atea mm Sprite39s Log Structured File System Large me caches ellective in reducing disk reads Willy DS 5000 Number Bytes Touched v Page 14 Interconnect Trends Summary IIO Benchmarks Scaling to track technological change TPC price perfo ance as nomalizing configuration feature Auditing to ensure no fou p ay Throughput with restricted response time is normal measure Historical Context of storage lD Secondary and Tertiary storage Devices Out lne storage lD Performance Measures Processorlnterface issues A Little Queuing Theory RedundantArrarys of inexpensive Disks RAID ABCs ofUNIX File Systems lD Benchmarks Comparing UNIX File System Performance lD Busses rm vmv n interconnect glue that interfaces computersystem components High speed hardware interfaces logical protocols Networks channels backplanes Netwnr C Backplane gtlunum l rl m 1m Dimnce Bandwidth in a mu Mins 4n a mum Mins 32D 7 IDEIEH Mins Latency h 1gtms medium 1Dwltps Rel39nliility Iuw medium high Extensive CRC Byte Fancy Byte Fancy messagerhased memnlyrmzpped narriiw pathmys 4 wide pathways distrihuted arh centralized arh quotmm rm vmv n Mmu viii m vim n xcxu i m Man 128 at ya 25 tanmmqu D we yes m 31 32 a 3 ago an MAM saneimp s W Mamie said Man t tonne amendment 0 Dpnml mel Dpnml clam AW 91m m Immn rick wrap 25 7 m a mama we Waiamxmtl In 155 in a whim Wmmnm m m on a immiuwe man a in m E3 a Maegan 2 2n 2i 7 Maxka 5m 5m 5m 25m mm mm mm ANSIm ms mam BI Distinctions begin to blur SCSI channel is like a bus Fulure us is like a channel td39sciinnectreciinnectl speed switching Izhrils HIPPHnrms Iinls i rmvmvxx BusBased Interconnect Bus ashared communication link between subsystems ngleset iifwires is shared multiple ays devices a peripheras may even he piirted man his 7 iiw cii I Easytii add new ing ciim Disadvantage e A ciimmunicatiiin hiittleneck piissihw limiting the maximum lO thriiughput Bus speed is limited by physical factors the bus length the numher iifdev39 es tand hence hus liiadingl 39 39 39 ta 39 aiy hus speedup 7 these physica its preven rm vmv xv BusBased Interconnect Two generic types ofbusses e lO husses lengthy many types iifdevices ciinnectedwide range in the data handwidthl and fiilliiwa hus standard tsii etimes called a channea Puememiiiy huses high speed matched In the memiiiy system In max e emiiiyeclzu handwidthsingle device tsiimetimes called a bee lapel e Tii liiwerciists inW ciist tnlderl systems ciimhinetiigether Bus transaction 7 Sending address a rec g iirsending data rm vmv iii Page 15 Bus Protocols Multibus 20 address 16 data 5 control 50ns Pause Bus Master has ahiiitytu cuntruithe hus Inl ates transactiun Bus Slave rnuduieactiuated hythetransacturn Bus Communication Proto nl events and liming requiremen 39nn uiseduence nhrrrnatiun Synchronous Bus Protocols cinch J Ii ii i Addnrss D212 Rm Readcumplele begin read peirnedSpirttransactiun Bus Prutucui Address Asynchronous Handshake t tn Master has nhl 39ned cuntrui and asserts address directiun data 39 Warts as e ed arnuunt nitirnehrrsiaues In decadelzrgel Asynchrunuus Bus Translers cuntru es tred ask serve In t1 Master asserts re e CHEW WWW W t2 slave asserts ack indicating data received Synchrunuus Bus Transiers sequence relative In curnrnun cinch 2 quot 3 Mam mums m1 rrcvumt rmvmvvz Slmmlmgs k nevum Bus Arbitrat on Parallel tCentraiizedi Arhitratiun Adm Bus Request Bus op ons 2quot Bus Gram Option High performance Low cost pm Bus width separate address Multiplex address a data nes a data lines New Data width Wider Is raster Narruwe s cheaper A k teg3z hits E Transiersize Mu pie wurds has Iewurdtransier 393 less hus uuerhead Bus masters Muitipie Single master tredu es zrhilmlinnj tnu zrhilmlinnj Waits aspecitied arnuunt nl rrnehrrsiaues In decudetargeu 5F 575mm quot7 n quotnus t1 Masterassertsredues transactiun Requestand nepy cunne an is cheaper IZ Slavezssens ack indicating ready In transmit data Eggg dg s h39ahw and has 39quotW WW t3 Masterreieases red data received means multiple meme ta siauereieases a k Clncking synchrunuus Asynchrunuus Time Multiplexed Bus address and data share lines rmvmvw ncvmvvs new Page 16 SCSI Small Computer System Interface Clo k rate5 MHz10 MHztfastIzo MHztultra Vindth n 8 bits I16 bits wide up to n 1 devices to communicate on a bus or String Devices can be slave target or master initiatorquot SCSl protocol a series of phases during which Specif ic actions are taken by the controller and the SCSI disks gthe bus multiple devices may reguest dress r cces e Arbitration When the SCSI bus gues free tarbitratefurl the bus fixed 39 39 b ad r Sele furnsthetargetthatitwill pa d39scunnectedl e Cummand the initiatur reads the SCSI cummand bytes frum hnsl memury and sendsthemtu thetarget 7 Data Transfer data in uruut get 7 Message Phase message in in out Innizlnr target lidentify saveresture data painter d39scunnect cummand cumpletel 7 Status Phase targetjust before cummand cumplete pate theselectiun if 5 rrc vun w SCSI Bus Channel Architecture Inrpeerprnlncnls 1993 IIO Bus SurveyPampH 2nd Ed Bus SBus TurbuChannel uChannel PCI 0 39 39 atur n DEC IBM lntel Cluck Rate lMllzl Addressing Data sizes lbitsl 33 Physical Physical 316253255 n1sza3zsa Master M lti lti u u Arbitratiun Central Central 32 bit read lMasl zn 33 PeaklMasl 75 111 12221 13 25 rm will we 1993 MP Server Memory Bus Survey Bus Summll Challenge XDBLs or 39 atur llP SGI Sun ClucknatelMllzl in an as Splntransactiun yes yes 157 i an 7 ata lines 1 256 m lparityl Data sizes lbnsl 512 my 512 ClnclsIrznsler a 5 a P l asl sen 1znn 1n5s Master ulti Multi u r Arbitratiun Central ntral Central Addressing Physical Physical Physical sluts 1s 9 1n Dussessystem 1 1 2 Length 13 inches 12 inches 17 inches rrcwwmu Summary IIO Benchmarks Scaling to track technological change price performance as normalizing configuration feature Auditing to ensure no foul play Throughput with restricted response time is normal measure rrcwwlm Page 17 Lecture 1 CostPerformance DLX Pipelining Caches Branch Prediction Prof Fred Chang ECS 250A ComputerArchitecture Winter1999 sites based van mum uee eszsz spnnt teee Computer Architecture ls the attributes ora computing system as seen b the ro rammer iethe conceptual structure and functional behavior as distinct from the organization ofthe data flows and controls the logic design and the physical implementation Amdahl Blaaw and Brooks 1964 Computer Architecture s Changing Definition 19505 to 19605 ComputerArchitecture Course ComputerArithmet39c 19705 to mid 1980 ComputerArchitecture Course lnstruction Set Design especially ISA appropriate mpilers 19905 ComputerArchitecture Course U Design ofCP memory system IID system Multiprocessors new new new Computer Arch39 ecture Topics Computer Arch39 ecture Topics ECS 250A Course Focus lnrmOmrMznd51mm3 Understanding the design techniques machine Dstsvwommg RAID Shared Mequot 0er structures technology factors evaluation E T h I Message passing methods thatglqilltd tertmine the form mgr m 2393 quot 7 quot93995 39 com LI ers In S en LI iiii iii quot Bus rm ncnb Parallelism Cnhmm intercennectten itetwetk Network interraces Technoloav Programming mam L2 Cmquot WWW Applications Languages WWW Latency ProcessorMemorySwitch Topologies WWW DEW Multiprocessors R quot9l USA L1C n Bandwidth VLSI s ig gmgav Networks and lnterconnections Late Y Exceplinn Handling Reliability opeming Measuremem amp Pipelininut zzzrd Rvsnlminn Pipe ing andlnstructinn systems Evaluation 39S OW Supeis calar Rendering Lm pmuglism Predictinn5peculzttinn VedanDSP new new new Page 1 Topic Coverage Textbook Hennessy and Patterson Computer Architecture A Quantitative Approach 2nd Ed 1996 Pertormancelcost DLX Pipelinirlg Caches Dranch Prediction ILP Loop Urlrollirlg Scoreboarding Tomasulo Dynamic Dranch Prediction Trace Scheduling Speculation Vector Processors DSPs Memory Hierarchy II interconnection Networks Multiprocessors ncwm ECSZSOA Staff lnstmcton Fred Chong Of ce Eullaocll chongcs Of ce Hours Mon 46pm orby appt T A Diana Keen Of ce EUll2239 keendcs TA Dlrlce Hours Fri 15pm Class Mon pm ComputerArchitecture A Quantitative Approach Second Edition 1996 Web page httpllarchcsucdavisedulchongl250Al Lectures available online before 1PM day oilecture Text mum rading Problem Sets 35 1 ln class exam prelim simulation 20 Project Proposals and Drafts 10 Project Final Report 25 Project Poster Session cs colloquium 10 ncwwr VLSI Transistors rmvmvln CMOS Inverter lh 0ut n Dx Out ncvmu CMOS NAND Gate g ncvmm Page 2 Integrated C cuits Costs g cnsl Packaging cnsl nalleslyield Real World Examples Metal Line Wafer Defect Area Diesl Yield Die Cost Ia ers h cost wafer osthe rformance What is Relationship of Cost to Price Component Costs Direct Costs add 25 In 40 recurring cusis iahur mnanry Es Farm cmz mmz purchasing scrap 900 0 3 360 4 M argin add 02 In 136 nunrccurring casts V lWaf 39 e V afer 39 e Tesiuies 3 1200 10 31 131 12 RampDmarketingsalesequipmenl maintenance rcn1a1inancing 39 custprc1ax prunis1axcs PowerPC601 4 000 1700 13 121 115 20 53 Aver Discounh 2w r Mam um I 179 s rice a n vnume 0 3 030 1300 10 195 55 73 discnunts andnrrelailermarkup 3 070 1500 12 234 53 19 149 List Price SupersPARC 3 070 1700 16 256 48 13 272 AWquot 393 25 to 40 3 000 1500 15 296 40 9 417 Avg Selling Price 39 34 to 39 Defects er unit area Die Area 7 1 rFrnm quotEslimatin IC Manma urin Cnstsquoth Linl Gwenna n n quot3me WWW 1 MicmpmcessongponAugusIZ13933115 y W F 1 33 39 39 n 0 0 Die Cast goes roughly h die area quotmm quotmm quotmm Chi Prices Au ust1993 Summa Price vs Cost p g ry Technology Trend Assume purchase 10000 units mm Microprocessor capacrty mu Chip Area Mfg Price Muiti Comment Wquot m cm Annn 306m 43 9 31 34 intense Competiton quotn 43an 01 35 245 70 No Competition in w I PowerPC 601 121 71 200 36 DEC Alpha 234 202 1231 61 Recoup RampD 5 U n my Pentium 296 473 965 20 Eariyin shipments ZX everv3yvs 1 Line width halve7 yrs Ix waxmin Mm W5 Y Frcvmvw waxmix Page 3 Memory Capacity Single Chip DRAM u 25 1 A 15 1995 54 mznnu 255 sizeith u EIEZS cyc lime iEIEIns rmvmvw Technology Trends umma apacity Speed latency Logic 2x in 3years 2x in3years DRA Disk 4x in 3years 3 s x 5 o lt o a v 2x in 10 years no vmv 2n Processor Performance Trends I 139 77 77 Pamr manta 1 1965 197m 1975 wan was 199m 1995 mun waxmm Processor Performance 135X before 155X now 0 s7 88 89 90 91 92 93 94 95 96 97 no vmv 22 Performance Trends Summary Workstation performance measured in Spec Marksimproves roughly50 peryear 2x every1a months mprovement in cost performance estimated at 70 per year no vmv 23 Computer Engineering Methodology Technology Trends rm vmv u Page 4 Computer Engineering Methodology Benchmarks Technology Trends PT vmv 25 Computer Engineering Methodology Benchmarks PT vmv n Computer Engineering Methodology Implementation Complexity Benchmarks PT vmv 27 Measurement Tools Benchmarks Traces Mixes Hardware Cost delay area powerestimation Simulation many levels 39rcuil Rules ofThumb Fundamental Laws VPrinciples PT vmv 2x The Bottom Line Performance and Cost Time to run the task ExTime e Executinn time response time lzlen Tasks per day hour week sec ns Performance 7 Thrnughpulhzndwidlh PT vmv n The Bottom Line Performance and Cost quotX is n times faster than Yquot means ElemeY Performanceltx EXT1m6X Performance 1 Speed ofConcorde vs Boeing 747 Throughput ofBoeing 747 vs Concorde PT vmv 3n Page 5 Amdahl39s Law Speedup due to enhancement E Time wD E Performance w E opeeuume 77777777777 W 777777777 W Eleme w E Performance wD E gt Amdahl s Law ExTrmeM ExTimemd x I Fractionmmmw Fractionmw J a Speeduptmt x39rimedd 1 Amdahl s Law Floating point instructions Improved to run 2x but only 10 of actual instructions are FP ExTimem speedw wm 39 E T 39 1 Fractionmm Fractiommmd Suppose that enhancement E accelerates afraction F X quotquotenew f orthe task by a factor 5 and the remainder orthe Pee quotPemmcm Speeduporeor ask is unafrec ed rimmm ncoom ncoom Amdahl s Law Floating point instructions improved to run 2x but only 10 of actual instructions are FP ExTimem ExTimenm x 09 m 095 x ExTimenm Speedupn mquot 1053 rm Imv u Metrics of Performance Application Answers per month Operations persecond Programming Lang nsl ollnstructi s persecond MIPS nsl nHFPj operations persecond MFLOPs Megzhyllzs persecond Cycles per second tclock ratel rm Imv 35 Compiler Inst Set Technology rm Imv an Page 6 Cycles Per Instruction Average Cycles per Instruction CPI cPu Time Cluck Intel Instruetinn Count Cycles Instruetinn chum n CPLlIime CycleTime Z CPI39 I39 i Instruction Frequency n CPI Z CPli F where Pi Ii Instrucllnn Cnunl Invest Resources where time is Spent rm vmv 37 Example Calculating CPI Ease Machine Reg I Reg Cycles CPli A Time 1 33 Load 2 4 27 one 2 2 13 Branch 2 4 27 5 rm vmv 3x SPEC System Performance Evaluation Cooperative First Round 1989 7 1n prngrans yielding asingIe numherl SPEszrlsquot Second Round 7 sPEcInaz 6 integer prngrzmsl and sPEprsz Ia aming point prngrzmsl u March 93 nIDEcannn Mndel l we Svsvyhasihcup hquothcopvlayhyclz hemcpvihaciquot waves aizIaiiu Whamarautqupzuu 113537 mateeueeau4ut22uuieaias 39 Third Round 1995 sPEcintss ii integer prngrznsl and g paint 7 enchmzrls useIuIInr3 yearsquot 7 single nag setting or an prngrans SPECinLhzse SPEClrihaseg quotWWW How to Summarize Performance Arithmetic mean weighted arithmetic mean tracks execution time 2Tn or WI TI Harmonic n weighted harmonic meanof rates eg MFLDPS tracks execution time nl 21R or nl 2WIIR Normalized execution time is handy for scaling performance eg x times faster than SPARCstation 10 But do not take the arithmetic mean of normalized execution t39me use the geometric H xMn rmvmvm SPEC First Round One program 99 oftime in single line ofcode New frontend compiler could improve dramatically rm vmv M Impact of Means on SPECmark89 for IBM 550 Ratio to VAX Time Weighted Time Program Eyeore Alter B I Eyeore Alter gee z espresso 35 3a spice 7 a7 dnduc as as Ilsa m m h 35 35 g u an an mmmd 73 73h Ip pp an 37 Inmczmr 33 13a Mea a Geomean R n rm vmv 62 Page 7 Performance Evaluation Forbetterorworse benchmarks shape afield Good products created when have 7 Gnnd henchmzrls 7 Good waysto summarize perrormance Given sales is a function in part of performance relative to competition investment in improving product as reported by performance summary lfbenchmarkslsummaryinade uatethen choose between improving product for real programs vs improving product to get more sales Sales almost always winsl Executlon time is the measure ofcomputer performance rrcvmm Instruction Set Architecture ISA mva lnte rface Design A good interface Lasts through many implementations portability compatibility is used in many differeny ways generality Provides convenient functionality to higherlevels Permits an efficient implementation at lowerlevels time rrcvmm Evolution of Instruction Sets singleAccumulatortEDsAC195v Accumulzlnrllndex Reg39 t is tMa Is e nchesterMark I IBM7 series 1953 Separation All Programming Model frnm implementation Highrlevel Language Based o cept ola ramil tasnnn 19 tiaMasn 1965 General Purpose Register Machines Complex instruction Sets LindSIan Architecture sz iotei m 1977M tcnc 66m Crzvl 1963776 RISC IMipsSrzrcl PVPAJBM Rssnn 1937 ncvom Evolu n of Instruction Sets Majoradvances in computer architecture are typically associated with landmark instruction set designs 7 Ex Stackvs GPRISysIem asn Design decisions must take into account 7 technology 7 machine orga ization 7 operating systems And they in turn influence these rrc Imv a A quotTypicalquot RISC 32bit fixed format instruction 3 formats 32 32bit GPR R0 contains zero DP take pair 3address reg reg arithmetic instruction Single address mode forloadlstore base displacement e noind tion Simple branch conditions Delayed branch see SPARC MIPS HP PArRisc DEC Alpha IBM PowerPC CDC ssn CDC 76 lcrzyr1Crzy1Crzy3 rrcvvmo Page 8 Example MIPS Registerr eg39sler 2e 25 Rsi 15 mu 5 5 a IE Ps2 Registerrlmmedime CM 2525 21m 1515 a ranch 252 21m M r iump Czll CM 2e la igel u Summary 1 Designingto LastthroughTrends Lnglc le years Zx inayears DRAM x In ayears Zx In in years i ax in aye Zx in in years Syrs In gradumegt16x CPU speed DRAMDisk size Time to run the task 7 Executiuntime respunsetime latency Tasks perda hour week sec ns 5 X is n times fasterthan Y means Tlme m Peri emanee txi Summary 2 Amdahl s Law x imerm 1 speedquotquot w 39 Emmew 39 1 Fractionemmed Fractiommmd CPI Law Speedupama P ngr m Execution time is the REAL measure of computer performance Good products created when have a Guud henchmzrlsgnndmyslnsumm perinrmance quot quot Die Cost goes roughly with die area WWW Emmamm Cari PC industry support engineeringresearch ncvmvw rmvmvsn investment rmvmvsi Sequential Laundry Pipelined Laundry Pipelining Its Natural 6 PM 7 8 9 10 11 Midnight Stan W rk ASAP Time 6 PM 7 a 9 10 11 Midnight 30 40 iE SEIAO I2o36 40 i lfoldo 539 r I LI Laundry Example T T To 20 l n AMBM my E DE each have one load ofclothes Er u m t ashd d 0 DE Washertakes 30 minutes r 55 d e Dryertakes 40 minutes 7 57 Folder takes 20 minutes i no vun 52 Sequential laundry takes 6 hours for4 loads lftheylearned pipeiining howlong would laundryrtcavhveg P pelined laundry takes 35 hours for4 loads mm Page 9 Pipelining Lessons 9 Pipelining doesn t help atency of single task it Time helps throug put of I entire workload 3T 30 I 4o 3970 3970 4o 20 Pipeline rate limited by s slowest pipeline stage k Multipletasks operating simultaneous y 6PM 7 8 Computer Pipelines Execute billions of instructions so throughput is what matters DLX desirable features all instructions same length registers located in same place in instruction format memory operands only in loads or stores 5 Steps of DLX Datapath 130 Figure 31 Page Instruction Instr Decode Execute Memory Write Fetch Reg Fetch Addr Calc Access Back O Potential speedup 8 Number pipe stages 6 Unbalanced lengths of r pipe stages reduces TIme to ll pipeline and ime o drain i re uces speedup no M955 FTC VWBSE Pipelined DLX Datapath Visualizin Pi elinin Figure 34 page 137 Figure 193 Pang 9 Its Not Easy for Time clock cycles Com pute rs Instruction 39 E 39 f 39 f v InstrDecode 3 Execute hm eg Fetch Addr Calc Access Data stationary control local decade for each instruction phase pipeline stage FTC W39 55 m N DQWO 39 Limits to pipelining Hazards prevent next instruction from executing during its designated clock cycle W cannot support this combination of instructions single person to fold and put clothes away W struction depends on result of prior instruction stIll in the p39 39 missing soc was Pip ng of branches amp other instructionsgtall the pipeline until the hazardhuhhl quot in the e FTC W99 ED Page 10 one Mammy PunSlru urzl szzms one Mammy PunSlru urzl szzms MM 91 W2 EFL div a u n M v sum ur Eqummnmr V Hgmelmm PM 1 am Wmmm u p m ImI mmmEfm W 1 WW St vim i mnmmrm three Genmc nmz szzms Wm mm m R anmrnnhmU u Mummmmm mm Page 11 three a enmc nmz szzms Wm mm D n m mmm Mm 22 c enmc nmz szzms r nrwzmln n Avnm mm quotmm sacrum 39 Emma marsmv a ramrs mnmmrm quot73192 3 HEM sun mmrs El n 7 End mm 9H L or am BF wxum 3 39 sun ram 1 31d VSMH Magma W u Ewan Z or rSMS er m Page 12 Software Scheduling to Avoid Load Hazards Try producing fast code for a b c de assuming a b c d e and fin memory Slow code Fast code LW Rbb LW Rbb LW cc LW Rcc ADD RaRbRc LW Ree sw aRa ADD RaRbRc LW Ree LW LW Rf sw aRa SUB RdReRr SUB RdReRr sw sw dRd quotWSW Control Hazard on Branches Three Stage Stall Branch Stall Impact If CPI 1 30 branch Stall 3 cycles gt new CPI 19 Two part solution 7 Determine branch taken or not sooner AND 7 Compute taken branch address eanier DLX branch tests if register 0 or 0 DLX Solution 7 1 clock cycle penalty for branch versus 3 m W3975 Pipelined DLX Datapath Figure 322 page 163 Instruction Instr Decode Execute Memory Write Fetch Reg Fetch Addr Calc Access Back IS IS the correctl cycle latency lmplementatloni 39quot 39 v m weave Four Branch Hazard Alternatives 1 Stall until branch direction is clear 2 Predict Branch Not Taken 7 Execute successor instmctions in sequence 7 Squash instmctions in pipeline irbranch actuallytaken e Advantage orlate pipeline state up a e 47 DLX n s not taken on average 7 Pc4 already calculated so use itto get next instmction 3 Predict Branch Taken 7 53 DLX branches taken on average DLX still incurs 1 cycle branch penalty other machines branch target known before outcome m Ween Four Branch Hazard Alternatives 4 Delayed Branch 7 Define branch to take place AFTER 5 following instruction h mscrucuon b sequential sweesscur1 sequential smaesscurz g g g g g g H Branch delay of length n Sequential siccesso branch target 1 taken 7 1 slot delay allows proper decision and branch target address in 5 stage pipeline 7 DLX uses this no weave Page 13


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Allison Fischer University of Alabama

"I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.