New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Mr. Hayley Barton


Mr. Hayley Barton

GPA 3.93

K. Asanovic

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

K. Asanovic
Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 61 page Class Notes was uploaded by Mr. Hayley Barton on Thursday October 22, 2015. The Class Notes belongs to COMPSCI 152 at University of California - Berkeley taught by K. Asanovic in Fall. Since its upload, it has received 23 views. For similar materials see /class/226655/compsci-152-university-of-california-berkeley in ComputerScienence at University of California - Berkeley.


Reviews for COMP ARCH & ENG


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/22/15
CS 152 Computer Architecture and Engineering Lecture 1 Introduction Krs te Asanuvu Eiecmcai Engmeenng and oumpmev Sciences Unwevs v m Cammma at EevkeiEv imp mam kmkyganhm humlusteec5hexlmleyedncslsz EDSAE Unuers ym Eamhndge m 15m mm cmst W 1 mman What is Computer Architecture Application A Gap too large to bridge in one step but there are exceptions e g magnetic compass V V l Physics In its broadest de nition computer architecture is the design of the abst rac tion layers that allow us to implement information processing applications ef ciently using quot L39 39 technologies mman Cmszr jllng39 s a Abstraction Layers In Modern Systems Annmaan Ax mm 1 mmmgLan ua 2 swims ommmswymmmym avcggggwe UnlprocessorPerformance VAgtlt 25lveav1578tu1586 wacwae 52Neav1586tu znnz R Sc Xge 77Neayl 2tu pvesent mm cmst W The End of the Unlprocessor Era Single biggest change in the history of computing systems mm cs szsvunv m Conventlonal Wlsdom In ComputerArchltecture omccnvcmcnawwsccm FumeistveE Tiansismvsexpenswe wccnvcnucnawwsccm PcwcvwawPcwcvexpcnswemansmmswcc Can pm mcvc un cmp than can a umtu tum an mch Sm memmcveasmuin vuctmnmveiPavaHehsmwacumpHevS mncvaucn mpcnmm supevscaiav uututumev Specuialmm vuvv NWCW LPWaH iawutmmimsmngvetumsunmmeHWtuHLP cm om cvv Mumpucsavc siuw M urv acccss istas1 New Memmvwai Mcmuw siuw mumpucsvm mu Huck Evciestu DRAM memuw a Hunks 1m mummv mch ummcccssmpcncvmancc zxm 5w New cw PuwevWau my WaH o Mcch WaH EnckWaH umpccsscvpccmnccmz i w e Sea change in chm ccsmn mummcwcvcs 2x mcccssms patch pi Zveavs Mme simpiev Pvacessmsave we Pavm emmem cs szsvunv m Sea Change n Ch p De gn V intei mm 1571 Han pvucessuv 2312tvansis1uvs a MHZi 1D micmn PM OS 11 mm2 chip RiSCH 1583 32min asiaee PipEimE 6D 76Dtvansis1uvs 3 MHz 3 micmn NM 03 an mmzcmp izsmmnpiuuesmcmncnos 312 RISC IIOFPUOIEZEhEODczche RiSCHshinkstaDD2mm1m55nm CamesviaDRAM mi tva issmSRm Processor is the new transistor Imam K2515259an ng Deja vu all over again Muitiprucessurs imminent in 197D EDS ans todax s processors are nearing an impasse as techriuiugies appmae the speed driidnt David Mitcheii Tne Transpurer The Time is NON i 989 Transputer s premature acusmm muinprdeessdrs medm oeatuniprucessurs aPrucrasnnatiun rewarded 2x sed perf ii ayeais We are dedieannd aii dr uurmture product deveiupmerit to muiticure designs inis is a sea enande in mm mm Diireienee is aii mierdpmeessdredmpanies have smie d muitiprucessurs AMD intei iEM sun aii new Appies 2 OPUS aPrucrasnnatiun penaiized 2gtlt sequemai pen 5yrs iggest programming enaiiende ndm itd 2 OPUS OtEHirH President intei 2mm nedt Imam K2515259an ng Problems WIth Sea Change A gu thms Prugrammmg Languages summer Operatmg Systems Archwtectures wararwes nut ready m supp y Threadieve ParaHehsm ur Dataieve ParaHehsmfur mun CPU mp Ammmures nut readyfur mun CPUsch 7 um nstm mmeve PamHehsm cannm he sawed hv mm man mm vmtevsa ane but my cannat he sawed mm Pamupmmn mavmneds Need a rewaran uf 3 the abstracter ayers m the cumputmg system stack mm cs szspv nv m Abstract on Layers n Modem Systems Annmaan A guvmm evalmu v Emenua Machmes c Ema 0 WWW M 9233 Eenahm woes new Rewgamn m campmev magma mamas Wm mm cs szswnv m n The New C5152 New cm 52 fucuses an lnteractlun er sumvars and hardware 7 male avmneduve and less dlgnal enerreeme t r navvasapavmeFPGAdesgnlabclassmsl bln Fallzuu y me Mme amhneamal ldeasvle ll Explme sung Much ufthe malerlal yuu H learn mlslerrn was prevluusly m 05252 Same Mme cavern C551 C vst swm c5252 neav v 2a veavsagal Mavhe evewl veers srm cszszescsl 527gtcsslc7 Class eenlalns labs based an varluus dlffErErlt machlne desl rls Expevlrllem mh hawavml eduml mechanlmsvka m mach2 an real sunvware mm Eslszspllnv m CS 152 Course Focus unaerslarrame me aeslerr lechmuuesr macmrre luclulesr lechnuluw Vaclulsr evaluallun melhuds lhalwlll aelermme me 1mm meampmers m zlsl oerrlurv mmquot parallelism ma rInlerraee Dexlgn hompmerr hchl ec u ISA Ha quotw f e squota mull Oplra ns Measurement ampM5m Synms Evaluation 7 cslszsvllnv m The New C5152 Execu ve Summary mg Pvauessm Wm meaecews mm m c5152 i Pms metemnamgv hehmd mum s mummy ms mm mummy C5152 AdmlnlstrIVIa ns1mctuv w mmanm om 57a Sada HaH himm Omce Huuvs M13D72 SDPM emaMu cun vm 575 Sada r A awn seamen mumm E a o Omce Lemmas mm 5 we 3mm 32 Sada mav chanEE Sectmn F124pm zaaw neue v mavchanua szt Commemmmzecme A QuanmanveApproam 42 mm Oct zane Reamngsassmnewumms edmun mum use eavhevEds Webpage mpummecsneww em 5152 Lactuves ava ame unhne heme nuun dav wecme mm mummy C5152 Structure and Syllabus SM modu e5 1 Swmme macmne uesmn SAS mmmpmmammmm unmpehned macmnes Hun Lam swmme pmehnes mg Somerbuammg uurutumev ssuE m 2 M 3 Vmua memurv Mam Excemmns mtenums b 5 quot25 vuvv macmnes Exphmw pavew mucessuvsvecuvmac muw me In ch Mumpvucessuv avcmecmves cache cuhevencE memurv mudg s smcmumzauun mm cmst W CS1 52 Course Components 2m Prumem Sets une per mudu e mendedta he p yau m We malena Fee mm msmss Mh my mum and WNW hm mm m M mm mm mm based mm an em hm nmnesassumema w have v mkedthmugha wmh ems Sammnsre easedanevPSshanded 4 Omaha une per mudu e Mass a 2mm rm 9mm m mmpmevs m sets an abs 3 4 Labsune per Labs use mm mu my swmatmszmech Schs waded musapenrended sewansm saw ab mm cmst W C5152 Labs Each ab has uwecteu mus upenrended asswgnments RaugthEISEI pm mgvade Dxre Ed pumun s mended m Ensure students Eam mam cuncepts behmd ab mu sum mus Penny avm mu and mu mthew avm ab mm Openrended asswgment s m auqu yuu m shuvv yuur reatmty mum ane day mnvpmyed s g AN an amhneauva Mes and mm mm HEEBWE vesMsOK mem amah e Smdemscan a We Gmup apenrended ah vapans must be m ed m sapavate v Smdemscan vka m umem gmupsmvdmevem ass gnmems mm cmst m u Related Courses FurIla Ammajuls mums Sylum nmnI mm nelgn Newman Archnzmn m cu mm cs szsvunv m ComputerArc ecture A ttle o Thruughuutthe cuurse WE H use a hws turma narratwe e he p eheeraahe Why Eenam eeas aruse why wehy abuut ewe new he des gn prunes ahe EXp ams Why Eenam eee s ehs Ware taken Because future technu ugwes mwght be as EDNS LramEd as u der ehes These whe gnure hwstury are deemed e repeat n Every msake heee h hemhe eeseh wasa su heee h hheehpeeys heh hemeehpeee whee hem mm cs szsvuhvnv Charles Babbage 1 79 Luczszn preresser n1 Mahemmes ozmhnuee unuersny 132771339 Charles Babbage Dl elelme E gme 1823 AnaWC EngHe 1833 7 mmmowmmmn mono cumpmen AppIcaoo n r Mathematma Tah es r mommy autma Tah es 7 New Background 7 m mrmn Pa vrmrma funman can he avvmxxmated by a Wemms Technology mechamcaquotqears15c uard395aamsxmwe ca cu atms mm cmst W erence Eng ne A machineln cnmr ule mathematical Izhles Wexerslva Mvcammuausmndmn mbeappmmmwapawm 7 MVPDNnammcanbecamvukd mmd lgsnceub es Anexamme m W41 m n m n m a1n mm 1 M d1quot39n1 d1n1 2 all you need IS an adder cmst W leference Englne 1823 Bahbage spapenspubhwed 1834 rbe babei is ieaa bv saneaiza bis sun in Sweaen 42 aabbaae owes abibe iaea avbniaina n be isania Anaiyiie Engine 1855 e sanemzaisbiavsbisnaenneaiibe PansWaHdFare e canmmpmeawmh aeaieebawnaniai Speed mall azeaiannanbeisbeininmei Now the machine IS at the Smithsonian mm cs szswnv aa Analytlc Englne 1833 Babbage papervvas pubhshed e mncvwed dqug a hmmsm Me devemeerlflz he d eremverlgme nsplratlun iamoaio Looms iamsveie Dunlme bybunanea mm We set mcavdswth ma Punched baies aiaaieaibe Panen mv eave a pmgrzm We same set mcavds mum be used Wm dmevem eaiaieaibeaasa numbers 1871 Babbage mes ibemaebneemansanieanzea It is not clear if the analytic engine could be bull e ven today being only mechanical technolcqy mm cs szswnv aa Analytic Engine m rm canoemm gunmanquotmuse cnmrul r The storemw en aHvaHamesm b2 upevated upun asWEH as aH MUSE euanmeswncn have ansen Yvumme vesuns m meupevauunsave med 2 Themm nemneeuanmesanuunubeepemeuupun ave aw muugm Tne program an Operau vanab e venemez venemea An Dueratmn m the mm requneu feedmu twu puneneu cards and Druducmg a new puneneu cardfnrthe store An opereuon 03 ener the sequence was eso prowdem mun meennm The t programmer m Byrnn HIE my anelzcequot 1315752 u Ada39stumr was Babbage nnnsem mun meemm Babbage s Influence abbage s deas had great m uence ater pnmamy because of 7 ngr enables Wm Mum quotmes m Eabbage s ectuves m new 7 LadyLoveace Wnu quotans ale nabvea s quotmes m Enuhsh andmumugmv mammm AnaMm Emma weaves ageb ar patter m the ear ytwentweth century rthe focus smned to ana og computers bu 7 HarvardWarm mm m 1944 a very close m 59er 0 me AnaWt Enamel cmst W Harvard Mark I 8th m 1944 m BM End cott aboratones rHuWamAwke ssu saw Ned E552 vaWsm a m hamca b hadsu cumquedve aVsandE rvvemhed mns 7A urmauneucaHv ndhad75JUUUcumpun 5 h nmngcxuckmamea evew 0015mm BBHZ ance 3 secand n raddma 52 nds fur mm uphcanan mute m a sme ca cu atm Broke down once a Week cmst W Linear Equation Solver lnhn mnzsnm lnwz Stale unwersty man 7 Nanasu hu heaneavKummnvaev u we auumeee s mkuvasebma m neeeemem Ds amcp meek Mes an veweshed Eapacnms Appmanon 7 meme mew amevermaw equatmns Banground 7 annevsv awe Dmevermaw Ana vlev en ahaHg camputer Technology 7 YubesandE e mmemamca ve avs Aranasorr deaded that the correct mode of computation was Using electronic binary mans mm cs szspwvv m Electronic Numerical Integrator AC and Computer ENI nspwed bv Atanasu and EEW Eckenand Maucmv desmned andbum we 1mmameUnueysnmpennwwama Themst cump e e v mechumc upevalmnah uEnevaWumuse ana vuca ca cma u 7 3 72 Squave metevs 2mm e an 7 Read m 2n cams P21 mnme 7 Addmanmak mm vs DMsan 5 ms WWVZ mm ApplicationBammcca cu atmns ang e f acatmn tad wmd crass wmd aw eeneey temperature wetht w she DmDeHant mews eeeeemm x1 D crete Variable Automa In Computer EDVAC EN AC Spmmammmg W amwasextema Sequences ahnstm mns were exemled meepeneemw a he reams a he eememn e Human Menenler veewee mtake memans am avamef Ecke MauchW JuhnvunNeumannandmhevsdesmned EDVAC 15mm suwems pmmem e Saan v asme sfmedpmgram campufer e program can be manipulated as data rm man are 79907 on EDVACWaS pubhshed m moi mum had vun Neumann s swgnamvew e m 1 anme mun maneSpahs summed We hanm m mventmg Me mmpuferm Jam Atanasaquot mm cs szsvunv m n Stored Program Computer Program A sequence of mstrucuons How r commlmsrmmon Sequenangv manuacunfm m cu mms autumabc tantra exfemaWiperfape Harvard Mark w new zuseszWvz meeme mug board mm was yeeeunwmemury mm ma yeeewmememury sum 1967cmeen7 cs szsvunv m 4 Technology Issues ENIAC 13mm tubes 2 l sdiuit nurnbers ENIAC nao many asynchronous parallel UNIS EDVAC 4mm tubes 2mm Ward storage mercurv deiav hnes but only one was actlve ata tune BINAC tor reha tv DI b on t agr Two prooessors mat dwecked eaon omer H work well because processors never eeo csiszsvunv m Dominant Problem Reliability Mean trrne between tenures war MIT39s Whirlwind with an MTBF rsz rnrn was perhaps t able machine r the rnos r n Reasons tor unrenabmty 1 Vaeuurn Tubes 2 storage rnemurn aeoustre oeiav hnes a Wrmarn t b Seieeoons ercurv deiav hnes u es csiszsvunv m Com m ercial Activity 194852 IBM s SSEC follow on 39om Harvard Mark I Selective Sequence Electronic Calculator 7 in Word store 7 instructions constraints and tables at datawere read rrum paper tapes 7 es Tape reading statinnsl 7 Tapes could be gluedtugethertu term a luupl 7 Data could be output lri u n to he phase at computation and read ill the elt hase Er computation mama csuszspunwna And then there was IBM 701 l IBM 701 30 machines were sold in 1953 used CRTs as main memorv 72 tubes of 32x32b each IBM 650 a cheaper drum based m e more than 120 were sold in 1954 Why was IBM late getting into compute technology IBM was E making too much m ven Without computers y IBM revenues were doubling everv 4 to 5 veers in 4039s and 5039s mama csuszspunwna Computers In mid 50 s amvvave was expensue vaes WEYE smaH 1 nun Wums Navesdemsv em sunv ave Mammyaccessnmewasmm5mmsmwevmanmepmcessm owe e structer exeumanwue v asMaHv damnated mm memury reference we The sum Io aesrgn comaex commmums m Execute an mmcnunwasme 22mm desmn ouncem asuppuseum me speedumecudmu uvanALU upevauun ngvammev s ww mm macmne was msEpavame 1mm me ac ua hamwave Wmememauun mm cmst W x The IBM 650 19534 Mayhem Drum 1000 mm 3 M Acme Instruction mam decimal moudmg new Words mogvam comer mgmna may ALU accumulator From 660 Mama mew Programmer s View of the IBM 650 A drum madnme Wxth 44mstmc mons Instructan on 1234 mug Luad the contents m loeguon 1234 me me olsmbotlon put m also me me upperaccomolaton oweraccumulamr u tevu sud men go u loeaoon mug fur the next mstructmn Good programmers opomrzeo tne placement of rnstructrons on the drum to reduce latencyl mm 5 525 my 4 The Earliest Instruction Sets SInae Accumulator r A carrvrover from the calmletors L to x Ac M x M V AC ADD x AC AC Mgtlt sue x MUL x Inva ved a umlent re lster DIV x q q SHIFY LEFY nc 2 MAC stFr new me x pc x JGE x r Ac 2 u then pc x LOAD m x Ac mm address e dMX STORE m x Typlcally less than 2 dozen mstructlonsl cs szswnvnv a 9 mm 9 Single Accumulator Machine LOOP LOAD N JGE DONE non ONE srons N n LOAD n r2 mo 5 r3 srons c JUMP LOOP DONE m How to modify me aaaressesA B and c 7 mm cmst W a SelfModifying Code LOOP Lona N ms DONE m ONE STORE n LOAD A Eammmmfm F2 m a as MSW7W F3 WORE c msbuchan mm 1 1 mm mm 10 5 5pm 5 A LOP cmst m u Index Registers a Tom Krmum Mm aster unxvemnr mm 5039 One or more specialized registers to simplify address calculation Modxfy exxstxng xnstructxons xono x xx Ac Mgtlt xx non x xx nc Ac Mgtlt xx Add new mstrucuons to mampu ate Index registers JZx x xx xrxxu MEN Pc x else xx xx 1 Loan x xx xx Mgtlt truncatedta tlx Index registers have accumulatorsIke Chara crensm mm csxszsvxxnv m s Using Index Registers x x A LOOP JZx DON x Lnsm xx Lnsra xx sroRE Lnsrc xx JUMP LOOP Lnsrn ONE Program does not modify Itself Ef ciency has Improved dramabcally ops Iter m mdex re 5 W x x u xnstructmn fetch 52 q 7 1 averandfetch 2 xx stare 1 5 e x Instruwans are 1 m 2 hxts ijer Index rellxsters thh ALUnhke mrcmtrv Camdexmn 0 mm csxszsvxxnvnv Costs Operations on Index Registers To maement mdex regxster by x ncgtxgtlt newnstUctan xcemc x xx gtnc Hewmsbucha a so the AC must be saved and restaed It mav be better to xnoement IX dxrectw chx x xx xx xx x More xnstmctxons to mampu ate xndex regxster sToREx x xx Mgtlt gtxx Extended m m a We IX begins to look like an accumulator e a mdex regxsters evera accumuxators gt eneral Purpose Registers mm csxszspwwnv n Modes Evolut on of Addr s 1 Emma accummaxux absumte addvess o x 2 angxe accummaxux xmex xegxsxexe LOAD xxx 3 xnmxecxxun AD x e Mumme accummamvsx muex 1295121 mmxecxxun LOAD RJX X m LOAD Rxxx x 5 xnmxecxmmugmegxsxexs LOAD mm BTheWuvks LOAD RMR JRQ R xndex baseaddv mm csxszsvnnv m Variety of Instruct on Formats Two address formats the des unatmn s same as one ufthe uperand suurces Reg X REE s REE Reg xMemHaReg a e m xmnhespecmedmrec vmwaavegxsev e e e we address esemsnsn Wesme nc ude meme dredmrh Three address formats One des unatmn and up In Wu uperand suurces per msmetmn Reg XREQ a Ree Re m f w Reg xMemMaReg Re Re 39 MN mm cs szsvunv m n Formats More lnstruc 39 Zero address formats operands on a stack Ramsey and Mispr eMlsPl39Mlspr mm s MlMlsPll IHHI man s e Stack can be m veemevs mm memurv usuaHv mp o1 sack cached m veemevs O e address formats Accumu ator machmes e Accummamv s amavs umenmphcn upevand Many different formats are possible mm cs szsvunv m Data Formats and Memory Addresses Datafurm t a Wes Hawwums Wuvdsand duumewums Same SSUES We WWW Mis re teeeigiggmii am emquot I vs We Emquot m a eAddresses Word 5 g y39 i nmen Su puseme memurv is umamzed m azepnwums Can awum addvess beam NW at n or 8 7 mm cs szsvunv m Software Developments up to 1955 Libraries of numerwca routmes e F a my Bum were erranseenaemai functmns new mamvu atmn equatmn 5a ver5 195560 ngh level Languages e Furtran 1955 Operating Systems e enssemmers waders LmkerS Camv ers u mg Diagrams m kEED track af 1 n iges Madnnes reqmred experlenced operators e Mas us he 2W sa ware mm cs szsvunv m Compat y Problem at IBM 5v esrw 6039s IBM had 4 Incompatible Ines of computers 7 1 u a my 55 a 7u74 7n2 a man 14m a 7mm Each svstem had ts own Instructmn set IO svstem and Secnndarv Stars 2 busmess smem c rea tme 350 mm cs szsvunv m IBM 360 Design Premises Amam Blaauwami Brooks 1954 The desmn mus end Msemu growm and successor macmnes Geneva melhud mvcunnecung H0 dev s g S Machme mus1 be capame m supemsmg rIseYwnhum manuahmewemmn Bum hardware Yaw2 checkmg and ucatmu amsm veduce dawn me Swmmem assemmE W emswnh vedundam H0 dewces memunes etc 1m Yam tolerance SamemumemsvEuuwed uatmgpmmwmds amenhan 36m mm cs szswnv m 5 IBM 36 A Genera urpose Register GPR Machine Processor State ABGenevaLPumuseSZVbMR39Emsmvs maybe used asmdex amuse regrster r I has ecspmpemes e F ualmqumB rbnREm Is 7A PI vamS amsWum F39C Co m odesC omen cumams a neon address mats 167m ha rwums 32m wows won duumerwums m omen s ww Wigs aveE ms mom mm cs szsvunv m s IBM 360 Initial Implementations Model 30 V V V Model 70 torage SK 7 54 KB 255K 7 512 KB Dara arh Erbwt mot crrcurmelay an nsecIevew 5 nsecIevew Local smre Mam store Tran stDrREQIStErs Conrmlsmre Read onw Jusee Cnnventmna mrcmts IBM 350 rnstructron set arcnrtecture ISA competer nro tne underlying tecnnoogrce differences between various model Mrestone Tne rst true ISA designed as portable hardwarersoftware Interface 1137 minor moorncetrons rt strI surwves tooeyI mm cs szsvunv m s IBM 360 45 years later The zSeries 210 Microprocessor A 4 GHz m lEM E nm SOl CMOSlechnulugy Quad cure desgn Duswssus mruvdev superscalav cxsc p pelme Oulrulruvdev memuvy accesses Redundant dalapalh wym s mm mama w W Parallel ampqu m vstmscamvaved lrcache 128KB L1 lEIKenlvy Branch Target Butler m w bmvevlasuvvmlcammevmalwamaam Hardware m demmal llualmgrpuml amhmelm lmvmun m Mm mm EEE Mme March 2008 And in conclusion Computer Architecture gtgt ISAs and RTL C8152 is about interaction ofhardware and soltware and design of appropriate abstraction Ia ers Computer architecture is shaped by technology and applications 7 sttury pmvmes lessonsfurthe future Computer Science at the crossroads from sequential to parallel computin s Salvatmn reqmresmnuvatmn m many elds mcludmg computer archwtemur Thursday is Intro to Simicsquot section with Scott Read Chapter 1 then Appendix B for next time manna cs szsnnms 5 Acknowledgements These shdes Bantam matenax devempeu and upynght by mm DadeanevsuMUCE MW matena derwed frum uurse E 823 UCE matena denved frum uurse 05252 CS 152 Computer Architecture and Engineering Lecture 4 Pipelining Krs e Asanovic Electrical Engineering and Cumputer Sciences University Dr Callfurnla at Berkeley ttp ww eecs berkeley edIIkrste http insteecsberke1ey educsl 2 l Last time in Lecture 3 Microcoding became less attractive as gap between M speeds reduced Com x instruction sets dif cult to pipeline so dif cult to incre se perform nce s gate count grew Ironlaw explains architecture design space 7 Trade instructlunspmgram cyclesinstmctlun andumycycle s designed for ef cient plpellned implementations slmllart al mmqu e inspired by Earlier Cray machines l 2 2mm CSlSQSDHng us Iron Law of Processor Performance Cycles rogram Instruc Ion yc Instructions per program depends on source code compiler technology and ISA rogram e Cycls per instructions CPI depends upon the ISA and the microarchitecture Time per cycle depends upon h t e microarchitecture and the base technology Micruarchitecture Singlercycle unpipelined w w Pumiin mmm CSiSQSDHng us An Ideal Pipeline stage 4 All objects go through the same stages No sharing of resources between anv two stages Propagation delav through all pipeline stages is equal The scheouhhg or an object entering the pipeline t is not attecteo by the objects li i other s age ese conditions genera1y hold torinoustria assembly i s But can an instruction plpellne satisfy tne last condition mimm CSiSQSDHng us Pipelined MIPS To pipeline MIPS First build MIPS without pipelining with C 1 Next add pipeline registers to reduce cycle time while maintaining CPI1 mmm csiszspiine us 5 Pipelined Datapath fetch decade 81 Regnfetch execute memory 1 phase phase phase phase Clock period can be reduced by dlvldll ig tne execution of an instruction into multiple cvcles txMi tRri tALUi tow tRW tom MOUSEY However CPI WIl Increas unless Instructions are pipemed mmm CSlSQSDHng us 5 Technology Assumptions A small amount of verv fast memorv caches backed up by a large slower memorv Fast ALU at least for integers Multiported Register les slowerl Thus the following timing assumption is reasonable 5stage pipelined Harvard architecture will be n the focus of our detailed desig mmm CSlSQSDHng us 5Stage Pipelined Execution Memor Match Decode Reg Fetch Execute IF ID EX MA 6 t7 CSlSQSDHng us 5Stage Pipelined Execution R esource Usage Diagram IrFetch Decode Reg Fetch Execute Memory IF ID EX MA tme to t t2 3 4 t5 t6 t7 IF 12 13 i 15 ID i 12 13 i 15 9 EX i 12 13 i 15 g MA n 12 13 i 15 WE u i 13 1 2mm CSiSQSDHng us Pipelined Execution ALU Instructions Not qmte correct We need an Instructlon Reg IR for eacn stage mmm CSiSQSDHng us Pipelined MIPS Datapath withoutjumps F D Control points Need to Be Connected mmm CSlSQSDHng as u 39 How Instructions can Interact with each other in a pipeline An instruction in the pipeline may need a resource being used by another instruction in the pipeline 9 structural hazard An instruction may depend on something produced by an earlier instruction Dependence may e for a data value 9 data hazard Dependence may be for the next instruction39s address 9 control hazard branches exceptions mmm CSlSQSDHng as n Data Hazards r1 a 1390 10 men 17 r1 is stale Oops mmm cs1 525mg as u C5152 Administrivia PS 1 due Tuesday Feb 10 in class Section covering PS 1 on Wednesday Feb 11 e Ruummme TED First Quiz on Thursday Feb 12 e n 1ass c1usedrbuuk nu cumputers urca1cu1aturs 7 Owners Matures 175 tmsweews hectures Lecture 7 Tuesday Feb 17 in 320 Soda Lecture 8 Thursday Feb 19 back in 306 Soda See website for full schedule mmm cs1 525mg as u Resolving Data Hazards 1 Strategy 1 Wait for the result to be available by freezing earlier pipeline stages 9 interlocks mimm CS SQSDHHE us Feedback to Resolve Hazards mimm Laterstages prwide dependence infurmauuntu eamer stages Which can stsii omi instructions Controiiing a pipeiine ii i this manner Works provided tne Instruction etstege I1 can complete Witnout any Interference from Instructions In stages 1 to I otherwise deadiocks mav occur CS SQSDHHE us lnterlocks to resolve Data Hazards Stall Condition mm m MemDW 1er010 r4ltr1 17 2mm CSlSQSDHng as n Stalled Stages and Pipeline Bubbles tt4St6t7 t 3 w l 2 r4ltr117 152an m2 Egtlt2 MA2 WES 3 2 IF IF 1D3 Egtlt3 MA3 WE3 IF stalled stages tme t0 t1 t2 t3 t4 t5 t6 7 IF 1 12 13 I3 I 1 lD n 12 12 12 12 3 esource EX H nop nop Hop 12 I3 1 sage MA op op Hop 12 13 WE mop Hop NOD12 1 nap pipeline bubble mmm CSlSQSDHng as u Interlock Control Logic Comp steg lnstru mmm e source registers CSlSQSDHng us ere th of the lnstructlon W the decode e WltH the destination register of the uncommitted Cthl lS ignoring jumps amp Interlock Control Logic branches Should We alwavs stall lfthe rs h not everv ll l mmm structlon reads a reg CSlSQSDHng as eld matches ot everv lnstructlon ertes e reglster gtWe some rd7 lstel re i Source amp Destination Registers Retype lEI Irtype El Hype IEl source5 destlnatlon ALU rd e rs tuncrt rs rt rd ALUi rt 9 rs op irnrn rs rt LW rteMrsimm rs rt SW M rs imm e rt rs rt BZ can rs true pce PC irnrn rs false PC 9 PC 4 rs J PC 9 PC irnrn JAL r31lt PC PC 9 PC irnrn 31 JR pce rs rs JALR r31lt PC PC 9 rs rs 31 2mm CSiSQspHng us 1 Deriving the Stall Signal Cm Cm w Case opcode rel Case opcode ALU a rd ALU ALUi ALUi LW art r JAL JALR a R31 39 a on r 3 off We Case opcode ALU ALUi LW cm s o re2 Case opcode JAL JALR a on e a or a err Cmu Hazards due to Loads amp Stores Stall Condition Wm W r17 r35 v l m in mm Mr17 lt r2 Is there any possible data hazard r4 lt Mr35 In thIs Instruction sequence mmm cmst us 1 Load amp Store Hazards Mr17 e r2 r17 r35 a data hazard r4ltMi 5S However the hazard is avoided because our memory system completes writes in one cycle I LoadStore hazards are sometimes resolved in the pipeline and sometims in the memory system iiself More on this later in the course mmm CSiSQSDHng us 4 Resolving Data Hazards 2 Strategy 2 cute data as soon as possible after it is calculated to the earlier pipeline stage 9 bypass mmm cmst us 5 Bypassing tlme t0 t1 t2 393 t4 t5 t6 t7 warmrm u i In 12 r4ltr117 in 152132 lD2 lD2 Egtlt2 MA2 W82 l 1F3 1F3 1F3 1F3 1D3 Egtlt3 MA3 stalledsrages 1F in Ex lFS in5 3 1 15 Each stall or kill introduces a bubble in the pipeline gtCPI gt 1 can get the data from ut A new datapath ie a bypass the output of the ALU to its inp t t 1 mien91 12 r4ltr117 2 3 3 3 W5 1 IF in Ex MA4 WE l5 lFS 1D5 Egtlt5 MAS W35 5 CSlSQSDHng us Adding a Bypass 1 11am 0 r4ltr1 17 s mmm When does this bypass help 101 m H thW 1 r4ltr1 17 r31 17 no 21 391 1 r4 e e y 25152vaan US mmm The Bypass Signal Deriving it from the Stall Signal sbaH ASrc 9st WeE relD No because from th1s by 5th weE mto two components Werbvpass WerstaH rsD WSM WeM rsD WSW Wew relD no WSE weE no WSM WeM 019 WSW Wew reZD W Case opcode We Case opcode ALU ALU LW Ws s 0 on ALU LW art JAL JALR JAL JALR c1131 Is thws correct7 om ALU and AM mstmchons can bene t pass 25152vaan as Bypass and Stall Signals 5th weE mto two components Werbvpass WerstaH WerstaHE Case opcodeE LW 3 W s o a on AL JALR Asrc rsD WsE w sbaH rsD WSE WerstaHE rsrwsM WeM rsDWsW Wew mo Hm WsE weE m WsM WeM no WSW Wew reZD mmm Cs SQSvnng us a Fully Bypassed Datapath suH pcmm Is there still a need for the stall Signal staH WE W590 relD WE wszso re2D mmm Cs SQSvnng us an Resolving Data Hazards 3 Strategy 3 Speculate on the dependence Two cases Guessed correctly 9 do nothing Guessed incorrectly 9 kill and rstart mmm CSlSQSDHng us Next Time Control Hazards BranchesJum 5 ExceptionsInterrupts mmm CSlSQSDHng us Acknowledgements These slides contain material developed and 39 by 7 ANin MT 7 David Pattersun UCB MIT malerial derived from course 6823 UCB malerial derived from course C8252 mmm Csl525wmg us CS 152 Computer Architecture and Engineering Lecture 7 Memory Hierarchyll Krste Asanovic Electrlcal Englneerlng and Cumputer Sclences University cur Callrdrnla at Berkeley ww eecs berkeley edIIkrste inst eecs berkele39y educsl 2 http htt l Last time in Lecture 6 Dynamic RAM DRAM is main form ofmain memory storage in use toda e Huldsyalues an small capaclturs need refresnlrlg nence dynamlc e slew multlrstep access precnarge read rdw read ulumn Static RAM SRAM is faster but more expensive 7 Usedtu pulld Dnrchlp memcwyrdrcacnes Caches exploit two forms of predictability in memory reference e ms 7 Tempur l lucallty same lucatlurl llKElytu pe accessed agalrl seen 7 Spatlal lucallty nelgnpdnng lucatlurl llKElyt pe accessed sun Cache holds small set ofvalues in fast mem SRAM close to processor 7 Needtu deyeldp search scneme mnnd values ln cacne and replacement pullcy in make space far newly accessed lucatluns 2mm CSlSQSDllng us Placement Policy nnnnnzz amckNumber mmsmgmmmsu Memory setNumber u 1 2 3 uxzcussv came mm mmmm Fqu 27Wav Set Dwrect Assocwatwe Assocwatwe pped mock 12 anvwhere anvwhere m onw mto can be Wed set 0 Neck 4 12 mod4 12 mod 8 2mm CS SQSDHHE us DirectMapped Cache B Dck Data B ock Data Word or the CS SQSDHHE us 2Way SetAssociative Cache 2mm CS SQSDHHE us Data Word or the Fully Associative Cache CS SQSDHHE us Replacement Policy In an associative cache which block from a set should be evicted when the set becomes full 0 Random Least Recentiv Used LRU LRU cache state rnust be u dated on every access true irnpiernentauon only reasibie rorsrnaii sea away pseudoeLRu binary tree orten used ror Ava Way First in First Cut HFO a k a RoundeRobin used in highly associauve caches Not Least Recendy Used NLRU r0 With exception ror rnost recendy used biocilt or biocilts This is a secondeorder effect Replacement only happens on misses imm csiszspiing us 7 Block Size and Spatial Locality Block is unit of transfer between tne cacne and rnernorv 4W gii2gtl cki Spiitcpu block address address ab bits b bi 2b block size ailta iine size in bvtes Larger block size has distinct nardware advantages iess tag overhead exploit rast burst transrers rrorn DRAM exploit rast burst transrers over Wide busses What are the disadvantages ofihcreasihg block size Fewer blocks gt more conflicts Can Waste bandwidth am CSiSQSpiing us I nte raction CPUCache i ne 5stage p p Ta Memarv Cantra Cache mu Data fram Lawer Leve s m Memaw Hierath 2mm CSiSQSDHng us Improving Cache Performance Average memory access time i time Miss rate x Miss penalty To improve performance 0 reduce the hit time 0 reduce the miss rate 0 reduce the miss penalty What is the simplest design strategy Biggest cache that doesn t increase hit time past 12 cycles a rox 8 2 B in modem technoo desygn Issues more comaex wym outrofrorder superscaar processors 2mm CSiSQSDHng us SerialversusParallel Cache and Memory access a is HlT RATlO Fractlun Elf references in cache 17 a is MlSS RATlO Remaining references Addr Addr Mall l Processor CACHE Memory Data Data Average access tirne for serial search tam 1 o tmgm Addr Mall l Processor CACHE Memory Data Data Average access tirne for parallel search a tcathe 1 o tmEm Savll lgs are usuallv srnall mgmgtgt tame hit ratio a high High bandWidth required for memorv pat Complexltv of handling parallel paths can slow tim 2mm csiszspiha De Causes for Cache Misses Compulsory firstareference to a block allta cold start misses a misses that would occur even with in nite cache Capaclty cache is mo small to hold all data needed b the re ram 7 misses that would occur even under pertect replacement pollcv Con ict misses that occur because of oollisions due no blockaplacement strat gy ld e a rnisses that Wou not occur With full assoaatlvltv 2mm csiszspiha as n Effect of Cache Parameters on Performance 0 Larger cache size reduces capacity and conflict misses hit time will increase 0 Higher associativity reduces conflict misses may incr se hit time 0 Larger block size reduces compulsory and capacity reloa increases conflict misses and miss pena 2mm CSiSQSDHng as d misses lty Write Policy Choices Cache hit n39te through wnte both eaene e memory gtgt generally ntgnertrarn but simplifies caene eunerenee write back wnte eaene uni memory is wntten only wnen tne entry is evicted gtgt a may bitperbluck anmrtner reduce the tram Cache miss 7 no write allocate uniy wnte to main memory 7 write allocate aka fetch on wnte teten into eaene Common combinations 7 wnte tnreugn and ne wnte allocate e wnte back With wnte allocate mam CsiSQSDHng us Write Performance Data Word or Byte CSlSQSDHng us Reducing Write Hit Time Problem Writes take two cycles in memory stage one cycle for tag check plus one cycle for data write if hit Solution s Design data RAM that can perform read and write in one cycle restore old value afiertag miss Fullyassociative CAM Tag caches Word line only enabled if hit Pipelined writes Hold write data for store in single buffer ahead of cache write cache data during next store39s tag check 2mm CSlSQSDHng us C5152 Administrivia 2mm CS SQSDHHE us Pipelining Cache Writes Load Data to CPU Write Buffer to Reduce Read Miss Penalty Evlcted dlrty llnes ror Wnteback cacne OR All ertes ll l wntetnrd cacne Processor is not stalled on writes and read misses can go ahead ofwrite to main memory Problem erte nmermay nuld updated value at lucallun needed by a read mlss slmple scheme an a read miss wallrurln Write ndnerm u emp Faster scheme CheEKerte buffer addresses agalrlst read miss addresses if rlEI match allm read miss tn gm ahead Elf Writes else return value lrl Write bl er mm Cslstvllng us 9 Blocklevel Optimizations Tags are too large ie too much over l ple suldllun Largerblueks but miss penalty cudld be large Subblock placement aka sector che e A Valid bit added in dnlls srnallerlnan full black called subrblueks 7 Only read a subrbuek miss 7 lra lag matches is ms word lri lne cache7 2mm CSlSQSDllng us SetAssociative RAMTag Cache tag Status one tag slams on Nut Energyrefflclent eAtag aha aa1a Ward l5 read frum every Way Tvvurphase appmach rFlrst readtags thenlust read datafrum selected Way eMure Energyremclent 5 eDuunles la1ehcy lh Ll 704 M L2 and abuve WW7 2mm CSlSQSDHng as u Multilevel Caches A mEmDryEannDth large and fast lncreaslngslzesufcacheateachlevel a Local mlSS rate mlSSeS ll l cache accesses to cadwe Global mlSS rate mlsses ll l cache CPU memorv accesses lVllsses per ll lStl uCthl l mlsses ll l cache number oflhstructlohs 2mm CSlSQSDHng us a A Typical Memory Hierarchy c2008 Splitmstruction ampdata Muinpie mterleaved primarv cacnes memorv banks onrchip SRAM offrdwip DRAM Multiported register le Large uni ed secondarv cacne onrchi part or cpu 3 SRAM 2mm CSlSQSDHng us 1 Presence of L2 influences L1 design Use smaller L1 ifthere is also L2 7 Trade increased L1 miss rate furreduced L1 hittime and reduced L1 miss penalty 7 Reduces average access Energy Use simpler writethrough L1 with onchip L2 7 Writerback L2 cacne absurbs Write traffic duesn t dc crrcnrp 7 must une L1 miss requesi per L1 access nu dirty victim wnte back simplifies pipeline cuntrul 7 Sim lifies ccnerence issues 7 Simplifies Errurrecuvery in L1an usejustparity bits in L1 and maimed frum L2 Wnen parity errcr detected an L1 read 2mm CSlSQSDHng us 4 Inclusion Policy Inclusive multilevel cache 7 ulds EDplES ufdata m uuter cache 7 External access need unlycheck DutEr cache M st mmuncase Exclusive multilevel caches 7 lnnercache may huld data mm m uuter cache 7 Swap llnes between lnnernuter caches an mlss 7 Use m AMD Athlun Wlm 64KB pnmary and 256KB secundary ca 2 Why choose one type or the other man CSlS25DHng us ltanium2 OnChip Caches IntelHP 2002 Level 1 16KB 4way s a 643 line quadport 2 load2 store single cycle latency gt Level 2 256KB 4way sa 1283 line quadFort 4 load or4 store Ive cycle latency Level 3 3MB 12way sa 1283 line Single 323 port twelve cycle latency Acknowledgements These slides contain material developed and copyright by 7 ANin MT 7 David Pattersun UCB MIT material derived from course 6823 UCB material derived from course C8252 2mm Csl525wing us


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Jennifer McGill UCSF Med School

"Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.