Intro MachineOrganization w
Intro MachineOrganization w CS 240
Popular in Course
verified elite notetaker
Popular in ComputerScienence
This 126 page Class Notes was uploaded by Jordon Hermiston on Thursday October 29, 2015. The Class Notes belongs to CS 240 at Wellesley College taught by Staff in Fall. Since its upload, it has received 82 views. For similar materials see /class/230937/cs-240-wellesley-college in ComputerScienence at Wellesley College.
Reviews for Intro MachineOrganization w
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/29/15
Talking To compu rers Program compila rion linking and loading 65240 Computer Organization Department of Compu ler Science Wellesley College A Transla rion hierarchy for C aSSembly code aSSembler object code execuTable Program Transla rion library rou fines 062 Compiler structure Source Syntactic Program Tokens Structure Semantic Scanner Parser Roufines c haracter swear Intermediate boland Attribute Tables used by all phases of the compiler Representation Intermediate Representation Code Generator Target code Program translation 063 Source Program Tokens LeXIcol tokens 61huracfer stream Type Program Structure Token Name begin BEGIN Keywords and END write WRITE read REA D Identifier Integer variable ID Literals Integer constants INTLIT 39 ASSIGNOP Operators PLUSOP MINUSOP SEMICOLON Punctuation LPAREN RPAREN I COMMA 064 Program translation Program Tokens Nontokens 6mm stream Type Program Structure Comments try again Preprocessor includeltstdio hgt directives define NUMS 5 6 macros NUMS blanks tabs and newlines Program translation 065 Parsing o The parser groups Tokens into units specified by the production rules of a context free grammar Source Syntactic Program Structure character am S mbol and Attribute Tables Program translation 066 C I am fluenT in over six million forms 0 The inpuT To The C of communicaTion compiler is a sTring of ASCII characTers Before doing anyThing else The compiler musT deTermine The synTacTic sTrucTure of This sTring 0 Program TranslaTion 067 SynTacTic Tokens STrLIcTLIre WM x d a x C a b c d b a b c i d a d b c b C Which one is The righT one Program TranslaTion 068 A simple ar39ifhmefic grammar E EFEFF F F6F66 5 6H H CVE a o1 I 2 aIbIc Program Translafion 069 Parsing a b c 0 Empty sfuck marker Program Trans Iafi on 0610 Scanning a variable A poinfer is pushed Program Translafion 0611 Scan Look at buf don39f scan past Check operand under To of sfack Since L has lower precedence Than we scan and push Program Trans Iafi on 0612 Scan b No problems here Program Translafion 0613 Scan Look af but don39f scan past we scan and pus Program Translafion Check operand under Top of sfack Since has lower precedence Than 0614 Scan c Program Trans Iafi on 0615 Scan Second Look af but don39f scan past Check operand under Top of stack Since has higher precedence Than we re uce Program Trans Iafi on 0616 Reduce using r39ule F F 6 allocate storage pop left operan pop operator pop right allocated node Program translation 0617 And repeat Still looking at Check operand under top of stack Since has equal precedence with they are the same symbol we reduce Program translation 0618 Affer r educTion Egt E F Program Translafion 0619 STiII Scanning Second Check operand under Top of sfuck Since L has lower precedence Than we scan and push Program Translafion 06 20 AfTer pushing and d BLIT now we39ve reached The end of The expression We redLIce like crazy Program Trans IaTi on 0621 AfTer pushing and d BLIT now we39ve reached The end of The expression We reduce like crazy Program Trans IaTi on 06 22 Generating code Source Syntactic Program Tokens Structure Semantic 4 Scanner H Parser l D R outInes character Shear Intermediate Representation boland Attribute Tables Program translation 0623 Syntactic Structure Semantic Generating OSSembly code 0 Perform a postorder ImermEdiale Representation traversal of the tree generating assembly along lthe wag sf w to a p lw tl bfp d lw t2 cfp mul t3 tl t2 5 add t4 t0 t3 b c lw t5 dfp add t6 t4 t5 And just where did mul come from Program translation 06 24 Coming attractions Code optimization Source Syntactic Program Tokens Structure Semantic Scanner Parser Roufines character stream Intermediate Representation boland Attribute Tables Imermemule Representation Code Generator Target code used by all phases of the compiler Program translation 06 25 The OSSembler39 assembly code Program translation 06 26 Look familiar o A scanner is sTill needed To group ASCII inTo meaningful uni Ts Assembly ObjecT code Tokens file code Scanner generaTion characTer sTream Used by The assembler To Symbol Table keep Track of addresses corresponding To labels 0 However since assembly is The symbol form of whaT The machine undersTands code generaTion is much easier Program TranslaTion 0627 For your convenience o Assemblers accepT numbers in a varieTy of bases 0 They also TreaT common variaTions of machine language as if They were insTrucTions For example move t0 tl is TranslaTed as add t0 zero tl 0 There is no blT branch on less Than insTrucTion IT is TranslaTed as Program TranslaTion 0628 14 Anatomy of an object file Object file header describes size and pieces of file Text segment contains machine d Static data contains long lived data Relocation information identifies instructions an data hat depend on absolute addresses Symbol table contains remainin labels such as external references Debugging information concerning w modules were compile Program translation 06 29 The linker assembly code linker executable library routines Program translation 0630 Linking in Three sTeps 1 Place code and daTa modules symbolically in memory 2 DeTermine The addresses of daTa and insTrucTion labels 3 PaTch boTh The inTernal and exTernal references Program TranslaTion 0631 Linking SchemaTic Object file ExecuTable file main main jal call prian jal prian prinT 39 C library RelocaTion info Program Trans IaTi on 0632 Exploi ring memory hierarchy Cache basics C5240 Computer Organization Deparfmen l of CompuTer Science Wellesley College Basic s rruc rure of memory hierarchy Speed Size CosT bis Technology lt ns 32 high CPU 055 n3 KBMB 4K10K SRAM Cache per GB 5070 ns MBGB Mam 100200 DRAM memory per GB 5M10M ns GBTB 502 Magne lic Ma n ric g39j per GB disk Cache basics 202 Copying blocks of informaTion o A memory hierarchy can consisT of many levels buT daTa is copied beTween only Two adjacenT levels aT a Time 0 We focus on jusT Two levels The upper The one closer To The processor and The lower Processor DaTa block is Transferred Cache basics 203 BaTTing average a If The daTa requesTed by The processor appears in The some block in The upper level iT39s a hiT OTherwise iT39s a miss The hiT raTio is The fracTion hiTs To ToTal requesTs HiT Time is The Time To access The upper level of memory Miss penalTy is The Time To replace a block in The upper level wiTh The corresponding block from The lower level lus The Time To deliver This lock 0 o 0 Processor DaTa block is Transferred Cache basics 204 DirecTmapped cache Cache basics 205 Tags and valida rion sTicker39s MIPS words are aligned To mul riples of 4 byTes so The leas r significanT 2 biTs of every address are ignored Cache basics 206 Miss raTe versus block size 0 Larger blocks exploiT spaTial localiTy To lower The miss raTes 0 However The miss raTe may increase if The block size becomes significanT fracTional of The cache size Why 0 Increased block size also increases The miss Black size bytes 1 KB penalTy 39m 9256 KB Cache basics 20 7 Handling cache misses 1 Send The original PC value currenT PC 4 To The memory 2 InsTrucT main memory To perform a read and waiT for The memory To compleTe iTs access 3 WriTe The cache puTTing The daTa from memory in The daTa porTion wriTin The upper biTs of The a dress inTo The Tag field and Turning The valid biT on 4 ResTarT The insTrucTion execuTion aT The firsT sTep which will refeTch The insTrucTion Cache basics 20 8 WhaT39s wr39ong wiTh This picTur e A sTore instruction Processor wriTes daTa inTo The cache Cache basics 209 Wr iTeThr ough wriTes daTa inTo The cache and memory aT The same Cache basics 2010 and intoa 39w39r39ite buffer A store instruction Processor writes data into the cache main memory Cache basics 2011 Writeback instruction Processor writes data ONLYinto the cache The modified block is written into main memory only when It Is r eplaced Cache basics 2012 An example Infrinsi ry Fas rVlATH o MIPS ar chi rec rur e wi rh a 12s rage pipeline and a simple cache implemen ra rion 0 To pipeline wi rhou r s ralling separ a re ins rr uc rion and da ra caches are used 0 Each cache is 16 KB or 4K wor ds wi rh 16wor d blocks Cache basics 2013 256 block cache wi rh 16 words per block Addvess shuwmg bit pusmuns Cache basics 2014 ApproximaTe insTr39ucTion and daTa miSSes for SPECZOOO benchmarks InsTr ucTion DaTa EffecTive miss r39aTe miss r39aTe combined miss r39aTe 04 114 32 This isn39T The whole sTory The ulTimaTe measure is The effecT of memory on program execuTion Time More soon Cache basics 2015 Increasing physical or39 logical memor39y widTh a Onerv mdrwde memaw mgammlmn Assume four word cache block 1 bus cycle To send address 15 bus cycles for each DRAM access iniTiaTe 1 bus cycle To send word Cache basics 2016 RepresenTing numbers dec oc r hex and friends 65240 Computer Organization Depar rmen r of Compu rer Science Wellesley College Our mu rual friend 0 Conversion To and from hex is Too BAle 10 o Oc ral is also usefulquot 0 1 2 3 0 Why do machine language programmers confuse Halloween wi rh Chris rmas And The same conversion Trick works be rween binary and oc ral Why Numbers 92 Frankly oTher conversions are no where as nice 0 Given a number num we find dquot d d1 do so Thaf numdrquotd1rquot391md1r1dor o The algori rhm is simple bu139 redious for i lt 0 to n do dl lt num mod I num lt num div r num dV r d1r1 d0 r 7 num div r r num mod r Repeat Numbers 93 AriThmeTic is The same everywhere Addi rion Subfracfion 100112 CDB16 001012 c3E16 Mulfiplica rion Division 1345 x 275 112 10001g Numbers 94 RepreSenTing a range of numbers 0 Whaf range of numbers can be represenfed using an nbi r binary number Numbers 95 Signed magni iude repreSen raTion 0 Using n bi139s we could represenf The na rural numbers up o 2quot o 1 2 3 4 5 2quot1 l i i i i i i 000 o 111 1 o or we could reserve one bi139 usually The msb To represen r a number39s sign 2n11 2 1 0 l 2 2 391 l i i i l i i i sign bit 1111 Slgl39l ll Q111 negative Posmve Numbers 96 Some problems with signed magnitude 0 Additional circuitry is needed to add positive numbers to numbers 0 1 1 0 1 represents 1101 1 1 1 0 1 represents 1101 0 And how exactly does one represent zero And why is this a problem Numbers 97 Click and Clack ride again 0 An automobile odometer D31 provides an elementary solution provided it runs backwards as well as forwards number line 500 3 2 1 o 1 2 3 499 number line 997 998 999 000 001 002 003 499 l l l l l l l l l l l l Rule for taking the negative of a number Subtract number from 999 to avoid borrowing then a Numbers 98 Digital Tappet Brotheres 0 We build a binary odometer D31 0 The number line works exactly the same as before number line number line 100 101 110 111 000 001 010 011 Rule for taking the negative of a number Subtract number from 111 to avoid borrowing then a Numbers 99 Addition works perfectly 0 We retain our handydandy 3bit binary odometer D11 1 3 2 l l 2 4 3 And there is no need for a separate circuit to do subtraction But overflow remains a clear and present danger Numbers 910 Defecfing overflow 0 Some language C for example ignore infeger overflow thers require fhaf The program be nofified o MIPS defecfs overflow wi rh an excep rion o The excepfion program counfer EPC confains The address of The offending insfrucfion Numbers 911 00001000 Mulfiplicafion hardware 00010000 0010 0000 o The multiplication 010 00quotquot algorifhm we learned in Mul plicand grammar school leads To skiff le 1001 a simple if somewhaf 64 ms 838 inefficienf design 0001 1000 Mulfiplicand M H I x 1001 Mul riplier 64 AL 39p le39quot Shlff right 1 0 0 0 0000 v 0 0 0 0 Producf l O O O Wrife Co nfrol 1001000 Producf I 64 bifs 0000 0000 00001000 00001000 0000 1000 Numbers 912 0100 1000 RepresenTing reals 0 Numbers wiTh fracTions 69 314159265 7c 0 Very small numbers 69 000000001 10 x 10399 0 Very large numbers 69 3155760000 315576 x 109 Numbers 917 Binary fracTions 0 W6 can also show binary numbers in normalized scienTific noTaTion using The noTaTion 1xxxx2 x ZYYYY 0 We now call The quot symbol The binary poinT We also call This formaT floaTin9 poinTquot because The binary poinT is noT fixed Numbers 918 Life is noT also so easy a WhaT if The fracTional parT is noT an exacT power of Two We will need To approximaTe 274 00012 2062510 25 000012 0312510 26 0000012 01562510 27 00000012 007812510 2E 000000012 003906310 279 0000000012 001953110 2710 00000000012 000976610 eTc 0 Then sum To The degree of precision necessary For example whaT does 310 Numbers 919 FIoaTingpoinT numbers 1S399n X fr39OC OI39I X Zexponem o A fixed word size means ThaT There is a Tradeoff beTween accuracy and range a more biTs for fracTion gives more precision of fracTion a more biTs for exponenT increases range of numbers ThaT can be represenTed Numbers 9 20 IEEE 754 floatingpoint standard a Single precision 313029282726252A2322212019181716151A13121110987 5 5 A 3 2 1 o l 5 exponent fraction 1 bit 8 bits 23 bits 0 Double precIsIon 313029282726252423222120191817161514131211109876543210 l 5 exponent fraction 1 bit 11 bits 20 bits 313029282726252A2322212019181716151A13121110987 5 5 A 3 2 1 o l fraction continued 32 bits Numbers 921 Water over and under the dam o Overflow is now the case w ere the ositive exponent is too large to be represented in the 0 But now there is also underflow ie the case where the negative exponent is too lar e to fit in the exponent field In this case t e fractional part has ecome so small that it cannot be represented Numbers 922 Some unpleasanT deTails a o o Placing The exponenT before The fracTion simplifies sorTing of floaTing poinT numbers using inTeger comparison insTrucTions BuT whaT abouT negaTive exponenTs They look big because The leading ding is all ExponenT is biased To make sorTing easier The bias is subTracTed from The normal unsigned value To deTermine The real value Bias 127 for single precision and 1023 for double precision 1S 9quot x 1 fracTion x ZBXPW39quot b as Nimbers 923 My kingdom for an example ConverT 7510 To The IEEE FloaTingPoinT STandard using single precision 1 Now do The same for doublepoinT precision Nimbers 924 Forms of Address MIPS addressing modes 65240 Computer Organization Department of Computer Science Wellesley College i gt 0 Originally computers crunched numbers There is no reason for any individual to have a computer in his homequot Ken OISen DEC Founder Addressing modes 52 American S randard Code for Informa rion In rerchange ASCII Char ASCII Char ASCII Char ASCII Char ASCII Char ASCII Char O Null 32 space 48 O 64 96 112 p 1 33 I 49 1 65 A 97 a 113 q 2 34 50 2 66 B 98 b 114 r 3 35 51 3 67 C 99 c 115 S 4 EOT 36 52 4 68 D 100 d 116 139 5 37 7 53 5 69 E 101 e 117 u 6 ACK 38 amp 54 6 70 F 102 f 118 V 7 39 55 7 71 G 103 g 119 W 8 kap 40 56 8 72 H 104 h 120 X 9 Tab 41 57 9 73 I 105 i 121 y 10 LF 42 58 74 J 106 122 z 11 43 59 75 K 107 k 123 12 FF 44 60 lt 76 L 108 I 124 I 15 47 63 79 O 111 o 127 DEL Addressing modes 53 Charac rers and sTrings o MIPS provides ins rruc rions To load and sTore byTes To andfrominemory lb t0 0Sp sb t0 0gp 0 There are several ways of combining characTers inTo s rrings 1 s rring leng rh in firs r posi rion 2 s rring leng rh in an accompanying variable 3 end of s rring charac rer in las r posi rion Addressing modes 54 Compiling a C sTr39ing copy void strcpy char x char y t3 int i 50 i 0 While xi Yi 1 39 513 i 1 a0quot a1quot Addressing modes 65 STar39T by saving sO 5 void strcpy char x char y t3 int i 50 i 0 while xi yi o39 i 1 513quot strcpy addi sp sp 4 sw 50 0sp a0quot a1 Addressing modes 56 IniTialize i To 0 122 void 5trcpy char x char y t3 int i 50 i 0 while xi yi o39 i 1 513 strcpy addi 5p 5p 4 5w 50 05p sac add 50 zero zero 5amp1 Addressing modes 57 Begin The loop m stz CECE void 5trcpy char x char y t3 j lt sso Illlllll i 0 while xi yi 39 o39 i 1 5pquot strcpy addi 5p 5p 4 5w 50 05p sac add 50 zero zero Ll add tl 50 al a1quot 58 Addressing modes Load yi in t2 121 t2 void 5trcpy char x char y t3 int i 50 i 0 while xi yi o39 i 1 513 strcpy addi 5p 5p 4 5w 50 05p sac add 50 zero zero Ll add tl 50 al lb t2 0tl 5amp1 Addressing modes 59 STor39e byTe in xi 5 t2 void 5trcpy char x char y t3 int i 50 i 0 while xi yi 1 o39 i 1 5pquot strcpy addi 5p 5p 4 5w 50 05p sac add 50 zero zero Ll add tl 50 al lb t2 0tl add t3 50 a0 a1quot 5b t2 0t3 510 Addressing modes Are we done void 5trcpy char x char y t3 int i 50 i 0 while xi yi o39 i 1 513 strcpy addi 5p 5p 4 5w 50 05p sac add 50 zero zero Ll add tl 50 al lb t2 0tl add t3 50 a0 5611 5b t2 0t3 beq t2 zero L2 Addressing modes 511 No Then i amp loop 5 t2 strcpy t3 addi 5p 5p 4 50 5w 50 05p add 50 zero zero Ll add tl 50 al 559quot lb t2 0tl add t3 50 a0 5b t2 0t3 a0quot beq t2 2ero L2 addi 50 50 1 a1quot Addressing modes 512 Yes Then resTor39e 30 u strcpy 123 addi 5p 5p 4 5w 50 05p add 50 zero zero L1 add t1 50 a1 lb t2 o tl add t3 50 a0 5b t2 o t3 beq t2 zero L2 addi 50 50 1 j L1 L2 1w 50 05p addi 5p 5p 4 jr ra Addressing modes 513 add t0 51 52 insiruciion operation second register sniff amouni source operand coming soon 3 i 000000 10001 10010 01000 00000 100000 6 biTs 5 biTs stiregkier source operand 5 biTs 5 biTs 5 biTs 6 biTs register desiinaiion funciion code rand Addressing modes 514 addi t0 51 1024 op rs rt constant or address 6 biTs 5 biTs 5 biTs 16 biTs sTill 32 biTs Addressing modes 515 Wha r abouT larger consfam s lui t0 255 t0 is register 8 001111 00000 01000 0000 0000 1111 1111 Transfers 16bif immediufe consfunf reglster 8 l 0000 0000 1111 1111 l 0000 0000 0000 0000 fills lower 16 bifs will zeros Bu r whaT abou r The lower 16 bi rs Addressing modes 516 OR immediaTe To The rescue lui t0 255 t0 is register 8 001111 00000 01000 0000 0000 1111 1111 Transfers 16biT immediaTe consTanT reglster 8 0000 0000 1111 1111 0000 0000 0000 0000 OR of These Two values goes ori Sto Sto 5 inTo lower 16biTs of T0 I I 001101 01000 01000 0000 0000 0000 0101 Addressing modes 517 HardwareSofTware inTerface o The MIPS assembler Takes care of breaking large consTanTs inTo pieces and Then reassembling Them i nTo a regisTer o A special regisTer aT is reserved for This Task Addressing modes 518 JType insTr39uc rion forma r j 1025 goto location 1025 000010 00 0000 0000 0000 0010 0000 0001 2 6 biTs 26 biTs jump address Memory jump des rina rion insTrucTion Pro ram Coun rer PC 26bit desiinufion is concufenafed with upper 4 bit of PC Addressing modes 519 CondiTional branches are more limiTed bne s0 sl address 000101 10000 10001 address 6 biTs 5 biTs 5 biTs 16 biTs still 32 biTs Addressing modes 520 ElemenTary my dear WaTson o CondiTionaI branches are 0 found in quotloopsquot and in if sTaTernenTs so They Tend To ranch To a nearby locaTion Since The PC conTains The address of The currenT ocaTion we can branch To wiThin 215 words of The currenT locaTion using PC reaTive addressing Addressing mudes 5721 PCr elaTive addressing bne tO 55 Exit Program CounTer Memory branch desTinaTion insTrucTion Addressing mudes 5722 Compare wiTh basa addressing bne t0 55Exit m Memory branch des rina rion insTrLIcTion Program Coun rer lw t0 12tl m Memory word or by re operand Addressing modes 523 Branching offSeTs in machine language while savei k i 1 address op r5 rt rd shamt funct Loop 511 t1 53 2 80000 0 0 19 9 2 0 add t1 t1 56 80004 0 9 22 9 0 32 1w t0 0t1 80008 35 9 8 0 bne t0 55 Exit 80012 5 8 21 2 addi 53 53 1 80016 8 19 19 1 j Loop 80020 2 20000 Exit address or offset Addressing modes 524 This week39s puzzler 0 Suppose we are given a brunch on register 50 being eqqu to register s1 7a m E n much greater branching distance Addressing mudes 525 The five MIPS addressing modes 0 It is sometimes necessary when reading a core dump for example to reverse engineer machine language into assembly 00af8020 Addressing mudes 526 Decisions Shif ring and branching 65240 Computer Organization n Depar rmen r of Compu rer Science p Wellesley College Programmer39s cheer39 Shiff To The lef r Shiff To The righ r Pop up Push down By re by re byfe Decisions 32 Logical oper a ons or immediufe 5 srl add add 515253 5152 53 Arithmefic subfr acf sub 515253 5152 53 add immediufe addi 5152100 5152 100 and and 515253 5152 amp 53 or or 515253 5152 I 53 nor nor 515253 5152 I 53 Logical and immediufe andi 5152100 5152 amp 100 ori 5152100 5 515210 srl 515210 5152 I 100 5152 10 5152 gtgt 10 Dafu Transfer loud word sfore word Iw 5110052 sw5110052 51Mem52100 Mem5210051 Decisions 33 sll 2 30 4 op rs rt rd shamt funct I 000000 I 00000 I 10000 I 01010 I 00100 I 00000 I 6 biTs 5 biTs 5 biTs 5 biTs 5 biTs 6 biTs Rfype for regisfer or Rformaf Decisions 34 andi 12 50 155 0 AND applied can be used To mask or39 conceal cer39Tain biT paT rer ns by forcing 0s 0 For example suppose s0 held The biT pa r rer39n 72 A8 88 AD 0 Similarly The OR oper39a rion may be used for39ce 1s And there was mounting in hot haste the steed The mustering squadron and the cluttering car And swiftly forming in the ranks of war And deep the thunder peal on peal afar And near the beat of the alarming drum Roused up the soldier ere the morning star While thronged the citizens with terror dumb 0r whispering with white lips the foe they come they come war Decisions 35 andi T2 50 15 op rs rt constant 001010 10000 01010 0000 0000 0000 1111 6 bi rs 5 bi rs 5 bi rs 16 bi rs I Type for immedia re or39 I for39ma r Decisions 36 Alas no NOT 0 In keeping wi rh The Twooperand forma r The designers of MIPS decided agains r a NOT opera rion o A NOR opera rion is offered in consola rion nor t0 tl t2 t0 tl t3 How does This help Decisions 37 Condifional branches 0 MIPS includes rwo decisionmaking ins rruc rions similar To The infamous if wi rh go roquot 0 Branch if equal beq registerl registerZ label 0 Branch if no r equal beq registerl registerZ label Decisions 38 Compiling ifTheneISe clau3es o Lef39s Try an easy one if i j fgh else f g h 0 We assume variable f roj correspond To regis rers Exit sO ro s4 Decisions 39 Compiling while loops Loop39 calc addr of savei ld to tl 0 One only sligh rly harder while savei k i 1 0 We assume Thai i and k tl k correspond To regis rers s3 and r5 and The base address of save is in s6 Decisions 310 Checking for39 inequalify How abou r while i lt k i1 0 We assume Thaf i and k correspond To regis rer s s3 and r 5 0 Yes Virgina There is a slfi Loop slt t0ik Decisions 311 Tinker An 05 example 65240 CompuTer OrganizaTion DeparTmenT of CompuTer Science Wellesley College Long ago in a place noT far away 0 Before The compuTer science deparTmenT before This room was builT before you were born 0 There was ExTradeparTmenTal 230 OS Tink 232 A Tinker roy machine 233 Tinker ins rruc rion se r 234 Tinker in acTion 235 Mul riprocessing Tinker 0 As if s rands Tinker39 is a single user39 sys rem 0 Our39 goal in This lec rur39e is To Turn Tinker39 info a mul ripr39ocessing sys rem capable of running up To Threeindependenf pr39ocesses a 34 39 v 39n 39 4 pg 3 Va 4 39 39 39 u 0 V V J u In k Aquot y ampquot quotI o u a 39Ixquot7 quot r aquot illLL quotaft2 Which we label with process IDs A B and C 05 Tink 236 Firs r we add a backing s rore 237 Tinker39s memory 238 Vi r rual addressing Imagine process A was execu ring ins rruc rion STORE 52 Wha r physical address does 52 reference 239 01 course How migh r The operating sys rem Tell which is which 2310 Kernel versus User mode 0 When The operaTing sysTem is running we Turn off virTual addressing and all addresses are physical addresses 0 Thus The operaTing sysTem runs in one of Two modes kernel and user 0 Kernel mode geTs a iTTe extra help from The IN A NUTSHELL 1 1 1pm r r min 3 3984 aha1 O REILLY39 This is someThing of an oversimplificaTion buT in any case some essenTial OS kernel musT always reside in memory 05 Tink 2311 Hidden hardware 0 Scheduling is performed by means of a run queue which holds The ID number of each process ready To execuTe 0 Special exTended insTrucTions ENQUEUE and DEQUEUE manage The FIFO run queue And where pray Tell do The exTra opcodes come from OS Tink 2312 Run run queue run RODIUIOD IIIO PICa lllbl I LIIll value of The AC sTored in ResTores preinTerrupT page Tame word 8 and and kernelUser mOde wriTes preinTerrupT PC Si mode STor39eCl In 00 sTored in page Table word 9 nTo locaTion OO RTI sTands for ReTurn from InTerrupT OS Tink 2313 On line sTorage o Tinker ordinarily execuTes insTrucTions sequenTially However in a mulTiprocessing environmenT The OS someTimes needs To sTep in 0 These siTuaTions are handled by inTerrust or Traps consisTing of Transfers To specific addresses in The operaTing sysTem kernel 0 To geT back from an inTerrupT The sysTem musT have boTh The old PC and The previous mode In Tinker address 00 holds boTh values Why 2314 Good nighT moon 0 When inTerrust occur from user mode The currenT process is puT To sleep and The ocTive conTexT is swiTches To onoTher process STop Talking while I39m inTerrupTing o A quonTum inTerrupT is used To ensure ThoT no process dominoTes The machine 0 When a clock inTerrupT occurs The hardware suspends The currenT process by sToring iTs PC and mode in address OO and Traps To CLOCK 2316 InpuT waiT o In a single user sysTem iT is 0 reasonable To The INPUT insTrucTion To waiT unTil The user enTers a value before conTinuing This is noT such a good idea in a mulTiprocessing environmenT In user mode The execuTion of The INPUT insTrucTion is handled by signaling The appropriaTe IO device and Trapping To IWAIT in The OS WhaT does Tinker do aT address IWAIT OS Tink 23 17 You have mail 0 0 When inpuT arrives The hardware puTs The ID of The process requesTing The IO inTo memory locaTion IPID The acTual inpuT in memory locaTion IVALU and Traps To IREADY in The 05 Your job is To wriTe The code for IREADY HinT This code uses Two special insTrucTions GE39IINS and GETPAGE described on The IasT page of These slides and in your assignmenT OS Tink 23 18 OS Tink JUIIIIIIUI 7 UI OAIOIIUOU IIIJII UbIIUIIJ DS39 nk DS39 nk 2319 2320 10 Maintaining control A simple implementation 65240 Computer Organization Department of Computer Science Wellesley College Our data path Add 4 Add T PCSr39c RegDst RegWr ite ALUSr39c MemWr39ite MemtoReg l ALU control 1 zero Read Addr 1 Insg ggron Read gt Address y Read Addr 2 D0 1 l PC Read Dam R d D t I f v ALU M ea a a Address ns me Wr39ite Addr Read emery Data 2 Write Data Wr ite Data r MemRead Sign 16 Extend 32 Control 172 OrchesTraTing a compuTaTion o SiTTing aT a console hooked To each of The nine conTrol lines we could asserT conTrol lines necessary To execuTe each machine insTr39ucTion o The quesTion is How do we do design uniTs ThaT do This auTomaTically 0 We sTarT wiTh The ALU conTr39oller ConTrol 173 Recall sumrau set on less man NOR ALU apemnon Over ow Carryout ConTrol 174 ALUOp Main Instr3126gt Con l39r ol Setting ALU control bits Other control lines Instr2521 ovf gt Instruction Read Addr 1 R Memory Instr2016 D60 1 Read Addr39 2 am y gt Read Instr3931O Instr P Address 39 ALU 15 11 gt gtWr39Ite Addr39 Read 0 Data 2 A gtWr39lte Data 1 gt 4 2 5399quot ALU Instr 16 Ex39rend 32 control 15 0 Instr50 A Control 175 Instruction Instruction Desired ALU control opcode operation ALU action input LW 7 Mlir 00171giad rdi 7 xxxxxx 3 7 7A 7 0010 7 00 store word Vii X0XXX 000 0010 Branch equal 0 brancihiequal XXXXXX subtract 01107 7 Btype 10 add 100000 la dd r 0010 7 R type 10 subtract 100010 subtract 01107 i R type r 30 100100 and if 0000 7 R31pr 77 10 7 7 77 1001071 or 7 0001 Rtype i077 5amp11 less than 101010 set on less than 0111 7 Control 176 Fr39om Tr39uTh Table To ALU conTr39ol mm mm ConTrol 177 ImplemenTing The main conTr39ol uniT o The main conTrol uniT Takes The 6 biT opcode field as ianITs and produces as oquLITs The values for The 9 conTrol lines o In The following slides we sTLIdy how These conTrol lines make The compuTer Tick ConTrol 178 The da ra pa rh wi rh confrol signals Inslr31726 kegWriie Instruction Data Write Data We sfar39f wiTh RinsTr39ucTions Tells LIs whaf ALU To perform Second operand Firs operand Result goes here 313029252726252423222120191517161514131211109 a 7 6 5 4 3 2 1 l l rs i It i rd l l 6 5 f 5 f 5 l 5 f 6 To read lYIPLIl S To wrife of register file address for register file Rfype for regisfer or Rformaf include add sub and or and 511 Confrol 1710 Rinstructions data and control signals Inslr31726 kegWrite Instruction Data Write Data Iinstr39uctions come next Tells us what ALU to perform Value read or written for lw Base register or SW offsef or address 313029252726252423222120191517161514131211109 a 7 6 5 4 3 2 1 l Opcode l rs l rt l Constant l 5 f 5 f 16 l To read inputs Si ned of register file extended multiplied by 4 and added to PC Itype include 1w sw bag and bnq Recall 1w t0 1200t1 Control 1712 I instr uctions data and control signals Insir31726 kegWr ite Instruction Dutu Write Dutu lw t0 1200tl Insir31726 kegWr ite Instruction Dutu Write Dutu Branch instructions are also Ifor39mat Tells us what ALU to perform Vaiue read or written for Iw Base register or SW offsef or address 313029252726252423222120191517161514131211109 a 7 6 5 4 3 2 1 Opcode rs rt Constant i 5 5 16 To read inputs Signed of register file extended multipled by 4 and added to PC Itype include 1w sw bag and bnq Control 1715 beq 51 52 100 Insir31726 kegWrite Instruction Data Write Data beq 51 52 100 Inwmzo kegWr ite Instruction Data Write Dutu Confrol lines in Sum Memto Reg Mem Mem Confrol 1718 In all the excitement 0 we left outan important instruction The unconditional jump j location resets the PC to a new value calculated as an offset of the old PC It39s implementation requires a small addition to our implementation 0 o Control 1719 j 1000 000010 00 0000 0000 0010 0101 0000 0000 6bits 26 bits still 32 bits Replace the lower 28 bits of the PC with the lower 26 bits of the instruction left shifted by 2 multiplied by 4 Control 1720 j 1000 Repluce The lower 28 insTrucTion offseT by InsTrucTion250 Instruction Mmuy Read Address 16 InsTrucTion150 biTs of The PC wiTh The lower 26 biTs of The 2 ConTrol 1721 adding jump Inwmzo InsTrucTion kegWriTe DuTu WriTe DuTu Single cycle pros and cons Con Uses The clock cycle inefficienle since The clock musT be Timed To accommodaTe The slowesT insTrucTion uynlz 1 Cycle 2 fl li l lw I sw Waste l Con May be wasTeful of area since some funcTional uniTs eg adders musT be duplicaTed since They can noT be shared during a clock cycle Big Pro Is simple and easy To undersTand ConTrol 17 23 VirTual memory A new level of cache 65240 Computer Organization DeparTmenT of Compu rer Science Wellesley College Crowded memory 0 To ral To ral memory reiuired reguired by a col ec rion 0 programs running of once on a compu rer may be larger rhan main memory 0 However only a frac rion of This memory is ac rively used of any given momen r o Vir rual memory allows programs To share memory by using main memory as a cache for secondary s rorage VirTual memory 222 User39 pr39oTecTion 0 0 0 Each program is compiled inTo iTs own privaTe address space a separaTe range of memory accessible To on y This program VirTual memory TranslaTes The program s address space inTo physical addresses IT also enforces proTecTion of a program s address space from oTher LA programs VirTLIal memory 223 0 0 0 room aT The Inn In The aTe 1950s The leading scienTific compuTer The IBM 650 had only 2000 words of memory Memory was precious Programmers spenT spenT a loT of Time Trying To squeeze programs inTo Tiny memory ForTunaTely The enTire program need noT be in memory ThroughouT iTs execuTion VirTLIal memory 224 Overlays o A Technique called overlays was used To allow a program To be larger Than The amounT of memory allocaTed To iT Symbol Table 14K Common roLITines Overlay driver 2K 8K Pass 1 Pass 2 10K VirTual memory 225 Problems wiTh overlays o The programmer was responsible for breaking her program inTo pieces deciding where in secondary memory each piece goes Address space 12287 I 4 4K main memory 8191 7 4095 4096 a F O O I CLIrrenT mapping and arranging for The TransporT beTween main memory and secondary sTorage VirTual memory 226 Vir39Tual memory a In 1961 a group a r Manches rer England pro osed a me rhod for performing The overlay process au roma rica y Address space 4K main memory Currenf mapping a A jump from loca rion 5012 To loca rion 9140 causes an au roma ric page faul r The operaTing sys rem is called wi rhou r The programmer begin aware Virfual memory 227 Vir Tual memory in acTion 1 Confenfs of main memory would be save on disk 2 Words 819212287 would be Address space loaded info main memory 3 Address map would be updated as shown 4K main memory 7 4095 O CLirrenf mapping 4 Program execufion resumes as Though nothing happened Virfual memory 228 Vir39ruol memory 0 Vir rual address space is 0 0 0 broken info a number of equal sized pages Memory is broken info The same size chunks known as page frames Some pages in vir139ual memory are in physical memory some are on disk A vir139ual memory miss is called a page faul r hVSical addiesses Virfual memory 229 Mapping from virTual To physical address 0 The address is broken info a virfual page number and a page offse r o The page size in This 0 example is 212 4 KB The size of physical pages is 1 6B while vir139ual memory is 4 6B Why Vinuai address muzazazv swammuaa 32m Ymn 3 2925 27 151313121Mu95 32m Page ausei Physical address Virfual memory 2210 Page Tables 0 Each program has iTs own page Table ThaT resides in memory a The page Table is indexed wiTh The virTual page number and poinTs To The acTual page in physical memory eiTher main memory or backing sTore VirTuaI memory 2 211 Page Table regisTer poinTs To page Table ansmi address VirTuaI memory 2212 Page faul rs page is on backing store Operating System trap Reference 4 A i restart instruction Reset free Page frame table page physical memory VirTuaI memory 2213 Page replacemem algoriThmsquot l ramc validinvalid bil D c l VlCIlm 0 Chang to invalid 139 Q swap E reset page desired page table for page in table new page backing store physical memory When physical memory fills up and a page faul r occurs somebody39s go r To go Vi rTuaI memory 2214 FIFOquot 0 When a page musT be replaced The oldesT page in memory sleeps wiTh The fishes reference string 70120304230321201701 IE MEI ENE llll I III IIEEE II page frames This is The simplesT buT requires some form of Time sTamp Vi rTual memory 2215 Belady39s anomaly o FIFO s performance isn39T always opTimal 0 IT also suffers from an 14 odd liTTle anomaly Consider The reference sTring 1 2 3 4 1 2 5 1 2 3 4 5 Number of page faults Number of frames Vi rTual memory 2216 OpTimal replacemenT 0 Replace The page which will noT be used for The longesT period of Time reference string 7 701 0120304230321201 NEE E E E E III II page frames UnforTunaTer The opTimaI page replace algoriThm is difficulT To implemenT since iT requires predicTing The fuTure VirTuaI memory 2217 LeasT recenle used 0 Replace The page ThaT has noT been used for The longesT period of Time reference string 70120304230321201701 III II II II IEEE E EM E E III page frames We Try To predicT The fuTure by observing The pasT VirTuaI memory 2218 Reference biTs o ImplemenTing an achIraTe U is Too expensive since iT requires LIpdaTing a daTa sTrLIcTLIre on every memory reference a MosT sysTems provide a use or reference biT which is seT whenever a page is accesses a The OS periodically clears The reference biTs so iT can deTermine which pages have been Touched during a Time period Virtual memory 2219 WhaT abouT wriTes o WriTes back To The disk Take millions of processor cycles so building a wriTe buffer is noT pracTical InsTead virTLIal memory sysTems musT use wriTe back copying The page back To disk when iT is replaced a Virtual memory 2220 The TranslafionIookaside buffer LB N ua page Physme page numbev VathmyRev Yag addvess Physxm memaw mm page gDvaev mmskaddvess Typical values 16512 entries missrafe 01 1 o misspenulfy 10 100 cycles Virfual memory 2221 Vuma addvess In rr i ns iTy FasTMA TH TL B Virfual memory 2222 eaa Virtua memury 22723 Raining on our parade 0 When a TLB miss occurs MIPS hardware saves the pag number in a special register and generates an exception 0 The OS handles the miss in software by indexing the page table through the page table register The system places the physical address from the page table into the TBL k2 about 13 clock cycles assum Ta 5 instruction and data cache respectiv ing the code and page table entry are in the eiy Virtua memury 2224 Thrashing o If The working se r is larger Than The number of available page frames no algori rhm shor r of OPT will give good resul rs l Virtual page 0 Virtual page 8 Virtual page 8 Virtual page 1 Virtual page 1 Virtual page 0 Virtual page 2 Virtual page 2 Virtual page 2 Virtual page 3 Virtual page 3 Virtual page 3 Virtual page 4 Virtual page 4 Virtual page 4 Virtual page 5 Virtual page 5 Virtual page 5 Virtual page 6 Virtual page 6 Virtual page 6 Virtual page 7 Virtual page 7 Virtual page 7 a b c Assume a LRU algorithm Vir rual memory 2225 13 Thanks for The memories Memory design C5240 Computer Organization Department of Compu fer Science Wellesley College Modular design Compu ler Hardware You are here 1 Majors Blocks l Medium Scale Digi39ral ParTs l Fundamen39ral Elemen39rs Complete Systems ALU Confroller IO Memory Regisfers Decoders Compara39rors Coun39rers Clocks Gates and FlipFlops Compu l39er memory 142 SRAMs and DRAMs o RegisTers and regisTer files provide The basic building blocks for small memories 0 Larger memories are builT using eiTher SRAM sTaTic random access memories E w or DRAM dynamic random access memories Com puTer memory 143 SRAMs o SRAMs are inTegraTed circuiTs ThaT are memory arrays wiTh a single access porT ThaT can Addreas X o prowde eITher a read or a Chipselect gt 16 0 Typical access Time varies owmmme Z qulle mum Wr39tee bl from 24 ns for The I quota Q fasTesT smaller CMOS sinus 013 parTs To 820 for The largesT parTs gt32 M Com puTer memory 144 Wr i ring To SRAMS o For wrifes bo ih dafa and The address as well as confrol signals musi be given c There are sei39upi39ime and holdTime requiremeni39s for The address and dafa Ines o The Wrii39e enable signal is nof a clock edge bui a pulse wifh a minimum widfh requirement Mavens A Chip salsa 4 sum Damn Hi Oummenabieg 2W 6 Wm nabi Dinii l E Cumpuler memury 145 Tris ra re buffers o If is nof praci39ical To build 64K x 1 39 same way we did regisi39er files a Rafher large memories are implemenfed wifh a shared oufpuf line call a bii line Cumpuler memury 14o Wired ORS Danger Will Robinson Compu rer39 memor39y 147 Noninver ring buffer Compu rer39 memor39y 148 Organiza rion of a 4M x 8 SRAM Compu rer39 memor39y 149 Compu rer39 memory 14 10 DRAMS o In SRAM The sTored value is kepT on a pair of inverTing gaTes IT doesn39T fade o In dynamic RAMThe value mummy i Is kepT In a cell as a charge in a capaciTor C n o A single TransisTor is used was my To access This sTored charge Cumpuier memury 1411 DRAM design Acumen M1 Eaiumn 11mg Cumpuier memury 1412 Taking advantage of locali ry o The principle of locality sfafes Thai programs access a relafively sma porfion of Their address space at any instant of Time a Localify in programs arises from nafLIraI program sfrucfures Compmzr memory nus Memory hierarchy Pamnnnl Sun In Compmzr memory 1444 It39s alive 0 The principle of locality makes having a memory hierarchy quotworkquot 0 More recently accessed data items are kept closer to the processor Temporal locality 0 Blocks of multiple 1 contiguous words in memory are moved to 7 39 upper levels spatial 3 9 9 3 locality q 1 s c l A A Computer memory 1415 Basic building blocks Combina rional circui rs 65240 Computer Organization DeparTmenT of CompuTer Science Wellesley College Modular design 0 In The old days compuTers were consTrucTed gaTe by gaTe Today The building blocks are somewhaT larger CompuTer Hardware l Majors Blocks l Medium Scale Digi39lal ParTs l FundamenTal Elemen ls CompleTe SysTems ALU Confroller IO Memory RegisTers Decoders ComparaTors CounTers Clocks Gates and FlipFlops Digi lal circui ls 112 Decoders o The mos r common decoder has on nbi r inpu r and 2quot oneasserted ou rpu rs and is implemen red using code de rec ror39s 33 Do BO gtO 3 D1 31 gto BZ 130 D1 gt 132 133 134 135 136 D7 V Digital cir39cui39rs 113 To do Digital cir39cui39rs 114 Encoders do jus r The opposi re 0 Useful for cap ruring a large number of one asser red signals DO succinc rly D1 0 For example Bo gt 1 n n an D6 gt Digi ral circui rs 115 Encoders can be hardwired 000 001 010 011 100 101 110 111 Digi ral circui rs 116 STeer ing cir cui rs Mul riplexer selecT Demul riplexer 0 D selecT Digifal ci rcuifs 117 Binar y con rr39ol o Multiplexers generally have 2quot data inputs one data output and n control lines 0 Here39s The world39s smallest Mux Digifal circuifs 118 Implementing Muxs with encoders Digital cir39cuits 119 Muxs as combinational circuits 0 A Mux and a voltage source are all we need to D o gt Implement our39 old fr39lend the votmg machme D1 D2 gt D3 gt 13911 B L M 39 D4 gt U U 1 III El 1 U D D5 I El 1 1 1 1 D n 13 D6 1 El 1 1 1 1 n 1 D7 1 1 1 1 gt W gt m gt Digital cir39cuits 1110 Shif rers D0 D1 DZ D3 D4 D5 De D7 CLDCI I I I I I I so S1 82 83 S4 S5 56 S7 Digital circui39rs 1111 Arrays of logic elemen rs 0 Many combina rional opera rions mus r be performed on en rire words 0 We build an array of logic elemen rs which is represen red simply by showing rha r a given op era rion will happen ro r e en rire collec rion o A bus is a collec rion of da ra lines rha r is rrea red as a single logical signal Select Select 9 A 25 M U 32 C B323 X A0 U CO BO Digi39ral circui39rs 1112 SUHof producs o A sum of producfs represeni39ai39ion is The logical sum OR of producfs AND quotThink of summing code detectors Digilal mm 11713 PLAs A programmable logic fwo s139ages of logic mm mm o The first sfai39e is an array of AND gafes which form The producf Terms somefimes called mini erms Product mm mm o The second s139age sums The minferms Digilal mm 11714 For example 0 Here is our old friend Recall D is logical OR E checks for zxaci39ly Two 1s and F is logical AND Digilal mm 11715 PLA implemen ra rion Digilal mm 11716 Some rimes drawn as mums mm pm 0mm Dwgwm mm 11717 When Though of as a ROM mums Iogz height AND puma Outputs width ROM is a fully decoded PLA for a given width and height Dwgwm mm 1143 Pipe Dreams A pipelined da rapa rh 65240 Computer Organization DeparTmenT of CompuTer Science Wellesley College Redesigning The da rapaTh for pipelining 1 IF InsTrucTion IF Inslruction fetch ID Instrudion deoode EX Execute MEM Memory access WB W39te back 61 C register le read address calculation ID InsTrucTion decode and regisTer file read EX Execu rion or address calculaTion 4 Mem DaTa memory access 5 W8 WriTe back lquot 9 Pipelining 192 Pr eTending each insTr39ucTion has iTs own daTapaTh 0 Under This assumpTion Three insTr ucTion would need Three daTapaThs much was 21cczicca m m m o InsTead we add regisTer s in beTween clock cycles so ThaT por Tions of The Wszzmtm daTapaTh can be shared w 51 mm E w x swan Pipelining 193 Pipelined version of our39 daTapaTh Pipelining 194 InsTr39ucTion fefch lw tl lOOsO Pipelinng 195 By comparison 0 In The au rhor39s mul ricycle implemen ra rion an ins rr ucfion regis rer is added To save memory ou rpufs o The IFID regis rer is connecfed in a similar fashion Pipelinng 196 Decode and r egiSTer file r39ead lw tl lOOsO Pipelinng 197 Execu ie or address calculaTion lw tl lOOsO Pipelinng 198 Memory access amp wr39iTe back sTages W Pipelinng 1910 sw tl lOOsO Pipelinng 1911 sw tl lOOsO Fg rA Pipelinng 1912 ImporTanT safeTy Tip 0 Each logical componenT of The daTapaTh insTr39ucTion memor39y r39egisTer39 r39ead por39Ts ALU daTa memory and r39egisTer39 wriTe por39T can be used only wiThin a single pipeline sTage o OTher39wise The specTer39 of sTr39ucTur39al hazard r39ear39s iTs ugly head Pipelining 1913 A bug in The works We r39eTur39n To The Wr39iTe back sTage of lw t1 100 50 Pipelining 1914 The correcfed pipelined dafapafh vlpelhmg ma Pipelined confrol 0 We start with a design that views the problem through rosecolored glasses Later sections in the fax remove these glasses to reveal hazards of the reel world vlpelhmg ma Pipelined daTapaTh wifh confrol signals Pipelining 1917 Exfending pipeline regisfer39s To include conTr39ol lines er DEX EXMEM MEMWB Pipelining 1918 Ready To ship Pipelining 1919 Well noT qui re 0 There39s s rill The le problem of confrol and dafa hazards o Lef39s looks 0139 a sequence lousy wifh dafa hazards sub 2 1 3 and 12 2 5 or 13 6 2 add 14 2 2 sw 15 1002 Pipelining 1920 Pipelining 1922 Ah but look closaly Pipelining 1921 Pipelined dependences MIPS insTrucTion formaT insfrucfion operation second register sniff amounf source operand coming soon V V V op rs rt rd shamt funct 6 biTs 5 biTs 5 biTs 5 biTs 5 biTs 6 biTs first register register destinufion funcfion code source operand o erund Rfype for regisfer or Rformaf Pipelining 1923 EXMEMRegisTer39r39d IDEXRegisTerRs 2 an 3552 manta quotHijm I DEX RegisferRs Pipelining 1924 A noTaTion for daTa dependences We classify rwo pairs of hazard condi rions 1a EXMEMRegis rer Rd IDEXRegis rer Rs 1b EXMEMRegis rer Rd IDEXRegis rer Rf 20 MEMWBRegisfer Rd IDEXRegis rer Rs 2b MEMWBRegisfer Rd IDEXRegis rer Rf Pipelining 1225 MEMWBRegis rer39rd IDEXRegisTer39RT 2 an 3552 manta quotMawma Pipelining 1926 We can now deTecT hazards how do we forward The proper daTa so can cc cm on on on ocl ecu wndmmi In In in in 1H ao 10 4n ao innmama x x x m x x x x x 01 x x The required daTe exisTs in Time for later insTrucTions in T e pipeline regisTer may as 100M SoluTion We Take The inpuTs To The ALU from any pipeline regisTer raTher Than 39 T The IDEX Pipelining 1927 Resolving EX Type 1 daTa hazards if EXMEM RegWriTe and EXMEM RegisTeer X 0 and EXMEM RegisTeer IDEX RegisTerRT Then ForwardB 10 IDIEX any MEINIB if EXMEM RegWriTe and EXMEM RegisTeer t O and EXMEM RegisTeer IDEX RegisTerRs Then ForwardA 10 Pipelining 1928 IT is noT always possible To for39war39d M doawl Col no on cc gas can ocv ecu DC Maxim mun mum mans msiias l if IDEX MemRead and IDEXRegisTerRT IFID RegisTerRs a r E RegisTerRT IFID RegisTerRT Then STall The Pipeline Pipelining 1929 STalling The pipeline llall ndm r em cc so so ecu ecu em ecu on new Pingn mm mm mamaim If we sTall The ID sTage insTrucTion Then we musT also sTall The IF sTage 39 cTion E i Mm E mun mum El Ema will was 7 The sTall is accomplished by prevenTing The PC and The IFID regisTers from changing Pipelining 1930 Of cour Se while The from half is sTalled The back half musT do noThing Mannheim r em cc so so ac ecu em ecu on em Pingn mm mm mi 39 El This is clone by inserTing l h naps inTo The pipeline W l l l lly i luau mints By idenTifying The hazard in The ID sTage we inserT naps by changing The EX MEM and WB conTrol fields of The IDEX pIpeline regisTer To Pipelining 1931 Pipelined conTr39ol overview so for Pipelining 1932 Branch hazards mmmmn cc cc cc on on own on on PlumII ew H W 3 gquot min quoti Equot We are predicfing branch no fakequot and need To flush The pipeline if we are wron Pipelining 1233 Flushing insfr39ucfions Pipelining 1934 A 2biT predicfion If The branch is Taken we have a penaITy of one cycle For our simple design This is reasonable WiTh deeper pipelines penaITy increases and sTa ric branch predic rion drasTicaIIy hur rs performance Solu rion dynamic branch predic rion um him Pipelining 1935 Assignment 6 Computer Science 240 Fall 2008 Due Thursday November 3 Reading Paterson and Hennessy B5 B8 61 Using the three variable multiplexer chip given on page 4 10 of the notes implement a function whose output is the parity of the inputs that is the output is 1 if and only if an odd number of inputs are 1 62 Put on your thinking cap The three variable multiplexer chip in 4 10 is actually capable of computing an arbitrary function of four Boolean variables Describe how and an example draw the logic diagram for the function that is 0 if the English word for the truth table row has an even number of letters 1 if it has an odd number of letter Hint If we call the fourth input variable D the eight input lines may be wired to V ground D or D 63 In lecture we designed an n bit adder The final carry out sets a carry flag in the processor status word It could also be used to set the overflow flag but only if n bit binary representation is used Why Design using a minimum of additional logic a circuit that tests for an overflow condition when 2 s complement representation is used That is the output of your circuit should be 1 if and only if an overflow condition has occurred Explain how your circuit works 64 A common MSI chip is a 4 bit adder Four of these chips can be hooked up to form a 16 bit adder How may pins would you expect the 4 bit adder chip to have Why 65 Implement a switching network that has two data inputs A and B two data outputs C and D and a control input S If S equals 1 the network is in pass through mode and C should equal A and D should equal B If S equals 0 the network is in crossing mode and C should equal B and D should equal A 66 Because of the potential for data corruption in large memories most computer systems use some sort of error checking code to detect possible corruption of data One simple code that is heavily used is a parity code In a parity code the number of 1s in a word is counted the word has odd parity if the number of 1s is odd and even otherwise When a word is written into memory the parity bit is also written 1 for odd 0 for even Then when the word is read out the parity bit is read and checked If the parity of the memory word and the stored parity bit do not match an error has occurred A 1 bit parity scheme can detect at most 1 bit of error in a data item if there are 2 bits of error then a 1 bit parity scheme will not detect any errors since the parity will match the data with two errors Actually a 1 bit parity scheme can detect any odd number of errors however the probability of having three errors is much lower than the Assignment 6 Page 2 Computer Science 240 probability of having two so in practice a 1bit parity code is limited to detecting a single bit of error Of course a parity code cannot tell which bit in a data item is in error 1 Construct the truth table for a fourinput oddparity function 9 Implement the fourinput oddparity function with AND and OR gates using bubbled inputs and outputs 6 Implement the fourinput oddparity function with a PLA 67 What would the behavior of the 32bit ALU be for control inputs Ainvert1 Bnegatel Operation11 For control inputs Ainvert1 Bnegate0 Operation 2 11 Be thorough 68 The ALU supported set on less than s It using just the sign bit of the adder Let s try a set on less than operation using the values 7ten and 6m To make it simpler to follow the example let s limit the binary representations to 4 bits 1001 and 0110 two two 1001 0110 1001 two 1010 0011 two two two two This result would suggest that 7 gt 6 which is clearly wrong Hence we must factor in over ow in the decision Modify the 1bit ALU in Figure B5 10 on page B 33 to handle 5 1t correctly Make your changes on a photocopy of this figure to save time 69 A 16bit ALU is built up of 16 1bit ALUs each one having an add time of 10 ns If there is an additional 1 ns delay for propagation from one ALU to the next how long does it take for the result of a 16bit add to appear 610 Redraw the waveforms given below and underneath them plot the waveform of Q for the clocked D LATCH introduced in lecture 611 What is the quiescent state of the S and R inputs to an SR latch built using two NAND gates Solution The NAND latch is wired the same way as the NOR latch Normally both inputs should be 1 to achieve consistency between the input and output Assignment 6 Page 3 Computer Science 240 612 Assuming infinite speed along the wires and a 10 ns delay per NOR gate what is the delay for output to settle to its final value for each of the 4 different input combinations to an SR latch Why fully explain your answers Input Delay ns Hot OW HHOOUJ