Computer Architecture I
Computer Architecture I CSE 30321
Popular in Course
Popular in Computer Science and Engineering
This 0 page Class Notes was uploaded by Mrs. Damaris Hyatt on Sunday November 1, 2015. The Class Notes belongs to CSE 30321 at University of Notre Dame taught by Staff in Fall. Since its upload, it has received 24 views. For similar materials see /class/232750/cse-30321-university-of-notre-dame in Computer Science and Engineering at University of Notre Dame.
Reviews for Computer Architecture I
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 11/01/15
For the sequence of instructions shown below show how they would progress through the pipeline EL Assume that forwarding HAS been implemented We will predict that any branch instruction is NOT TAKEN Branches or Jumps are resolved after the EX stage Assume that register 8 Min etiqu a 1 for the 1S Beq instruction Assume that register 17 Mega26 for the 2quotd Beq instruction 1 2 8 4 5 6 7 8 9 10 1 2 3 8 9 10 instruction as progressing through pipeline 7 0 20 11 7 12 1 1 024 17 17 1 17 26 Y 5 6 7 8 5 5 17 17 1 17 010 1 2 3 8 9 10 i Assume that this sequence of code is executed 100 times How many cycles does the pipelined implementation take ii How many cycles would this code take in a multicycle implementation From Part 1 you can see that it takes 17 clock cycles to execute 12 instructions However we can start the next iteration in clock cycle 14 Therefore it really onlytakes 13 cycles for each iteration and 17 00s for the last one Therefore iterations 1 through 99 take 13 00s each 0 18 x 99 1287 00s Iteration 100 takes 17 00s Therefore 1287 00s 17 00s 1304 00s For the multicycle implementation we have 0 9 instructions that take 4 00s 0 2 instructions that take 3 00s 0 1 instruction that takes 5 00s Therefore each iteration takes 9x4 2x3 1x5 36 6 5 47 00s If there are 100 iterations then 4700 00s are required Pipelining gives us a speed up of 4700 1304 36 for this implemention Little to no extra HW is needed CSE 30321 MIPS Single Cycle Dataflow The goals of this lecture are to show how IAs map to real HW and affect the organization of processing logic and to set up a discussion of pipelining other principles of modern processing The organization of a computer Von Neumann Model Stored program machine instructions are represented as numbers Programs can be stored in memory to be readwritten just like numbers Today we39ll talk Functions of Each Component Datapath performs data manipulation operations arithmetic logic unit ALU floating point unit FPU Control directs operation of other components finite state machines micro programming Memory stores instructions and data random access vs sequential access volatile vs non volatile RAMs SRAM DRAM ROMs PROM EEPROM disk tradeoff between speed and costbit InputOutput and IO devices interface to the environment mouse keyboard display device drivers The Performance Perspective Performance of a machine determined by Instruction count clock cycles per instruction clock cycle time Processor design datapath and control determines Clock cycles per instruction Clock cycle time We will discuss a simplified MIPS implementation MIPS Instruction Formats All MIPS instructions are 32 bits 4 bytes long Rt e yp 31 2625 2120 1615 1110 6 5 o op6 rs 5 115 rd 5 shamt 5 funct 6 IType 31 2625 2120 1615 o OpS rs5 rt5 AddressImmediatevalue16 Jtype 31 26 25 o Op6 Target address 26 Review of Design Steps Instruction Set Architecture gt RTL representation RTL representation gt Datapath components Datapath interconnects Datapath components gt Control signals Control signals gt Control logic Writing RTL How many states cycles should an instruction take CPI Datapath component sharing 131i 5 Let39s talk abou a enerally on the board first Let39s just look at our instruction formats and quotderivequot a simple datapath We need to make all of these instruction formats workquot The MIPS Subset To simplify things a bit we39ll just look at a few instructions memory reference Iw sw arithmetic logical add sub and or slt branching beq j Organizational overview fetch an instruction based on the content of PC decode the instruction fetch operands 39 read one or two registers At simplest level this is how Von Neumann execme RISC model works 39 effective address calculationarithmetic logica operationscomparison store result 39 write to memory write to register update PC Implementation Overview simplest view of Abstract Simplified View Von Neumann RISC uP Address A 3 Instruction l Instruction R 39 Clk Memory Rw 2all 9 l Clk Clk 2 types of sugnals data and control Clocking strategy All storage elements clocked by same clock edge What we39ll do look at instruction encodings look at datapath development discuss how we generate the control signals to make the datapath elements work Review of Design Steps Instruction set Architecture gt RTL representation ie P 6 C 4 RTL representation gt or 4 6 3 2 Datapath components Datapath interconnectsN quot33d lhese l d Datapath components gt Control signals Control signals gt Control logic quot33 lhese 0 d need these to do J Writing RTL How many states cycles should an instruction take CPI Datapath component sharing What to be Done for Each Instruction Decode Fetch operands wnie back How many cycles should The above Take You are The archiTecT so you decide Less cycles gt more To be done in one cycle Instruction Fetch Unit FeTch The insTrucTion memPC UpdaTe The program counTer sequenTial code PC lt PC4 branch and jump PC lt someThing elsequot Single Cycle Implementation Each insTrucTion Takes one cycle To compleTe We waiT for everyThing To seTTle down and The righT Thing To be done ALU mighT noT produce righT answerquot righT away why we use wriTe signals along wiTh clock To deTermine when To wriTe Cycle Time deTermined by lengTh of The longesT a w Let39s say we want to fetch an Rtype instruction arithmetic InstrucTion format 31 2625 2120 1615 1110 65 o l op6 l rs 5 l 115 l rd 5 l shamt5 funcHG 39 RTL InsTrucTion feTch memPC ALU operaTion regrd lt regrs op regrT Go To nexT insTrucTion Pc lt PC 4 Ra Rb and Rw are from insTrucTion39s rs rT rd fields AcTual ALU operation and regisTer wriTe should occur afTer decoding The insTrucTion Datapath for RType Instructions Reg er ALUctr IType ArithmeticLogic Instructions Instruction format rs Ra BusA m 5 I Rb 32 rd 715 Rw 32 h 31 26 25 21 20 Registers 2 Clkogt 2 BusB 1615 0 Op6 rs5 rt5 Addresslmmediatevalue16 Register timing Register can always be re RTL for arithmetic operations eg ADDI Instruction fetch memPC d Add operation regrt lt regrs SignExtimm16 Register write only happens when RegWr is set to high and at the falling 6 To quotex ins h uctiom PC lt PC 4 edge of the clock Also immediate instructions Datapath Type AL IType LoadStore Instructions Instructions note that we reuse ALU Instruction format RegWri ALUctr 5 Pa 31 26 25 21 20 1715 3 quots 7 39 Op6 rs 5 m5 AddressImmediate value 16 pt 5 Rb 32 aw 32I bit BusB 32 Zka Reg39sm s RTL for loadstore operations eg LW Instruction fetch memPC RegDst Compute memory address Addr lt regrs rd rt SignExtimm16 Load data into register regrt lt memAddr 60 to next instruction Pc lt P In MIPS destination registers are in different How about Store places in opcode therefore e need a mux same thing just skip 3rd step memaddr 6 regrs Datapath for adStore Instructions Reg Dst Addr Data Memory WrEn IType Branch Instructions Instruction format 31 2625 2120 1615 0 Op6 rs5 rt5 AddressImmediatevalue16 RTL for branch operations eg BEQ Instruction fetch memPC Compute conditon Cond lt regrs regrt Calculate the next instruction39s address if Cond eq 0 then PC lt PC 4 SignExdimm16 x 4 else Clk o To Instruction Mem i ALUctr rs E R 32 m 5 Rb R 32 bl 5 R 39 t 5 ng as we39ll define this next ZEN will need PC zero tes condition from ALU RegDst rd FT Next Address Logic comams PC 4 why 30 subtlety see Chapter 5 in your text May not want to u u change PC if BEQ 0 condition not met this stuff happens anyway so we have to be sure we don39t 16 i change things we don39t want to changequot imm16 Instruction Memory implicitly says 4 if branch instruction AND 0 can automatically generate control signal Branch Zero When does the correct new PC become available Can we do better JType Jump Instructions Instruction format 31 26 25 Op 6 Targetaddress26 RTL operations eg BEQ Instruction fetch memPC Set up PC PC lt PC 4lt3129gt CO NCATtarget lt 25 0 gt x 4 A Single Cycle Datapath Instmction memory Add Jump Instruction Fetch Unit why PClt3128gt subtlety see Page 383 in your text New address in jump instructior PClt3128gt Instructionlt 25 PC CorryIn use 4 M53 of PC Instruction Memory Let39s trace a few instructions For example Add 5 6 7 sw 09 10 Sub 1 2 3 LW 11 012 The HW needed plus control US1335 Control Logic Ie now we need to make the HW do what we want it to do add subtract etc when we want it to When we talk about control we talk about these blocks Implementing Control Implementing Control Implementation Steps I Implementation Steps Identify control inputs and control output control wards 1 Idenhfy Conlml mpu l s and Conlr l OUTPUTS Make a comm Signal Table for each cycle 2 Make a control sugnal table for each cycle Derive comm logic from The comm Table 3 Derive control logic from the control table This logic can take on many forms combinational logic 9 Do we quotad a FSM here39 ROMS microcode or combinations Control outputs RegDst MennoReg l RegWrite Control inputs mim t 0pcode 5 bits Control ALUSm Func 6 bits Palh ALuctr Branch Jump Single Cycle Control InputOutput Control Inputs 4 Opcode 6 bits Step 1 Idenitfy inputs amp outputs How about R ty e instructions Control Outputs RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch Jump ALUctr these are rows these are columns Ste 2 Make a control signal table for each cycle For MIPS we have to build a Main Control Block and an ALU Control Block Control Signal Table R type inputs Add am I w sw 39l BFQ Func input 100000 100010 xxxxxx xxxxxx xxxxxx Op input 000000 000000 100011 101011 000100 RegDst 1 1 0 X X ALUSrc 0 0 1 1 0 Mem to Reg 0 0 1 X X Reg Write 1 1 1 0 0 Mem Read 0 0 1 0 0 Mem Write 0 0 0 1 0 Branch 0 0 0 0 1 ALUOp Add Sub 00 00 01 outputs Let hires l 1 Main ILUOP 6 Control V 2 OPCOde Other cnt signals Our 2 blocks of control logic Use OP field to generate ALUOp encoding Control signal fed to ALU control block Use Func field and ALUOp to generate ALUctr decoding Specifically sets 3 ALU control signals B Invert Carry in operation Main control ALU control Main Control C0quotquot l 3 Outputs of main control become inputs to ALU control R type lw sw beq ALU OPEM Oquot R typequot add add subtract We have 8 bits ALUO 10 gt l 1 0 0 01 of input to our ALU control Or in other words 00 ALU performs add 01 ALU performs sub 10 ALU does what function code says block we need 3 bits of utp Generating ALUctr We want these outputs I ALU Operation l and l or l add l sub l slt I and 00 ALUctrlt220gt loooloo1l l 111 or 01 ALUctrlt2gt B negate C in 81 B inver39t Illvert B and C in adder 39 Q ALUctrlt1gt Select ALU Output st be a 1 for less 11 ALUctrlt0gt Select ALU Output subtract We have these inputs uquotclt5o ALUctr 36and100100 lwsw 37or 1oo1o1 beq 32add100000 34sub100010 42 slt 1 o 1 o 1 0 Raw can ignore these they re the same for all This table is used to generate the actual Boolean logic gates that produce ALUctr lwsw beq Could generate gates by hand often done wSW ALUctr nned 110110 or ALUctrlt 1 gt gt F2 ox r 1 Ex ALUctrlt2gt F1 1x 0 and ALUctrlt02 SUBBEQ F0 ux quot 0x I 1r Recall Single cycle MIPS machine Recall for MIPS we have to build a Main Control Block and an ALU Control Block Well here39s what we did We came up with the information to generate this logic which would fit here in the datapath ALUOP Single cycle versus multicycle and again remember realistically logic ISAs insturction types etc would be much more complex we39d also have to route all signals toowhich may affect how we39d like to organzie processing logic SingleCycle Implementation Singlecycle fixedlength clock CPI 1 Clock cycle propagation delay of the longest datapath operations among all instruction types Easy to implement Singlecycle variablelength clock CPI 1 Clock cycle 2 7otype i instructions propagation delay of the type instruction datapath operations Better than the previous but impractical to implement Disadvantages What if we have floating point operations How about component usage I i Multiple Cycle Alternative Break an instruction into smaller steps Execute each step in one cycle Execution sequence Balance amount of work to be done Restrict each cycle to use only one major functional unit At the end of a cycle Store values for use in later cycles why Introduce additional quotinternalquot registers The advantages Cycle time much shorter Diff inst take different of cycles to complete Functional unit used more than once per instruction I A CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 1 Lecture 08 Introduction to the MIPS ISA i Procedure Calls in MIPS University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 3 MIPS Registers and the conventions associated with them University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS MIPS Datapath 6 bit opcodes More ways to MIPS Instructions address memory are 32 bits plus 6 function codes more functionality University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 4 MIPS Instruction Types Instructions are characterized into basic types For each type 32 bits of instruction are interpreted differently 3 types of instructions in MIPS R type I type J type In other words As seen with Add instruction encoding broken down into X different fields With MIPS only 3 ways X of bits arranged Think about datapath Why might this be good University of Notre Dame CSE 30321 Lecture 08 Introduction In the MIPS ISA Procedure calls in MIPS 5 m A qu1ck look more complex lSAs Da al a Path of Add from start to finish Controller Add 0010 0001 0010 0011 Bits for Load 0 also sent 0010 0011 M 10 AVE More types more multiplexor inputs signal rout39ng University of Notre Dame Rtype Instructions El All instructions have 3 operands El All operands must be registers El Operand order is fixed destination first El Example Ccode A B c Assume that A B C are stored in registers s0 s1 s2 MIPS code sub s0 s1 s2 Machine code 00000010001 1001010000 XXXXX 100010 El Other Rtype instructions I addu mult and or sll srl University of Notre Dame CSE 30321 Lecture 08 Introduction In the MIPS ISA Procedure calls in MIPS 6 RType Assembly and Machine Format El Rtype All operands are in registers Assembly add 7 add rd rs rt RFrd RFrsRFrt f I 39 agd opfu no 31 f 26 25 2120 1615 6 5 ma up 6 rs 5 rt 5 rd 5 shamt 5 funct 6 Machine B 000000 00111 01000 01001 xxxxx 100000 D 0 7 8 9 x 32 University of Notre Dame lType Instructions ltype One operand is an immediate value and others are in registers Example addi s2 SS1 128 addi rt rs lmm 392 x YRF18 RF17128 a 3 31 326 25 1120 1 15 Kg 0 0p 6 rs 5 rt5 AddressImmediate value 16 B 001000 10001 10010 0000000010000000 D 8 17 18 128 University of Notre Dame lType Instructions Another Example Itype One operand is an immediate value and others are in registers Example lw s3 32t0 RF19DMRF832 f 3 39 a f f x n 31 3quot 26 25 quot21 20 vs 15 quottax 0 Op 6 rs 5 rt 5 AddressImmediate value 16 B100011 01000 10011 D 35 8 19 0000000000100000 How about load the next word in memory University of Notre Dame Byte addressability What immediate values are encoded in an ltype instruction for example are affected by the fact that MIPS data words are byte addressable Let s look at Questions 1 and 2 on the board University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 10 IType Instructions Yet Another Example Itype One operand is an immediate value and others are in registers Example Again bne t0 t1 Again mR 8RI19 PCPC 4 Imm4 f39 else PCPC4 31 g 25 120 1 15 o one rs5 rt 5 AddressImmediate value 16 B 00101 01000 01001 0000 0000 0001 0000 5 PCrelative addressing University of Notre Dame JType Instructions El Jtype only one operand the target address Example j 3 Goto addr 3 x 4 Le goto addr 12 31 3936 25 mam o Op 6 Target address 26 B 000010 00000000000000000000000011 39 3 University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 13 In class examples University of Notre Dame Practical Procedures Have already started to see that you don t make N copies of for loop body Thus Might look like this for i0 iltN i N2Y 3 a b C 35 N N 1 sub 2 21 d a 3 oopadd456 abc f d l add748 dae awsaszsw fd add 331 1 1132 f110 oop sub 11 2 3 bneq 11 0 oop You wouldn t make multiple copies of a machine instruction function either University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 14 Size of Immediate Operand Mumr DI Uh new In lrllmodlnb University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 16 Practical Procedures For example Might look like this int mainvoid 6 n an arg reg int i int 339 add 5 0 7 arg reg 7 jpower j powerlti 7 ca int powerint i int n poweri add 3 0 0 39 j k sub 5 51 for jo jltn j P3 mUt5 6 6 k 11 add 3 31 return k SUb 11 51 3 bneq 11 0 00 add 2 6 0 data n ret reg jca Advantage Much greater code density especially valuable for library routines etc University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 17 Procedure calls are so common that there s significant architectural support University of Notre Dame MIPS Procedure Handling cont El What about passing parameters and return values I registers 4 7 a0a3 are used to pass first 4 parameters I returned values are in 2 and 3 v0v1 El 32x32bit GPRs General purpose registers I 0 zero 32 bits I 2 3 v0 v1 return values ro I 4 7 a0 a3 arguments 391 l 8 15 t0 t7 temporaries l 16 23 s0 s7 saved l 24 25 t8 t9 more temporaries I 31 ra return address 3931 HI University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 18 MIPS Procedure Handling CI The big picture re 0 r1 Caller Callee s v 31 ra return address F wrwbn1 bo PC HI Lo El Need jump and return I jal ProcAddr issued in the caller jumps to ProcAddr save the return instruction address in 31 PC JumpAddr RF31PC4 Ijr 31 ra last instruction in the callee jump back to the caller procedure PC RF31 University of Notre Dame In class example University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS More complex cases El Register contents across procedure calls are designated as either caller or callee save El MIPS register conventions I t v a not preserved across call caller saves them if required I s ra fp preserved across call callee saves them if required I See PampH FIGURE 218 p88 for a detailed register usage convention I Save to where More complex procedure calls I What if your have more than 4 arguments D I What if your procedure requires more registers than available I What about nested procedure calls I What happens to ra if proc1 calls proc 2 which calls proc3 University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS Where is the stack located Memory Structure Lower Mem Reserved Addr Instruction segment I Data Esegment V Higher Stack MEIquot Esegment m Addr University of Notre Dame Top of stack sp i 21 23 CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS Z The stack comes to the rescue El Stack IA dedicated area of memory I FirstlnLastOut FILO I Used to gt Hold values passed to a procedure as arguments gt Save register contents when needed gt Provide space for variables local to a procedure El Stack operations lpush place data on stack sw in MIPS I pop remove data from stack lw in MIPS El Stack pointer lStores the address of the top of the stack l29 sp in MIPS University of Notre Dame CSE 30321 Lecture 08 Introduction to the MIPS ISA Procedure calls in MIPS 24 Call frames El Each procedure is associated with a call frame El Each frame has a frame pointer fp 30 main Snap shots of stack proc1 Local proc1 varlables 2 Rials s proc 93 ifp procZ Argument 5 Argument 5 Argument 6 is in 46m proc3 mam University of Notre Dame CSE 30321 Computer Architecture 1 m 2009 Lecture 13 In Class Notes and Problems October 6 2009 Question 1 First we briefly review the notion of a clock cycle CC Generally speaking a CC is the amount of time required for i a set of inputs to propagate through some combinational logic and ii for the output of that combinational logic to be latched in registers The inputs to the combinational logic would usually come from registers too Thus for the MIPS single cycle datapath that was derived in class last week the combinational logic referred to above would really be the instruction memory register file ALU data memory and register file The inputs come from instruction memory and the output might be the main register file in the case of an ALU instruction or the PC in the case of a branch instruction Putting the MIPS datapath aside its generally a good idea to minimize the logic on the critical path between two registers as this will help shorted the required clock cycle time will result in an increase in clock rate which generally means better performance To be more quantitative hypothetically let s say that there are two gate mappings that can implement the logic function that we need to evaluate The inputs to these gates would come from 1 set of registers and the outputs from these gates would be stored in another set of registers The composition of each is specified in the table below I AND gates I NOR gates Design 1 I 7 I 17 Design 2 I 24 I 4 Furthermore the delay associated with each gate is also listed below AND gate NOR gate 2 picoseconds 1 picosecond Which design will lead to the shorted CC time Design 1 7 ANDs 2 psAND 17 NORs 1 psNOR 13 XORs 3 psXOR 14ps17ps39ps70ps Design 2 24 ANDs 2 psAND 4 NORs 1 psNOR 7 XORs 3 psXOR 48ps4ps21ps73ps What is the clock rate for that design in GHz Clock rate 1 CC Clock rate 1 70ps Clock rate 1 7o1o 12 Clock rate 1428 x1011 Hz Clock rate 1428 x1011 Hz 1 GHz1O9 Hz Clock rate 143 GHZ 39 The longest critical path determines the clock cycle time Question 2 Vlhthout Ioo ng back in your notes write out the register transfer language RTL for a generic MIPS R type type and Jtype instruction Dont be overly concerned with getting all of the sign extensions etc exactly right For your reference the instruction encodings for the R and J instruction types are shown below RType add 59 57 8 31 5 or funct 6 Type 31 2625 n one l r31 JType J 3 n is 25 a 0516 Targul address 26 Answers Your Solution My Solution RType get 6 MemPC Reg d 6 RegRs op RegRt Pc 6 Pc 4 IType Iwsw get Mem PC Addr 6 RegRs lt0000ngt ltimm150gt if Iw RegRt 6 MemAddr if sw Memaddr 6 RegRs 6 Pc 4 l e M or sw longer latency l 7 01 registermemory transfers in each lTYPe beCi get 6 MemPC Cond FiegFis FiegFit if cond met PC 6 new value PC 6 PC 4 signextimm150x4 if cond not met PC 6 PC 4 JType get 6 MemPC PC 6 ltPC extensiongt ltFi250gt For 0 vi There s a lot of stuff common between instructions For example RType Everything the same except what the ALU does Looking at Rtype vs Itype 0 Still lots of stuff the same register reads ALU operation register write back Question 3 In class last week you derived the single cycle datapath for the MIPS ISA A block diagram is shown below instruction Data memory Assume that the following latencies would be associated with the datapath above Questions a What typeclass of instruction in the MIPS ISA will set CC time and hence clock rate eg ALU type conditional branch load store etc b Given the latencies in the above table what would the clock rate be for our single cycle datapath Note for both a and b pay particularly close attention to the diagram shown above and try to de ne the critical path Answers ist the load instruction sets the critical path For the load we must Get the instruction encoding from memory Read data from the register file Perform an ALU operation Access data memo Write data back to the register file Note that we can ignore the overhead associated with PC 6 PC 4 bc this would happen in parallel with operations Second the clock rate would just be the inverse of the some of the delays 2ns2ns3ns2ns2ns 11 ns 111 ns91 MHZ In a single cycle implementation the instruction with the longest critical path sets the clock rate for EVERY instruction Think about Amdahl s Law this is good if we can improve something that we use often 0 but here we re sort of making a lot of things worse 0 How worse We ll see next Question 5 Let s look at the impact of the load instruction s domination of the CC time a bit more Recall that a load lw needs to do the following a Get MemoryPC b Read data from the register file and sign extend an immediate value c Calculate an address d Get data from MemoryAddress e Write data back to the register file With this design let s assume that each step takes the amount of time listed in the table a H9 H2 HQ Its l 4ns I2ns I2ns I4ns I2ns I for simpler math Part 1 With these numbers what is the CC time A 14ns However store sw instruction only needs to do steps A B C and D 0 Therefore a store only really needs 12 ns Similarly add instruction only needs to do steps A B C and E 0 Therefore could do an add in 10 ns Still with single CC design both of the above instructions take 14 ns Our CC time is limited by the longest instruction Part 2 Let s quantify performance hit that we re taking and assume that we could somehow execute each instruction in the amount of time that it actually takes Assume the following I ALU I lw I sw I Branch jump I 45 20 15 20 Per the previous discussion we can assume that ALU instructions take 10 ns lw s take 14 ns and sw s take 12 ns We can also estimate how long it takes to do a branchjump Assume that steps A B C and E are needed 0 A Fetch B Read data to compare C Subtract to do comparison D Update PC like a register write back CPU time instruction program x seconds instruction CPU time single CC 2 ix 14 ns 14i CPU time variable CC ix45x102x1415x122x10 111i Potential performance gain 2 14i 111 i 27 By using a single long CC we re losing performance Part a Unfortunately such a variable CC is not practical to implement however there is a way to get some of the above performance back To make this solution work want to balance the amount of work done per step 7 0 Because if every step has to take the same amount of time if we have 2 2 2 2 and 4 ns we re still at 4 As an example let s see what happens if we can make each step take 3 ns i x CC instruction x s CC ix45x4 2x5 15x4 15x4x3 118i CPU time 3 ns multicycle 118i is not as good as 11i but its better than 14 9 19 vs 27 Now we have a shorter clock rate and each instruction takes an integer number of CCs There is a catch to this approach We have to store the intermediate results Remember a CC is defined as the time some logic was evaluated and the result was latched and that inputs often come from another previous latch Realistically the latching time of the intermediate registers will add a nominal overhead to each stage Part 5 Let s see how the overhead might affect performance We ll assume its 01 ns CPU time 3 ns overhead 2 i x CC instruction x s CC i x 45x 4 2 x 5 15 x 4 15 x 4 x 3O1 i x 18 1 06 06 x31 i x 4 x 31 124i Again we lose a little performance but we re still better than the single cycle method 0 14i 124i 13 Every approach is better than the single cycle approach 0 111ilt118ilt13ilt14i The 118i and 13i approach are feasible Question 5 If we take the above approach how should we break up the instructions Think about each step what needs to be done to ensure Fl I and J type instructions all work Hint Think about each instruction and think about what can be done in parallel 1 IF 2 ID g EX 4 M RType E WB every every lwsw lw lw Fetch instruction at Decode instruction Compute memory Read memory store rite MDR to register address in PC type ie send opcode address result in MDR file to control logic memorydatareg every every Fttype Fttype Store result in Store results read Perform correct Write ALUout to instruction register from the register file logical operation reference 31 every jump Branch sw increment PC Compute new PC Update PC post Write to memory address 39 All Store intermediate result in ALU Note All use ALU Let s talk about more What if we have an Rtype instruction l opcode l rs it rd l shamt l function code 0 Would send rs and it encodings to RF as indices 0 Data would be stored in intermediate registers A and B What if we have a Jtype instruction l opcode l offset 0 Answer Do the same as above anyway There s no harm and no control signals based on opcode available to end of CC anyhow 0 Similarly we can calculate a new PC address ie for a jump or branch even if we don t know if we have a jump or a branch Again it does no harm it subscribes to ASAP philosophy We can do this now and save a CC later