Processor Design CS 3220
Popular in Course
Popular in ComputerScienence
This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 3220 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 7 views. For similar materials see /class/234107/cs-3220-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.
Reviews for Processor Design
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/02/15
3822 r November 18 2008 Georgia f DD Tech j mggtu mgj 0 How fast is the system What determines this 0 Dealing with Timing ie how to make stun TaSter 0 XilinxISEspecific timing issues Gear ia I l Teh mw mgg lull figs LogicGate Delay Wire Delay FlipFlopRegisterLatch Setup and Hold Times 0 Noise Gear ia I l Teh mw mgg 0 Complexity oflogic Have you minimized the Boolean expressions How many inputs How many minterms Is the overall logical structure too deep 0 Really long carry chain in adder 0 Just too much to do multiplier How is the logic physically implemented 0 Raw gates And what kind Any gates or perhaps only NAND gates 0 LUTs Gear ia I ll col Teh w Consider trying to drink a glass of water through a straw that is 6 inches long a straw that is 6 yards long 0 Which will take longer Depending on distance of sourcesink logic the wires can I otentialll add a nontrivial amount ofdelay For high clock speeds cycle times are very short and so even a small amount ofwire delay can really hurt Ll 5 3 2quot 3 39l 4 a I a a F P I Ea le 53 343513 333351 3939 I we I r J vl a an 39 39 251quot 339 51 37 3 J 1 4 u 4M 5 3511 r 1434 um a z 1 r l H r if 7 7ll t 7 Georgia ln Tech e 0 Fanout is the number of consumers of a value wire 310 foo X y 7 3 foo is equal to 3 l wire bar foo V wire 310 blah foo wire 310 blorch foo z 0 How long would it take you to drink 3 glasses of water through three straws stuck in your mouth at the same time Gear ia j il Tegh mgg Register gt D O gt lCLK tc2q gt O DATA X X STABLE t 0 Think about handing offa baton in a relay race You ve got to hold the baton steady for at least a certain minimum amount oftime for your partner to successfully grab the baton Set Up Time Then you ve got to keep holding on long enough to be sure that your partner really does have a solid grip Hold Time Georgia li golf Picture from Rabaey Chandrakasan and Nikolic TECH 39 W m Ck Picture from Rabaey Chandrakasan and Nikolic Georgia 9118 31 Tech w V V 0 Electromagnetic Radiation Ever notice that your cordless phone gets staticllty when your roommate turns on the microwave oven 0 Highenergy particle strikes Alpha particles from space naturally occurring isotopes 0 Thermal variation Circuits are slower at higher temperatures 0 part of why overclockers are so obsessed with cooling Gear ia I DU Teh w y MW J 0 Crosstalk between nearby wires S Capacitative Coupling Inductive Coupling Gear ia l il 114 Tegch w N r 3 39 mu 1 3 Ground Bounce Power Supply Noise water Tank Ishomlehoijmhn Pressure Drop Geo ngchrmgj n 2 m i C t quot MN 1 A 0 Parametric Variations in Manufacturing Not all wires are drawn the exact same width Not all transistors are drawn the exact same lengthwidth Not all transistor gates have the same thickness Not all transistor substrates have the same dopant concentrations 0 Why do you think that even for the exact same design some chips are faster than others 0 This is not noise in a dynamic sense but for the purposes of designing the timing of circuits it can have a similar impact Gear ia DU Teh I A V CIOCk CYCIe time Extra noise margin gt lt 9 decrease inf A V Clock cycle time Georgia j Tech 13 l l l l l l Not much you can do about this Also not a huge technology dependent amount you can do about this in a FPGA design there are some things you can do in an ASIC You have some control over this Georgia f coll 31 Tech mg 14 0 Alternative algorithms LookaheadCarry Adder instead of RippleCarry Adder 0 Reduces criticalpath gate delay at the cost of more logic 0 Pipelining Reduce the amount of logic per stage 0 At the cost of more stages plus the corresponding overhead More latches more control logic more bypassing 0 Faster logic implementations No real control over this at the HDL level Dynamic or Domino logic 0 Faster but consumes more power potential noise issues Georgia li of Tech w 0 Not all stages have the same amount of logic Can you shuffle some around ALUresut isoad dst Pipeline Pipeline Registers Registers From Exec To WriteBac 39 kto work with Geo ia vl Tab 16 Fanout too high Slow Wye cation ramp aquot 2 A E EH SE E I agu n 39wEir tag 395 quota Fanout Tree Fe gt Exponential Horn WVVVVW I 4 Georgia 393 Tech f g jf 17 0 Floorplanning Try to physically place the sourcesink closer together to reduce wire delays 0 Wire buffering Gear ia f I ll li Tegch nlx V N l 0 Switching from RCA to LCA requires more gates but reduces gate delay more gates may require more area thereby increasing wire delay between the gates larger area may make other modules further away thereby increasing their wire delays 0 to compensate we change their algorithms too to reduce their logic delays but that increases their area 0 to compensate we add repeaters to the wires but the repeaters take up area and cause otherfloorplanning problems etc etc etc Gear ia ll Teh w 0 What does the tool tell you After placeandroute ISE performs timing estimates of your circuits Can use PAR results to estimate wire lengths See example Gear ia I il Teh synthesis attribute ALLCLOCKNETS of FPGAtoplevel is 40 ns 0 Xilinx actually scans your comments for directives The synthesis place and route etc will attempt to optimize the logicwiring to achieve this overall clock cycle period for the entire design Warnings will be issued if it can t make timing Gear is I 1U Teh a v a a t 39A Wm 3 Mini 0 XilinxISE reports specific paths that failed to meet timing reran with zons constraint here Timing Speed Grade Minimum Minimum Maximum Maximum Summar 4 22154ns input arrival time before clock Maximum Frequency 45139MHZ 6549ns 22851ns period output required time after clock combinational path delay 8424ns WARNINGXst2245 Timing constraint is not met Possibly use any ofthe revious discussed techniques to target these paths eg pipeline it Gear ia I DU Teh w 22 t Processes for FPGAJDpJeveI wanna 39 3 222 Will give you a HTlVl L E2 232239wmi 2r TT39 ma type of report listing all 2 SE timing paths that failed to make timing go fix those 2 m i i is i 3 quotmil o If a II else fa i lS SlOW down the clock 69 Gear ia le 31 Teh mw mgg 3822 r August 26 2008 Georgia f DD Tech j mggtu mgj Slide materials based on Kime Schulte Saluja UWisc and Roth UPenn Simulation vs Synthesis Modules and Primitives Structural Descriptions Language Conventions Data Types Gear ia I il Teh mgoea mgg 0 A Hardware Description Language HDL It is a form ofa programming language Designed to support direct compilation from HDL 9 HW 0 Although not all language constructs are synthesizeable It is a parallel programming language 0 Not in the multicoremultithreaded sense 0 Statements are not necessarily executed sequentially 0 Appropriate since hardware gates execute in parallel Gear ia I l Teh w 0 Simulation process of interpreting your HDL to tell you how the code will behave For simulation pretty much all language constructs supported 0 Synthesis process of converting the HDL code into a format targeted at a specific hardware implementation or embodiment Some language constructs and their use in a Verilog description make simulation efficient and are ignored by synthesis tools Synthesis tools typically accept only a subset ofthe full Verilog language constructs Gear ia DU col Teh 0 Structural instantiation of primitives and modules 0 RTLDataflow continuous assignments 0 Behavioral procedural assignments Gear ia I 1U Teh modu1e fu11add A B CI S CO input A B CI output S CO wire N1 N2 N5 ha1fadd HA1 A B N1 N2 HA2 N1 CI S N3 or P1 CO N3 N2 endmodu1e modu1e ha1fadd X Y S C input X Y output S C xor S X Y and C X Y endmodu1e Gear ia j U Tegh mgg RTL RegisterTransfer Level modu1e fart1 A B CI S CO input A B CI output S CO assign S A A B A CI assign CO A amp B A amp CI B amp CI endmodu1e Continuous assignment Continuous assignment Gear ia j Tegch g Looks a little more CIike modu1e fabhv A B CI S CO input A B CI output S CO reg S CO required to hoid vaiues between events a1ways A or B or CI begin S lt A A B A CI procedurai assignment CO lt A amp B A amp CI B amp CI procedurai assignment end endmoduie Gear ia f co Tegch BBQ mmms 0 Gate Level and nand or nor xor xnor buf not 0 Switch Level Can simulatesynthesize to transistor level We won t use these in this class Georgia j tl 31 Tech ing MWMW 0 No declaration can only be instantiated 0 All outputs appear in list before any inputs 0 Optional delay name of instance Example and N25 Z A B C instance name Emwnmeand 10 Z A B X Hdeby X C D E deay Usually better to provide instance name for debugging txample OIquot N30 SET Q1 AB N5 N41 N25 ABC R1 Example and 10 N33 Z A B X namedelay Gear ia I il Teh t ll ll l Tlil w l 39T 0 Example N1re W Simple a connection from point Ato point B and possibly CDeu 0 Vector Buses w139 re 7 0 W1 W2 decare two 8bit buses Ranges 0 Bound to type not variable name 0 Ex w139 re 70 W1 150 W2 not allowed 0 Doesn t have to be Nl 0 for Nbit bus Exwi re 07 wl wire 158 w2 Trytojust use N l0 forthis class Gear ia I DU Teh w MEWS 0 The Module Concept Basic design unit Modules are 0 Declared 0 Instantiated Module declarations cannot be nested Gear ia j ll col Tegh mgg I Example modu1e whatisthis S A B 0 input S A B output 0 wire 0 wire S AnS an not S S and AnS A S and an B sf or O AnS BnS endmodu1e what1sthis Gear ia j Tegch g 13 0 Interface specification module mux2 57 A7 37 Or input S A B output 0 wire 0 Outputs must be redeclared W39i re so we know to use wire assignment Can also have 139nout bidirectional wire 0 InternaIVWresLeq locaV vanat es wire AnS BnS 0 Implementation not S S and AnS H s and an B s or O AnS BnS Gear ia I il Teh mgoea mgg 14 ll l w 3 13 rst r 5 t a ll V l V l 4 Hull lll 39if Am ll 393 quotin ll I Identifiers must not be keywords I Ports 0 First example ofsignals 0 Scalar eg En 0 Vector egAlOAOl D3O and DO 3 I Range is MSB to LSB I Can refer to partial ranges D 2 l 0 Type defined by keywords input output I 39inout bidirectional Gear ia I ll Teh 15 I modu39le mux2 S A B 0 input S A B output 0 wire 0 4 X mOdU1e mux S 0 Wire S AnS BnS input 10 S 2 bit seiect input 30 X 4 inputs output 0 wire 0 not S S and AnS A S and an B s or O AnS BnS wire X32 X10 mux21 M32 S0 X3 Xz X32 6 mux2 M10 SO X1 X0 X10 M4 Primitives do not endmoduie i Geo ia39l39ii 15in 16 0 By position association moduie mux4 S X 0 mux4 M4 seil0 vaiues30 mux4vai SSei10Xvaiues0nmx4vai 0 By name association mux4 M4 0mux4vai SCseilO XCvaiues3O Same port mappings More typing but helps to reduce bugs clearer Gear ia I DU oi Teh 0 Empty Port Connections modu1e mux4 s X 0 mux M4 C va1ues30 mux4va1 0 Inputs S1o are at highimpedance state 2 mUX quotLJ39hJ 1 OutputOunused Gear ia I l Teh mw mgg 0 is concatenate 0 Example assume moduie fu11addAB CIN S COUT aiready deciared eisewhere modu1e addarray A B CIN S COUT input 70 A B input CIN output 70 5 output COUT wire 71 carry fu11add FA7O ABcarry CINSCOUT carry instantiates eight fuiiadd moduies enamoau Ie Georgia f co 31 Tech W C Referred to as a memory reg NiUJ mem M 1UJ 0 Can access individual memory words as register vectors memidx N bit word at idx Gear ia j ii Tegh mgg Wm 1 Y Y e T suml5 wml j U 0 Casesensitivity Verilog is casesensitive Some simulators are caseinsensitive Advice Lon39t use casesensitive feature Keywords are lower case 0 Different names must be used for different items within the same scope 0 Identifier alphabet Upper and lower case alphabeticals decimal digits underscore Gear ia I il col Teh 21 swam 1 Y Y e T sum395 wmi j U Maximum of1024 characters in identifier First character not a digit Statement terminated by Free format within statement except for within quotes Strings enclosed in double quotes and must be on a single line Comments All characters after in a line are treated as a comment MultiIine comments begin with and end with Builtin system tasks or tunctlons begin with s Gear ia I ii Teh com 22 0 Verilog signal values 0 Logical 0 or FALSE 1 Logical 10rTRUE x X Unknown logic value 0 o 1 05 error don t know don t care z Z High impedance condition 0 not connected to anything 0 physically no current flowing Gear ia I ll Teh 23 0 Format ltsizegtltbaseformatgtltnumbergt ltsizegt decimal specification of number of bits 0 default is unsized and machinedependent but at least 32 bits ltbase formatgt 39 followed by arithmetic base of number 0 ltdgt ltDgt decimal default base if no ltbaseformatgt given 0 lthgt ltHgt hexadecimal o ltogt ltOgtoctal o ltbgt ltBgtbinary ltnumbergt value given in base of ltbaseformatgt 0 can be used for reading clarity 0 Iffirst character of sized binary number is o 1 the value is ofilled up to size IfX or zvalue is extended using X or 2 respectively Gear ia li Teh w 0 Examples 6 b010111 839b0110 8 b1110 439bx01 1639H3AB 24 J 36 1639HX 839hz gives 010111 gives 00000110 gives 00001110 gives XX01 gives 0000001110101011 gives 00011000 gives 11100 gives xxxxxxxxxxxxxxxx gives zzzzzzzz Gear ia I 1U Teh 25 Nets Used for structural connectivity 0 Registers Abstraction of storage May or may not be real physical storage 0 Properties of Both Informally called signals May be either scalar one bit or vector multiple bits Gear ia I il Teh I wire connectivity only no logical I tri same as wire but indicates will be 3stated in hardware supplyo Global net GND supply1 Global Net VCC VDD Georgia I DU col Tech wire x wire X y wire 150 data address wire address offset index tri 310 databus operandbus Value implicitly assigned by connection to primitive or module output Gear ia I 1U Teh 0 Declaration of parameters parameter A 2 b00 B 2 b01 C 2 b10 parameter regsize 8 a reg regsize 10 111ustrates use of parameter regsize Gear ia j U 1 g lCknnu J n 29 0 reg stores a logic value 0 integer stores values which are not to be stored in hardware Defaults to simulation computer register length or 32 bits whichever is larger No ranges or arrays supported May yield excess hardware if value needs to be stored in hardware in such a case use sized register Gear ia I l Teh w 3O 3822 r Septembergo 2008 Georgia f DD Tech j mggtu mgj Some examp estaken from James M Lee Ver og Omckstart 1997 0 Today s agenda What exactly is going on in the simulation What other useful constructs are there Common pitfalls Modeling style tradeoffs Gear ia I il Teh 0 To understand behavioral modeling need to understand how underlying simulation occurs Things happen in time steps Simulator keeps an event list of all events that occur Events can spawn other events Order of rocessinu events in the same time cycle depends on delays assignment types and otherfactors Gear ia li Teh w 0 Initial Ixecute only at the start of simulation tim o Execute until end of block and then end May have more than one initial block Does not mean initialization 0 think of it as start here at time zero 0 Always Execute during every time step of simulation Georgia I DU 31 Tech 0 What happens here module threeinits initial dispay Statement initial dispay Statement 2 ini ial dispay Statement J endmodule Not predictable we have a race condition and you should avoid writing anything like this let alone write something that depends on any specific ordering ofthese blocks Gear ia DU of Teh 39 What happens here assume start at time zero module threeinits initial begin 1 dispay Bock 1 S1 dispay Bock 1 S2 2 dispay Bock 1 S3 end initial begin dispay Bock 2 S1 2 dispay Bock 2 S2 2 dispay Bock 2 S3 end endmodule r J i 71 a Block 2 S1 Block 1 S1 Block 1 52 Block 2 52 Block 1 S3 Block 2 S3 Gear ia j ii Tegh Biting 0 Beginend is sequential Even though more than one statement may be executed in the same time step the simulator still honors the order Delays are additive see previous example 0 Forkjoin in parallel All statements execute concurrently Delays are independent Gear ia I ll col Teh w 39 What happens here assume start at time zero module threeinits initial fork 1 display Block 1 S1 display Block 1 S2 2 display Block 1 S3 join initial fork display Block 2 S1 2 display Block 2 S2 2 display Block 2 S3 join endmodule L J Block 1 52 Block 2 51 Block 1 S1 Block 1 S3 Block 2 52 Block 2 S3 Gear ia aw Teh mw mgg 0 You may have run across this already but any signal never seen before is assumed to be a wire Ihaven t encouraged this because omission could lead to unintended results But there are useful cases for this ie throwaway temporary signals module weird4muxinput 30 i1 input 30 i2 input 2o sel output 0 mux4 M4l1i1oi11i12i13sel1oM4l10UT M42i2oi21i22i23sel1oM420UT mux2 M2M4I10UTM420UTsel2 M4OUT not declared but legal f endmodule quot l 7 quot Can be dangerous when using different sized input vectors Gear ia j il col Teh mgg 0 We ve already seen dispay and write Like rl anC but ill inserts a n wlin n Can use some printfstyle formatting eg d 0 Not identical though h for hexadecimal as opposed to X in C b for binary doesn t exist in standard C 0 Files useful if you have a huge log of output from displaywritemonitor integerfp initial begin fp fopen my etxt fdispayfp Hello World fcosefp end endmodule Each file id only has one integer f1 f2 initial begin f1 fopen le1txt bit set so you can or them together 1 is reserved for f2 fopenfie u 7 mp mnmlpl fdispayf1 f2 fcosef1 fcosef2 end endmodule Gear ia j ii col Teh mgg Mmms Useful for large arrays of storage eg register file Memory elements are treated as a group reg 7o a bo15 c97196o reg d e813 E a lSJUSt an 8bit register b is a 16entry memory with 8 bits per entryword c is a 12entry memory with 8 bits per entryword cl isjust a onebit register e is a 6entry register with 1bit entries Gear ia I DU Teh w Mmms 0 Elements are atomic by themselves can t directly address bits within entries reg 7o a bo15 c97196o reg d e813 b3 refers to fourth 8bit word in memory b b35 ll illegal a b3 a5 this is what you need to do Gear ia j ii oi Teh mgg 0 Procedural assignment reg 7o ab destination is never a wire initial begin a 5 What ha ens reg 7o ab c d b 8 fla initial begi fork a 5 a bi b a b 8 These occur at time I c ab step ten 1039 d c1 H end I 10 a c b d Race Condition end Results indeterminate l Georgia in collco Tech 13 0 Nonblocking Assignment reg 70 ab initial begin a 5 b 8 fork a b b a join end reg 70 ab initial begin a 5 b 8 begin a lt b b lt a end end happens right away but assignment to LHS deferred until later end oftime Step i Gear ia 39YI U1 Teh mg 14 ix m 1 ve ls g8 0 Standalone 4yx wait fourtime steps and then do yx 0 Intraoperation 2 y 4 x wait two steps sample x wait four assign y 0 RisingFalling nand 3 xyA nand with 3step latency nand 24 xyz 2step rise 4step fall 0 Only for TB s During synthesis these delays are typically ignored What kind of a gate would 1 y x or y 1 x look like in real hardware Gear ia I il Teh w Pretty much all ofthe common CJava operators No incrementdecrement 0 A few extra equality operators itera equality Gear ia I DU Teh w 0 Pay attention to sizes of inputsoutputs Bitwise operators have input size output size Logical operators ampamp have output size 0 Even with arithmetic operations reg 7o X y X 8 b1011001o y 8 b10101111 wire 80 2 x y shorthand for continuous assignment 0 What is value ofwire8 Gear ia I DU Teh w 0 Discussed on news group reg 70 X wire y AX computes parity ofX wire 2 X computes NOR of all of X s bits Nbits in always one bit out watch your sizes 0 Examples amp 10101101 9 o 10101101 9 1 A 10101101 9 1 amp 11001122 9 o 11001122 9 1 A 11011122 9 X amp 1111111X 9 X 1111111X 9 1 A 1111111X 9 X Gear ia I l Teh w module mux25Labso input abs output 0 notss andAnSas and BnSbs oroAnSBnS endmodule module muszADabso input aos output 0 always 5 ifs O b else 0 a endmodule module mux2CAabso input dLb39 output 0 assign sa b endmodule Gear ia j il oi Teh mgg 19 module regl6q d clk clrn input 150 d input clk crn output 150 q always begin ifclrn o Posedge Clk q Cl 7 Subtle glitch here how else q 0 end end l Georgia l 931 Tech i 2 MMWMMWMMum a Geo ia39l39 Tgh 21 Boo Not Synthesizeable Sequential registerFSM Georgia 291 ech 39 mg 22 mee eel SA Wm H Last revision October 6 2008 This is a reference document for the ISA of the processor we will be implementing in CS3220 This document lists all of the default supported instructions as well as defines the expected behavior for each instruction The ISA is very simple it has been designed to be compact relatively easy to decode and implement as well as relatively easy to directly write assembly code for Note that this document may be updated from time to time check the llLast revision timestamp above as the course progresses to reflect changes in the ISA to improve the project experience Opcode bitmap OPCODE bit 1 0 r Arith 9 2 quot i 1 22 quot g 39 Arith w lm m o opcode bits E extended opcode bits d destination register bits r source register bits s displacement bits i immediate bits Z memory size bits X unused bits set to zero Registers and memory There are sixteen total registers RORlS in our ISA plus the program counter PC The ISA only supports a flat nonprotected physical memory space no virtual memory The ISA does not support interrupts all IO timing etc must be handled directly by the software IO devices are all memory mapped into prespecified memory addresses Memory is addressed with 32bit physical addresses although due to the limited size of the memory the upper bits are usually ignored Special Registers R0 Hardwired Zero Register value is always zero writes are ignored R14 Jump Register holds cond branch condition jump target or return address R15 Stack Register NAME Arithmetic Operation OPCODE 0000 Instruction Format 3 0 1512 118 74 Generic arithmetic operation The destination register d and source register r are combined by a function as specified by the Efield All operations are treated as 32bit integers No overflow is detected or reported for any operations EEEE 0000 0001 0010 0011 0100 0101 0110 0111 EEEE 1000 1001 1010 1011 1100 1101 1110 1111 Operation AND OR CMPZ MOV XOR SXBO SXBl SXBZ Regdddd Regdddd 0p Regrrrr ADD5U B Regdddd Regdddd i Regrrrr MU L Regdddd Regdddd Regrrrr no overflow handling NEG Regdddd Regrrrr treated as signed 32bit integer and negated via 2 s complement SHL Regdddd Regdddd ltlt Regrrrr negative shift amount undefined SHRL Regdddd Regdddd gtgt Regrrrr shiftright logical zerofill MSB s SHRA Regdddd Regdddd gtgt Regrrrr shiftright arithmetic signextend MSB s NOT Regdddd Regrrrr bitwise complement ofall bits CMPZ Regdddd Regrrrr0 1 0 compare with zero MOV Regdddd Regrrrr ANDOR XOR Regdddd Regdddd op Regrrrr bitwise logical operation on all bits SXBn signextend from byte n Signextended the source register starting from the most significant bit of byte n Specifically SianBO s Regrrrr7 Regdddd 24 ssssssss Regrrrr70 SianBl s Regrrrr15 Regdddd 16 ssssssss Regrrrr150 SianBZ s Regrrrr23 Regdddd 8 ssssssss Regrrrr230 Note that any bits left of the sign bit in the original source register will be overwritten by the sign extension procedure NAME Arithmetic Operation with Immediate OPCODE 0001 Instruction Format N 1512118 7430 Generic arithmetic operation Same behavior as Opcode 0000 but the second operand Regrrrr has been replaced with a zeroextended immediate value Regdddd Regdddd opg28 hOOOOOOOiiii for operations with two operands Regdddd 0p 28 hOOOOOOOiiii for operations with one operand NEG NOT Note that the signextend instructions are not supported with an immediate format because they simple do not make sense EEEE 0000 0001 0010 0011 0100 0101 0110 Operation ADD SUB MUL NEG SHL SHRL SHRA EEEE 1000 1001 1010 1011 1100 1101 1110 Operation AND OR XOR NAME ltlt Not Defined gtgt OPCODE This portion of the Opcode space has not been defined and has been reserved for future use During the final phase of the project you may choose to augmentextend the ISA with new instructions to yield a more efficient design This would be the logical place to define new instructions NAME Conditional Branch OPCODE 0100 0101 0110 Instruction Format 15 12 11 0 Conditional branch based on opcodeselected condition If the condition is true see below the branch is taken The taken target is computed by adding the signextended displacement leftshifted by one to the fallthrough address The displacement sfield is effectively the number of instructions to branch forward or backward by since all instructions are two bytes long All conditions are based on the jump register R14 FallThroughPC PC 2 TargetPC PC 2 signextendssssssssssss ltlt 1 BEQZ 0100 PC Reg14 32 bOOOOOO TargetPC FallThroughPC BGEZ 0101 PC Reg14 gt 32 bOOOOOO TargetPC FallThroughPC BGTZ 0110 PC Reg14 gt 32 bOOOOOO TargetPC FallThroughPC Note A llLess than condition can be generated by simply using a NEG instruction on the Jump Register prior to a BGEZ conditional branch Similarly llNotEqual to Zero can be generated by using a CMPZ instruction prior to BEQZ NAME Jump OPCODE 0111 Instruction Format 15 12 11 8 7 0 Jump to the address specified in dfield register Place the return address in the Jump register Reg14 PC 2 PC the current PC of the Jump instruction PC Regdddd NAME Load OPCODE 1000 Instruction Format m 2 8 7 Z 151 11 4 32 10 Load a value from memory The unused bits 3 and 2 should be set to zero otherwise the behavior is undefined The zfield determines the memory size of the load All byte and wordsized loads zero extend the loaded result Use the SignExtension instructions if necessary Memory operations assume a little Endan format Addresses are assumed to be aligned with the memory size failure to provide aligned addresses may result in unexpectedundefined behavior zz0 byte 8bit zz1 word 16bit zz2 dword 32bit zz3 undefined LDB Regdddd 24 bOOOOOO MEMRegrrrr LDW Regdddd 16 h000 non II Miquot 0 Hi urn Min a 1 lDD3 Reglddddl lVlElVllRegrrrr3 MEMRegrrrr2 MEMRegrrrr1 MEMRegrrrr NAME Store OPCODE 1001 Instruction Format m 2 7 Z 151 11 8 4 32 10 Store a value to memory The unused bits 3 and 2 should be set to zero otherwise the behavior is undefined The zfield determines the memory size of the store Addresses are assumed to be aligned with the memory size failure to provide aligned addresses may result in unexpectedundefined behavior zz0 byte 8bit zz1 word 16bit zz2 dword 32bit zz3 unde ned STB MEMRegrrrr Regdddd70 STW MEMRegrrrr1 MEMRegrrrr Regdddd150 STD MEMRegrrrr3 MEMRegrrrr2 MEMRegrrrr1 MEMRegrrrr Regdddd NAME Load with Stack Pointer OPCODE 1010 Same as the regular load instruction except that the address is calculated by adding an immediate value to the value in the stack pointer R15 The zfield determines the memory size of the store Addresses are assumed to be aligned with the memory size failure to provide aligned addresses may result in unexpectedundefined behavior 1512 11 8 7 2 10 addr Reg1526 000000ssssss LDSB Regdddd 24 b000000 MEMaddr LDSW Regdddd16 bOOOOOO MEMaddr1 MEMaddr LDSD Regdddd MEMaddr3 MEMaddr2 MEMaddr1 MEMaddr NAME Store with Stack Pointer OPCODE 1011 Same as the regular store instruction except that the address is calculated by adding an immediate value to the value in the stack pointer R15 The zfield determines the memory size of the store Addresses are assumed to be aligned with the memory size failure to provide aligned addresses may result in unexpectedundefined behavior 1512 11 8 7 2 10 addr Reg1526 000000ssssss STSB MEMaddr Regdddd7O STSW MEMaddr1 MEMaddr Regdddd150 STSD MEMaddr3 MEMaddr2 MEMaddr1 MEMaddr Regdddd NAME Move Immediate to Byte n OPCODE 1100011011 Instruction Format m 1512 118 70 Move an immediate value only the 8 bits no sign or zero extension into byte position n of the destination register MIBO 1100 Regdddd Regdddd3l8 8 iiiiiiii MlBl 1101 Regdddd Regdddd3116 8 iiiiiiii Regdddd70 MIBZ 1110 Regdddd Regdddd3124 8 iiiiiiii Regdddd150 MIBB 1111 Regdddd 8 iiiiiiii Reg0ddd230 Assembly Mnemonic Summary Native instructions shown in regular type eg llADD mnemonics shown in blue italics eg quotBLTZ Basic Arithmetic Operations ADD Add two registers ADDI Add register with 4bit immediate AND Bitwise AND of two registers AND Bitwise AND of register with 4bit immediate CMPZ Compare with zero MOV Move one register to another MUL Multiply two registers MU L Multiple register with 4bit immediate NEG Negate 2 s complement register NEGI Negate 2 s completment 4bit immediate NOT Bitwise NOTinversion of register NOTI Bitwise NOTinversion of 4bit immediate OR Bitwise OR of two registers ORI Bitwise OR of register with 4bit immediate SHL Left shift of two registers SHLI Left shift of one register by a 4bit immediate SHRL Logical right shift of two registers SHRLI Logical right shift of one register by a 4bit immediate SHRA Arithmetic right shift of two registers SHRAI Arithmetic right shift of one register by a 4bit immediate SUB Subtract two registers SU BI Subtract register with 4bit immediate SXBO Sign extend from MSB of byteO SXBl Sign extend from MSB of byte1 SXBZ Sign extend from MSB of byte2 XOR Bitwise XOR of two registers XORI Bitwise XOR of register with 4bit immediate 3822 r November 20 2008 Georgia f DD Tech j mggtu mgj 0 What s the difference Loosely but the two are obviously related Verification re you nunlng correc y 0 This is what engineers programmers etc usually worry about Does the divider work correctly Does the robot explode if spoken to in French Validation Are you building the correct thing 0 This is what designers marketing etc usually worry about Did the customer even need a division unit in their CPU What language did the Army specify selfdestruct codewords in l Gear ia 691 31 Teh rag l Georgia U 114 Tech 0 If testing a structurallyimplemented module can compare output to a behavioral implementation 0 What if you only have one implementation Modelchecking 0 Use some higherlevel description to describe behavior andor properties ofoperation 0 Feed implementation and modes to a model checker Model checkers make use of SATsolvers constraint solvers and other automated techniques typically either proves correctness or provides counterexample5 Georgia DU Tech I 4 0 For true blackbox testing full or even close to full coverage can be very hard to achieve in practice Consider 32bit adder two 32bit inputs 64bit test vector Brute force 264 18446744073709551616 0 That s 18 Quintillion input possibilities 0 Assuming you can test 1 Billion inputs per second this would still ta Ilte you over 584 years to verify Let s say you only have 6 months to verify the adder 0 You ll only be able to verify 0086 ofthe ossible in uts 64bit adder With a little more than 0 107 Quintillion Millennia forfull v 116000 processors you 128bit SIMD adder ha ha LlLAI 1 Gear ia 011 01 Tegch g Georgia Tech llU w ra i A a U NJ 9 mu m IA 0 IT 1 M S w d n l O n l d n l s also meaningless More vectors generall 0 How many vectors should be used y increase TaUlt coverage Should they be chosen with uniform distribution 0 Randomly select nbit vectors as sti mul 0 Circuits are usually not black boxes Grey39BOXTesting esp if we re the ones that built them in the rst place 0 So make use ofthat knowledge Airm ic fncfnr l Y test both hioh and low Gear ia j il oi Teh mgg A p t 5 0 Generally more difficult to Implement Verilog need to change interface to export internal signals can see in graphical timing viewer Hardware very expensive Gear ia I DU Teh w F m M 0 Test vector is notjust an nbit input but a sequence of nbit inputs There are effectively an infinite number of possible input sequences Luckily there are only a finite number of states duh as well as a finite number of state transitions Choose input sequences that 0 a Visit every state 0 b Traverse every edge in the state transition diagram Gear ia I ll Teh w lJJUIIIa V39HJMUIMIIIJMUMHV Input sequence of o 1 o 1 would visit all states and all edges l Geo ia Tall 10 o1o1 gt 39 gto11o FSM Implemented Correctly o 399 For this ll 0 implementation U no 1 1 two edges not This implementationjust exercised by the happens to correctly output test sequence the first four bits 0 and they re incorrect in For this this case implementation yes Gear ia j il l i Tegh mgg 11 0 By knowing the implementation you can craft specific sets of input sequences to ef ciently test the functionality ofthe FSM 0 Without such knowledge you can still test the FSM but it is a lot harder to guarantee correctness Need a lot more input sequences to provide confidence Partial information can still be used to reduce the number of test sequences 0 Eg knowing the total number of states although perhaps not the encoding or the exact transitions 0 Formal reasoning eg pumping lemma can reduce numberlength of required inputs Gear ia DU col Teh 0 Many systems contain BuiltIn SelfTesting BIST On boot up or really whenever the system wants to invoke the BIST to verify the correct operation ofthe system Older systems would display memory checks at bootup PowerOn SehC Test or POST 0 Lurrent systems still do this thEYJUSt don39t display everything anymore Gear ia I l 13 quot CQUV anw ml 13 E33 T 0 Normal verification techniques can be very slow Eg simulating thousands or millions of test vectors Simulation is orders ofmagnitude slower than real HW 9 r 9 u l ll l s quot s i 0 Idea do what you need to in simulation to get as much right as possible 0 Then let BIST simulate a much much larger number of test inputs in much less time Can build a simple PRNG and verifier Gear ia UU Teh w 0 If you add two random numbers how to you know if the sum is correct Well you would add them together 0 With what An adder right But isn t that what we re testing So how do you know the reference adder is correct 0 One approach undo the operation if youjust added XY if you subtract out Y do you get X back Assumes error itself is not symmetric in that it too does not get undone by the subtraction operation Requires easily invertable function eg not easy for multiplication 0 Consistency checks oddoddeven oddevenodd eveneveneven signs widths of operands Exploit structure of circuit numerical properties of operation etc Gear ia DU col Teh I 0 Reuse simpler components To verify the multiplier make the adder perform long addition 0 MNMMMM Ntimes 0 Use known results Don t use a PRNG for the input generator rather use known inputs 0 Or use the PRNG with a known seed so that you still know what test sequences it will generate Keep a table of expected outputs Gear ia I DU Teh w 0 What to do next 0 Need observation and control For our projects we can make use ofthe LCD and LEDs out we directly wire specific sngnals to these outputs also not convenient for observing a lot of information 0 eJ all ofthe state inside the processor could be many kilobytes of registers flipflops etc Gear ia I DU Teh w 3g501euole n o 3 939 E FF 0 2 9L I 9 a 3g501euole 3 i01ng012uquu03 ScanIn PortPir I ScanOut PortPin Forms one 7 if antic shift register d Geo ia i 39 13h 18 l lm i 4 To Next Scan Register in Chain A Normal Input Normal Output VI VII IUIIL ll l l MI Llul JMUII not all registers are connected into the chain Scan Mode From Previous Scan Register in Chain L l Geo ia i ngch 19 3822 r September16 2008 Georgia f DD Tech j mggtu mgj 0 Last time we looked at very structural implementations of FSMs module myFSM ports reg 30 state In COITEI OI over Implementation wire 30 staten and maintains synthesizability structural constructs for nextstate and output assign stateno in1 7 in2 in3 wire temp andtemp in1 in2 in4 But sometimes you don t want to deal with each and every AND gate assign out1 temp state update always posedge clock 0 w i and Inverter begin ifreset state lt 439boooo else Even this isn t strictly structural state lt staten end endmodule l Gear iaquot Teach 0 Use two always blocks First one corresponds to the implementation ofthe next state and output logic 0 triggered on changes to any of its inputs 0 primarily for simulation efficiency Second one is pretty much the same as before state transfers 0 triggered on clock and any other asynchronous inputs eg reset Gear ia I ll Teh w 2bit graycode counter 00901911910 outputy goes high when state is 2 b11 module FSMfooclock reset y Input CIOCK reset output y reg 10 state wire 10 staten output logic andystate0state1 nextstate logic bufstaten1state0 notstaten0state1 state transition always posedge clock or nededge reset begin ifreset state lt 239b00 else state lt staten end module FSMfooclock reset y input clock reset output y reg 10 state reg 10 staten This is defined as a register now R always state begin I I output logic Compiler wrll Y Statelol amp statell detect that this 7 39 7 next state logic ureglsteru IS not staten1 state0 stateno state1 if needed and end state transition always posedg begin ifreset This allows you to state lt 2 boo else make procedural state lt staten assignments end 39 endmodule Geo ia l 011 7 13h 4 Counter thatjust counts up when counter value reaches max assert output signal but for a parameterizable number of cycles moame countericlocK reset out parameterWidth 8 parameter HoldLength 1 input clk reset output out reg Width10 counter reg Width10 countern reg Width10 hold reg Width10 holdn assign out hold 0 always counter or hold begin countern counter 1 ifhold 0 ampamp counter 1ltltWidth1 39 always posedge clock or negedge reset begin ifreset begin counter lt 0 now lt 0 end else begin counter lt countern hold lt holdn end end endmodule example instatntiation counter 512 myctrclk myctrgo myctrout holdn 1 else ifhold gt HoldLength holdn 0 else ifhold o h 7 n h l else holdn hold end set hold still holding increment Georgia f I il Tech 5 reg 10 state casestate 239boo staten 2 b01 239b01 staten 239b11 Combine with define state names 2 b10staten 2 boo 2 b11staten default staten 2 boo endcase ifreset state lt 0 else state lt staten end Georgia I 931 Tech J 6quot 0 In this course we will implement a simple processor The CPU will implement a simple ISA Combination of some RISC and some CISC qualities 0 Overview 32bit Architecture 0 32bit registers 32bit PC 32bit memory Compact instructions 0 Only 16 bits per instruction 0 Accumulatorstyle register specifications Gear ia I DU Teh w m gt3quot s al Al 2 0 16 general purpose registers Ro R15 32 bits each Rois hardwired to always equal zero 0 any writes to R0 are ignoreddiscarded R14 is the jump register 0 used in conditional branches R15 is the stack pointer 0 Memory is a simple flat 32bit physical space No translation page tables TLBs etc No protection or ownership O is memory mapped Gear ia I DU Teh w Arith Arith w Imm 1me rm 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 m Thwa 4 WWW r r w a Spa 131 331330 1111 110 g f lt 3ltlif 111r7 1 111 g V quot L quot Vquot 1 El 39 7 39 39 397 LEE If H 11 Wng H M I BO W W 1 115111111 GE M39Bl 11M 11 1 71 1 1 1 10 M 32 1 1111 MB3 7 1r V 1 w r 111mm m Cn a V r L 7 11 II1 r 1 TJVL Cur QYIAVJ Gear i3quot Teach OPCODE l bit 15 0000 a 0 Accumulator style register specifiers Only specify one destination and one source Other source is the destination EX ADD R2 R3 9 R2 R2 R3 This is more CISCy eg X86 uses this approach 0 Specific operation specified by the Efield bits3o MIPS a RISC ISA has similarformat EEEE 0000 0001 0010 0011 0100 0101 0110 0111 EEEE 1000 1001 1010 1011 1100 1101 1110 1111 Operation AND OR NAND NOR XOR SXBO SXBl SXBZ rgiafvl l Tech 10 ADD SUB MUL XOR AND they do what you expect Regd Regd opE RegLrJ SHL SHRL SHRA shift IeTtrlgnt two flavors of shiftrlgnt 0 SHRL Shiftright Logical MSBs filled with zero s 0 SHRA Shiftright Arithmetic MSBs filled with original msb NEG NOT only one operand Regd Regr Regd Regr SXBn Sign Extend from Byte n these end up being useful for loading immediate values Gear ia DU Teh I 2030 0 5X80 signextend from byte 0 0 If R1 contains 32h 1o37A20C then SXBo R3 R1 gives R1 00010000 00110111 10100010 00001100 R3 00000000 00000000 00000000 00001100 0 If R1 contains 32 h1o37AzB6 then SXBo R3 R1 gives R1 00010000 00110111 10100010 10100110 R3 11111111 11111111 11111111 10100110 0 SXB1 SXBz do the same but from other positions SXB1 00010000 00110111 10100010 10100110 SXBz 00010000 00110111 10100010 10100110 Gear ia DU 121 Teh I OPCODE bit 15 0001 0 Same as regular arithmetic but source is now a 4bit immediate value EX ADD R2 3 9 R2 3 The immediate value is zeroextended Only lets you use a constant up to 15 but most constants are small 0 Would like longer immediates but this is a tradeoffto keep instructions format small and uniform lengthed Example SUBI R7 13quot 9 0001 0111 1101 0001 Gear ia f il Tegch g 0110 0 BEQZ BGEZ BGTZ Taken branch if Equal to Zero gt Zero gt Zero PC PC 2 signextendsfield For nottaken PC PC 2 as usual If What is gtgt than zero Implicitly usejump register R14 More CISCy style Some X86 instructions use EAX as sourcedest without explicitly naming it Tradeofic between flexibility testing any register versus branch distance size of sfield Gear ia j il cal Teh mgg N T g 0 BNEZ BLTZ BLEZ don t exist 0 Must use twoinstructions sequence Instead of BNEZ 18 need to use 0 NOT R14 R14 BEQZ 18 Later when we build an assembler we will allow BNEZ BLTZ and BLEZ mnemonics 0 assembler responsible for expanding into twoinstruction sequence sort of like preprocessor macros Gear ia I ii ii Teh w OPCODE l bit 15 0 PC lt Regd R14 lt PC 2 jump register stores return address I Note parallel assignment 0 Return address can be saved for subroutine exit JMP R6 assume function address stored in R6 Store R14 9 memory do stufic Load R2 6 memory JMP R2 return ignore address written to R14 l Gear ia il 114 Teh 0 zfield specifies memory operation size 00 B byte 01 W word 10 D double word For byte and word loads upper bits zero extended 0 Use SXBn instructions if necessary LDW R1 R2 9 R1 16 hoooo MemR21 MemR2 Assumes aligned accesses Behavior undefined for unaligned access 0 Stores are similar but no zero extension issues Gear ia f I ii Tegch Similar to regular load but register operand replaced with implicit stack pointer R15 plus a zeroextended immediate displacement sfield LDSW R3 24 9 R3 16 hoooo MemR15241 MemR1524 Convenient for quick accesses to local variables Assumes stack frame is 64 bytes or less Assumes stack grows upwards 0 this is different than mix for example Gear ia j il li Tegch mg 18 OPCODE bit 15 iii 113 l 1 i a 39 I 39 f l MIBO I lVllBl lVllBZ lVllB3 0 Ideally we would like to directly say something like MOV R1 oxdeadbee39rquot which would cause the 32bit literal to be placed in register R1 Note our 16bit instruction can t possibly do this in a single instruction Could get away with a single moveimmediate instruction requires multiple instructions with extra shiftsmasks to load a full 32 bit value into a register We waste a bit of opcode space for four distinct instructions when combined with SXBn insts you can load constants into registers relatively quickly Recall that four 4bit immediates we can use Arith w Imm instructions Gear ia j l Teach mg 19
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'