×
Log in to StudySoup
Get Full Access to NYU - ENGR 2214 - Study Guide - Final
Join StudySoup for FREE
Get Full Access to NYU - ENGR 2214 - Study Guide - Final

Already have an account? Login here
×
Reset your password

cs2214

cs2214

Description

School: New York University
Department: Mechanical Engineering
Course: Computer Architecture and Organization
Professor: Haldun hadimioglu
Term: Spring 2017
Tags:
Cost: 50
Name: Comparch Final Study Guide
Description: Covers the whole semester
Uploaded: 05/05/2017
109 Pages 311 Views 0 Unlocks
Reviews


Shlomi Oved CS-UY 2214 Comp Arch Notes Lecture • CS2214- Computer Architecture and Organization- • CS Course • Hardware Course o Designing Computers ▪ Designing Microprocessors ▪ Designing Memory o Design by using state machine technique ▪ Design any digital system this way. • Such as processors, GPU, memory, I/O Controllers,…  • Computer Classification (Based on speed, cost, power consumption, weight,  size) o Super Computers ▪ Scientific Apps o Servers (Mainframe) o Desktop Computers o Embedded Computers • Computer Architecture o View of a computer by a machine language programmer ▪ A machine language is in terms of ones and zeros. ▪ Instruction set (command set), data representation, memory,  Input/Output, Control modes,…  • Intel x86->CISC (Complex Instruction Set Computer) Arch. • MIPS (cable box), ARM (mobile)->RISC (Reduced  Instruction Set Computer) Arch.  • We will design a MIPS based system. • Intel RISC: Atom (Intel gave up on mobile processing to  ARM). • Computer Organization is a set of resources that implement (support)  architecture. o Central Processing Unit (CPU) or Core, Memory, I/O Controllers o CPU runs Machine Language programs1 Shlomi Oved CS-UY 2214 Comp Arch Notes CPU Cache  Memory  (SRAM) Single Core Processor Memory I/O Controller Keyboard DRAM Buses I/O Controller Mouse • Designing a Computer I/O Controller Hard Disk Flash-EPROM  Memory (SSD)o MIPS-microprocessor based computer->A RISC ���������� o Use design principles ▪ Make common case fast ▪ Smaller/Simpler systems and concepts are faster/cheaper ▪ RISC systems! o Use layered-computer-design concept ▪ Top-down design 1. Application Layer o Scientific Apps a) Simple data elements- Integers & Real numbers (Floating-Points  numbers) b) Arithmetic/Logic operations c) Complex data elements: Arrays (vectors, matrices) d) Step through elements of arrays ▪ Loops 2. Computational Method Determines nature of operations & operands and how operations are  initiated a) Control Flow Operations are performed according to a list of sequential operations Control flow hides parallelism Ex: � = � + � ∗ � − � /� ADD SUB 2 Shlomi Oved CS-UY 2214 Comp Arch Notes MUL DIV b) Data Flow- An operation starts when its operands are ready.  Parallelism is exposed. B C D E F + - * / Ac) Demand-Driven: operation is initiated when its result is demanded:  Parallelism is exposed. d) Systolic computing- Limited for scientific application, horrible for  other applications (web and others). e) Neural Computing- Emulates neurons using transistors.  3. Algorithms: an algorithm is a mechanical procedure that accepts inputs and  generates results after performing a finite number of steps. o Sequential algorithms!!! 4. High-Level Language: Programs implementing algorithms where steps are  converted to statements. o Sequential programs o FORTRAN, C (programming language) 5. Operating Systems: Needed to abstract (hide) hardware details • Unix, Linux,  6. Computer Architecture: Machine Language instructions, Memory, Input/Output,… and others form  the Computer Architecture o Sequential instructions 7. Computer Organization- (Microarchitecture): Set of resources to support Computer architecture: o CPU, Memory, I/O Controllers o One CPU->Sequential system 8. Digital Logic: Digital (logic) circuits implement CPU, memory, I/O  Controllers, etc… o Gates & Flip-flops form digital circuits o A Flip flop stores a bit 3 Shlomi Oved CS-UY 2214 Comp Arch Notes 9. Transistor: Digital circuits (gates & flip flops) are implemented by transistor  circuits (Full parallelism with Digital Logic/Transistor-FPGA chips- >reconfigurable chips. Chips that can do programs without software) Recitation Joe- JAB995@nyu.edu Comp Arch- Section B1 • Number Systems • Binary (base 2)- Unsigned (only positive) and 2s complement (positive  and negative numbers) • Hexadecimal (base 16)- 0-9, A-F • Decimal (base 10) • �!10! + �!!!10!!! + �!!!10!!! + ⋯ + �!10! + �!10! • Decimal to Binary (unsigned) • 50 !" → ? !"#$%"&' • !"! = 25, � = �, ����� ����������� ��� (���) • !"! = 12, � = � • !"! = 6, � = � • !! = 3, � = � • !! = 1, � = � • !! = 0, � = � ���� ����������� ��� (���) •    50 !" → 110010 !"#$%"&' • Binary to Decimal • (110010)!"#$%"&' • Generalized Formula (b=base): • �!�! + �!!!�!!! + ⋯ + �!�! + �!�! • 1 ∗ 2! + 1 ∗ 2! + 0 ∗ 2! + 0 ∗ 2! + 1 ∗ 2! + 0 ∗ 2! • 32 + 16 + 12 = (50)!" • 2’s Compliment • Look at MSB: • 1. Even (0) it is positive • 2. Odd (1) it is negative • 50 !" → ? !!! !"#$%&#&'( • You should get 110010. Since the MSB is 1 (odd), then you need to add a  zero since 50 isn’t negative. • (0110010)!!! !"#$%&#'() = (50)!" • Unsigned and Positive 2’s compliments numbers4 Shlomi Oved CS-UY 2214 Comp Arch Notes • Appending zeros to left side of MSB does nothing to change the number • Negative Decimal to 2’s Compliment • −50 !" • 1. Convert negative decimal number into positive decimal number. • −50 !" → (50)!" • 2. Get binary number of positive decimal number. • (50)!" → (0110010)! (Extra zero because its 2’s compliment and it is  positive 50 not negative 50. • 3. Convert current binary number into negative binary number o Flip bits (Turn zeros into ones and ones into zeros) o 0110010 → 1001101 o Add 1 (positive 1) o 1001101+0000001=(�������)�!� ���������� • −�� �� → ������� �!� ���������� • Negative 2’s compliment to decimal • 1001110 !!! !"#$%&#'() • 1. Convert to positive 2’s compliment  (flipping bits and adding 1) • 1001110 !!! !"#$%&#'() → 0110001 !!! !"#$%&#'() • 0110001 + 0000001 → (0110010)!!! !"#$%&#'() → 2! + 2! + 2! = 32 + 16 + 2 = 50 • Decimal to Hexadecimal (base 16) Decimal Hex Binary 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 10 A 1010 11 B 1011 12 C 1100 13 D 1101 14 E 1110 15 F 1111


o How much is the CPU involved?



We also discuss several other topics like ws 104 study guide

• (50)!" • Method 15 Shlomi Oved CS-UY 2214 Comp Arch Notes • 1. Convert decimal to binary • 2. Convert binary to hex • 1.  50 !" • !"! = 25 � = 0 ��� • !"! = 12 � = 1 • !"! = 6 � = 0 • !! = 3 � = 0 • !! = 1 � = 1 • !! = 0 � = 1 ��� • (110010)! → 11 − 0010 → 0011 − 0010 ��� �ℎ��� �� ������ → 0011 = 3, 0010 = 2,�ℎ������� �� ��� ↔ (��)�� ↔ ���� • Method 2 • !" !" = 3 � = 2 ��� • !!" = 0 � = 3 ��� • (��)��� • −50 !" → 11001110 !!! !"#$%&#'() → (��)!"# • Unsigned Binary Addition • (50)!" + (25)!" = (75)!" • (25)!" → (11001)!"#$%"&' • 110010(50) + 011001(25) = 001011 ���ℎ ����� ��� �� 1 • Overflow of 1 in unsigned binary addition • 2’s Compliment Subtraction • (50)!" − 25 !" = 50 + (−25) • 0110010 (50)+ 0011001(-25) • 1. Change sign of second binary number (Flip bits and add 1) • 2. Perform an addition • 1. 0011001 → 1100110 + 1 → 1100111(-25) • 2. 0110010+1100111=0011001 (with carry out of 1) • Overflow (2’s compliment) • 1. After an addition between a positive and negative 2’s compliment number  there will NEVER be overflow • 2. Add 2 positive numbers and result is negative • 3. Add 2 negative numbers and result is positive. • Lecture 1/30/17 • Computer Layers 1. Applications 2. Computational Method6 Shlomi Oved CS-UY 2214 Comp Arch Notes 3. Algorithms 4. High-Level Language 5. Operating Systems 6. Computer Architecture (HW/SW Interface) 7. Computer Organization 8. Digital Logic 9. Transistor Designing MIPS-Based Computer 1. Applications: Scientific • Number crunching a) Simple data elements: o Integers, Floating-Point numbers b) Operations on them: o Arithmetic/Logic c) Complex data elements o Arrays d) Operations on them: o Stepping through elements ▪ Loops Developing programs ▪ A compiler translates a high-level language (C/C++/Python/…) to a  special file called object file: o This object file has: o Machine language program (text segment) o Global (static) data o Information needed to link the program to other programs. ▪ The object file is stored to the disk. ▪ Linker links separately compiled programs and libraries and  generates executable file then stores to disk. ▪ Loader loads executable from disk to memory to run software (app) ▪ DLLs= Dynamically Linked Libraries ▪ Then, CPU runs software (app) by accessing memory 6. Computer Architecture o View of computer by a machine language programmer: o Machine language instruction, data, representation,  memory, input/output,… 1. Data Representation o Word length=32 bits o Largest integer size I. Integers (1 byte=8 bits) o 1-/2-/4-byte Unsigned numbers o 1-/2-/4-byte 2’s Complement numbers II. Floating-Point o 32-bit (Single precision) o FP numbers7 Shlomi Oved CS-UY 2214 Comp Arch Notes o 64 bit (Double precision) o FP numbers o IEEE-754 FP Standard III. Characters o 8-bit ASCII code o 16-bit UNICODE 2. Register Model o A register keeps information o They keep data and addresses o They are faster than memory o They keep operands/results of A/L operations o Registers are in CPU & I/O Controllers I. 32  32-bit integer registers (General-Purpose Registers =GPRs) They  contain also addresses o R0, R1, R2, …, R31 o R0 is always 0 o R31 keeps return address from functions II. 2 32-bit registers to keep results of integer MUL & DIV (Hi, Lo) (Together they keep 64 bits to avoid overflow) III. 32  32-bit FP Registers o F0, F1, F2, …, F31 o �(!"!#), �(!"!#!!) → ��: (�!, �!) IV. A 1-bit Cond register to keep result of FP compares o It is a flag V. A 32 bit instruction pointer, pointing at the next instruction to run o Program Counter (PC) (the control flow) VI. Registers for Os o System registers o Status register VII. Registers for I/O Controllers o Status (Condition) Register o Data Register Lecture- 2/1/17 Designing MIPS-Based Computer 1. Applications: Scientific • Number crunching 2. Computer Architecture a. Data Representation b. Register Model & Word Length • CPU & I/O Controllers have registers c. Input/Output • Input≡Data transfer from an I/O Device to memory • Output≡Data transfer from memory to an I/O Device • Memory ↔ I/O Controllers ↔ I/O Devices (Buses and Ports)8 Shlomi Oved CS-UY 2214 Comp Arch Notes • Methods & forms of data transfers & what happens with I/O  completions & I/O problems. o How much is the CPU involved? ▪ CPU should interact with I/O Controllers so they  would continue independently on their own. ▪ How does the CPU point at them? ▪ Memory-Mapped I/O≡I/O Controllers are  treated like memory≡One set of instructions for  both memory and I/O Controllers. (LW-Load  from Memory & SW-Store to Memory for both) Memory I/O  Mem. Ctrls One address space▪ ▪ Virtual memory does not allow memory space  size to be affected d. Control Modes: A control mode indicates which operations &  addresses are applicable at the moment to the user o User vs system (kernel) modes o State of the program must be saved before switching to  the other mode & restored to resume the program. o PC (Program counter), GPR (General Purpose  Registers)(with software conventions we reduce the  number of registers needed to be saved), flags,… o Management of transition (switch) between modes e. Interrupts: An interrupt is an event that forces CPU to stop  running current program & start running interrupt program  (function, handler) I. External: Reset, Timer, I/O Controller,… II. Internal≡Exceptions: Arithmetic Overflow, invalid address III. Software≡TRAPS≡ System Calls f. Addressing≡ Accommodating memory & I/O  Controllers≡Accomodating Instructions + data + addresses o 32 Address bits→ 2!"����� (4 ��) o Memory is a 2-d array ▪ Rows ≡ Locations • Locations have unique  identifiers=Addresses ▪ Columns ≡Bits/location 9 Shlomi Oved CS-UY 2214 Comp Arch Notes • 32 bits/location ▪ Byte addressing • Big Endian Addressing (Left to  Right) 100 104 108 10C Memory 100 101 102 103 104 105 106 107 108  109 10A 10B 10C 10D 10E 10F ▪ No word boundary crossing ▪ (you can’t cross into another location such as  show below) Memory • Address if item is divisible by length of item  in bytes. • Sizes of Address Space Portions User: 2GB System: 2GB Memory I/O  Ctrl0 7FFFFFFFCFFFFFFFC 80000000 10 Shlomi Oved CS-UY 2214 Comp Arch Notes 2/3/17- Recitation Digital Logic and Operators AND ((X*Y)=output) Truth Table  X Y Output 0 0 0 0 1 0 1 0 0 1 1 1


▪ How does the CPU point at them?



Don't forget about the age old question of What does the body use for energy?

OR (x+y=output) Truth Table  X Y Output 0 0 0 0 1 1 1 0 1 1 1 1

NOT (� = ������) Truth Table X Output 0 1

Don't forget about the age old question of  What are the PA guidelines for?

11 Shlomi Oved CS-UY 2214 Comp Arch Notes 1 0

NAND �� = ������ Truth Table  X Y Output 0 0 1 0 1 1 1 0 1 1 1 0


• a=b+25 (How do we deal with constants?



NOR (� + � = ������) Truth Table  X Y Output 0 0 1 0 1 0 1 0 0 1 1 0

Don't forget about the age old question of vinish shrestha

XOR (exclusive-or)  �⨁� = 0 (When you have odd number of ones then you output a one) Truth Table X Y Output 0 0 0 0 1 1

12 Shlomi Oved CS-UY 2214 Comp Arch Notes 1 0 1 1 1 0

Ex:  X Y Z Output 0 0 1 1 0 1 1 0 1 1 1 1

Software Conventions for Register Usage R0: Always ∅ (read only) R1: Assembler Temporary Register R2-R3: Function Results R4-R7: Function Arguments (parameters) R8-R15: Temporary Registers R16-R23: Temp Registers (saved by the callee) R24-R25: Temporary Registers 400,000 �� �9,0 �8 400,004 �� �10,4 �8 400,008 ��� �11, �9, �10 400,00� �� �11, 8(�8) C-like Int a,b,c; a=b+c; MIPS/EMY Assembly Language Program LW  $tl, b LW  $t2, c ADD  $t3, $t1, $t2 SW  $t3, a b:    .word 0x65E c:    .word 0x2C a:    .word 0x0 MIPS/EMY Mnemonic Machine Language Program 400,000 �� �9,0 �8 #�9 ← � 100000000 , �9 ← 65�13 Shlomi Oved CS-UY 2214 Comp Arch Notes 400,004 �� �10,4 �8 #�10 ← � 100000000 + 4 , �10 ← 2� 400,008 ��� �11, �9, �10 #�11 ← �9 + �10, �11 ← 65� + 2� 400,00� �� �11, 8(�8) #� �8 + 8 ← �11, 0�10000008 ← 68A Global Data 10000000 65E   LW R10,4(R8) 10000004 2C      �10 ← � 100000000 + 4 10000008 68A Note: R8 has 10000000 LW Rt, Offset (�!) �! = ������ �������� �� ���� �! ← �[�! + ������] ADD �!, �!, �! #�! ← �! + �! SW �!, ������(�!)   �[�! + ������] ← �! Word is 32 bits 4 bytes ahead for memory changes 4*8=32 bits Add R11, R9, R10 R9+R10=R11 65E+2C=0110 0101 1110 + 0000 0010 1100= 0110 1000 1010=68A R11= 68A Handout #4 Note 3: - Memory is passive - Memory can’t perform architectural operations on data - It can only keep instructions and data - Another unit is implied in Machine Language Programs to manipulate memory  locations - This implied unit is visible in microarchitecture (CPU) Note 4: RISC Computers -Require explicit instructions to read data to the CPU from memory- -Similar for storing data to memory(sw) - Only LOADs and STORE, can access memory CISC Computers -Have more complex arithmetic/logic/floating-point instructions which can specify  many locations rather than registers RISC (See Table on Handout 4, Note 8)14 Shlomi Oved CS-UY 2214 Comp Arch Notes There are 7 Memory accesses in that table However for CISC: Memory Accesses: 1 for Instruction Read 2 for Data Read 1 for Data Write Only 4 Memory Accesses ADD M[z] M[x] M[y]

Don't forget about the age old question of polykin

02/06/17- Lecture Designing a MIPS-based Computer 1. Applications • Scientific Apps o Number crunching operations 2. Computer Architecture a. Data Representation & Word length b. Register Model c. Input/Output d. Control Modes e. Interrupts f. Addressing g. Machine Language Instructions≡Machine Language Instruction  Set≡Architectural operations, Arguments & Addressing modes • An application specifies operations • App operations are implemented High-Level language (HLL)  statements • HLL statements are implemented by Machine Language (ML)  Instructions • An app operation is implemented by Multiple HLL statements. • A single HLL statement is implemented by Multiple ML  instructions • A HLL statement has variables & operation(s) • A ML instruction has architectural operations, arguments &  addressing modes • Memory is needed to keep variable?!! o Each variable is a memory location ▪ Static (global) & local variables o Today: Memory can’t perform operations & so we need  CPUs A=b+c HLL Statement with 3 variables and an operation 1. ADD (R8) (a), (R2) (b), R(10) (c)  (ONLY ONE ML INSTRUCTION!!!) CISC Instruction with 3 memory accesses for data 2. RISC Instructions15 Shlomi Oved CS-UY 2214 Comp Arch Notes LW    R9, ∅(R*) LW   R10, 4(R8) ADD  R11, R3, R10 SW    R11, 8(R8) 4 ML Instructions Ex: CISC vs. RISC A=b+c D=a+e F=d+b+c 10 Memory Accesses for data CISC (actually 12 can’t add 3 digits at once) In the worst case: 8 Memory Accesses for data RISC In the best case: 3 Memory Accesses for data RISC Usual case: 6 Memory Accesses for data RISC ▪ Arithmetic Logic CISC instructions access memory for data. But arithmetic  logic RISC instructions do not. o RISC≡ �/� ���ℎ�������� (only load and store instructions access  memory for data). ▪ Machine Language Instructions≡ 1s & 0s ▪ Mnemonic Machine Language Instructions o A mnemonic for each architecture operation o All instruction & data addresses specified o All addresses & data elements are specified in Hexadecimal coding o Special characters to specify registers & addressing modes ▪ MIPS Instruction Set o EMY Instruction Set≡ 9 �����������s ▪ Arithmetic Logic, Data transfer, Control instructions ▪ A ML instruction specifies an architectural operation, arguments &  addressing modes. ▪ Each ML instruction uses an instruction format that indicates how to  interpret bits o Register, Immediate & Jump ▪ Instruction format fields o Opcode specifies architectural operation o A number of register fields to specify registers o If necessary additional fields to specify addressing modes o 32-bit Integer Addition (2’s Complement) ▪ Syntax: • ADD, Rd, Rs, Rt ▪ Semantics • �� ← �� + �� THEN • If overflow, generate an internal interrupt (execption) ▪ Format, Addressing Modes, memory accesses16 Shlomi Oved CS-UY 2214 Comp Arch Notes • Format: Register format since Rd is needed Opcode  (6 bits) Rs (5  bits) Rt (5  bits) Rd (5  bits) Shift  Amount  (5 bits) Function  (6 bits) 000000

Add ←10000

If you want to learn more check out 1) What determines the death of a star in the SS?

• Function= 2nd Opcode • An addressing specifies how an argument is pointed: An address or an instruction • 3 arguments: Rd, Rs, Rt • Rd is a destination register directly indicated by  instruction→ Register Addressing Mode • Rs & Rt are source registers directly specified by  instructions → Register Addressing Mode • Number of memory accessed made (by CPU) to run  instructions: • 1 memory access to fetch (read) instruction 7. Machine Language Instructions≡Machine Language Instruction  SH≡Architectural operations, Arguments & Addressing Modes • 32-bit 2’s Complement Int o Addition ▪ Syntax: • ADD Rd, Rs, Rt ▪ Semantics ▪ �� ← �� + �� then If overflow, generate Arithmetic  overflow Internal Interrupt (Exception) • Format, Addressing modes & memory accesses o Format: Register format since Rd is used o 3 Arguments: Rd, Rs, Rt ▪ 3 addressing modes ▪ Rd is a destination argument, a register, directly  specified by instruction→Register Addressing Mode: ▪ Rs & Rt are source arguments, registers, directly  specified by instruction: Register Addressing Mode ▪ 1 memory access made by CPU to run instruction (Instruction fetch) • 32-bit 2’s Complement Int Subtraction ▪ Syntax: SUB Rd, Rs, Rt ▪ Semantics: �� ← �� − �� then if overflow, generate an  arithmetic overflow internal interrupt (exception • Format, etc… o Format: Register format since Rd is needed o The Rest is the same as ADD17 Shlomi Oved CS-UY 2214 Comp Arch Notes Register Format 6 5 5 5 5 6 Opcode Rs Rt Rd Shift  Amount Function 000000

100000→ ��� 000000

100010 → ��� 000000

101010→SLT 000000

100100→ AND 000000

100101→ OR

• 32-bit 2’s Complement Compare (Less Than) o Syntax: SLT, Rd, Rs, Rt o Semantics: If Rs<Rt then �� ← 1 else �� ← ∅ • Format, etc… o Format: R format since Rd is needed o 5 arguments: Rd, Rs, Rt, ∅, 1 o Rs, Rt, Rd: Same as ADD, ∅ & 1 are implied by Instruction→ ������� ���������� ���� o 1 memory access made by CPU your instruction:  Instruction fetch(read) • 32-bit AND o AND Rd, Rs, Rt o �� ← �� & �� (& → ������� ���) o Rest is same as ADD except function field o AND R8, R9, R10 o �8 ← �9 & �10 R9 . . . 0 1 0 1 1 0 R10 . . . 0 1 1 0 1 0 R8 . . . 0 1 0 0 1 0

• 32-bit OR • OR Rd, Rs, Rt • �� ← ��|�� (| ������� ��) • Rest is same as ADD except function field • OR R8, R9, R10→ �8 ← �9|�10 R9 . . . 0 1 0 1 1 0 R10 . . . 0 1 1 0 1 0 R8 . . . 0 1 1 1 1 0

• These 32-bits refer to an instruction: • 0000 0001 0110 1001 0101 0000 0010 0000 • Start by check the left most 6 bits (the Opcode) • It is: 000000 (Which indicates the Register format) • We have to look at the last 6 bits18 Shlomi Oved CS-UY 2214 Comp Arch Notes • Opcode: 000000 • Rs: 01011= R11 • Rt: 01001= R9 • Rd: 01010= R10 • Shift Amount: 00000 • Function: 100000 • ��� �10, �11, �9 • 32-bit Load from Memory o LW Rt, Disp(Rs) o �� ← � �� + ���� ��������� ������� • Format, etc. o Immediate Format since a displacement is needed o 2 Arguments: Rt is a destination argument, a register,  directly specified by instruction: Register Addressing Mode o A memory location is source argument whose address is  calculated by adding Rs & sign extended Displacement: o 2-byte signed Displacement Addressing Mode o 2 memory accesses: 1 to fetch instruction and 1 to read a  data element. • 400000  LW  R9, ∅ �8 → �9 ← � �8 + 0! → �9 ← �[�8] • 400004  LW  R10, 4(R8)→ �10 ← � �8 + 4! → �10 ← �[�8 + 4] • 400008  ADD R11, R9, R10 • 40000C  SW R11, 8(R8)→ � �8 + 8! ← �11 → �[�8 + 8] ← �11 • ≡ • 10000000 65E← �8 • 10000004 2C • 10000008 0← �8 → �� �9, (−8)!"(�8) LW R10, (−4)!"(�8) SW R11, ∅(�8)

Immediate Format 6 5 5 16 Opcode Rs Rt Immediate 100011→LW

• 32-bit Store o SW Rt, Disp(Rs) o � �� + ����! (��������� �������) ← �� • Format, etc. o I format since a Disp is needed o 2 Memory Accesses: IF and DR Recitation- 2/10/17 • Handout 4- Page 4 #2 19 Shlomi Oved CS-UY 2214 Comp Arch Notes • Write a mnemonic machine language program that implements the following  high-level statement: • � = �|� • Statement ��! two variables and stores the result in another variable. • Assume: variable “a” is in memory location 10000004 •                                    “b” is in R8 and contains 27, •                                     “c” is in memory location 10000000 containing 4A0. • This program starts at location 400000. R9 contains 10000000 initially. • 400000 LW R10, 0(R9) #Load from memory to register:  �10 ← � 10000000 , �10 ← 4�0 • 400004 OR R11, R8, R10 #OR two registers: �11 ← �8|�10, �11 ← 4�7 • 400008 SW R11, 4(R9) #Store from register to Memory: � 10000004 ← �11, �[10000004] ← 4�7 Instruction PC R8 R9 R10 R11 M[10000000] M[1000004] Mem.  Accesses Initial 400000 27 10000000 ? ? 4A09 0 - LW R10,  0(R9) 400004 NS NS 4A0 NS NS NS 2: IF &  DR OR R11, R8,  R10 400008 NS NS NS 4A7 NS NS 1; IF SW R11,  4(R9) 40000C NS NS NS NS NS 4A7 2: IF &  DR

Note 3: What if base register of SW was R10 by mistake SW R11, 4(R10) R10= 4A0→ �[4�4] SW Rs, Offset(Rt) HW 1- Relevant Questions and Answers Q1) EMY Machine Language set does NOT have the following instruction” 400000 ADDRM R8, 500(R9)  #R8← �8 + �[�9 + 500!] i) Implement the instruction by using a few actual MEY instructions in menomic  notation. 400C00 LW R10, 500(R9)  #Load the memory operand to R10 400C04 ADD R8, R8, R10  #Add the register and memory operand ii) Assume this instruction is added to the EMY instruction set. Describe its syntax ,  semantics, format, etc. Syntax: ADDRM Rt, Disp(Rs)20 Shlomi Oved CS-UY 2214 Comp Arch Notes Semantics: �� ← �� + � �� + ����! , �� ��������, �������� �� �������� ���������! Format: I-format because a displacement is needed: 6 5 5 16 Opcode Rs Rt Displacement

Rt is source and destination Register. One register argument is therefore using register addressing mode Third argument is a memory location whose address is the sum of a register and a  displacement. Therefore, the 2-byte signed displacement addressing mode is used. Two memory accesses are made for this instruction 1 for Instruction Fetch 1 for Data Read ADDRM stands for Add Register Memory R-format  6 5 5 5 5 6 Opcode Rs Rt Rd Shift  amount Function

Q2) 400300 ADDMRM  R8, (R9,R10)  #M[R9]← �8 + �[�9 + �10] i) Implement this instruction: 400300 ADD R11, R9, R10  #Calculate the effective address of the memory  operand 400304 LW R12, 0(R11) #Load the memory operand 400308 ADD R13, R8, R12 #Add the register and memory operand 40030C SW R13, 0(R9) #Store the result into the other memory location ii)   Syntax: ADDRM Rd,(Rs, Rt) Semantics: � �� ← � �� + �� , �� ��������, �������� �� �������� ������������ Format: R-format since 2 registers (Rs, Rt, Rd) are needed:   6 5 5 5 5 6 Opcode Rs Rt Rd Shift  amount 2nd Opcode

3 arguments are used by the instruction… We perform 3 memory accesses: 1 for Instruction Fetch, 1 for Data read, and 1 for  data write. ADDMRM stands for Add memory and register to memory 2/13/17- Lecture21 Shlomi Oved CS-UY 2214 Comp Arch Notes • 32-bit Store to Memory o SW Rt, Disp(Rs) o � �� + ����! ← �� (��������� �������) Disp is a 16-bit signed displacement (in terms of bytes) o Format, etc. ▪ Immediate Format ▪ Since we need a displacement ▪ 2 Arguments: Rt is a source argument, a register, directly  specified by instruction: Register Addressing Mode; A memory  location is destination whose address is calculated by adding  Rs and sign-extended Displacement: 2-byte signed  Displacement A.M. ▪ 2 memory accesses: ▪ 1 to fetch instruction ▪ 1 to store data • Control Instructions o Skipping a few instructions (short range) conditionally: Conditional  branches o Long range skipping unconditionally: Jumps • Branch If Equal to o BEQ Rs, Rt, Offset o If Rs=Rt then o �� ← �� + ������! ∗ � ��������� ������� o Offset is a 16-bit signed offset in terms of words (locations) Immediate Format 6 5 5 16 Opcode Rs Rt Immediate 101011 → ��

000100 → ���

• Format, etc: o I format since we need offset o 4 arguments: o Rs & Rt…. o PC is implied by the instruction o Therefore it is the Implied Addressing Mode o An Address is source argument which is calculated by adding PC &  sign extended multiplied-by-4 offset: 2-byte signed PC-Relative  Addressing Mode o 1 memory access to fetch instruction o ≡ o 400E04  ADD22 Shlomi Oved CS-UY 2214 Comp Arch Notes o 400E08 SUB o 400E0C BEQ, R8, R9, 1 (If R8=R9, then we will branch, we are  skipping 1 instruction so the offset is 1). (if you wanted to go back to  ADD, then the displacement would be -3). o 400E10 OR o 400E14 AND o Branch taken vs untaken o If  � = � (R8=R9) then � = � + � �10 = �11 + �12 o                                      Else  � = � − ℎ (�13 = �14 − �15) o ≡ o 400100  BEQ  R8, R9, 2 o 400104 SUB R13, R14, R15 o 400108 BEQ  R8, R8, 1 (Unconditional jump) o 40010C  ADD R10, R11, R12 o 400110 o 26-bit Unconditional Jump ▪ J Address ▪ �� ← �� ��: �� , (������� ∗ �) ((Address*4)=26 bits )(Effective Address) • Address is in terms of words (locations) ▪ Format, etc. • Jump format since a 26-bit address is needed • 2 Arguments: PC is destination argument, implied by  the instruction so it is the Implied Addressing Mode: A  memory address is the source argument calculated by  multiplying the 26-bit address by 4 and attaching to the  leftmost 4 bits of PC: 26-bit PC-Direct Addressing Mode. Jump Format 6 26 Opcode Address 000010 → �

00400BE0 J 00404000 ADD 00404000(Effective Address) →(Divide by 4)  101000 (address) o 400100  BEQ  R8, R9, 2 o 400104 SUB R13, R14, R15 o 400108 J 100044 (The result of dividing 400110 by 4)23 Shlomi Oved CS-UY 2214 Comp Arch Notes o 40010C  ADD R10, R11, R12 o 400110 • 32-bit Jump o JR Rs o �� ← �� o Format, etc. ▪ R format- Since Rs is needed even thought the I format can also  be used. • Opcode: 000000 • Function: 00 1000 ▪ 2 arguments: PC & Rs: PC is implied by the instruction so it is  using the implied addressing mode, and Rs is using the register  addressing mode. ▪ 1 memory access to fetch the instruction o JR R31 ????? (Return Register) o Suppose the following function: o 4A00B0 ADD o … o 4A0104 JR R31 o R31 is the return from Function 2/15/17 Lecture 7. MIPS Machine Language Instructions • ADD, SUB, SLT, AND, OR, LW, SW, BEQ, J (EMY CPU will run them) • JR→ ������ ���� ��������� + ⋯. • Jump to Functions o Jump And Link o JAL Address o R31← �� then o �� ← �� 28: 31 , (������� ∗ 4) (Effective address) o Format, etc. ▪ J format since we need a 26-bit address ▪ 4 arguments: R31, PC, PC, (PC[28:31],(Address*4)) ▪ ….. ▪ 1 memory access to fetch the instruction Jump Format 6 26 Opcode Address 000011 → ���

           = 400EA4  ADD 400EA8  JAL (102002)24 Shlomi Oved CS-UY 2214 Comp Arch Notes 400EAC  SUB (R31 gets this instruction) (First three steps are the main)            = 408008  OR          = 408E0C   JR R31 (There are the function steps) • Branch If Not Equal to o BNE  Rs, Rt, Offset o If �� ≠ �� then o �� ← �� + ������! ∗ 4 (��������� �������) o I format since we need a 16-bit offset o Rest is same as BEQ o 1 memory access to fetch the instruction Immediate Format 6 5 5 16 Opcode Rs Rt Immediate 000101 → ���

001000→ ����

001010→ ����

001100 → ����

001101→ ���

001111→ ���

• a=b+25 (How do we deal with constants?) • 32-bit 2’s Complement Immediate Add o ADDI  Rt, Rs, Imm o �� ← �� + ���! then if overflow we generate the internal interrupt  signal. o Format, etc. ▪ I format since we need a 16-bit immediate data element. ▪ 3 arguments: Rt, Rs, Immediate data element ▪ Rs, Rt: Immediate Addressing Mode ▪ Immediate data is obtained from 16-bit Imm. by doing a sign  extension: 2-byte signed Immediate Addressing Mode ▪ 1 memory access to fetch the instruction a=b+25 400A04  ADDI R8, R9, 19 ▪ 32-bit 2’s Complement Imm Compare o SLTI Rt, Rs, Imm o If �� < ���! then �� ← 1 o                             Else �� ← ∅25 Shlomi Oved CS-UY 2214 Comp Arch Notes o Format, etc ▪ I format since we need a 16-bit Imm data element. ▪ 32-bit Immediate AND o ANDI Rt, Rs, Imm o �� ← �� & ���∗ o I format since we need a 16-bit Immediate data element o 3 arguments: Rs, Rt, Immediate data element. o Rs, Rt: Register Addressing Mode  o Immediate data element is obtained by catenating 16 ∅s to left of Imm  field: 2-byte unsigned Immediate Addressing mode ▪ Masking ▪ a:           010…0 1011 0111 1111 0100 (16 bits) ▪ ���∗: 000 … 0 0000 1111 1111 0000 ▪ ANDI:   000…0 0000 0111 1111 0000 ▪ ORI:       010…0 1011 1111 1111 0100 ▪ 32-bit Immediate OR  o ORI Rt, Rs, Imm o �� ← ��|���∗ o I format since we need a 16-bit Immediate data element. o 3 arguments: Rs, Rt, Immediate data element. o Rs, Rt: Register Addressing Mode  o Immediate data element is obtained by catenating 16 ∅s to left of Imm  field: 2-byte unsigned Immediate Addressing mode ▪ Load Upper Immediate- ▪ Initialize leftmost bits of a register with a constant. o LUI Rt, Imm o �� ← ���, 0000 o I format since we need a 16-bit Immediate data element. o Rs is not used o � = � + (6780����)!" o 40EF00 LUI R8, 6789  #�8 ← 6789,0000 o 40EFF04 ORI R8, ABCD  #�8 ← �8|0000����� ≡ �8 ← 6789���� ▪ 32-bit NOR o NOR Rd, Rs, Rt o �� ← �� �� o R Format since we need Rd o Opcode: 000000 o Function: 100111 o (The rest is the same as AND/OR) o Ex: o �8 ← �8 o NOR R8, R8, R026 Shlomi Oved CS-UY 2214 Comp Arch Notes o NOR R8, R8, R8 Recitation 2/17/17 ▪ Handout 4- Page 4 3. Writing function perform absolute value Notes: The number “k” is passed to parameter register R4. The result is returned  in R2. 400400, ADD R2, R4, R0  #�2 ← �4 + �0 ≡ �2 ← �4 ��������� �������� 400404, SLT R8, R4, R0  #Is R4<R0?≡ �� "�" < 0? 400408, BEQ R8, R0, 1  #Yes, (R8=0), skip SUB instruction. Or No, perform SUB 40040C SUB R2, R0, R4  #�2 ← �0 − �4 ≡ ������ �4 400410 JR  R31 Set if less than SLT Rd, Rs, Rt If Rs<Rt, �� ← 1          Else �� ← 0 Assume all registers have values preloaded If a=b (R8=R9) then c=d+e (R10=R11+R12) else f=g-h (R13=R14-R15) K=m|p (R16=R17|R18) 400A00 BNE R8, R9, 2  #a!=b? 400A04  ADD  R10, R11, R12 400A08  J 100284 400A0C  SUB R13, R14, R15  #f=g-h 400A10 OR R16, R19, R18  #k=m|p J 100284 → 0000 00 0001 0000 0000 0010 1000 0100 00 → 00400�10 → 400�10 Jump and Branch Jump Address (26 Bits) �� ← �� 31 − 28 + (������� ∗ 4) 31 − 0 ���� ���� 4 ���� (26 + 2 ����) 0040000027 Shlomi Oved CS-UY 2214 Comp Arch Notes First 8 are  user-space 12GB Last 8 are  system  space 2GB 256 MB 256 MB 256 MB 256 MB . . . . 256 MB ← ��[31 − 20] = 0000 = 0 ← 0001 − 1 ← 0010 = 2 ← 0011 = 3 ← 1111 = �00400000  J 100A00 PC[31-28]: 0000 Address:|00 0001 0000 0000 1010 0000 0000| 00→ 0402800 040002C  SUB R12, R13, R14 . . . 40040C  J 940294(Jump down)/100008 (Jump up) . . . 2500A50  ADD R8, R9, R10 1. Jump to Address 2500A50 (Implicit 0000 to the left) Convert to Binary: 0000          | 0010 0101 0000 0000 1010 0101 00|00 PC[31-28]| 0    9         4      0        2   9          4  | Dividing by 4 causes left shift 940294 2. 0000     |0100 0000 0000 0000 0010 11 |00 28 Shlomi Oved CS-UY 2214 Comp Arch Notes PC[31-28]| 1    0         0      0        0   8   | Dividing by 4 causes left shift Branch: �� ← �� + 4 + ������! ∗ � ��������� ������� 400308 . . . 400F14  BEQ R1, R2, 3D/FCFC . . . . 40100C 1. Branch to 40100C (from 400F18 because PC+4)     0040100C  -> 0000 0000 0100 0000 0001 0000 0000 1100 -   00400F18 -> 0000 0000 0100 0000 0000 1111 0001 1000 ______________________________________________________________________     0000 0000 0100 0000 0001 0000 0000 1100 + 1111 1111 1011 1111 1111 0000 1110 1000 (Negate everything except last 4 bits _______________________________________________________     0000 0000 0000 0000 0000 0000 1111 0100 (carry out of 1)->(no overflow when adding pos and neg) (hex:) 000000F4->F4 F4=������! ������! 4 = �44 = �������� �� 4 ������� ���� ��� ���� 1111 0100 = 111101 = 3� 2. Branch to 400308 Set first example for SUB->ADD Conversion      00400308->  0000 0000 0100 0000 0000 0011 0000 1000 -   00400F18 -> 0000 0000 0100 0000 0000 1111 0001 1000 _______________________________________________________________________ 1111 1111 1111 11|11 1111 0011 1111 0000 (I Format can only use 16-bit offset) 11 1111 0011 1111 0000-> F3F0/4-> 1111 1100 1111 1100-> FCFC Lecture 02/22/17-29 Shlomi Oved CS-UY 2214 Comp Arch Notes 30Shlomi Oved CS-UY 2214 Comp Arch Notes 31Shlomi Oved CS-UY 2214 Comp Arch Notes 02/24/17- Recitation 4. Write a function for multiplying 2 numbers, y and z, both of which are always  greater than ∅.  Assume y & z are passed into R4 & R5 respectively. 401050 ADD R8, R0, R0 # We clear R8  401054 ADD R9, R4, R0 # We move R4, that is y, to R9  401058 ADD R8, R5, R8 # We add R5, z, to R8 (add Z to temporary result)  40105C ADDI R9, R9, (−1)!"# We subtract 1 from Y  401060 BNE R9, R0, (−3)!"# Is it the end (is y equal to zero) ? If not, go to 401058  401064 ADD R2, R8, R0 # The end. We move the result to R2 (software register  conventions) 401068 JR R31 # We return from the function Execution Table (4*2)  Instruction PC R2 R4 R5 R8 R9 R31 Memory  Acceses Initial 401050 ? (2)!" (4)!" ? ? 401000 - ADD R8, R0, R0 401054 NS NS NS 0 NS NS 1:IF ADD R9, R4, R0 401058 NS NS NS NS 2 NS 1:IF ADD R8, R5, R8 40105C NS NS NS 4 NS NS 1:IF ADDI R9, R9, (−1)!" 401060 NS NS NS NS 1 NS 1:IF BNE R9, R0, (−3)!" 401058 NS NS NS NS NS NS 1:IF ADD R8, R5, R8 40105C NS NS NS 8 NS NS 1:IF ADDI R9, R9, (−1)!" 401060 NS NS NS NS 0 NS 1:IF BNE R9, R0, (−3)!" 401064 NS NS NS NS NS NS 1:IF ADD R2, R8, R0 401068 8 NS NS NS NS NS 1:IF JR 31 401000 NS NS NS NS NS NS 1:IF

A=25-b 400000 SUB R10, R0, R9  #b=R9, �10 ← 0 − � ≡ �10 ← (−�) 400004 ADDI R8, R10, (19)!" #�8 ← 25 !" + �10 ≡ 25 !" + �, 19 !" ≡ 25 !" (Assume that b=6) Execution Table  Instruction PC R8 R9 R10 Memory  Accesses Initial 400000 ? 6 ? - SUB R10, R0, R9 400004 NS NS (−6)!" 1:IF ADDI R8, R10, 19 400008 19 !" NS NS 1:IF

Compiling a While Loop �ℎ��� � � = �    (Look for first “i” where � � ≠ �) i=i+1; 40A000 SLL R8,R9,2 #R9==i, �8 = � ∗ 4 ��� ���� ���������� 40A004 Add R11, R10, R8 #R10 has �[∅] address 40A008 LW R12, ∅ �11 #Load A[i] to R1232 Shlomi Oved CS-UY 2214 Comp Arch Notes 40A00C BNE R12, R13, 2 #If � � ≠ �, exit the loop {k=R1} 40A010 ADDI R9, R9, (1)!"  #Increment I 40A014 J 100280  #go back to 40A000 for another iteration 40A018   #out of the loop Shorthand Execution Table: ��� � = 0 → �8 = 0 ��� �11 → �[0] LW R12=A[0] BNE {Don’t take branch} ADDI  i+=1 J SLL � = 1 → �8 = 4 ADD �10 + �8 = � 0 + 4 ����� → � 1 = �11 LW R12=A[1] BNE Coding LB & SB (load byte & store byte) 400A00 LB R8, 0(R9)  #R9 has 10000003 … … … 10000000  FE 78 9A BC  #�8 ← � 10000003 ≡ �8 ← �� !                  ≡ �8 ← �������� �� ! → 10111100 ! 1111 … .1111 1011 1100 � … . � 1011 1100 6F’s         B           C       LB Coding: LB uses the I-format since it needs a displacement 10 0000 01001 01000 0000 0000 0000 0000 Opcode:10 0000 (LB) Rs: 01001 (R9) Rt: 01000 (R8) Disp: 0000 0000 0000 0000 (0) 40C004 SB R10, (−1)!"  (R9)  #R9 has 10000003 & R10 has ABCDEF R10= ABCDEF (EF is the byte to store) … …. ….. 10000000 FE 78 EF BC (EF is taking place of 9A)  #M[10000003-1=10000002]← ��33 Shlomi Oved CS-UY 2214 Comp Arch Notes SB Coding: SB uses I format since it needs a displacement 10 1000 01001 01010 1111 1111 1111 1111 Opcode:10 1000 (SB) Rs: 01001 (R9) Rt: 01010 (R10) Disp: 1111 1111 1111 1111 (-1) SWAP(v,k,p)   (temp=v[k], v[k]=p, p=temp) �[�] ↔ � SWAP: LW R8, 0(R9) #R9 points to “k” ADDI R10, R0, 4 #R10 gets 4 MUL R10, R8 #(Hi, Lo)← 4 ∗ � MFLO R11 #R11 gets 4*k (MFLO=Move from low) ADD R12, R12, R11 #R12 points at V[k], R12 originally pointed at V[0] LW  R13 0(R12) #R13=temp=V[k] LW R14, 0(R15) #R15→ "�" SW R13, 0(R15) # “p”← � � = ���� SW R14, 0(R12) #v[k]← � 02/27/17- Lecture 7. MIPS Machine Language Instructions • 32-bit 2’s Complement Multiply o MULT Rs, Rt o ��, �� ← �� ∗ �� ((Hi, Lo) is 64 bits so there will be no  overflow) o R format- to avoid using an extra opcode combination. We  could also use the I format. ▪ Opcode: 000000 ▪ Function: 011000 o 1 Memory access to fetch the instruction • 32-bit Unsigned Multiply o MULTU Rs, Rt o (Hi, Lo)← �� ∗ ��  (64 Bits) o R format ▪ Opcode: 000000 ▪ Function: 011001 o 1 memory access to fetch the instruction • 32-bit 2’s Complement Divide o DIV Rs, Rt o �� ← �������� �� !" !" o �� ← ��������� �� !" o R format !" ▪ Opcode: 00000034 Shlomi Oved CS-UY 2214 Comp Arch Notes ▪ Function: 01 1010 o 1 memory access to fetch the instruction • 32-bit Unsigned Divide o DIVU Rs, Rt o �� ← �������� �� !" !" o �� ← ��������� �� !" o R format !" ▪ Opcode: 000000 ▪ Function: 01 1011 o 1 memory access to fetch the instruction • − !"!→ �������� = −4; ��������� = −1 • !"!→ �������� = 4; ��������� = 1 • !"#"$%&$ !"#"$%& = !" !" • �������� = �������� ∗ ������� + ��������� • Quotient is negative if (Rs, Rt) have opposite signs. • Remainder has the sign of Rs. • Move from Hi o MFHI Rd o �� ← �� o R Format- Since we need Rd. ▪ Opcode: 000000 ▪ Function: 01 0000 ▪ 1 memory access to fetch the instruction • Move From Lo o MFLO Rd o �� ← �� o R Format- Since we need Rd. ▪ Opcode: 000000 ▪ Function: 01 0012 ▪ 1 memory access to fetch the instruction • Floating-Point Numbers o Numbers with points (decimals) o Very Large & Very small numbers in terms of magnitude o 2.57 ∗ 10!! = 257 ∗ 10!" o IEEE-754 FP Standard ▪ Single-precision (32-bit) ▪ Double-precision (64-bit) ▪ ±� ∗ 2! �: ����������� �: �������� (��������) ▪ 1.M   or   0.M35 Shlomi Oved CS-UY 2214 Comp Arch Notes ▪ 1. �: ���������� ▪ 0. �:������������ o Ex: o 1. �: 1.011 ∗ 2!! ����� o 0. �: 0.1001 ∗ 10!!" (����� �� 0) • Single Precision Format  1 bit 8 bits 23 bits Sign(0 → +, 1 → −) BE (Biased Exponent) Mantissa(fraction)(the  decimal point is implied)

BE=e+127 to have a positive exponent for faster operations (compare,…) 1. ∅ → �� = ∅ & � = ∅ 2. ∞ → �� = 255 & � = ∅ 3. Normalized: 0<BE<255 4. DeNormalized: BE=∅ & � ≠ ∅ 5. Not a Number (Non): �� = 255 & � ≠ ∅;!! ;∞ − ∞; −5 Neg  Infinity NormalizedDeNor Can’t be represented DeNor Normalized Positive  Infinity −2!"# −10!" −10!"# m −2!!"# −10!!" −2!!"# −10!!" ∅ m 2!!"# 10!!" 2!!"# 10!!" −2!!"# −10!!" −10!!"# −10!!"#10!!"# 10!!"# −10!!"# (0 10000001 001010 … 0)!"""!!"# = (? )!" Sign bit is 0, so number is positive �� = 10000001 = 1 ∗ 2! + 1 ∗ 2! = 128 + 1 = 129 0 < �� < 255 Normalized �� = � + 127 → � = 129 − 127 = 2 � = .00101 = 1 ∗ 2!! + 1 ∗ 2!! = .125 + .03125 = .15625 2!!2!!2!!2!!2!! =+1.15625 ∗ 2! = (4.625)!" Ex: 10,5 !" ���, �������� = ? !"""!!"# For int 10: 10 2 = 5 & ∅ ��� 36 Shlomi Oved CS-UY 2214 Comp Arch Notes 5 2 = 2 & � 2 2 = 1 & � 1 2 = 0& � ��� 1010 For Fraction: 0.5 ∗ 2 = �. 0 .1 Therefore : 1010.1 = 1010.1 ∗ 2! = 1.0101 ∗ 2! � = 3 � = 3 → �� = 3 + 127 = 120 130 2 = 65 & � ��� 65 2 = 32 & 1 32 2 = 16 & 0 16 2 = 8 & 0 8 2 = 4 & 0 4 2 = 2 & 0 ! ! = 1 & 0 1 2 = 0 & � ��� BE=10000010| Sign=0 Since it is positive Therefore: (0 10000010 01010000000000000000000)!"""!!"# 3/1/17 Lecture 7. MIPS Machine Language Instructions • Floating-Point Numbers o Floating-Point Instructions ▪ Computer Organization Layer Today’s microprocessor Integer CPU (GPRs) System Coprocessor  (System Registers) 37 FP Coprocessor  (FP Regs) CP CP CP0 CP1CP2 CP3Shlomi Oved CS-UY 2214 Comp Arch Notes o Coprocessor is a processor with a special function o CP2 & CP3 are reserved for future use but never used • Single-Precision Load o LWC1  FRt, Disp(Rs) o ��� ← � �� + ����! ��������� ������� o I format since we need displacement ▪ Opcode: 110001 o 2 memory accesses ▪ 1 to fetch instruction & 1 to read data • Ex: • 4050A4 LWC1 F2, 0(R8) • Single-Precision Store o SWC1 FRt, Disp(Rs) o � �� + ����! ��������� ������� ← ��� o I format since we need displacement, ▪ Opcode: 11 1001 o 2 memory accesses ▪ 1 to fetch instruction & 1 to write data • Ex: • 405004 SWC1 F25, 0(R12) • Single-Precision ADD o ADD.S FRd, FRs, FRt o ��� ← ��� + ��� & generate an interrupt if there is an  overflow AND interrupt is enabled. o Modified R Format o 1 memory access to fetch instruction • Ex: • 400E00 ADD.S  F0, F15, F12 • Double-Precision ADD o ADD.D  FRd, FRs, FRt o ���, ��� + 1 ← ���, ��� + 1 + (���, ��� + 1) & generate  an internal interrupt if there is an overflow AND interrupt is  enabled. o Modified R Format o 1 memory access to fetch instruction • Ex: • 405E04 ADD.D F6, F18, F10 • �6, �7 ← �18 + �19 + (�10 + �11) • Single Precision: SUB.S, MUL.S, DIV.S • Double Precision: SUB.D, MUL.D, DIV.D38 Shlomi Oved CS-UY 2214 Comp Arch Notes • Single-Precision Compare o C._ _.S  o (In the space in between could be: EQ, NE, LT, GT, LE, GE) o C.LT.S  FRt, FRs o If FRt<FRs then  o ���� ← 1 ���� o ���� ← ∅ o Modified R format o 1 memory access to fetch instruction • Double-Precision Compare o C._ _.D  FRt, FRs • Branch if Cond is True(1) o BC1T Offset o If Cond=1 then  ▪ �� ← �� + ( ������! ∗ �)(��������� �������) o Modified I format o 1 memory access to fetch instruction • Branch if Cond is False(1) o BC1F Offset • Computer Organization (Microarchitecture) Layer • It supports Computer Architecture layer by means of CPU, memory,  I/O controllers… • The CPU, Memory and I/O Controllers are digital systems. • Digital Systems: Today’s microprocessor Integer CPU (GPRs) System Coprocessor  (System Registers) FP Coprocessor  (FP Regs) CP CP CP0 CP1CP2 CP3• Digital Systems • A digital system performs micro operations. • Micro operations are simple operations such as add, sub, compare,  shift, load memory, store to memory, etc… 39 Shlomi Oved CS-UY 2214 Comp Arch Notes • Data Unit (Datapath) performs micro operations • Control Unit controls Data unit≡It determines sequence of micro operations  in Data Unit o Sequencer • Registers: Store (keep) data o Flip-flops implement registers ▪ A flip-flop stores (keeps) 1 bit • ALU (Arithmetic/Logic Unit) perform arithmetic/logic operations o Gates implement ALUs. o Buses interconnect registers & ALUs ▪ Wires implement buses. • Designing Digital Systems o Describe a digital system in detail 1. Draw a circuit diagram a. Works for simple digital systems 2. Write a program in a hardware description language known as  (HDL) (VHDL, Verilog HDL) or software languages (C, C++) 3/3/17- Recitation Handout 4, Question 5:40 Shlomi Oved CS-UY 2214 Comp Arch Notes Instruction PC R2 R4 R5 R8 R9 R10 R11 R31 Memory  Access M[10002000] M[10002004] Initial 400600 ? 2 1000200 ? ? ? ? 400200 -- A36 1B9 ADD  R8,R0,R4 400604 NS NS NS 2 NS NS NS NS 1:IF NS NS ADD R9,  R0, R0 400608 NS NS NS NS 0 NS NS NS 1:IF NS NS ADD R10,  R0, R5 40060C NS NS NS NS NS 1000200 NS NS 1:IF NS NS LW R11,  0(R10) 400610 NS NS NS NS NS NS A36 NS 2:IF, DR NS NS ADD R9,  R9, R11 400614 NS NS NS NS A36 NS NS NS 1:IF NS NS ADDI R10,  R10, 4 400618 NS NS NS NS NS 1000204 NS NS 1:IF NS NS ADDI R8,  R8, (−1)!" 40061C NS NS NS 1 NS NS NS NS 1:IF NS NS BNE R8,  R0, (−5)!" 400620 NS NS NS 1 NS NS NS NS 1:IF NS NS LW R11,  0(R10) 400610 NS NS NS NS NS NS 1B9 NS 2:IF, DR NS NS ADD R9,  R9, R11 400614 NS NS NS NS BEF NS NS NS 1:IF NS NS ADDI R10,  R10, 4 400618 NS NS NS NS NS 1000208 NS NS 1:IF NS NS ADDI R8,  R8, (−1)!" 40061C NS NS NS 0 NS NS NS NS 1:IF NS NS BNE R8,  R0, (−5)!" 400620 NS NS NS 0 NS NS NS NS 1:IF NS NS ADD R2,  R9, R0 400624 BEF NS NS NS NS NS NS NS 1:IF NS NS JR 31 400200 NS NS NS NS NS NS NS NS 1:IF NS NS

• Relevant Questions and Answers (HW 2) • Q5. Moving location to store to: •     10000350 • +              250 • ________________ •     100005A041 Shlomi Oved CS-UY 2214 Comp Arch Notes • Q11.  • • Compiling a Case/Switch Statement • Switch(k){ o Case 0: a=b+c; back; o Case 1:a=b*c; back; o Case 2: � = !! ; back; o Case 3: a=-a; back; o } Memory Addresses Data Served 1000F000 400C20 #Case 0 1000F004 400C28 #Case 1 1000F008 400C34 #Case 2 1000F00C 400C40 #Case 3

R9=k, R11=1000F000, R13=a, R14=b, R15=c • 400C00 SLT R8, R10, R0 # k<0? • 400C04 BNE, R8, R0, (15) # If k<0, don’t execute switch • 400C08 SLTI R8, R9, (4) # k<4? • 400C0C BEQ, R8, R0, (13) #If R>3 don’t execute switch • 400C10 SLL R10, R9, 2 #R10=k*4 • 400C14 ADD R11, R11, R10 #R11 Points to beginning of array • 400C18 LW R12 0(R11) #read in address for case instruction • 400C1C JR R12 #jump to case instruction • 400C20 ADD R13, R14, R15 #case 0 a=b+c42 Shlomi Oved CS-UY 2214 Comp Arch Notes • 400C24 BEQ R0, R0, 7 #exits after case 0 • 400C28 MULT R14, R15 #case 1 a=b*c (Stored in high low registers) • 400C2C MFLO R13 #assume answer only in lower 32 bits • 400C30 BEQ R0, R0 4 #exits after case 1 • 400C34 DIV R14, R15 #case 2: a=b/c • 400C38 MFLO R13 #move from low to R13 • 400C3C BEQ R0, R0, 1 #exits after case 2 • 400C40 SUB R13 0 R13 #case 4: a=-a • 400C44 #all backs come to this exit 3/6/17 Lecture Digital Systems • A digital system performs micro operations. • Data unit performs micro operations • Control unit controls Data Unit • Designing a digital system≡Describing digital system • 1. Write a program • 2. Draw a circuit diagram • Components of a digital system o Registers: Store information ▪ FFs implement registers o ALUs: Perform arithmetic logic operations( add, sub, AND, OR, shift,  compare…) ▪ Gates implement them o Buses: Interconnect Registers and ALUs ▪ Wires implement buses. o Sequencer: determines sequence of micro operations in the data unit. ▪ Gates & Flip Flops implement it. _________________________________________________________________________________________________ • Typical Data Unit View43 Shlomi Oved CS-UY 2214 Comp Arch Notes BBUS ABUS ALUcontrol OBUS t p p p p … p p p p s s s s sss • Asynchronous digital systems o No clock signal t CP1 CPU2 CPU3 CPU4 P P P P S S S S…clock• Clock is a periodic signal that synchronizes components o Synchronous digital system!!!!!! • Clock speed (rate) is in terms if Hz (Hertz)≡clock frequency o 1 Hz means 1 clock period per second≡clock period duration is 1  second. o ����� ��������� = ! !"#!$ !"#$%&�� o ����� ��������� = 1 ��� = 10!�� 44 Shlomi Oved CS-UY 2214 Comp Arch Notes ▪ 10! ����� �������/��� ▪ ����� ������ = ! !"#!$ !"#$%#&'( = !!"! = 10!! = 1 ���������� ▪ Clock period duration is determined by observing the longest  operations (add, sub, memory accesses) ▪ Then, we add waiting time to account for temperature and  humidity increases up to prescribed value in the data sheet!!! ▪ Whether we write a program or draw a circuit diagram, we  start a state diagram that shows which operation happen when  in a short way • States≡ ������� indicate micro operations happening at  the moment o A state takes 1 clock period o Each state has a unique number to identify o Only one state happens in a clock period • Arrows indicate sequence of states to be traced o They indicate next state. o RTL Notation (Register Transfer  Level/Language) o ����� 24: � ← �� + ���[��] o State 25:  ��� ← �[�]; �� ← � (����� �ℎ�� �� ��������                                                     MDR[0]=1                                       MDR[0]=∅                                     State 26: � ← ��� �� ����� 27: ��� �� ← ��� �� + ��� �� ▪ Registers take on their values the clock period after they are stored ▪ (MDR[0] is checked with the old value by the sequencer) 3/10/17- Recitation y(a,b,c)=�� + �� Full Adder circuits Cout A B C Adder Truth Table sum45 Shlomi Oved CS-UY 2214 Comp Arch Notes a b c Cout Sum 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1

Sum(a,b,c)=��� + ��� + ��� + ��� Cout(a,b,c)= ��� + ���+��� + ��� → �������� ���ℎ �������= ab+bc+ac Ripple-Carry Adder • �!", �!", … . , �! • + �!", �!", … �! ��� �����!" �!" �! �! �! �! • ____________________ • �!", �!",…., �! CPU GPRs Select 32 Multiplier xor 32 32 FA FA FA … 32-bit  Adder 32 46 Shlomi Oved CS-UY 2214 Comp Arch Notes Decoders • There are 3 types: Binary, BCD to decimal, and Binary to 7 segment • 1. Binary Decoders o Memory chips have them o Size: � − �� − 2! ��� (decoder) ▪ n data inputs representing an n-bit unsigned number ▪ 2! data outputs o No more than one output can be one at a time ▪ If the n-bit number is k, output k is 1 2 to 4 DCD Ex:47 Shlomi Oved CS-UY 2214 Comp Arch Notes Execution Table 1. Flip Flops (Handout 3-Page 19) - Flip Flop has 2 outputs: Q and � - The CE (clock enable) input enables/disables the clock input, e. • If CE=0, the clock input cannot be used -The clock input (c) indicates when to store on FF. Registers - sequential circuit used to store data temporarily - Note: a register which is stored a value in a particular clock period (cp)  actually gets the value in the beginning of the following cp.48 Shlomi Oved CS-UY 2214 Comp Arch Notes 3/20/17- Lecture 2.  High-Level State Diagram � ← �� + ���[��] 25 ��� ← �[�]; �� ← � MDR[0]=1 MDR[0]=0 26 2724 � ← ���[��] ���[��] ← ���[��] + ���[��] Digital System Design 1. Determine interaction between communicating digital systems • For example: CPU and Memory 2. Get High-level State Diagram • It describes micro-operations 3. Get Data Unit 4. Get  Low-Level State Diagram • It describes which control signal is on when 5. Get Control Unit 1. Interaction: Digital System↔Memory 32 MABUS MemRead m ety syS latigiDMemWrite 32 MRBUS 32 MWBUS romeM• MABUS(Memory Address Bus), MRBUS (Memory Read Bus), MWBUS(  Memory Write Bus) 49 Shlomi Oved CS-UY 2214 Comp Arch Notes • Memory accesses take one full clock period like ADD & SUB Transferring data between digital systems/components a. Point-to-Point Connections (Allows for parallelism but too expensive) b. Busing: A bus is a set shared of wires where only one source is  connected to the bus!!50 Shlomi Oved CS-UY 2214 Comp Arch Notes Rd5 Store GPR3.  Data Unit MRBUSStore  MDR MDR 24 ���� = ��; ���� = ��� �� ; ���� = ���� + ����; � ← ���� 25 ��� ← �����; ���� = �; ���� = ����; �� ← ����; ���� = � MDR[0]=1 MDR[0]=0 ABUS=GPR[Rs]; OBUS=ABUS; � ← ���� 51 Shlomi Oved CS-UY 2214 Comp Arch Notes 4. Low-Level State Diagram 24 ���!!"# = ∅1; ���!"#$ = ∅∅; ������� = 0010; ���!"#$ = ∅1; �����! = 1 25 MemRead=1; ���!!"# = ∅∅; �����!"# = 1; ���!"#$ = ∅1; ���!"#$ = ∅∅; �����!" = 1 MDR[0]=1    MDR[0]=0 26 27 ���!"#$ = 10; ���!"#$ = 10; ���!!"# = 01; ���!"#$ = 00;                                                       ���!"#$ = 0010; ���!"#$ = 01; �����! = 1;                                                            �����!"# = 1; 3/22/17 Lecture Page 5 of Handout 7 1. Develop Architecture: MIPS      LW, SW, ADD, SUB, SLT, AND, OR, BEQ, J 2. CPU-Memory interaction in terms of buses & control signals 3. Design CPU- Lower Level State Diagram, Upper Level State Diagram, Control Unit  and etc. For 9 instructions→ ���  a. For 9 instructions  i. �� ��,���� �� → �� ← � �� + ����! (��������� �������)  ii. I Format  iii. Implement architecture operations by means of micro operations  Some micro operations are common to all instructions. • IF Cycle (Instruction Fetch)  o Fetch Instruction o Update PC • ID Cycle (Instruction Decode) o Read Rs & Rt to A & B • EX Cycle (Execute Cycle) o Start implementing architecture operations o LW/ :Calculate the effective address o A/L(Arithmetic Logic): o BEQ: o J: • MEM Cycle (Memory Cycle) o LW/ :Access Memory o A/L(Arithmetic Logic): • WB Cycle (Write Back Cycle) o LW: Write back data read from memory52 Shlomi Oved CS-UY 2214 Comp Arch Notes ���!" = 5 → 0,1,2,3,4 ���!"" = ���!"# = ���!"# = ���!" = ���!"# = 4 → 0, 1, 6,7 ���!"# = 3 = 0, 1, 8 ���!""#$ = 6 = 0,1,2,3,16,17  53 Shlomi Oved CS-UY 2214 Comp Arch Notes IR: Instruction Register ALUout MDR: Memory Data Register ��� ≡ ����� ������� ��� ����������� Unpipelined: CPU runs one instruction at a time Pipelined: CPU runs multiple instruction at a time. iv. Modify high-level state diagram for each remaining instruction one by one.  i. SW Rt, Disp(Rs) • � �� + ����! ← ��  ii. I Format i. ADD Rd, Rs, Rt • �� ← �� + ��& … (�� �������� �������� ���������) ii. R Format i. Sub Rd, Rs, Rt • �� ← �� − �� ii. R Format i. BEQ Rs, Rt, Offset • If Rs=Rt then �� ← �� + (������!) ∗ 4 ii. I Format i. J Address • �� 27: 0 ← ������� ∗ 4 ii. J format i. ADDRM Rt, Disp(Rs) • �� ← �� + � �� + ����! ii. I Format IR Formats R-Format 6 5 5 5 5 6 Opcode Rs Rt Rt Shift  Amount Function

J-Format 6 26 Opcode Address

I-Format 6 5 5 16 Opcode Rs Rt DOImm

54 Shlomi Oved CS-UY 2214 Comp Arch Notes 03/24/17- Recitation Handout #6- Page 7 Digital System Design Basics � ← �� + ���[��]  (24) ��� ← �[�],�� ← �  (25) ���[�] = � ���[�] = �� ← ���[��]  (26) ���[��] ← ���[��] + ���[��]  (27) 55 Shlomi Oved CS-UY 2214 Comp Arch Notes State Diagram i. BEQ Offset: IF ACCUM=0 then �� ← �� + (0,0,������) ii. ADDRM Disp: ����� ← ����� + � 0,0,���� iii. ADDM Disp: � 0,0,���� ← ����� + � 0,0,����  ����� ← ����� + �[0,0,����] �����: �����������: ▪ Architectural Register ▪ Accumulates Results ▪ Single GPR56 Shlomi Oved CS-UY 2214 Comp Arch Notes CP State PC MAR IR MDR ACCUM M[200] M[3204] Initial ---- 200 ? ? ? 1A 3000 7 1 0 NS NS NS NS NS NS NS 2 1 NS 200 NS NS NS NS NS 3 2 204 NS 3000 NS NS NS NS 4 3 NS 3204 NS NS NS NS NS 5 4 NS NS NS 7 NS NS NS 6 0 NS NS NS NS 21 NS NS

HW 3 Q&A 2. Syntax: ADDMR Rt, Rs, Offset Architectural Operation: �� ← �� + � �� + ����� ! ≪ 2)] Format: I-Format  i. Show Modified High-Level State Diagram  ii. Show Modified Portion of the Data Unit Every Clock Period: � ← �� � ← �� i. �� ← �[��] �� ← �� + 4 ������ ← �� + ([�����!] ≪ 2) ������ ← � + ([�����!])Picture below shows the things labeled as the same 57 Shlomi Oved CS-UY 2214 Comp Arch Notes ii. 58 Shlomi Oved CS-UY 2214 Comp Arch Notes HW 3 Q&A 6. Syntax: COMPMR Rd,(Rs), Rt Architectural Operation: If M[Rs]<Rt then �� ← 1 ���� �� ← 0 Format: R-Format i. Update High-Level & Datapath. What is ���!"#$#%? 3/27/17- Lecture ������ ← � �� ����[��] ← ��� 59 Shlomi Oved CS-UY 2214 Comp Arch Notes EMY CPU Datapath EMY CPU Low-Level State Diagram (If value not indicated, they aren’t needed and their values are zero) State 0:  IRWrite=1; IorD=0; ALUsrcA=0; ALUsrcB=01; ALUop=00; PCWrite=1;  PCSrc=00; MemRad=1 State 1: ALUop=00; ALUSrcA=0; ALUScrB=11; State 2: ALUop=00; ALUSrcA=1; ALUScrB=10 State 3: MemRead=1; IorD=1;  State 4: MemtoReg=1; RegWrite=1; RegDst=0; State 5: MemWrite=1; IorD=1;  State 6: ALUSrcA=1; ALUScrB=00; ALUop=10; State 7: MemtoReg=0; RegWrite=1; RegDst=1; State 8: ALUSrcA = 1; ALUSrcB = 00; ALUop = 01; PCSource = 01; PCWriteCond = 1; State 9: PCWrite=1; PCSrc=10; State 10: Invalide Opcode State 11: Overflow60 Shlomi Oved CS-UY 2214 Comp Arch Notes Modify Diagrams for ADDRM New Instructions to High Level State Diagram: Changes to Datapath Change to Low-Level State Diagram State 0:  IRWrite=1; IorD=0; ALUsrcA=00; ALUsrcB=01; ALUop=00; PCWrite=1;  PCSrc=00; MemRad=1 State 1: ALUop=00; ALUSrcA=00; ALUScrB=11; State 2: ALUop=00; ALUSrcA=01; ALUScrB=10 State 3: MemRead=1; IorD=1;  State 4: MemtoReg=1; RegWrite=1; RegDst=0; State 5: MemWrite=1; IorD=1;  State 6: ALUSrcA=01; ALUScrB=00; ALUop=10;61 Shlomi Oved CS-UY 2214 Comp Arch Notes State 7: MemtoReg=0; RegWrite=1; RegDst=1; State 8: ALUSrcA = 01; ALUSrcB = 00; ALUop = 01; PCSource = 01; PCWriteCond =  1; State 9: PCWrite=1; PCSrc=10; State 10: Invalide Opcode State 11: Overflow State 16: ALUSrcA=10;  ALUSrcB=00; ALUop=00; State 17: RegDst=0; MemtoReg=0; RegWrite=1; 3/29/17- Lecture Control Unit Design • Control Unit controls Data Unit • Low-Level State Diagram describes Control Unit • Controlling Data Unit o Determine micro operations in Data Unit ▪ Control signals indicate them o Which state is next? ▪ Next state signals indicate it62 Shlomi Oved CS-UY 2214 Comp Arch Notes EMY CPU Low Level State Diagram • Control Unit generate control & next state signals based on current state &  status signals from Data Unit. 21? 13 4 4 63 Shlomi Oved CS-UY 2214 Comp Arch Notes • Two ways to implement the cloud 1. Hardwiring: Gates & FFs generate control & next state signals 2. Microprogramming: A  memory module is stored control & next state  values • Memory module is used as a look-up table • Hardwired EMY Control Unit NS3-NS0• Regwrite is 1 when it is state 4 or state 7 • �������� = �4 + �7 �4 �� �7 S4 S7 RegWrite • PCWrite is 1 when it is state 0 or state 9 • ������� = �0 + �9(�0 �� �9) S0 S9 PCWrite 64 Shlomi Oved CS-UY 2214 Comp Arch Notes • ALUSrcA is 1 when it is state 2 or state 6 or state 8 • ������� = �2 + �6 + �8 (�2 �� �6 �� �8) S2 S6 S8 ALUSrcA • ALUSrcB0 is 1 when it is State 0 or State 1 • ALUSrcB0=S0+S1 (S0 OR S1) S0 S1• ALUop0 is 1 when it is state 8 • �����0 = �8 • �8 → �����0 ALUSrcB0 • ALUCtrl0 is 1 when ALUop=10 and it is either OR(25 ) or SLT( 2A) • �������0 = �����1 ∗ �����0 �������37 + �������42 • NS3 is 1 when it is state 1 & it is either BEQ(4) or J(2) • NS3=S1(OPCDC4+OPCDCD2) (Opcode Decoder 2 OR Opcode Decoder 4) 65 Shlomi Oved CS-UY 2214 Comp Arch Notes Modifying EMY Control Unit (Add instructions for ADDRM) State Register has 5 bits, 5-to-32 state decoder, 5 NS lines: NS4-NS0 • Renamed Signals o ALUSrcA is renamed ALUSrcA0 (right most bit) • Modified Signals o RegWrite is 1 when it is state 4 of state 7 or state 17 o RegWrite=S4 + S7 + S17 (S4 OR S7 OR S17) S4 S7 S17• NS0 is 1 when … (Fill in later) • New Signals RegWrite o ALUSrcA1 is 1 when it is state 16. o ALUSrcA1=S16 o �16 → �������1 o NS4 is 1 when it is (state 3 and ADDRM(17)) or state 16 o NS4=(S3 OPCDCD23)+ S16[(S3 OPCDCD23) OR S16] 3/31/17- Recitation 66 Shlomi Oved CS-UY 2214 Comp Arch Notes Clock  Period State REGA REGB REGC REGD Initial -- ? 20 ? ? 1 0 NS NS NS NS 2 1 20 NS NS NS 3 2 NS NS 24 NS 4 3 NS NS NS 44 5 0 NS 2 NS NS 6 1 2 NS NS NS

(44)!" → 0000 0000 0000 0000 0000 0000 0100 0100 → 0000 0000 0000 0000 0000 0000 0000 0010 2. 67 Shlomi Oved CS-UY 2214 Comp Arch Notes Clock  Period State Reset REGA REGB REGC MDR M[10000000] Initial -- -- 10000000 ? 2 ? E 1 0 0 NS NS NS NS NS 2 1

NS NS NS E NS 3 2

NS E 1 NS NS 4 3

NS F NS NS NS 5 4

NS NS NS F NS 6 0 0 10000004 NS NS NS F

Purpose: From base memory location of REGA, we’re incrementing memory location  contents by 1, for REGC number of locations, when complete, do nothing in state 5,   until reset signal is 1 then get new inputs for REGA, REGC, and restart. 3. ADD JR to EMY CPU JR: Syntax: JR Rs Architectural Operation: �� ← �� High Level: 1 0 same same Data Path: JR All others States  2-9 Same 0  M 1  U 2  X 3  6 PC← � To PC From A PCSource4   68 Shlomi Oved CS-UY 2214 Comp Arch Notes 4. Add JAL to EMY CPU JAL: Syntax: JAL Address �31 ← ��          Architectural OP: �� ← �� 31 − 28 , ������� ≪ 2 High Level: 0 same 1 All others States  2-8 Same 9 same J JAL 16 R31← �� same Data Path: RegDst 2 IR=Instruction  Register[20-16] [15-11] 0  M 1  U 2  X      2 (11111) Write Reg    GPR Register  ALUout MDR PC Q& 69 Write Data 0  M 1  U 2  X    3 2 File Mem-To-Reg Shlomi Oved CS-UY 2214 Comp Arch Notes Q&A Q8. High Level: State 2: ������ ← � + �����! Instruction: Format: I-format Syntax: ADDRIM (Rt)++, Rs, Imm Architectural Operation: M[Rt]← �� + ���!; �� ← �� + 4 Datapath: MABUS: Memory Access BUS 0  M 1  U 2  X    1 B MABUS MRBUS: Memory Read BUS MWBUS: Memory Write BUS 2 IorD B ALUout 0  M 1  U 2  X    7 Sel MWBUS MWBUS NEW MUX70 Shlomi Oved CS-UY 2214 Comp Arch Notes 4/3/17- Lecture ? 13 214 4 • Hardware implement- Cloud? 1. Hardwiring: Gates & FF generate control & Next State signals 71 Shlomi Oved CS-UY 2214 Comp Arch Notes 2. Microprogramming: A memory module in Control Unit is stored  control & Nest state values & used as a look up table. • Each location of memory corresponds to a state in low-level state  diagram Status Memory ROM Current State State  Gates Register Next State • Each location generates control signals for Data Unit-> Its content is  called Control word≡Micro instruction • Whole content is control program≡Microprogram • Memory is called control memory≡Micromemory Group I & III Micromemory 16x21 bit 17 21 4 21 4 4 4 Current  state ADD State Register 4Next  state 4 4-bit 4-to-1 MUX 3 2 1 0 Add Ctrl 2 Sel D.R.I.  For State 1 4 4 D.R.II D.R.I 4 40 6 72 Opcode Shlomi Oved CS-UY 2214 Comp Arch Notes Loc Content

0 6 R Format 2 9 J 4 8 BEQ 17 2 ADDRM

23 2 LW 2B 2 SW

D.R.II.  For State 2 Loc Content

17 3 ADDRM 23 3 LW 2B 5 SW

Microcode ≡Microinstructions in terms of 1s & 0s PCW PCWCond IorD MemRea d MemWrt IRW MemtoReg PCSrc ALUop ALUSrcB ALUSrcA RW RedD st AddC trl 1 0 0 1 0 1 0 00 00 01 0 0 0 11 0 0 0 0 0 0 0 00 00 11 0 0 0 01 0 0 0 0 0 0 0 00 00 10 1 0 0 10 0 0 1 1 0 0 0 00 00 00 0 0 0 11 0 0 0 0 0 0 1 00 00 00 0 1 0 00 0 0 1 0 1 0 0 00 00 00 0 0 0 00 0 0 0 0 0 0 0 00 10 00 1 0 0 11

0 0 0 0 0 0 0 00 00 01 10 0 0 011 0 0 0 0 0 0 0 00 00 00 00 1 0 000

Loc 0 1 2 3 4 5 6 7 8 9 16 0 • Adding ADDRM to Architectural operation. • Modify Control Unit 1. Since we have states 16 & 17, State Register is 5 bits wide 2. Micromemory receives 5 address bits. It has 32 locations 3. Since state register has 5 bits, MUX is a 5-bit MUX 4. Since StateRegister has 5 bits, ADDer is a 5-bit ADDer 5. Since state 3 has 2 branches, a new Dispatch ROM is needed: D.R.III 6. D.R.III.   Loc Content  (Decimal)

17 16 ADDRM 23 4 LW

7. MUX is attached D.R.III like other D Roms. D.R.III is connected input 4 of  MUX. MUX is an 8-to-1 MUX 73 Shlomi Oved CS-UY 2214 Comp Arch Notes (5 bits per arrow, arrows numbered from 0 to 4) 8. Since MUX is an 8-to-1 MUX, it needs 3  select signals: 3 address ctr bits 9. D.R.I & D.R.II must have 5  5-bit 8-to-1 MUX bits/location 10. D.R.I & D.R.II are modified to  include ADDRM 11. ALUSrcA has 2 bits to run ADDRM D.R.III ADDerD.R.II D.R.I 012. Since we added 2 bits to each location, we have 23 bits per micromemory  location: 32x23-bit 4/5/17- Lecture Computer Performance Performance= ! !"#$%&'() !"#$ Execution Time- CPU Time + Non-overlapped I/O Time ��������� ���� ≈ ��� ���� ��� ���� = ����������� + ������������� ��������� ���� ≈ ����������� ≅ ������� ������� = ������ �� ����� ������� ��� ������� ∗ (����� ������) ������� = ������ �� ����� ������� ��� ������� ����� ��������� Number of clock periods for  program= (���! !!! !!! ∗ �!) (�! = ������ �� ������������ �� ���� � ���) NI=number of instructions running for program= (�! !!! ���!"#$!%# = !!! ) �� ������� �ℎ� ������ �� ����� ������� ����� ��� ���ℎ ����������� ��� �ℎ� ������� =!"#$%! !" !"#!$ !"#$%&' !"# !"#$"%& !" ������� = �� ∗ ���!"#$!%# ∗ ����� ������ We assume an ideal memory ����!"# ������� ������������ ��� ������ = ������� ������ �� ������������ ��� ��� �ℎ� ������� ��� ������ �� �������� = �� ������� ∗ 10! = ����� ���!"#$!%# ∗ 10! ������!"# = ������ �� �������� ����� ���������� ��� �������� �� ���� ��� ������� ∗ 10! (Giga, Trillion, Peta) 74 Shlomi Oved CS-UY 2214 Comp Arch Notes An App Run Instruction ���! �! ADD 4 10M MULT 6 1.5M LOAD 5 2.5M STORE 4 0.35M BRANCH 3 0.15M FPADD 8 5M FPDIV 20 0.5M

Cfreq=1GHZ=10! Hz Cperiod= ! !"#$% = !!"! = 10!! = 1 ���������� ! �� = �! !!! = 10� + 1.5� + 2.5� + 0.35� + 0.15� + 5� + 0.5� = 20� = 20 ∗ 10! !!!! = 4 ∗ 10� + 6 ∗ 1.5� + Number of cperiods for program= ���! ∗ �! 5 ∗ 2.5� + 4 ∗ 0.35� + 3 ∗ 0.15� + 8 ∗ 5� + 20 ∗ 0.5� = 113.35� = 113.35 ∗ 10! �� = 113.35 ∗ 10! ���!"#$!%# = ������ �� ����� ������� ��� ������� 20 ∗ 10! = 5.67 CPUtime=�� ∗ ���!"#$!%# ∗ ������� = 20 ∗ 10! ∗ 5.67 ∗ 10!! = 113.35 ∗ 10!! ������� ∗ 10! = 20 ∗ 10! ����!"#$!%# = �� 113.35 ∗ 10!! ∗ 10! = 176.44 ������ �� �� ���������� ��� ������� = �!"#$$ + �!"#$% = 5� + 0.5� = 5.5� = 5.5 ∗ 10! FP time for  program= ������ �� �� ����� ������� ��� ������� ∗ ����� ������ = ���!"#$$ ∗ �!"#$$ + ���!"#$% ∗ �!"#$% ∗ ������� = 8 ∗ 5� + 20 ∗ 0.5� ∗ 10!! = 50 ∗ 10!! ������ ��� ������� ∗ 10! = 5.5 ∗ 10! ������!"#$!%# = ������ �� �� ��� ��� ������� Clock Doubling 50 ∗ 10!! ∗ 10! = 110 Instructions Run= ADD+ADD+ADD+ 5(LW+ADD+ADDI+ADDI+BNE)+ADD+JR Cfreq=1 GHz=10!�� ���!"" = 4 → 0,1,6,7 ���!" = 5 → 0,1,2,3,475 Shlomi Oved CS-UY 2214 Comp Arch Notes ���!""# = 4 → 0,1,16,17 ���!"# = 3 → 0,1,16 ���!" = 3 → 0,1,16 ������� = 1 ����� = 110! = 10!!� = 1 ���������� ������� = ������ �� �������� ��� ������� ∗ ������� = ���!"" ∗ �!"" + ���!" ∗ �!" + ���!""# ∗ �!""# + ���!"# ∗ �!"# + ���!" ∗ �!" ∗ ������� = ( 4 ∗ 5 + 5 ∗ 5 + 4 ∗ 10 + 3 ∗ 5 + 3 ∗ 1 ∗ 10!! = 119 ∗ 10!!������� = 119 ����������� What if ���!"#$ = 2���? ���!"" = 5 → 0∗, 1, 6, 7 ���!" = 7 → 0∗, 1,2,3∗, 4 ���!""# = 5 → 0∗, 1,16,17 ���!"# = 4 → 0∗, 1,16 ���!" = 4 → 0∗, 1,16 ������� = 77 �� ������� = 1 ����� = 1 2 ∗ 10! = 0.5 ∗ 10!! = 0.5 ����������� 4/7/17-Recitation76 Shlomi Oved CS-UY 2214 Comp Arch Notes 77Shlomi Oved CS-UY 2214 Comp Arch Notes 78Shlomi Oved CS-UY 2214 Comp Arch Notes 79Shlomi Oved CS-UY 2214 Comp Arch Notes 80
Page Expired
5off
It looks like your free minutes have expired! Lucky for you we have all the content you need, just sign up here