Shlomi Oved CS-UY 2214 Comp Arch Notes Lecture • CS2214- Computer Architecture and Organization- • CS Course • Hardware Course o Designing Computers ▪ Designing Microprocessors ▪ Designing Memory o Design by using state machine technique ▪ Design any digital system this way. • Such as processors, GPU, memory, I/O Controllers,… • Computer Classification (Based on speed, cost, power consumption, weight, size) o Super Computers ▪ Scientific Apps o Servers (Mainframe) o Desktop Computers o Embedded Computers • Computer Architecture o View of a computer by a machine language programmer ▪ A machine language is in terms of ones and zeros. ▪ Instruction set (command set), data representation, memory, Input/Output, Control modes,… • Intel x86->CISC (Complex Instruction Set Computer) Arch. • MIPS (cable box), ARM (mobile)->RISC (Reduced Instruction Set Computer) Arch. • We will design a MIPS based system. • Intel RISC: Atom (Intel gave up on mobile processing to ARM). • Computer Organization is a set of resources that implement (support) architecture. o Central Processing Unit (CPU) or Core, Memory, I/O Controllers o CPU runs Machine Language programs

1 Shlomi Oved CS-UY 2214 Comp Arch Notes CPU




![]()


Cache
![]()
![]()
![]()
![]()
Memory
![]()
![]()
(SRAM)
![]()
![]()
![]()
Single Core Processor
![]()

![]()

![]()


![]()


![]()
![]()
![]()
Memory I/O Controller Keyboard
![]()
![]()
![]()
![]()
DRAM
![]()
Buses
![]()


![]()
I/O Controller
![]()
![]()
![]()
![]()
Mouse • Designing a Computer
![]()
![]()
![]()
![]()
I/O Controller
![]()
![]()
![]()
![]()
Hard Disk
![]()
![]()
Flash-EPROM
![]()
Memory (SSD)o MIPS-microprocessor based computer->A RISC ���������� o Use design principles ▪ Make common case fast ▪ Smaller/Simpler systems and concepts are faster/cheaper ▪ RISC systems! o Use layered-computer-design concept ▪ Top-down design 1. Application Layer o Scientific Apps a) Simple data elements- Integers & Real numbers (Floating-Points numbers) b) Arithmetic/Logic operations c) Complex data elements: Arrays (vectors, matrices) d) Step through elements of arrays ▪ Loops 2. Computational Method Determines nature of operations & operands and how operations are initiated a) Control Flow Operations are performed according to a list of sequential operations Control flow hides parallelism Ex: � = � + � ∗ � − � /� ADD SUB 2 Shlomi Oved CS-UY 2214 Comp Arch Notes MUL DIV b) Data Flow- An operation starts when its operands are ready. Parallelism is exposed.
![]()
B
![]()
C
![]()
D
![]()
E
![]()
F
![]()
![]()
![]()
![]()
+
![]()
![]()
![]()
![]()
-
![]()
![]()
![]()
![]()
*
![]()
![]()
![]()
![]()
/
![]()
![]()
Ac) Demand-Driven: operation is initiated when its result is demanded: Parallelism is exposed. d) Systolic computing- Limited for scientific application, horrible for other applications (web and others). e) Neural Computing- Emulates neurons using transistors. 3. Algorithms: an algorithm is a mechanical procedure that accepts inputs and generates results after performing a finite number of steps. o Sequential algorithms!!! 4. High-Level Language: Programs implementing algorithms where steps are converted to statements. o Sequential programs o FORTRAN, C (programming language) 5. Operating Systems: Needed to abstract (hide) hardware details • Unix, Linux, 6. Computer Architecture: Machine Language instructions, Memory, Input/Output,… and others form the Computer Architecture o Sequential instructions 7. Computer Organization- (Microarchitecture): Set of resources to support Computer architecture: o CPU, Memory, I/O Controllers o One CPU->Sequential system 8. Digital Logic: Digital (logic) circuits implement CPU, memory, I/O Controllers, etc… o Gates & Flip-flops form digital circuits o A Flip flop stores a bit 3 Shlomi Oved CS-UY 2214 Comp Arch Notes 9. Transistor: Digital circuits (gates & flip flops) are implemented by transistor circuits (Full parallelism with Digital Logic/Transistor-FPGA chips- >reconfigurable chips. Chips that can do programs without software) Recitation Joe- JAB995@nyu.edu Comp Arch- Section B1 • Number Systems • Binary (base 2)- Unsigned (only positive) and 2s complement (positive and negative numbers) • Hexadecimal (base 16)- 0-9, A-F • Decimal (base 10) • �!10! + �!!!10!!! + �!!!10!!! + ⋯ + �!10! + �!10! • Decimal to Binary (unsigned) • 50 !" → ? !"#$%"&' • !"! = 25, � = �, ����� ����������� ��� (���) • !"! = 12, � = � • !"! = 6, � = � • !! = 3, � = � • !! = 1, � = � • !! = 0, � = � ���� ����������� ��� (���) • 50 !" → 110010 !"#$%"&' • Binary to Decimal • (110010)!"#$%"&' • Generalized Formula (b=base): • �!�! + �!!!�!!! + ⋯ + �!�! + �!�! • 1 ∗ 2! + 1 ∗ 2! + 0 ∗ 2! + 0 ∗ 2! + 1 ∗ 2! + 0 ∗ 2! • 32 + 16 + 12 = (50)!" • 2’s Compliment • Look at MSB: • 1. Even (0) it is positive • 2. Odd (1) it is negative • 50 !" → ? !!! !"#$%&#&'( • You should get 110010. Since the MSB is 1 (odd), then you need to add a zero since 50 isn’t negative. • (0110010)!!! !"#$%&#'() = (50)!" • Unsigned and Positive 2’s compliments numbers4 Shlomi Oved CS-UY 2214 Comp Arch Notes • Appending zeros to left side of MSB does nothing to change the number • Negative Decimal to 2’s Compliment • −50 !" • 1. Convert negative decimal number into positive decimal number. • −50 !" → (50)!" • 2. Get binary number of positive decimal number. • (50)!" → (0110010)! (Extra zero because its 2’s compliment and it is positive 50 not negative 50. • 3. Convert current binary number into negative binary number o Flip bits (Turn zeros into ones and ones into zeros) o 0110010 → 1001101 o Add 1 (positive 1) o 1001101+0000001=(�������)�!� ���������� • −�� �� → ������� �!� ���������� • Negative 2’s compliment to decimal • 1001110 !!! !"#$%&#'() • 1. Convert to positive 2’s compliment (flipping bits and adding 1) • 1001110 !!! !"#$%&#'() → 0110001 !!! !"#$%&#'() • 0110001 + 0000001 → (0110010)!!! !"#$%&#'() → 2! + 2! + 2! = 32 + 16 + 2 = 50 • Decimal to Hexadecimal (base 16)
Decimal
Hex
Binary
0
0
0000
1
1
0001
2
2
0010
3
3
0011
4
4
0100
5
5
0101
6
6
0110
7
7
0111
8
8
1000
9
9
1001
10
A
1010
11
B
1011
12
C
1100
13
D
1101
14
E
1110
15
F
1111
o How much is the CPU involved?

We also discuss several other topics like ws 104 study guide
• (50)!" • Method 15 Shlomi Oved CS-UY 2214 Comp Arch Notes • 1. Convert decimal to binary • 2. Convert binary to hex • 1. 50 !" • !"! = 25 � = 0 ��� • !"! = 12 � = 1 • !"! = 6 � = 0 • !! = 3 � = 0 • !! = 1 � = 1 • !! = 0 � = 1 ��� • (110010)! → 11 − 0010 → 0011 − 0010 ��� �ℎ��� �� ������ → 0011 = 3, 0010 = 2,�ℎ������� �� ��� ↔ (��)�� ↔ ���� • Method 2 • !" !" = 3 � = 2 ��� • !!" = 0 � = 3 ��� • (��)��� • −50 !" → 11001110 !!! !"#$%&#'() → (��)!"# • Unsigned Binary Addition • (50)!" + (25)!" = (75)!" • (25)!" → (11001)!"#$%"&' • 110010(50) + 011001(25) = 001011 ���ℎ ����� ��� �� 1 • Overflow of 1 in unsigned binary addition • 2’s Compliment Subtraction • (50)!" − 25 !" = 50 + (−25) • 0110010 (50)+ 0011001(-25) • 1. Change sign of second binary number (Flip bits and add 1) • 2. Perform an addition • 1. 0011001 → 1100110 + 1 → 1100111(-25) • 2. 0110010+1100111=0011001 (with carry out of 1) • Overflow (2’s compliment) • 1. After an addition between a positive and negative 2’s compliment number there will NEVER be overflow • 2. Add 2 positive numbers and result is negative • 3. Add 2 negative numbers and result is positive. • Lecture 1/30/17 • Computer Layers 1. Applications 2. Computational Method6 Shlomi Oved CS-UY 2214 Comp Arch Notes 3. Algorithms 4. High-Level Language 5. Operating Systems 6. Computer Architecture (HW/SW Interface) 7. Computer Organization 8. Digital Logic 9. Transistor Designing MIPS-Based Computer 1. Applications: Scientific • Number crunching a) Simple data elements: o Integers, Floating-Point numbers b) Operations on them: o Arithmetic/Logic c) Complex data elements o Arrays d) Operations on them: o Stepping through elements ▪ Loops Developing programs ▪ A compiler translates a high-level language (C/C++/Python/…) to a special file called object file: o This object file has: o Machine language program (text segment) o Global (static) data o Information needed to link the program to other programs. ▪ The object file is stored to the disk. ▪ Linker links separately compiled programs and libraries and generates executable file then stores to disk. ▪ Loader loads executable from disk to memory to run software (app) ▪ DLLs= Dynamically Linked Libraries ▪ Then, CPU runs software (app) by accessing memory 6. Computer Architecture o View of computer by a machine language programmer: o Machine language instruction, data, representation, memory, input/output,… 1. Data Representation o Word length=32 bits o Largest integer size I. Integers (1 byte=8 bits) o 1-/2-/4-byte Unsigned numbers o 1-/2-/4-byte 2’s Complement numbers II. Floating-Point o 32-bit (Single precision) o FP numbers7 Shlomi Oved CS-UY 2214 Comp Arch Notes o 64 bit (Double precision) o FP numbers o IEEE-754 FP Standard III. Characters o 8-bit ASCII code o 16-bit UNICODE 2. Register Model o A register keeps information o They keep data and addresses o They are faster than memory o They keep operands/results of A/L operations o Registers are in CPU & I/O Controllers I. 32 32-bit integer registers (General-Purpose Registers =GPRs) They contain also addresses o R0, R1, R2, …, R31 o R0 is always 0 o R31 keeps return address from functions II. 2 32-bit registers to keep results of integer MUL & DIV (Hi, Lo) (Together they keep 64 bits to avoid overflow) III. 32 32-bit FP Registers o F0, F1, F2, …, F31 o �(!"!#), �(!"!#!!) → ��: (�!, �!) IV. A 1-bit Cond register to keep result of FP compares o It is a flag V. A 32 bit instruction pointer, pointing at the next instruction to run o Program Counter (PC) (the control flow) VI. Registers for Os o System registers o Status register VII. Registers for I/O Controllers o Status (Condition) Register o Data Register Lecture- 2/1/17 Designing MIPS-Based Computer 1. Applications: Scientific • Number crunching 2. Computer Architecture a. Data Representation b. Register Model & Word Length • CPU & I/O Controllers have registers c. Input/Output • Input≡Data transfer from an I/O Device to memory • Output≡Data transfer from memory to an I/O Device • Memory ↔ I/O Controllers ↔ I/O Devices (Buses and Ports)8 Shlomi Oved CS-UY 2214 Comp Arch Notes • Methods & forms of data transfers & what happens with I/O completions & I/O problems. o How much is the CPU involved? ▪ CPU should interact with I/O Controllers so they would continue independently on their own. ▪ How does the CPU point at them? ▪ Memory-Mapped I/O≡I/O Controllers are treated like memory≡One set of instructions for both memory and I/O Controllers. (LW-Load from Memory & SW-Store to Memory for both)
![]()
Memory I/O Mem. Ctrls
![]()
![]()
![]()
![]()
![]()
![]()
![]()
One address space▪ ▪ Virtual memory does not allow memory space size to be affected d. Control Modes: A control mode indicates which operations & addresses are applicable at the moment to the user o User vs system (kernel) modes o State of the program must be saved before switching to the other mode & restored to resume the program. o PC (Program counter), GPR (General Purpose Registers)(with software conventions we reduce the number of registers needed to be saved), flags,… o Management of transition (switch) between modes e. Interrupts: An interrupt is an event that forces CPU to stop running current program & start running interrupt program (function, handler) I. External: Reset, Timer, I/O Controller,… II. Internal≡Exceptions: Arithmetic Overflow, invalid address III. Software≡TRAPS≡ System Calls f. Addressing≡ Accommodating memory & I/O Controllers≡Accomodating Instructions + data + addresses o 32 Address bits→ 2!"����� (4 ��) o Memory is a 2-d array ▪ Rows ≡ Locations • Locations have unique identifiers=Addresses ▪ Columns ≡Bits/location 9 Shlomi Oved CS-UY 2214 Comp Arch Notes • 32 bits/location ▪ Byte addressing • Big Endian Addressing (Left to Right) 100 104 108
![]()
10C
![]()
![]()
![]()
![]()
Memory
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
100 101 102 103 104 105 106 107 108 109 10A 10B
![]()
10C 10D 10E 10F
![]()
![]()
▪ No word boundary crossing ▪ (you can’t cross into another location such as show below)
![]()
Memory
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
• Address if item is divisible by length of item in bytes. • Sizes of Address Space Portions User: 2GB System: 2GB
![]()
![]()
![]()
![]()
Memory I/O
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Ctrl
![]()
0
![]()
![]()
![]()
7FFFFFFFCFFFFFFFC 80000000
![]()
![]()
![]()
10 Shlomi Oved CS-UY 2214 Comp Arch Notes 2/3/17- Recitation Digital Logic and Operators AND ((X*Y)=output) Truth Table
X
Y
Output
0
0
0
0
1
0
1
0
0
1
1
1
▪ How does the CPU point at them?

Don't forget about the age old question of What does the body use for energy?
OR (x+y=output) Truth Table
X
Y
Output
0
0
0
0
1
1
1
0
1
1
1
1
NOT (� = ������)
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Truth Table
X
Output
0
1
Don't forget about the age old question of What are the PA guidelines for?
![]()
![]()
![]()
11 Shlomi Oved CS-UY 2214 Comp Arch Notes
1
0
NAND �� = ������
![]()
![]()
![]()
![]()
Truth Table
X
Y
Output
0
0
1
0
1
1
1
0
1
1
1
0
• a=b+25 (How do we deal with constants?

NOR (� + � = ������)
![]()
![]()
![]()
![]()
Truth Table
X
Y
Output
0
0
1
0
1
0
1
0
0
1
1
0
Don't forget about the age old question of vinish shrestha
XOR (exclusive-or) �⨁� = 0
![]()
![]()
(When you have odd number of ones then you output a one) Truth Table
![]()
![]()
X
Y
Output
0
0
0
0
1
1
12 Shlomi Oved CS-UY 2214 Comp Arch Notes
1
0
1
1
1
0
Ex:
X
Y
Z
Output
0
0
1
1
0
1
1
0
1
1
1
1
Software Conventions for Register Usage R0: Always ∅ (read only) R1: Assembler Temporary Register R2-R3: Function Results R4-R7: Function Arguments (parameters) R8-R15: Temporary Registers R16-R23: Temp Registers (saved by the callee) R24-R25: Temporary Registers 400,000 �� �9,0 �8 400,004 �� �10,4 �8 400,008 ��� �11, �9, �10 400,00� �� �11, 8(�8) C-like Int a,b,c; a=b+c; MIPS/EMY Assembly Language Program LW $tl, b LW $t2, c ADD $t3, $t1, $t2 SW $t3, a b: .word 0x65E c: .word 0x2C a: .word 0x0 MIPS/EMY Mnemonic Machine Language Program 400,000 �� �9,0 �8 #�9 ← � 100000000 , �9 ← 65�13 Shlomi Oved CS-UY 2214 Comp Arch Notes 400,004 �� �10,4 �8 #�10 ← � 100000000 + 4 , �10 ← 2� 400,008 ��� �11, �9, �10 #�11 ← �9 + �10, �11 ← 65� + 2� 400,00� �� �11, 8(�8) #� �8 + 8 ← �11, 0�10000008 ← 68A Global Data 10000000 65E LW R10,4(R8) 10000004 2C �10 ← � 100000000 + 4 10000008 68A Note: R8 has 10000000 LW Rt, Offset (�!) �! = ������ �������� �� ���� �! ← �[�! + ������] ADD �!, �!, �! #�! ← �! + �! SW �!, ������(�!) �[�! + ������] ← �! Word is 32 bits 4 bytes ahead for memory changes 4*8=32 bits Add R11, R9, R10 R9+R10=R11 65E+2C=0110 0101 1110 + 0000 0010 1100= 0110 1000 1010=68A R11= 68A Handout #4 Note 3: - Memory is passive - Memory can’t perform architectural operations on data - It can only keep instructions and data - Another unit is implied in Machine Language Programs to manipulate memory locations - This implied unit is visible in microarchitecture (CPU) Note 4: RISC Computers -Require explicit instructions to read data to the CPU from memory- -Similar for storing data to memory(sw) - Only LOADs and STORE, can access memory CISC Computers -Have more complex arithmetic/logic/floating-point instructions which can specify many locations rather than registers RISC (See Table on Handout 4, Note 8)14 Shlomi Oved CS-UY 2214 Comp Arch Notes There are 7 Memory accesses in that table However for CISC: Memory Accesses: 1 for Instruction Read 2 for Data Read 1 for Data Write Only 4 Memory Accesses
ADD
M[z]
M[x]
M[y]
Don't forget about the age old question of polykin
02/06/17- Lecture Designing a MIPS-based Computer 1. Applications • Scientific Apps o Number crunching operations 2. Computer Architecture a. Data Representation & Word length b. Register Model c. Input/Output d. Control Modes e. Interrupts f. Addressing g. Machine Language Instructions≡Machine Language Instruction Set≡Architectural operations, Arguments & Addressing modes • An application specifies operations • App operations are implemented High-Level language (HLL) statements • HLL statements are implemented by Machine Language (ML) Instructions • An app operation is implemented by Multiple HLL statements. • A single HLL statement is implemented by Multiple ML instructions • A HLL statement has variables & operation(s) • A ML instruction has architectural operations, arguments & addressing modes • Memory is needed to keep variable?!! o Each variable is a memory location ▪ Static (global) & local variables o Today: Memory can’t perform operations & so we need CPUs A=b+c HLL Statement with 3 variables and an operation 1. ADD (R8) (a), (R2) (b), R(10) (c) (ONLY ONE ML INSTRUCTION!!!) CISC Instruction with 3 memory accesses for data 2. RISC Instructions15 Shlomi Oved CS-UY 2214 Comp Arch Notes LW R9, ∅(R*) LW R10, 4(R8) ADD R11, R3, R10 SW R11, 8(R8) 4 ML Instructions Ex: CISC vs. RISC A=b+c D=a+e F=d+b+c 10 Memory Accesses for data CISC (actually 12 can’t add 3 digits at once) In the worst case: 8 Memory Accesses for data RISC In the best case: 3 Memory Accesses for data RISC Usual case: 6 Memory Accesses for data RISC ▪ Arithmetic Logic CISC instructions access memory for data. But arithmetic logic RISC instructions do not. o RISC≡ �/� ���ℎ�������� (only load and store instructions access memory for data). ▪ Machine Language Instructions≡ 1s & 0s ▪ Mnemonic Machine Language Instructions o A mnemonic for each architecture operation o All instruction & data addresses specified o All addresses & data elements are specified in Hexadecimal coding o Special characters to specify registers & addressing modes ▪ MIPS Instruction Set o EMY Instruction Set≡ 9 �����������s ▪ Arithmetic Logic, Data transfer, Control instructions ▪ A ML instruction specifies an architectural operation, arguments & addressing modes. ▪ Each ML instruction uses an instruction format that indicates how to interpret bits o Register, Immediate & Jump ▪ Instruction format fields o Opcode specifies architectural operation o A number of register fields to specify registers o If necessary additional fields to specify addressing modes o 32-bit Integer Addition (2’s Complement) ▪ Syntax: • ADD, Rd, Rs, Rt ▪ Semantics • �� ← �� + �� THEN • If overflow, generate an internal interrupt (execption) ▪ Format, Addressing Modes, memory accesses16 Shlomi Oved CS-UY 2214 Comp Arch Notes • Format: Register format since Rd is needed
Opcode (6 bits)
Rs (5 bits)
Rt (5 bits)
Rd (5 bits)
Shift Amount (5 bits)
Function (6 bits)
000000
Add
←10000
If you want to learn more check out 1) What determines the death of a star in the SS?
• Function= 2nd Opcode • An addressing specifies how an argument is pointed: An address or an instruction • 3 arguments: Rd, Rs, Rt • Rd is a destination register directly indicated by instruction→ Register Addressing Mode • Rs & Rt are source registers directly specified by instructions → Register Addressing Mode • Number of memory accessed made (by CPU) to run instructions: • 1 memory access to fetch (read) instruction 7. Machine Language Instructions≡Machine Language Instruction SH≡Architectural operations, Arguments & Addressing Modes • 32-bit 2’s Complement Int o Addition ▪ Syntax: • ADD Rd, Rs, Rt ▪ Semantics ▪ �� ← �� + �� then If overflow, generate Arithmetic overflow Internal Interrupt (Exception) • Format, Addressing modes & memory accesses o Format: Register format since Rd is used o 3 Arguments: Rd, Rs, Rt ▪ 3 addressing modes ▪ Rd is a destination argument, a register, directly specified by instruction→Register Addressing Mode: ▪ Rs & Rt are source arguments, registers, directly specified by instruction: Register Addressing Mode ▪ 1 memory access made by CPU to run instruction (Instruction fetch) • 32-bit 2’s Complement Int Subtraction ▪ Syntax: SUB Rd, Rs, Rt ▪ Semantics: �� ← �� − �� then if overflow, generate an arithmetic overflow internal interrupt (exception • Format, etc… o Format: Register format since Rd is needed o The Rest is the same as ADD17 Shlomi Oved CS-UY 2214 Comp Arch Notes
Register Format
6
5
5
5
5
6
Opcode
Rs
Rt
Rd
Shift Amount
Function
000000
100000→ ���
000000
100010 → ���
000000
101010→SLT
000000
100100→ AND
000000
100101→ OR
• 32-bit 2’s Complement Compare (Less Than) o Syntax: SLT, Rd, Rs, Rt o Semantics: If Rs<Rt then �� ← 1 else �� ← ∅ • Format, etc… o Format: R format since Rd is needed o 5 arguments: Rd, Rs, Rt, ∅, 1 o Rs, Rt, Rd: Same as ADD, ∅ & 1 are implied by Instruction→ ������� ���������� ���� o 1 memory access made by CPU your instruction: Instruction fetch(read) • 32-bit AND o AND Rd, Rs, Rt o �� ← �� & �� (& → ������� ���) o Rest is same as ADD except function field o AND R8, R9, R10 o �8 ← �9 & �10
R9
.
.
.
0
1
0
1
1
0
R10
.
.
.
0
1
1
0
1
0
R8
.
.
.
0
1
0
0
1
0
• 32-bit OR • OR Rd, Rs, Rt • �� ← ��|�� (| ������� ��) • Rest is same as ADD except function field • OR R8, R9, R10→ �8 ← �9|�10
R9
.
.
.
0
1
0
1
1
0
R10
.
.
.
0
1
1
0
1
0
R8
.
.
.
0
1
1
1
1
0
• These 32-bits refer to an instruction: • 0000 0001 0110 1001 0101 0000 0010 0000 • Start by check the left most 6 bits (the Opcode) • It is: 000000 (Which indicates the Register format) • We have to look at the last 6 bits18 Shlomi Oved CS-UY 2214 Comp Arch Notes • Opcode: 000000 • Rs: 01011= R11 • Rt: 01001= R9 • Rd: 01010= R10 • Shift Amount: 00000 • Function: 100000 • ��� �10, �11, �9 • 32-bit Load from Memory o LW Rt, Disp(Rs) o �� ← � �� + ���� ��������� ������� • Format, etc. o Immediate Format since a displacement is needed o 2 Arguments: Rt is a destination argument, a register, directly specified by instruction: Register Addressing Mode o A memory location is source argument whose address is calculated by adding Rs & sign extended Displacement: o 2-byte signed Displacement Addressing Mode o 2 memory accesses: 1 to fetch instruction and 1 to read a data element. • 400000 LW R9, ∅ �8 → �9 ← � �8 + 0! → �9 ← �[�8] • 400004 LW R10, 4(R8)→ �10 ← � �8 + 4! → �10 ← �[�8 + 4] • 400008 ADD R11, R9, R10 • 40000C SW R11, 8(R8)→ � �8 + 8! ← �11 → �[�8 + 8] ← �11 • ≡ • 10000000 65E← �8 • 10000004 2C • 10000008 0← �8 → �� �9, (−8)!"(�8) LW R10, (−4)!"(�8) SW R11, ∅(�8)
![]()
![]()
Immediate Format
6
5
5
16
Opcode
Rs
Rt
Immediate
100011→LW
• 32-bit Store o SW Rt, Disp(Rs) o � �� + ����! (��������� �������) ← �� • Format, etc. o I format since a Disp is needed o 2 Memory Accesses: IF and DR Recitation- 2/10/17 • Handout 4- Page 4 #2 19 Shlomi Oved CS-UY 2214 Comp Arch Notes • Write a mnemonic machine language program that implements the following high-level statement: • � = �|� • Statement ��! two variables and stores the result in another variable. • Assume: variable “a” is in memory location 10000004 • “b” is in R8 and contains 27, • “c” is in memory location 10000000 containing 4A0. • This program starts at location 400000. R9 contains 10000000 initially. • 400000 LW R10, 0(R9) #Load from memory to register: �10 ← � 10000000 , �10 ← 4�0 • 400004 OR R11, R8, R10 #OR two registers: �11 ← �8|�10, �11 ← 4�7 • 400008 SW R11, 4(R9) #Store from register to Memory: � 10000004 ← �11, �[10000004] ← 4�7
Instruction
PC
R8
R9
R10
R11
M[10000000]
M[1000004]
Mem. Accesses
Initial
400000
27
10000000
?
?
4A09
0
-
LW R10, 0(R9)
400004
NS
NS
4A0
NS
NS
NS
2: IF & DR
OR R11, R8, R10
400008
NS
NS
NS
4A7
NS
NS
1; IF
SW R11, 4(R9)
40000C
NS
NS
NS
NS
NS
4A7
2: IF & DR
Note 3: What if base register of SW was R10 by mistake SW R11, 4(R10) R10= 4A0→ �[4�4] SW Rs, Offset(Rt) HW 1- Relevant Questions and Answers Q1) EMY Machine Language set does NOT have the following instruction” 400000 ADDRM R8, 500(R9) #R8← �8 + �[�9 + 500!] i) Implement the instruction by using a few actual MEY instructions in menomic notation. 400C00 LW R10, 500(R9) #Load the memory operand to R10 400C04 ADD R8, R8, R10 #Add the register and memory operand ii) Assume this instruction is added to the EMY instruction set. Describe its syntax , semantics, format, etc. Syntax: ADDRM Rt, Disp(Rs)20 Shlomi Oved CS-UY 2214 Comp Arch Notes Semantics: �� ← �� + � �� + ����! , �� ��������, �������� �� �������� ���������! Format: I-format because a displacement is needed:
6
5
5
16
Opcode
Rs
Rt
Displacement
Rt is source and destination Register. One register argument is therefore using register addressing mode Third argument is a memory location whose address is the sum of a register and a displacement. Therefore, the 2-byte signed displacement addressing mode is used. Two memory accesses are made for this instruction 1 for Instruction Fetch 1 for Data Read ADDRM stands for Add Register Memory R-format
6
5
5
5
5
6
Opcode
Rs
Rt
Rd
Shift amount
Function
Q2) 400300 ADDMRM R8, (R9,R10) #M[R9]← �8 + �[�9 + �10] i) Implement this instruction: 400300 ADD R11, R9, R10 #Calculate the effective address of the memory operand 400304 LW R12, 0(R11) #Load the memory operand 400308 ADD R13, R8, R12 #Add the register and memory operand 40030C SW R13, 0(R9) #Store the result into the other memory location ii) Syntax: ADDRM Rd,(Rs, Rt) Semantics: � �� ← � �� + �� , �� ��������, �������� �� �������� ������������ Format: R-format since 2 registers (Rs, Rt, Rd) are needed:
6
5
5
5
5
6
Opcode
Rs
Rt
Rd
Shift amount
2nd Opcode
3 arguments are used by the instruction… We perform 3 memory accesses: 1 for Instruction Fetch, 1 for Data read, and 1 for data write. ADDMRM stands for Add memory and register to memory 2/13/17- Lecture21 Shlomi Oved CS-UY 2214 Comp Arch Notes • 32-bit Store to Memory o SW Rt, Disp(Rs) o � �� + ����! ← �� (��������� �������) Disp is a 16-bit signed displacement (in terms of bytes) o Format, etc. ▪ Immediate Format ▪ Since we need a displacement ▪ 2 Arguments: Rt is a source argument, a register, directly specified by instruction: Register Addressing Mode; A memory location is destination whose address is calculated by adding Rs and sign-extended Displacement: 2-byte signed Displacement A.M. ▪ 2 memory accesses: ▪ 1 to fetch instruction ▪ 1 to store data • Control Instructions o Skipping a few instructions (short range) conditionally: Conditional branches o Long range skipping unconditionally: Jumps • Branch If Equal to o BEQ Rs, Rt, Offset o If Rs=Rt then o �� ← �� + ������! ∗ � ��������� ������� o Offset is a 16-bit signed offset in terms of words (locations)
Immediate Format
6
5
5
16
Opcode
Rs
Rt
Immediate
101011 → ��
000100 → ���
• Format, etc: o I format since we need offset o 4 arguments: o Rs & Rt…. o PC is implied by the instruction o Therefore it is the Implied Addressing Mode o An Address is source argument which is calculated by adding PC & sign extended multiplied-by-4 offset: 2-byte signed PC-Relative Addressing Mode o 1 memory access to fetch instruction o ≡ o 400E04 ADD22 Shlomi Oved CS-UY 2214 Comp Arch Notes o 400E08 SUB o 400E0C BEQ, R8, R9, 1 (If R8=R9, then we will branch, we are skipping 1 instruction so the offset is 1). (if you wanted to go back to ADD, then the displacement would be -3). o 400E10 OR o 400E14 AND o Branch taken vs untaken o If � = � (R8=R9) then � = � + � �10 = �11 + �12 o Else � = � − ℎ (�13 = �14 − �15) o ≡ o 400100 BEQ R8, R9, 2 o 400104 SUB R13, R14, R15 o 400108 BEQ R8, R8, 1 (Unconditional jump) o 40010C ADD R10, R11, R12 o 400110 o 26-bit Unconditional Jump ▪ J Address ▪ �� ← �� ��: �� , (������� ∗ �) ((Address*4)=26 bits )(Effective Address) • Address is in terms of words (locations) ▪ Format, etc. • Jump format since a 26-bit address is needed • 2 Arguments: PC is destination argument, implied by the instruction so it is the Implied Addressing Mode: A memory address is the source argument calculated by multiplying the 26-bit address by 4 and attaching to the leftmost 4 bits of PC: 26-bit PC-Direct Addressing Mode.
Jump Format
6
26
Opcode
Address
000010 → �
00400BE0 J 00404000 ADD 00404000(Effective Address) →(Divide by 4) 101000 (address) o 400100 BEQ R8, R9, 2 o 400104 SUB R13, R14, R15 o 400108 J 100044 (The result of dividing 400110 by 4)23 Shlomi Oved CS-UY 2214 Comp Arch Notes o 40010C ADD R10, R11, R12 o 400110 • 32-bit Jump o JR Rs o �� ← �� o Format, etc. ▪ R format- Since Rs is needed even thought the I format can also be used. • Opcode: 000000 • Function: 00 1000 ▪ 2 arguments: PC & Rs: PC is implied by the instruction so it is using the implied addressing mode, and Rs is using the register addressing mode. ▪ 1 memory access to fetch the instruction o JR R31 ????? (Return Register) o Suppose the following function: o 4A00B0 ADD o … o 4A0104 JR R31 o R31 is the return from Function 2/15/17 Lecture 7. MIPS Machine Language Instructions • ADD, SUB, SLT, AND, OR, LW, SW, BEQ, J (EMY CPU will run them) • JR→ ������ ���� ��������� + ⋯. • Jump to Functions o Jump And Link o JAL Address o R31← �� then o �� ← �� 28: 31 , (������� ∗ 4) (Effective address) o Format, etc. ▪ J format since we need a 26-bit address ▪ 4 arguments: R31, PC, PC, (PC[28:31],(Address*4)) ▪ ….. ▪ 1 memory access to fetch the instruction
Jump Format
6
26
Opcode
Address
000011 → ���
= 400EA4 ADD 400EA8 JAL (102002)24 Shlomi Oved CS-UY 2214 Comp Arch Notes 400EAC SUB (R31 gets this instruction) (First three steps are the main) = 408008 OR = 408E0C JR R31 (There are the function steps) • Branch If Not Equal to o BNE Rs, Rt, Offset o If �� ≠ �� then o �� ← �� + ������! ∗ 4 (��������� �������) o I format since we need a 16-bit offset o Rest is same as BEQ o 1 memory access to fetch the instruction
Immediate Format
6
5
5
16
Opcode
Rs
Rt
Immediate
000101 → ���
001000→ ����
001010→ ����
001100 → ����
001101→ ���
001111→ ���
• a=b+25 (How do we deal with constants?) • 32-bit 2’s Complement Immediate Add o ADDI Rt, Rs, Imm o �� ← �� + ���! then if overflow we generate the internal interrupt signal. o Format, etc. ▪ I format since we need a 16-bit immediate data element. ▪ 3 arguments: Rt, Rs, Immediate data element ▪ Rs, Rt: Immediate Addressing Mode ▪ Immediate data is obtained from 16-bit Imm. by doing a sign extension: 2-byte signed Immediate Addressing Mode ▪ 1 memory access to fetch the instruction a=b+25 400A04 ADDI R8, R9, 19 ▪ 32-bit 2’s Complement Imm Compare o SLTI Rt, Rs, Imm o If �� < ���! then �� ← 1 o Else �� ← ∅25 Shlomi Oved CS-UY 2214 Comp Arch Notes o Format, etc ▪ I format since we need a 16-bit Imm data element. ▪ 32-bit Immediate AND o ANDI Rt, Rs, Imm o �� ← �� & ���∗ o I format since we need a 16-bit Immediate data element o 3 arguments: Rs, Rt, Immediate data element. o Rs, Rt: Register Addressing Mode o Immediate data element is obtained by catenating 16 ∅s to left of Imm field: 2-byte unsigned Immediate Addressing mode ▪ Masking ▪ a: 010…0 1011 0111 1111 0100 (16 bits) ▪ ���∗: 000 … 0 0000 1111 1111 0000 ▪ ANDI: 000…0 0000 0111 1111 0000 ▪ ORI: 010…0 1011 1111 1111 0100 ▪ 32-bit Immediate OR o ORI Rt, Rs, Imm o �� ← ��|���∗ o I format since we need a 16-bit Immediate data element. o 3 arguments: Rs, Rt, Immediate data element. o Rs, Rt: Register Addressing Mode o Immediate data element is obtained by catenating 16 ∅s to left of Imm field: 2-byte unsigned Immediate Addressing mode ▪ Load Upper Immediate- ▪ Initialize leftmost bits of a register with a constant. o LUI Rt, Imm o �� ← ���, 0000 o I format since we need a 16-bit Immediate data element. o Rs is not used o � = � + (6780����)!" o 40EF00 LUI R8, 6789 #�8 ← 6789,0000 o 40EFF04 ORI R8, ABCD #�8 ← �8|0000����� ≡ �8 ← 6789���� ▪ 32-bit NOR o NOR Rd, Rs, Rt o �� ← �� �� o R Format since we need Rd o Opcode: 000000 o Function: 100111 o (The rest is the same as AND/OR) o Ex: o �8 ← �8 o NOR R8, R8, R026 Shlomi Oved CS-UY 2214 Comp Arch Notes o NOR R8, R8, R8 Recitation 2/17/17 ▪ Handout 4- Page 4 3. Writing function perform absolute value Notes: The number “k” is passed to parameter register R4. The result is returned in R2. 400400, ADD R2, R4, R0 #�2 ← �4 + �0 ≡ �2 ← �4 ��������� �������� 400404, SLT R8, R4, R0 #Is R4<R0?≡ �� "�" < 0? 400408, BEQ R8, R0, 1 #Yes, (R8=0), skip SUB instruction. Or No, perform SUB 40040C SUB R2, R0, R4 #�2 ← �0 − �4 ≡ ������ �4 400410 JR R31 Set if less than SLT Rd, Rs, Rt If Rs<Rt, �� ← 1 Else �� ← 0
![]()
Assume all registers have values preloaded If a=b (R8=R9) then c=d+e (R10=R11+R12) else f=g-h (R13=R14-R15) K=m|p (R16=R17|R18) 400A00 BNE R8, R9, 2 #a!=b? 400A04 ADD R10, R11, R12 400A08 J 100284 400A0C SUB R13, R14, R15 #f=g-h 400A10 OR R16, R19, R18 #k=m|p J 100284 → 0000 00 0001 0000 0000 0010 1000 0100 00 → 00400�10 → 400�10 Jump and Branch Jump Address (26 Bits) �� ← �� 31 − 28 + (������� ∗ 4) 31 − 0 ���� ���� 4 ���� (26 + 2 ����) 0040000027 Shlomi Oved CS-UY 2214 Comp Arch Notes First 8 are user-space
![]()
12GB Last 8 are system
![]()
space 2GB
![]()
256 MB
![]()
![]()
![]()
![]()
256 MB
![]()
![]()
256 MB
![]()
256 MB
![]()
![]()
.
![]()
![]()
. . .
![]()
![]()
256 MB ← ��[31 − 20] = 0000 = 0 ← 0001 − 1
![]()
← 0010 = 2 ← 0011 = 3 ← 1111 = �00400000 J 100A00 PC[31-28]: 0000 Address:|00 0001 0000 0000 1010 0000 0000| 00→ 0402800 040002C SUB R12, R13, R14 . . . 40040C J 940294(Jump down)/100008 (Jump up) . . . 2500A50 ADD R8, R9, R10 1. Jump to Address 2500A50 (Implicit 0000 to the left) Convert to Binary: 0000 | 0010 0101 0000 0000 1010 0101 00|00 PC[31-28]| 0 9 4 0 2 9 4 | Dividing by 4 causes left shift 940294 2. 0000 |0100 0000 0000 0000 0010 11 |00 28 Shlomi Oved CS-UY 2214 Comp Arch Notes PC[31-28]| 1 0 0 0 0 8 | Dividing by 4 causes left shift Branch: �� ← �� + 4 + ������! ∗ � ��������� ������� 400308 . . . 400F14 BEQ R1, R2, 3D/FCFC . . . . 40100C 1. Branch to 40100C (from 400F18 because PC+4) 0040100C -> 0000 0000 0100 0000 0001 0000 0000 1100 - 00400F18 -> 0000 0000 0100 0000 0000 1111 0001 1000 ______________________________________________________________________ 0000 0000 0100 0000 0001 0000 0000 1100 + 1111 1111 1011 1111 1111 0000 1110 1000 (Negate everything except last 4 bits _______________________________________________________ 0000 0000 0000 0000 0000 0000 1111 0100 (carry out of 1)->(no overflow when adding pos and neg) (hex:) 000000F4->F4 F4=������! ������! 4 = �44 = �������� �� 4 ������� ���� ��� ���� 1111 0100 = 111101 = 3� 2. Branch to 400308 Set first example for SUB->ADD Conversion 00400308-> 0000 0000 0100 0000 0000 0011 0000 1000 - 00400F18 -> 0000 0000 0100 0000 0000 1111 0001 1000 _______________________________________________________________________ 1111 1111 1111 11|11 1111 0011 1111 0000 (I Format can only use 16-bit offset) 11 1111 0011 1111 0000-> F3F0/4-> 1111 1100 1111 1100-> FCFC Lecture 02/22/17-29 Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
![]()
![]()
30Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
![]()
31Shlomi Oved CS-UY 2214 Comp Arch Notes 02/24/17- Recitation 4. Write a function for multiplying 2 numbers, y and z, both of which are always greater than ∅. Assume y & z are passed into R4 & R5 respectively. 401050 ADD R8, R0, R0 # We clear R8 401054 ADD R9, R4, R0 # We move R4, that is y, to R9 401058 ADD R8, R5, R8 # We add R5, z, to R8 (add Z to temporary result) 40105C ADDI R9, R9, (−1)!"# We subtract 1 from Y 401060 BNE R9, R0, (−3)!"# Is it the end (is y equal to zero) ? If not, go to 401058 401064 ADD R2, R8, R0 # The end. We move the result to R2 (software register conventions) 401068 JR R31 # We return from the function Execution Table (4*2)
Instruction
PC
R2
R4
R5
R8
R9
R31
Memory Acceses
Initial
401050
?
(2)!"
(4)!"
?
?
401000
-
ADD R8, R0, R0
401054
NS
NS
NS
0
NS
NS
1:IF
ADD R9, R4, R0
401058
NS
NS
NS
NS
2
NS
1:IF
ADD R8, R5, R8
40105C
NS
NS
NS
4
NS
NS
1:IF
ADDI R9, R9, (−1)!"
401060
NS
NS
NS
NS
1
NS
1:IF
BNE R9, R0, (−3)!"
401058
NS
NS
NS
NS
NS
NS
1:IF
ADD R8, R5, R8
40105C
NS
NS
NS
8
NS
NS
1:IF
ADDI R9, R9, (−1)!"
401060
NS
NS
NS
NS
0
NS
1:IF
BNE R9, R0, (−3)!"
401064
NS
NS
NS
NS
NS
NS
1:IF
ADD R2, R8, R0
401068
8
NS
NS
NS
NS
NS
1:IF
JR 31
401000
NS
NS
NS
NS
NS
NS
1:IF
A=25-b 400000 SUB R10, R0, R9 #b=R9, �10 ← 0 − � ≡ �10 ← (−�) 400004 ADDI R8, R10, (19)!" #�8 ← 25 !" + �10 ≡ 25 !" + �, 19 !" ≡ 25 !" (Assume that b=6) Execution Table
Instruction
PC
R8
R9
R10
Memory Accesses
Initial
400000
?
6
?
-
SUB R10, R0, R9
400004
NS
NS
(−6)!"
1:IF
ADDI R8, R10, 19
400008
19 !"
NS
NS
1:IF
Compiling a While Loop �ℎ��� � � = � (Look for first “i” where � � ≠ �) i=i+1; 40A000 SLL R8,R9,2 #R9==i, �8 = � ∗ 4 ��� ���� ���������� 40A004 Add R11, R10, R8 #R10 has �[∅] address 40A008 LW R12, ∅ �11 #Load A[i] to R1232 Shlomi Oved CS-UY 2214 Comp Arch Notes 40A00C BNE R12, R13, 2 #If � � ≠ �, exit the loop {k=R1} 40A010 ADDI R9, R9, (1)!" #Increment I 40A014 J 100280 #go back to 40A000 for another iteration 40A018 #out of the loop Shorthand Execution Table: ��� � = 0 → �8 = 0 ��� �11 → �[0] LW R12=A[0] BNE {Don’t take branch} ADDI i+=1 J SLL � = 1 → �8 = 4 ADD �10 + �8 = � 0 + 4 ����� → � 1 = �11 LW R12=A[1] BNE Coding LB & SB (load byte & store byte) 400A00 LB R8, 0(R9) #R9 has 10000003 … … … 10000000 FE 78 9A BC #�8 ← � 10000003 ≡ �8 ← �� ! ≡ �8 ← �������� �� ! → 10111100 ! 1111 … .1111 1011 1100 � … . � 1011 1100 6F’s B C LB Coding: LB uses the I-format since it needs a displacement 10 0000 01001 01000 0000 0000 0000 0000 Opcode:10 0000 (LB) Rs: 01001 (R9) Rt: 01000 (R8) Disp: 0000 0000 0000 0000 (0) 40C004 SB R10, (−1)!" (R9) #R9 has 10000003 & R10 has ABCDEF R10= ABCDEF (EF is the byte to store) … …. ….. 10000000 FE 78 EF BC (EF is taking place of 9A) #M[10000003-1=10000002]← ��33 Shlomi Oved CS-UY 2214 Comp Arch Notes SB Coding: SB uses I format since it needs a displacement 10 1000 01001 01010 1111 1111 1111 1111 Opcode:10 1000 (SB) Rs: 01001 (R9) Rt: 01010 (R10) Disp: 1111 1111 1111 1111 (-1) SWAP(v,k,p) (temp=v[k], v[k]=p, p=temp) �[�] ↔ � SWAP: LW R8, 0(R9) #R9 points to “k” ADDI R10, R0, 4 #R10 gets 4 MUL R10, R8 #(Hi, Lo)← 4 ∗ � MFLO R11 #R11 gets 4*k (MFLO=Move from low) ADD R12, R12, R11 #R12 points at V[k], R12 originally pointed at V[0] LW R13 0(R12) #R13=temp=V[k] LW R14, 0(R15) #R15→ "�" SW R13, 0(R15) # “p”← � � = ���� SW R14, 0(R12) #v[k]← � 02/27/17- Lecture 7. MIPS Machine Language Instructions • 32-bit 2’s Complement Multiply o MULT Rs, Rt o ��, �� ← �� ∗ �� ((Hi, Lo) is 64 bits so there will be no overflow) o R format- to avoid using an extra opcode combination. We could also use the I format. ▪ Opcode: 000000 ▪ Function: 011000 o 1 Memory access to fetch the instruction • 32-bit Unsigned Multiply o MULTU Rs, Rt o (Hi, Lo)← �� ∗ �� (64 Bits) o R format ▪ Opcode: 000000 ▪ Function: 011001 o 1 memory access to fetch the instruction • 32-bit 2’s Complement Divide o DIV Rs, Rt o �� ← �������� �� !" !" o �� ← ��������� �� !" o R format !" ▪ Opcode: 00000034 Shlomi Oved CS-UY 2214 Comp Arch Notes ▪ Function: 01 1010 o 1 memory access to fetch the instruction • 32-bit Unsigned Divide o DIVU Rs, Rt o �� ← �������� �� !" !" o �� ← ��������� �� !" o R format !" ▪ Opcode: 000000 ▪ Function: 01 1011 o 1 memory access to fetch the instruction • − !"!→ �������� = −4; ��������� = −1 • !"!→ �������� = 4; ��������� = 1 • !"#"$%&$ !"#"$%& = !" !" • �������� = �������� ∗ ������� + ��������� • Quotient is negative if (Rs, Rt) have opposite signs. • Remainder has the sign of Rs. • Move from Hi o MFHI Rd o �� ← �� o R Format- Since we need Rd. ▪ Opcode: 000000 ▪ Function: 01 0000 ▪ 1 memory access to fetch the instruction • Move From Lo o MFLO Rd o �� ← �� o R Format- Since we need Rd. ▪ Opcode: 000000 ▪ Function: 01 0012 ▪ 1 memory access to fetch the instruction • Floating-Point Numbers o Numbers with points (decimals) o Very Large & Very small numbers in terms of magnitude o 2.57 ∗ 10!! = 257 ∗ 10!" o IEEE-754 FP Standard ▪ Single-precision (32-bit) ▪ Double-precision (64-bit) ▪ ±� ∗ 2! �: ����������� �: �������� (��������) ▪ 1.M or 0.M35 Shlomi Oved CS-UY 2214 Comp Arch Notes ▪ 1. �: ���������� ▪ 0. �:������������ o Ex: o 1. �: 1.011 ∗ 2!! ����� o 0. �: 0.1001 ∗ 10!!" (����� �� 0) • Single Precision Format
1 bit
8 bits
23 bits
Sign(0 → +, 1 → −)
BE (Biased Exponent)
Mantissa(fraction)(the decimal point is implied)
BE=e+127 to have a positive exponent for faster operations (compare,…) 1. ∅ → �� = ∅ & � = ∅ 2. ∞ → �� = 255 & � = ∅ 3. Normalized: 0<BE<255 4. DeNormalized: BE=∅ & � ≠ ∅ 5. Not a Number (Non): �� = 255 & � ≠ ∅;!! ;∞ − ∞; −5 Neg
![]()
Infinity
![]()
![]()
Normalized
![]()
DeNor Can’t be represented
![]()
![]()
![]()
![]()
![]()
![]()
DeNor
![]()
Normalized Positive Infinity
![]()
![]()
![]()
![]()
−2!"# −10!" −10!"#
![]()
m
![]()
−2!!"# −10!!"
![]()
−2!!"# −10!!"
![]()
![]()
∅ m
![]()
2!!"#
![]()
10!!" 2!!"# 10!!"
![]()
−2!!"# −10!!"
![]()
−10!!"# −10!!"#10!!"# 10!!"# −10!!"# (0 10000001 001010 … 0)!"""!!"# = (? )!" Sign bit is 0, so number is positive �� = 10000001 = 1 ∗ 2! + 1 ∗ 2! = 128 + 1 = 129 0 < �� < 255 Normalized �� = � + 127 → � = 129 − 127 = 2 � = .00101 = 1 ∗ 2!! + 1 ∗ 2!! = .125 + .03125 = .15625 2!!2!!2!!2!!2!! =+1.15625 ∗ 2! = (4.625)!" Ex: 10,5 !" ���, �������� = ? !"""!!"# For int 10: 10 2 = 5 & ∅ ��� 36 Shlomi Oved CS-UY 2214 Comp Arch Notes 5 2 = 2 & � 2 2 = 1 & � 1 2 = 0& � ��� 1010 For Fraction: 0.5 ∗ 2 = �. 0 .1 Therefore : 1010.1 = 1010.1 ∗ 2! = 1.0101 ∗ 2! � = 3 � = 3 → �� = 3 + 127 = 120 130 2 = 65 & � ��� 65 2 = 32 & 1 32 2 = 16 & 0 16 2 = 8 & 0 8 2 = 4 & 0 4 2 = 2 & 0 ! ! = 1 & 0 1 2 = 0 & � ��� BE=10000010| Sign=0 Since it is positive Therefore: (0 10000010 01010000000000000000000)!"""!!"# 3/1/17 Lecture 7. MIPS Machine Language Instructions • Floating-Point Numbers o Floating-Point Instructions ▪ Computer Organization Layer Today’s microprocessor Integer
![]()
![]()
![]()
CPU (GPRs) System Coprocessor (System Registers)
![]()
![]()
![]()
![]()
![]()
![]()
37 FP Coprocessor
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
(FP Regs) CP CP CP0 CP1
![]()
![]()
![]()
CP2
![]()
![]()
![]()
CP3Shlomi Oved CS-UY 2214 Comp Arch Notes o Coprocessor is a processor with a special function o CP2 & CP3 are reserved for future use but never used • Single-Precision Load o LWC1 FRt, Disp(Rs) o ��� ← � �� + ����! ��������� ������� o I format since we need displacement ▪ Opcode: 110001 o 2 memory accesses ▪ 1 to fetch instruction & 1 to read data • Ex: • 4050A4 LWC1 F2, 0(R8) • Single-Precision Store o SWC1 FRt, Disp(Rs) o � �� + ����! ��������� ������� ← ��� o I format since we need displacement, ▪ Opcode: 11 1001 o 2 memory accesses ▪ 1 to fetch instruction & 1 to write data • Ex: • 405004 SWC1 F25, 0(R12) • Single-Precision ADD o ADD.S FRd, FRs, FRt o ��� ← ��� + ��� & generate an interrupt if there is an overflow AND interrupt is enabled. o Modified R Format o 1 memory access to fetch instruction • Ex: • 400E00 ADD.S F0, F15, F12 • Double-Precision ADD o ADD.D FRd, FRs, FRt o ���, ��� + 1 ← ���, ��� + 1 + (���, ��� + 1) & generate an internal interrupt if there is an overflow AND interrupt is enabled. o Modified R Format o 1 memory access to fetch instruction • Ex: • 405E04 ADD.D F6, F18, F10 • �6, �7 ← �18 + �19 + (�10 + �11) • Single Precision: SUB.S, MUL.S, DIV.S • Double Precision: SUB.D, MUL.D, DIV.D38 Shlomi Oved CS-UY 2214 Comp Arch Notes • Single-Precision Compare o C._ _.S o (In the space in between could be: EQ, NE, LT, GT, LE, GE) o C.LT.S FRt, FRs o If FRt<FRs then o ���� ← 1 ���� o ���� ← ∅ o Modified R format o 1 memory access to fetch instruction • Double-Precision Compare o C._ _.D FRt, FRs • Branch if Cond is True(1) o BC1T Offset o If Cond=1 then ▪ �� ← �� + ( ������! ∗ �)(��������� �������) o Modified I format o 1 memory access to fetch instruction • Branch if Cond is False(1) o BC1F Offset • Computer Organization (Microarchitecture) Layer • It supports Computer Architecture layer by means of CPU, memory, I/O controllers… • The CPU, Memory and I/O Controllers are digital systems. • Digital Systems: Today’s microprocessor Integer
![]()
![]()
![]()
CPU (GPRs) System Coprocessor (System Registers)
![]()
![]()
![]()
![]()
![]()
FP Coprocessor
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
(FP Regs) CP CP
![]()
CP0 CP1CP2 CP3• Digital Systems
![]()
![]()
![]()
![]()
![]()
![]()
• A digital system performs micro operations. • Micro operations are simple operations such as add, sub, compare, shift, load memory, store to memory, etc…
![]()
39 Shlomi Oved CS-UY 2214 Comp Arch Notes • Data Unit (Datapath) performs micro operations • Control Unit controls Data unit≡It determines sequence of micro operations in Data Unit o Sequencer • Registers: Store (keep) data o Flip-flops implement registers ▪ A flip-flop stores (keeps) 1 bit • ALU (Arithmetic/Logic Unit) perform arithmetic/logic operations o Gates implement ALUs. o Buses interconnect registers & ALUs ▪ Wires implement buses. • Designing Digital Systems
![]()
o Describe a digital system in detail 1. Draw a circuit diagram a. Works for simple digital systems 2. Write a program in a hardware description language known as (HDL) (VHDL, Verilog HDL) or software languages (C, C++) 3/3/17- Recitation Handout 4, Question 5:40 Shlomi Oved CS-UY 2214 Comp Arch Notes
Instruction
PC
R2
R4
R5
R8
R9
R10
R11
R31
Memory Access
M[10002000]
M[10002004]
Initial
400600
?
2
1000200
?
?
?
?
400200
--
A36
1B9
ADD R8,R0,R4
400604
NS
NS
NS
2
NS
NS
NS
NS
1:IF
NS
NS
ADD R9, R0, R0
400608
NS
NS
NS
NS
0
NS
NS
NS
1:IF
NS
NS
ADD R10, R0, R5
40060C
NS
NS
NS
NS
NS
1000200
NS
NS
1:IF
NS
NS
LW R11, 0(R10)
400610
NS
NS
NS
NS
NS
NS
A36
NS
2:IF, DR
NS
NS
ADD R9, R9, R11
400614
NS
NS
NS
NS
A36
NS
NS
NS
1:IF
NS
NS
ADDI R10, R10, 4
400618
NS
NS
NS
NS
NS
1000204
NS
NS
1:IF
NS
NS
ADDI R8, R8, (−1)!"
40061C
NS
NS
NS
1
NS
NS
NS
NS
1:IF
NS
NS
BNE R8, R0, (−5)!"
400620
NS
NS
NS
1
NS
NS
NS
NS
1:IF
NS
NS
LW R11, 0(R10)
400610
NS
NS
NS
NS
NS
NS
1B9
NS
2:IF, DR
NS
NS
ADD R9, R9, R11
400614
NS
NS
NS
NS
BEF
NS
NS
NS
1:IF
NS
NS
ADDI R10, R10, 4
400618
NS
NS
NS
NS
NS
1000208
NS
NS
1:IF
NS
NS
ADDI R8, R8, (−1)!"
40061C
NS
NS
NS
0
NS
NS
NS
NS
1:IF
NS
NS
BNE R8, R0, (−5)!"
400620
NS
NS
NS
0
NS
NS
NS
NS
1:IF
NS
NS
ADD R2, R9, R0
400624
BEF
NS
NS
NS
NS
NS
NS
NS
1:IF
NS
NS
JR 31
400200
NS
NS
NS
NS
NS
NS
NS
NS
1:IF
NS
NS
• Relevant Questions and Answers (HW 2) • Q5. Moving location to store to: • 10000350 • + 250 • ________________ • 100005A041 Shlomi Oved CS-UY 2214 Comp Arch Notes • Q11.
![]()
![]()
• • Compiling a Case/Switch Statement • Switch(k){ o Case 0: a=b+c; back; o Case 1:a=b*c; back; o Case 2: � = !! ; back; o Case 3: a=-a; back; o }
Memory Addresses
Data Served
1000F000
400C20 #Case 0
1000F004
400C28 #Case 1
1000F008
400C34 #Case 2
1000F00C
400C40 #Case 3
R9=k, R11=1000F000, R13=a, R14=b, R15=c • 400C00 SLT R8, R10, R0 # k<0? • 400C04 BNE, R8, R0, (15) # If k<0, don’t execute switch • 400C08 SLTI R8, R9, (4) # k<4? • 400C0C BEQ, R8, R0, (13) #If R>3 don’t execute switch • 400C10 SLL R10, R9, 2 #R10=k*4 • 400C14 ADD R11, R11, R10 #R11 Points to beginning of array • 400C18 LW R12 0(R11) #read in address for case instruction • 400C1C JR R12 #jump to case instruction • 400C20 ADD R13, R14, R15 #case 0 a=b+c42 Shlomi Oved CS-UY 2214 Comp Arch Notes • 400C24 BEQ R0, R0, 7 #exits after case 0 • 400C28 MULT R14, R15 #case 1 a=b*c (Stored in high low registers) • 400C2C MFLO R13 #assume answer only in lower 32 bits • 400C30 BEQ R0, R0 4 #exits after case 1 • 400C34 DIV R14, R15 #case 2: a=b/c • 400C38 MFLO R13 #move from low to R13 • 400C3C BEQ R0, R0, 1 #exits after case 2 • 400C40 SUB R13 0 R13 #case 4: a=-a • 400C44 #all backs come to this exit 3/6/17 Lecture Digital Systems • A digital system performs micro operations.
![]()
• Data unit performs micro operations • Control unit controls Data Unit • Designing a digital system≡Describing digital system • 1. Write a program • 2. Draw a circuit diagram • Components of a digital system o Registers: Store information ▪ FFs implement registers o ALUs: Perform arithmetic logic operations( add, sub, AND, OR, shift, compare…) ▪ Gates implement them o Buses: Interconnect Registers and ALUs ▪ Wires implement buses. o Sequencer: determines sequence of micro operations in the data unit. ▪ Gates & Flip Flops implement it. _________________________________________________________________________________________________ • Typical Data Unit View43 Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
BBUS
![]()
ABUS
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
ALUcontrol
![]()
![]()
OBUS
![]()
![]()
t
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
p p p p … p p p p s s s s sss • Asynchronous digital systems o No clock signal
![]()
![]()
t
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
CP1 CPU2 CPU3 CPU4
![]()
![]()
![]()
![]()
![]()
P P P P S S S S…clock
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
• Clock is a periodic signal that synchronizes components o Synchronous digital system!!!!!! • Clock speed (rate) is in terms if Hz (Hertz)≡clock frequency o 1 Hz means 1 clock period per second≡clock period duration is 1 second. o ����� ��������� = ! !"#!$ !"#$%&�� o ����� ��������� = 1 ��� = 10!�� 44 Shlomi Oved CS-UY 2214 Comp Arch Notes ▪ 10! ����� �������/��� ▪ ����� ������ = ! !"#!$ !"#$%#&'( = !!"! = 10!! = 1 ���������� ▪ Clock period duration is determined by observing the longest operations (add, sub, memory accesses) ▪ Then, we add waiting time to account for temperature and humidity increases up to prescribed value in the data sheet!!! ▪ Whether we write a program or draw a circuit diagram, we start a state diagram that shows which operation happen when in a short way • States≡ ������� indicate micro operations happening at the moment o A state takes 1 clock period o Each state has a unique number to identify o Only one state happens in a clock period • Arrows indicate sequence of states to be traced o They indicate next state. o RTL Notation (Register Transfer Level/Language) o ����� 24: � ← �� + ���[��] o State 25: ��� ← �[�]; �� ← � (����� �ℎ�� �� �������� MDR[0]=1 MDR[0]=∅ State 26: � ← ��� �� ����� 27: ��� �� ← ��� �� + ��� �� ▪ Registers take on their values the clock period after they are stored ▪ (MDR[0] is checked with the old value by the sequencer) 3/10/17- Recitation y(a,b,c)=�� + �� Full Adder circuits Cout
![]()
![]()
A
![]()
![]()
![]()
B
![]()
![]()
![]()
C
![]()
Adder Truth Table
![]()
![]()
sum45 Shlomi Oved CS-UY 2214 Comp Arch Notes
a
b
c
Cout
Sum
0
0
0
0
0
0
0
1
0
1
0
1
0
0
1
0
1
1
1
0
1
0
0
0
1
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
Sum(a,b,c)=��� + ��� + ��� + ��� Cout(a,b,c)= ��� + ���+��� + ��� → �������� ���ℎ �������= ab+bc+ac Ripple-Carry Adder • �!", �!", … . , �! • + �!", �!", … �!
![]()
![]()
��� ����
![]()
![]()
![]()
�!"
![]()
![]()
�!"
![]()
![]()
�!
![]()
![]()
�!
![]()
![]()
�!
![]()
�! • ____________________ • �!", �!",…., �! CPU
![]()
![]()
GPRs
![]()
Select
![]()
32
![]()
![]()
![]()
Multiplier xor
![]()
![]()
32
![]()
![]()
![]()
![]()
![]()
32 FA FA FA …
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
32-bit
![]()
![]()
![]()
![]()
![]()
Adder
![]()
32
![]()
46 Shlomi Oved CS-UY 2214 Comp Arch Notes Decoders • There are 3 types: Binary, BCD to decimal, and Binary to 7 segment • 1. Binary Decoders o Memory chips have them o Size: � − �� − 2! ��� (decoder) ▪ n data inputs representing an n-bit unsigned number ▪ 2! data outputs o No more than one output can be one at a time ▪ If the n-bit number is k, output k is 1 2 to 4 DCD
![]()
Ex:47 Shlomi Oved CS-UY 2214 Comp Arch Notes Execution Table
![]()
1. Flip Flops (Handout 3-Page 19)
![]()
- Flip Flop has 2 outputs: Q and � - The CE (clock enable) input enables/disables the clock input, e. • If CE=0, the clock input cannot be used -The clock input (c) indicates when to store on FF. Registers - sequential circuit used to store data temporarily - Note: a register which is stored a value in a particular clock period (cp) actually gets the value in the beginning of the following cp.48 Shlomi Oved CS-UY 2214 Comp Arch Notes 3/20/17- Lecture 2.
![]()
High-Level State Diagram
![]()
� ← �� + ���[��]
![]()
25
![]()
![]()
![]()
��� ← �[�];
![]()
![]()
�� ← �
![]()
MDR[0]=1 MDR[0]=0 26 27
![]()
![]()
![]()
![]()
![]()
24 � ← ���[��]
![]()
![]()
![]()
![]()
���[��] ← ���[��] + ���[��] Digital System Design 1. Determine interaction between communicating digital systems • For example: CPU and Memory 2. Get High-level State Diagram • It describes micro-operations 3. Get Data Unit 4. Get Low-Level State Diagram • It describes which control signal is on when 5. Get Control Unit 1. Interaction: Digital System↔Memory 32 MABUS
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
MemRead m ety syS latigiDMemWrite
![]()
32 MRBUS 32 MWBUS
![]()
![]()
![]()
romeM• MABUS(Memory Address Bus), MRBUS (Memory Read Bus), MWBUS( Memory Write Bus) 49 Shlomi Oved CS-UY 2214 Comp Arch Notes • Memory accesses take one full clock period like ADD & SUB Transferring data between digital systems/components
![]()
a. Point-to-Point Connections (Allows for parallelism but too expensive) b. Busing: A bus is a set shared of wires where only one source is connected to the bus!!
![]()
50 Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
Rd5
![]()
Store GPR3. Data Unit MRBUSStore
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
MDR
![]()
![]()
![]()
MDR
![]()
![]()
![]()
![]()
24 ���� = ��; ���� = ��� �� ; ���� = ���� + ����; � ← ���� 25
![]()
![]()
��� ← �����; ���� = �; ���� = ����; �� ← ����; ���� = � MDR[0]=1 MDR[0]=0
![]()
![]()
ABUS=GPR[Rs];
![]()
![]()
![]()
OBUS=ABUS; � ← ���� 51 Shlomi Oved CS-UY 2214 Comp Arch Notes 4. Low-Level State Diagram 24 ���!!"# = ∅1; ���!"#$ = ∅∅; ������� = 0010; ���!"#$ = ∅1; �����! = 1 25 MemRead=1; ���!!"# = ∅∅; �����!"# = 1; ���!"#$ = ∅1; ���!"#$ = ∅∅; �����!" = 1 MDR[0]=1 MDR[0]=0
![]()
![]()
26 27 ���!"#$ = 10; ���!"#$ = 10; ���!!"# = 01; ���!"#$ = 00; ���!"#$ = 0010; ���!"#$ = 01; �����! = 1; �����!"# = 1; 3/22/17 Lecture Page 5 of Handout 7 1. Develop Architecture: MIPS LW, SW, ADD, SUB, SLT, AND, OR, BEQ, J 2. CPU-Memory interaction in terms of buses & control signals 3. Design CPU- Lower Level State Diagram, Upper Level State Diagram, Control Unit and etc. For 9 instructions→ ��� a. For 9 instructions i. �� ��,���� �� → �� ← � �� + ����! (��������� �������) ii. I Format iii. Implement architecture operations by means of micro operations Some micro operations are common to all instructions. • IF Cycle (Instruction Fetch) o Fetch Instruction o Update PC • ID Cycle (Instruction Decode) o Read Rs & Rt to A & B • EX Cycle (Execute Cycle) o Start implementing architecture operations o LW/ :Calculate the effective address o A/L(Arithmetic Logic): o BEQ: o J: • MEM Cycle (Memory Cycle) o LW/ :Access Memory o A/L(Arithmetic Logic): • WB Cycle (Write Back Cycle) o LW: Write back data read from memory52 Shlomi Oved CS-UY 2214 Comp Arch Notes ���!" = 5 → 0,1,2,3,4 ���!"" = ���!"# = ���!"# = ���!" = ���!"# = 4 → 0, 1, 6,7 ���!"# = 3 = 0, 1, 8
![]()
���!""#$ = 6 = 0,1,2,3,16,17 53 Shlomi Oved CS-UY 2214 Comp Arch Notes IR: Instruction Register ALUout MDR: Memory Data Register ��� ≡ ����� ������� ��� ����������� Unpipelined: CPU runs one instruction at a time Pipelined: CPU runs multiple instruction at a time. iv. Modify high-level state diagram for each remaining instruction one by one. i. SW Rt, Disp(Rs) • � �� + ����! ← �� ii. I Format i. ADD Rd, Rs, Rt • �� ← �� + ��& … (�� �������� �������� ���������) ii. R Format i. Sub Rd, Rs, Rt • �� ← �� − �� ii. R Format i. BEQ Rs, Rt, Offset • If Rs=Rt then �� ← �� + (������!) ∗ 4 ii. I Format i. J Address • �� 27: 0 ← ������� ∗ 4 ii. J format i. ADDRM Rt, Disp(Rs) • �� ← �� + � �� + ����! ii. I Format IR Formats
R-Format
6
5
5
5
5
6
Opcode
Rs
Rt
Rt
Shift Amount
Function
J-Format
6
26
Opcode
Address
I-Format
6
5
5
16
Opcode
Rs
Rt
DOImm
54 Shlomi Oved CS-UY 2214 Comp Arch Notes 03/24/17- Recitation Handout #6- Page 7 Digital System Design Basics
![]()
� ← �� + ���[��] (24)
![]()
��� ← �
![]()
[�],�� ← � (25)
![]()
���
![]()
[�] = �
![]()
![]()
���[�] = �
![]()
� ← ���[��] (26)
![]()
���[��] ← ���[��] + ���[��] (27)
![]()
55 Shlomi Oved CS-UY 2214 Comp Arch Notes State Diagram
![]()
i. BEQ Offset: IF ACCUM=0 then �� ← �� + (0,0,������)
![]()
ii. ADDRM Disp: ����� ← ����� + � 0,0,���� iii. ADDM Disp: � 0,0,���� ← ����� + � 0,0,���� ����� ← ����� + �[0,0,����] �����: �����������: ▪ Architectural Register ▪ Accumulates Results ▪ Single GPR56 Shlomi Oved CS-UY 2214 Comp Arch Notes
CP
State
PC
MAR
IR
MDR
ACCUM
M[200]
M[3204]
Initial
----
200
?
?
?
1A
3000
7
1
0
NS
NS
NS
NS
NS
NS
NS
2
1
NS
200
NS
NS
NS
NS
NS
3
2
204
NS
3000
NS
NS
NS
NS
4
3
NS
3204
NS
NS
NS
NS
NS
5
4
NS
NS
NS
7
NS
NS
NS
6
0
NS
NS
NS
NS
21
NS
NS
HW 3 Q&A 2. Syntax: ADDMR Rt, Rs, Offset Architectural Operation: �� ← �� + � �� + ����� ! ≪ 2)] Format: I-Format i. Show Modified High-Level State Diagram ii. Show Modified Portion of the Data Unit Every Clock Period: � ← �� � ← �� i. �� ← �[��]
![]()
![]()
![]()
![]()
![]()
![]()
![]()
�� ← �� + 4 ������ ← �� + ([�����!] ≪ 2) ������ ← � + ([�����!])Picture below shows the things labeled as the same 57 Shlomi Oved CS-UY 2214 Comp Arch Notes ii.
![]()
58 Shlomi Oved CS-UY 2214 Comp Arch Notes HW 3 Q&A 6. Syntax: COMPMR Rd,(Rs), Rt Architectural Operation: If M[Rs]<Rt then �� ← 1 ���� �� ← 0 Format: R-Format i. Update High-Level & Datapath. What is ���!"#$#%?
![]()
3/27/17- Lecture
![]()
������ ← � �� �
![]()
���[��] ← ��� 59 Shlomi Oved CS-UY 2214 Comp Arch Notes EMY CPU Datapath EMY CPU Low-Level State Diagram
![]()
(If value not indicated, they aren’t needed and their values are zero) State 0: IRWrite=1; IorD=0; ALUsrcA=0; ALUsrcB=01; ALUop=00; PCWrite=1; PCSrc=00; MemRad=1 State 1: ALUop=00; ALUSrcA=0; ALUScrB=11; State 2: ALUop=00; ALUSrcA=1; ALUScrB=10 State 3: MemRead=1; IorD=1; State 4: MemtoReg=1; RegWrite=1; RegDst=0; State 5: MemWrite=1; IorD=1; State 6: ALUSrcA=1; ALUScrB=00; ALUop=10; State 7: MemtoReg=0; RegWrite=1; RegDst=1; State 8: ALUSrcA = 1; ALUSrcB = 00; ALUop = 01; PCSource = 01; PCWriteCond = 1; State 9: PCWrite=1; PCSrc=10; State 10: Invalide Opcode State 11: Overflow60 Shlomi Oved CS-UY 2214 Comp Arch Notes Modify Diagrams for ADDRM New Instructions to High Level State Diagram: Changes to Datapath
![]()
Change to Low-Level State Diagram State 0: IRWrite=1; IorD=0; ALUsrcA=00; ALUsrcB=01; ALUop=00; PCWrite=1; PCSrc=00; MemRad=1 State 1: ALUop=00; ALUSrcA=00; ALUScrB=11; State 2: ALUop=00; ALUSrcA=01; ALUScrB=10 State 3: MemRead=1; IorD=1; State 4: MemtoReg=1; RegWrite=1; RegDst=0; State 5: MemWrite=1; IorD=1; State 6: ALUSrcA=01; ALUScrB=00; ALUop=10;61 Shlomi Oved CS-UY 2214 Comp Arch Notes State 7: MemtoReg=0; RegWrite=1; RegDst=1; State 8: ALUSrcA = 01; ALUSrcB = 00; ALUop = 01; PCSource = 01; PCWriteCond = 1; State 9: PCWrite=1; PCSrc=10; State 10: Invalide Opcode State 11: Overflow State 16: ALUSrcA=10; ALUSrcB=00; ALUop=00; State 17: RegDst=0; MemtoReg=0; RegWrite=1;
![]()
3/29/17- Lecture Control Unit Design • Control Unit controls Data Unit • Low-Level State Diagram describes Control Unit • Controlling Data Unit o Determine micro operations in Data Unit ▪ Control signals indicate them o Which state is next? ▪ Next state signals indicate it62 Shlomi Oved CS-UY 2214 Comp Arch Notes EMY CPU Low Level State Diagram
![]()
• Control Unit generate control & next state signals based on current state & status signals from Data Unit. 21
![]()
![]()
![]()
?
![]()
![]()
13
![]()
![]()
4
![]()
![]()
4 63 Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
![]()
![]()
![]()
![]()
![]()
• Two ways to implement the cloud 1. Hardwiring: Gates & FFs generate control & next state signals 2. Microprogramming: A memory module is stored control & next state values • Memory module is used as a look-up table • Hardwired EMY Control Unit NS3-NS0• Regwrite is 1 when it is state 4 or state 7 • �������� = �4 + �7 �4 �� �7 S4
![]()
![]()
S7
![]()
![]()
RegWrite • PCWrite is 1 when it is state 0 or state 9 • ������� = �0 + �9(�0 �� �9) S0
![]()
![]()
![]()
S9
![]()
![]()
![]()
![]()
PCWrite 64 Shlomi Oved CS-UY 2214 Comp Arch Notes • ALUSrcA is 1 when it is state 2 or state 6 or state 8 • ������� = �2 + �6 + �8 (�2 �� �6 �� �8)
![]()
![]()
![]()
![]()
S2
![]()
![]()
S6
![]()
S8
![]()
![]()
ALUSrcA • ALUSrcB0 is 1 when it is State 0 or State 1 • ALUSrcB0=S0+S1 (S0 OR S1)
![]()
![]()
![]()
S0
![]()
![]()
![]()
S1• ALUop0 is 1 when it is state 8 • �����0 = �8 • �8 → �����0
![]()
ALUSrcB0 • ALUCtrl0 is 1 when ALUop=10 and it is either OR(25 ) or SLT( 2A) • �������0 = �����1 ∗ �����0 �������37 + �������42
![]()
• NS3 is 1 when it is state 1 & it is either BEQ(4) or J(2)
![]()
• NS3=S1(OPCDC4+OPCDCD2) (Opcode Decoder 2 OR Opcode Decoder 4) 65 Shlomi Oved CS-UY 2214 Comp Arch Notes Modifying EMY Control Unit (Add instructions for ADDRM) State Register has 5 bits, 5-to-32 state decoder, 5 NS lines: NS4-NS0 • Renamed Signals o ALUSrcA is renamed ALUSrcA0 (right most bit) • Modified Signals o RegWrite is 1 when it is state 4 of state 7 or state 17 o RegWrite=S4 + S7 + S17 (S4 OR S7 OR S17)
![]()
![]()
![]()
![]()
S4
![]()
![]()
S7
![]()
![]()
S17• NS0 is 1 when … (Fill in later) • New Signals
![]()
RegWrite o ALUSrcA1 is 1 when it is state 16.
![]()
o ALUSrcA1=S16 o �16 → �������1 o NS4 is 1 when it is (state 3 and ADDRM(17)) or state 16 o NS4=(S3 OPCDCD23)+ S16[(S3 OPCDCD23) OR S16] 3/31/17- Recitation
![]()
66 Shlomi Oved CS-UY 2214 Comp Arch Notes
Clock Period
State
REGA
REGB
REGC
REGD
Initial
--
?
20
?
?
1
0
NS
NS
NS
NS
2
1
20
NS
NS
NS
3
2
NS
NS
24
NS
4
3
NS
NS
NS
44
5
0
NS
2
NS
NS
6
1
2
NS
NS
NS
(44)!" → 0000 0000 0000 0000 0000 0000 0100 0100 → 0000 0000 0000 0000 0000 0000 0000 0010 2.
![]()
67 Shlomi Oved CS-UY 2214 Comp Arch Notes
Clock Period
State
Reset
REGA
REGB
REGC
MDR
M[10000000]
Initial
--
--
10000000
?
2
?
E
1
0
0
NS
NS
NS
NS
NS
2
1
NS
NS
NS
E
NS
3
2
NS
E
1
NS
NS
4
3
NS
F
NS
NS
NS
5
4
NS
NS
NS
F
NS
6
0
0
10000004
NS
NS
NS
F
Purpose: From base memory location of REGA, we’re incrementing memory location contents by 1, for REGC number of locations, when complete, do nothing in state 5, until reset signal is 1 then get new inputs for REGA, REGC, and restart. 3. ADD JR to EMY CPU JR: Syntax: JR Rs Architectural Operation: �� ← �� High Level:
![]()
1 0
![]()
![]()
![]()
same
![]()
![]()
![]()
same Data Path:
![]()
![]()
JR All others
![]()
![]()
![]()
![]()
States
![]()
2-9 Same 0 M
![]()
![]()
![]()
1 U
![]()
![]()
2 X
![]()
3 6
![]()
![]()
PC← �
![]()
To PC
![]()
From A
![]()
![]()
PCSource
![]()
![]()
4 68 Shlomi Oved CS-UY 2214 Comp Arch Notes 4. Add JAL to EMY CPU JAL: Syntax: JAL Address �31 ← �� Architectural OP: �� ← �� 31 − 28 , ������� ≪ 2 High Level: 0
![]()
![]()
![]()
same
![]()
1
![]()
![]()
![]()
All others
![]()
States
![]()
2-8 Same
![]()
9
![]()
![]()
same
![]()
![]()
J
![]()
JAL
![]()
16
![]()
![]()
![]()
![]()
R31← ��
![]()
![]()
same
![]()
Data Path:
![]()
RegDst
![]()
2
![]()
![]()
IR=Instruction Register
![]()
[20-16]
![]()
[15-11]
![]()
0 M 1 U 2 X 2
![]()
![]()
![]()
![]()
![]()
![]()
(11111) Write Reg
![]()
![]()
GPR Register
![]()
ALUout
![]()
![]()
MDR
![]()
![]()
PC
![]()
Q& 69 Write Data
![]()
![]()
![]()
![]()
0 M 1 U 2 X 3
![]()
![]()
![]()
2 File
![]()
Mem-To-Reg Shlomi Oved CS-UY 2214 Comp Arch Notes Q&A Q8. High Level:
![]()
State 2: ������ ← � + �����! Instruction: Format: I-format Syntax: ADDRIM (Rt)++, Rs, Imm Architectural Operation: M[Rt]← �� + ���!; �� ← �� + 4 Datapath: MABUS: Memory Access BUS 0 M
![]()
![]()
![]()
1 U
![]()
![]()
2 X
![]()
1 B
![]()
![]()
MABUS MRBUS: Memory Read BUS MWBUS: Memory Write BUS
![]()
![]()
2
![]()
IorD
![]()
![]()
![]()
B
![]()
ALUout 0 M
![]()
![]()
![]()
1 U 2 X
![]()
7
![]()
![]()
Sel MWBUS
![]()
![]()
MWBUS NEW
![]()
MUX70 Shlomi Oved CS-UY 2214 Comp Arch Notes 4/3/17- Lecture
![]()
?
![]()
![]()
![]()
13
![]()
![]()
![]()
21
![]()
4
![]()
![]()
4 • Hardware implement- Cloud? 1. Hardwiring: Gates & FF generate control & Next State signals 71 Shlomi Oved CS-UY 2214 Comp Arch Notes 2. Microprogramming: A memory module in Control Unit is stored control & Nest state values & used as a look up table. • Each location of memory corresponds to a state in low-level state diagram
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Status Memory ROM
![]()
![]()
![]()
Current State State
![]()
Gates Register
![]()
![]()
Next State • Each location generates control signals for Data Unit-> Its content is called Control word≡Micro instruction • Whole content is control program≡Microprogram • Memory is called control memory≡Micromemory
![]()
Group I & III Micromemory 16x21 bit
![]()
![]()
![]()
![]()
![]()
17
![]()
21
![]()
4
![]()
![]()
![]()
![]()
![]()
21
![]()
![]()
![]()
4
![]()
![]()
4
![]()
![]()
![]()
![]()
4
![]()
![]()
Current state
![]()
![]()
![]()
ADD
![]()
State Register 4Next
![]()
![]()
![]()
![]()
![]()
![]()
![]()
state
![]()
4
![]()
![]()
4-bit 4-to-1 MUX
![]()
![]()
3
![]()
2
![]()
1
![]()
![]()
0 Add Ctrl
![]()
![]()
![]()
2
![]()
Sel
![]()
![]()
D.R.I. For State 1
![]()
![]()
4
![]()
![]()
![]()
![]()
![]()
4 D.R.II D.R.I
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
4
![]()
![]()
![]()
4
![]()
0 6
![]()
![]()
72
![]()
Opcode Shlomi Oved CS-UY 2214 Comp Arch Notes
Loc
Content
0
6
R Format
2
9
J
4
8
BEQ
17
2
ADDRM
23
2
LW
2B
2
SW
D.R.II. For State 2
Loc
Content
17
3
ADDRM
23
3
LW
2B
5
SW
Microcode ≡Microinstructions in terms of 1s & 0s
PCW
PCWCond
IorD
MemRea d
MemWrt
IRW
MemtoReg
PCSrc
ALUop
ALUSrcB
ALUSrcA
RW
RedD st
AddC trl
1
0
0
1
0
1
0
00
00
01
0
0
0
11
0
0
0
0
0
0
0
00
00
11
0
0
0
01
0
0
0
0
0
0
0
00
00
10
1
0
0
10
0
0
1
1
0
0
0
00
00
00
0
0
0
11
0
0
0
0
0
0
1
00
00
00
0
1
0
00
0
0
1
0
1
0
0
00
00
00
0
0
0
00
0
0
0
0
0
0
0
00
10
00
1
0
0
11
0
0
0
0
0
0
0
00
00
01
10
0
0
011
0
0
0
0
0
0
0
00
00
00
00
1
0
000
Loc 0 1 2 3 4 5 6 7 8 9 16 0 • Adding ADDRM to Architectural operation. • Modify Control Unit 1. Since we have states 16 & 17, State Register is 5 bits wide 2. Micromemory receives 5 address bits. It has 32 locations 3. Since state register has 5 bits, MUX is a 5-bit MUX 4. Since StateRegister has 5 bits, ADDer is a 5-bit ADDer 5. Since state 3 has 2 branches, a new Dispatch ROM is needed: D.R.III 6. D.R.III.
Loc
Content (Decimal)
17
16
ADDRM
23
4
LW
7. MUX is attached D.R.III like other D Roms. D.R.III is connected input 4 of MUX. MUX is an 8-to-1 MUX 73 Shlomi Oved CS-UY 2214 Comp Arch Notes (5 bits per arrow, arrows numbered from 0 to 4) 8. Since MUX is an 8-to-1 MUX, it needs 3 select signals: 3 address ctr bits
![]()
9. D.R.I & D.R.II must have 5 5-bit
![]()
![]()
![]()
8-to-1 MUX
![]()
![]()
bits/location 10. D.R.I & D.R.II are modified to include ADDRM 11. ALUSrcA has 2 bits to run ADDRM
![]()
D.R.III ADDerD.R.II
![]()
![]()
![]()
D.R.I
![]()
012. Since we added 2 bits to each location, we have 23 bits per micromemory location: 32x23-bit 4/5/17- Lecture Computer Performance Performance= ! !"#$%&'() !"#$ Execution Time- CPU Time + Non-overlapped I/O Time ��������� ���� ≈ ��� ���� ��� ���� = ����������� + ������������� ��������� ���� ≈ ����������� ≅ ������� ������� = ������ �� ����� ������� ��� ������� ∗ (����� ������) ������� = ������ �� ����� ������� ��� ������� ����� ��������� Number of clock periods for program= (���! !!! !!! ∗ �!) (�! = ������ �� ������������ �� ���� � ���) NI=number of instructions running for program= (�! !!! ���!"#$!%# = !!! ) �� ������� �ℎ� ������ �� ����� ������� ����� ��� ���ℎ ����������� ��� �ℎ� ������� =!"#$%! !" !"#!$ !"#$%&' !"# !"#$"%& !" ������� = �� ∗ ���!"#$!%# ∗ ����� ������ We assume an ideal memory ����!"# ������� ������������ ��� ������ = ������� ������ �� ������������ ��� ��� �ℎ� ������� ��� ������ �� �������� = �� ������� ∗ 10! = ����� ���!"#$!%# ∗ 10! ������!"# = ������ �� �������� ����� ���������� ��� �������� �� ���� ��� ������� ∗ 10! (Giga, Trillion, Peta) 74 Shlomi Oved CS-UY 2214 Comp Arch Notes
An App Run
Instruction
���!
�!
ADD
4
10M
MULT
6
1.5M
LOAD
5
2.5M
STORE
4
0.35M
BRANCH
3
0.15M
FPADD
8
5M
FPDIV
20
0.5M
Cfreq=1GHZ=10! Hz Cperiod= ! !"#$% = !!"! = 10!! = 1 ���������� ! �� = �! !!! = 10� + 1.5� + 2.5� + 0.35� + 0.15� + 5� + 0.5� = 20� = 20 ∗ 10! !!!! = 4 ∗ 10� + 6 ∗ 1.5� + Number of cperiods for program= ���! ∗ �! 5 ∗ 2.5� + 4 ∗ 0.35� + 3 ∗ 0.15� + 8 ∗ 5� + 20 ∗ 0.5� = 113.35� = 113.35 ∗ 10! �� = 113.35 ∗ 10! ���!"#$!%# = ������ �� ����� ������� ��� ������� 20 ∗ 10! = 5.67 CPUtime=�� ∗ ���!"#$!%# ∗ ������� = 20 ∗ 10! ∗ 5.67 ∗ 10!! = 113.35 ∗ 10!! ������� ∗ 10! = 20 ∗ 10! ����!"#$!%# = �� 113.35 ∗ 10!! ∗ 10! = 176.44 ������ �� �� ���������� ��� ������� = �!"#$$ + �!"#$% = 5� + 0.5� = 5.5� = 5.5 ∗ 10! FP time for program= ������ �� �� ����� ������� ��� ������� ∗ ����� ������ = ���!"#$$ ∗ �!"#$$ + ���!"#$% ∗ �!"#$% ∗ ������� = 8 ∗ 5� + 20 ∗ 0.5� ∗ 10!! = 50 ∗ 10!! ������ ��� ������� ∗ 10! = 5.5 ∗ 10! ������!"#$!%# = ������ �� �� ��� ��� ������� Clock Doubling 50 ∗ 10!! ∗ 10! = 110 Instructions Run= ADD+ADD+ADD+ 5(LW+ADD+ADDI+ADDI+BNE)+ADD+JR Cfreq=1 GHz=10!�� ���!"" = 4 → 0,1,6,7 ���!" = 5 → 0,1,2,3,475 Shlomi Oved CS-UY 2214 Comp Arch Notes ���!""# = 4 → 0,1,16,17 ���!"# = 3 → 0,1,16 ���!" = 3 → 0,1,16 ������� = 1 ����� = 110! = 10!!� = 1 ���������� ������� = ������ �� �������� ��� ������� ∗ ������� = ���!"" ∗ �!"" + ���!" ∗ �!" + ���!""# ∗ �!""# + ���!"# ∗ �!"# + ���!" ∗ �!" ∗ ������� = ( 4 ∗ 5 + 5 ∗ 5 + 4 ∗ 10 + 3 ∗ 5 + 3 ∗ 1 ∗ 10!! = 119 ∗ 10!!������� = 119 ����������� What if ���!"#$ = 2���? ���!"" = 5 → 0∗, 1, 6, 7 ���!" = 7 → 0∗, 1,2,3∗, 4 ���!""# = 5 → 0∗, 1,16,17 ���!"# = 4 → 0∗, 1,16 ���!" = 4 → 0∗, 1,16 ������� = 77 �� ������� = 1 ����� = 1 2 ∗ 10! = 0.5 ∗ 10!! = 0.5 ����������� 4/7/17-Recitation
![]()
76 Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
![]()
77Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
![]()
78Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
![]()
79Shlomi Oved CS-UY 2214 Comp Arch Notes
![]()
![]()
80