New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Computer Architecture

by: Marian Kertzmann DVM

Computer Architecture CS 6810

Marian Kertzmann DVM
The U
GPA 3.78

Alan Davis

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Alan Davis
Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 17 page Class Notes was uploaded by Marian Kertzmann DVM on Monday October 26, 2015. The Class Notes belongs to CS 6810 at University of Utah taught by Alan Davis in Fall. Since its upload, it has received 59 views. For similar materials see /class/229976/cs-6810-university-of-utah in ComputerScienence at University of Utah.

Similar to CS 6810 at The U


Reviews for Computer Architecture


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/26/15
Lecture 11 ILP Innovations and SMT Today outof order example ILP innovations SMT Sections 35 and supplementary notes 000 Example Assumptions same as HW 4 except there are 36 physical registers and 32 logical registers Estimate the issue time completion time and commit time for the sample code 000 Example Original code ADD LD ADD ST SUB LD ADD R1 R2 R3 R2 8R1 R2 R2 8 R1 R3 R1 R1 R5 R1 8R2 R1 R1 R2 Renamed code ADD P33 P2 P3 LD P34 8P33 ADD P35 P34 8 ST P33 P3 SUB P36 P33 P5 Must now wait for regs to be freed 3 000 Example I I I I I I I I I I Original code Renamed code an Iss Comp Comm ADD R1 R2 R3 ADD P33 P2 P3 i i1 i6 i6 LD R2 8R1 LD P34 8P33 i i2 i8 i8 ADD R2 R2 8 ADD P35 P34 8 i i4 i9 i9 ST R1 R3 ST P33 P3 i i2 i8 i9 SUB R1 R1 R5 SUB P36 P33 P5 i1 i2 i7 i9 LD R1 8R2 ADD R1 R1 R2 000 Example I I I I I I I I I I Original code Renamed code an Iss Comp Comm ADD R1 R2 R3 ADD P33 P2 P3 i i1 i6 i6 LD R2 8R1 LD P34 8P33 i i2 i8 i8 ADD R2 R2 8 ADD P35 P34 8 i i4 i9 i9 ST R1 R3 ST P33 P3 i i2 i8 i9 SUB R1R1R5 SUB P36 P33 P5 i1 i2 i7 i9 LD R18R2 LD P18P35 i7 i8 i14 i14 ADD R1R1R2 ADD P2P1P35 i9 i10i15 i15 5 Reducing Stalls in RenameRegfile Larger ROBregister fileissue queue Virtual physical registers assign virtual register names to instructions but assign a physical register only when the value is made available Runahead while a long instruction waits let a thread run ahead to prefetch this thread can deallocate resources more aggressively than a processor supporting precise execu on Twolevel register files values being kept around in the register file for precise exceptions can be moved to 2ncl 6level Stalls in Issue Queue Twolevel issue queues 2ncl level contains instructions that are less likely to be woken up in the near future Value prediction tries to circumvent RAW hazards Memory dependence prediction allows a load to execute even if there are prior stores with unresolved addresses Load hit prediction instructions are scheduled early assuming that the load will hit in cache Functional Units Clustering allows quick bypass among a small group of functional units FUs can also be associated with a subset of the register file and issue queue ThreadLevel Parallelism Motivation gt a single thread leaves a processor underutilized for most of the time gt by doubling processor area single thread performance barely improves Strategies for threadlevel parallelism gt multiple threads share the same large processor 9 reduces underutilization efficient resource allocation Simultaneous MultiThreading SMT gt each thread executes on its own mini processor 9 simple design low interference between threads Chip MultiProcessing CMP How are Resources Shared Each box represents an issue slot for a functional unit Peak thruput is 4 IPC Thread 1 Thread 2 CI Thread 3 Cycles Thread 4 Idle Superscalar FineGrained Simultaneous Multithreading Multithreading Superscalar processor has high underutilization not enough work every cycle especially when there is a cache miss Finegrained multithreading can only issue instructions from a single thread in a cycle can not find max work every cycle but cache misses can be tolerated Simultaneous multithreading can issue instructions from any thread every cycle has the highest probability of finding work for every issue slot 10 What Resources are Shared Multiple threads are simultaneously active in other words a new thread can start without a context switch For correctness each thread needs its own PC its own logical regs and its own mapping from logical to phys regs For performance each thread could have its own ROB so that a stall in one thread does not stall commit in other threads lcache branch predictor Dcache etc for low interference although note that more sharing 9 better utilization of resources Each additional thread costs a PC rename table and ROB cheap 11 Pipeline Structure What about RAS LSQ Private Shared Frontend Private Frontend Shared Exec Engine Resource Sharing Thread1 R1 6 R1 R2 P736 P1 P2 R3eR1R4 P74eP73P4 R5 6 R1 R3 P75 6 P73 P74 Instr Fetch Instr Rename Instr Fetch Instr Rename Issue Queue R2 6 R1 R2 P76 6 P33 P34 P736 P1 P2 R5eR1R2 P77eP33P75 P746P73J39P4 R3 6 R5 R3 P78 6 P77 P35 P75 6 P73 P74 Th d 2 P76 6 P33 P34 rea 39 P77 6 P33 P76 Register File P78 6 P77 P35 HI Performance Implications of SMT Single thread performance is likely to go down caches branch predictors registers etc are shared this effect can be mitigated by trying to prioritize one thread While fetching instructions thread priority can dramatically influence total throughput a widely accepted heuristic ICOUNT fetch such that each thread has an equal share of processor resources With eight threads in a processor with many resources SMT yields throughput improvements of roughly 24 Alpha 21464 and Intel Pentium 4 are examples of SMT 14 Pentium4 HyperThreading Two threads the Linux operating system operates as if it is executing on a twoprocessor system When there is only one available thread it behaves like a regular singlethreaded superscalar processor Statically divided resources ROB LSQ issueq a slow thread will not cripple thruput might not scale Dynamically shared trace cache and decode finegrained multithreaded roundrobin FUs data cache bpred MultiProgrammed Speedup Benchmark Bust Spcctlup Worst Speedup Mg pccdup gzip 143 114 124 x39pr 143 1114 117 gcc 144 11111 111 Incl 15 MI 121 111113 1111 H110 117 parser 141 100 118 can 142 1117 125 pcrlhmk 141 1117 12I1 gap 143 117 125 ni lcx 141 1111 113 hxipl 14 115 124 m 01139 143 1112 111 t Llpwisc 133 112 124 swim 158 11911 114 Ingrid l 23 194 11I1 applu 137 1I12 1111 mesa 13 11 l 122 gulgcl 147 1115 25 an 155 1911 113 equukc 14K 1112 121 I Lit39crcc 131 1111 125 immp 141quot 11M 121 lucus 130 IN 11 I39Jnai tl l 34 113 12I1 sixtrack 15K 12N 142 npsi 1411 114 123 lwrull 15x I39WI39I 12II sixtrack and eon do not degrade their partners small working sets swim and art degrade their partners cache contention Best combination swim amp sixtrack worst combination swim amp art Static partitioning ensures low interference worst slowdown is 09 Title Bullet


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.