New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

High Perform Comput Arch

by: Alayna Veum

High Perform Comput Arch CS 6290

Alayna Veum

GPA 3.81


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 6290 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 16 views. For similar materials see /class/234112/cs-6290-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.

Similar to CS 6290 at

Popular in ComputerScienence


Reviews for High Perform Comput Arch


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15
CS6290 Memory 1 Georgia UU Tech mm gtu m I I III Views of Memory I Real machines have limited amounts of memow 640KB A few GB This laptop 2GB Programmer doesn t want to be bothered Do you think oh this computer only has 128MB so I ll write my code this way What happens if you run on a different machine Programmer s View Example 32 bit memory When programming you don t care about how much real memory there is Even if you use a lot memory can always be paged to disk Kernel Text Data Heap SSZ39O 4GB ll Il l Programmer s VIew lag 39 0 Really Program s View 0 Each program process gets its own 4GB space Kernel Kernel Text Data Text Data Heap a E CPU 5 View Ig 0 At some point the CPU is going to have to load fromstoreto memory all it knows is the real AKA physical memory 0 which unfortunately is often lt 4GB 0 and is never 4GB per process I EEE Pages Memory is divided into pages which are nothing more than fixed sized and aligned regions of memory Typical size 4KBpage but not always 0 4095 Page 0 40968191 Page 1 819212287 Page 2 1228816383 Page 3 Page Table lag I Map from virtual addresses to physical Physical locatlons OK Addresses I 4K OK 8K 4K Page Table implements 8K i this V9P mapping I 12K 12K 39 Virtual Addresses Page Tables 393 39 OK 4K 8K Need for Translation I OXFC519088 Virtual Address Virtual Page Number Page Offset Physical Address Main Memory OXFC519 OXOO152 OX00152OSB Georgia quot Tech 39 l Simple Page Table 39igi39 I 0 Flat organization One entry per page Entry contains physical page number PPN or indicates page is on disk or invalid Also meta data eg permissions dirtiness etc One entry per page I MultiLevel Page Tables 3955 I Virtual Page Number Level 1 Level 2 Offset Physical Page Number Georgia quot Tech 39 l Choosing a Page Size I o Page size inversely proportional to page table overhead 0 Large page size permits more efficient transfer tofrom disk vs many small transfers Like downloading from Internet 0 Small page leads to less fragmentation Big page likely to have more bytes unused CPU Memory Access 39 i 5 Program deals with virtual addresses Load R1 OR2 On memory instruction 1 Compute virtual address OR2 2 3 0quot Compute virtual page number Compute physical address of VPN s page table entry Load mapping Compute physical address Could be more depending On page table organization Do the actual Load from memory Georgia 39 39 Tech 39 73 I III Impact on Performance Ii Every time you loadstore the CPU must perform two or more accesses Even worse every fetch requires translation of the PC 0 Observation Once a virtual page is mapped into a physical page it ll liker stay put for quite some time I IEEE Idea Caching 0 Not caching of data but caching of translations Physical 0K ysses 4K 0K 8K 4K 12K 8K 16K 1 K 20K 24K 28K Virtual Addresses VPN 8 PPN 16 Georgia quot Tech 39 Translation Cache TLB liga Va I TLB Translation Lookaside Buffer Physical Virtual Address Cache Address TLB Data Cache Hit Tags 39 If TLB hit no need to do page table lookup Note data cache from memory accessed by physical addresses now Georgia quot Tech 39 Ea PAPT Cache Previous slide showed Physically Addressed Physically Tagged cache Sometimes called PIPT llndexed Con TLB lookup and cache access serialized Caches already take gt 1 cycle Pro cache contents valid so long as page table not modified Georgia quot Tech39 Virtually Addressed Cache 393 IE I Virtual Cache leg gzany Physical Data Address Address Virtually tagged TLB On Cache T L2 Cache Miss 0 Tags Pro latency no need to Check TLB Con Cache must be flushed on process Change Tech 39 III Virtually Indexed Physically Tagged nual Cache Address Data Cache Physical Tag Tags gt Hit TLB Physical Address Pro latency TLB parallelized Pro don t need to flush on process swap Con Limit on cache indexing can only use bits not from the VPNPPN Georgia quot Tech39 Ea TLB DeSIgn Often fully associative For latency this means few entries However each entry is for a whole page Ex 32 entry TLB 4K8 page how big of working set while avoiding TLB misses If many misses Increase TLB size latency problems Increase page size fragmenation problems Georgia quot Tech39 I p EEE rocess Changes With physically tagged caches don t need to flush cache on context switch But TLB is no longer valid Add process ID to translation PIDO VPN8 PPN 28 PD11 PPN 44 I SRAM vs DRAM 39igi39 I DRAM Dynamic RAM SRAM 6T per bit built with normal high speed CMOS technology DRAM 1T per bit built with special DRAM process optimized for dens y Georgia quot Tech 39 Hardware Structures l 93 I SRAM DRAM wordline wordline Hm lyl b E b I III Implementing the Capacitor lggl 39 Cell Plate Si Trench Cell J Ya I 39 Cap Insulator Re lling Poly 1 Si Substrate Storage N ode Poly Field OxideWquot DRAM gures from thls Sllde Were taken from Prof N kollc s EECSMlZOOS Lecture notes from UOBel keley Georgia Tech l DRAM Chip Organization I 0 O 2 Row 0 Memory T Address g Cell Array O D Column Address Data Bus Georgia quot Tech39 I III DRAM Chip Organization 2 I 0 Differences with SRAM 0 reads are destructive contents are erased after reading row buffer 0 read lots of bits all at once and then parcel them out based on different column addresses similar to reading a full cache line but only accessing one word at a time o FastPage Mode FPM DRAM organizes the DRAM row to contain bits for a complete page row address held constant and then fast read from different locations from the same page Georgia quot Tech39 DRAM Read Operation Memory Cell Array OX1 FE DU 0 E U a o o o a Sense Am s Ill Ill Ill Ill Column Decoder Data Bus 0x009 Accesses need not be sequential IE 5 63 Georgia quot Tech 39 Destructive Read 39 Wordline Enabled Sense Amp Enabled Vdd sense amp C b39itline quotvoltage After read of 0 or 1 cell contains something close to 12 storage cell voltage l ia J 539 quot 4 la IEEE Refresh 0 So after a read the contents of the DRAM cell are gone 0 The values are stored in the row buffer Write them back into the cells for the next read in the future DRAM cells Sense Amps l I I I I I I I I I I I l Georgia quot Tech 39 I l l Refresh 2 Fairly gradually the DRAM cell will lose its contents even if it s L not accessed FT This is why it s called dynamic Gate Leakage Contrast to SRAM which is static in that once written it maintains its value forever so longas power remains on o All DRAM rows need to be regularly read and re written If it keeps its value even if power is removed then it s nonvolatile eg flash HDD DVDs Georgia quot Tech39 DRAM Read Timing Data Transfer g Data Transfer Overlap Accesses are 3 Column Access asynchronous L RWAmSS triggered by RAS and CAS signals which can in theory occur at arbitrary times subject to DRAM timing constraints Address I r lt Row Column Address Address u DQ raorgia Tech SDRAM Read Timing f DoubleData Rate DDR DRAM transfers data on both rising and falling edge of the clock 1 Column Access Row Access 3 Command frequency does not change I Coidmn 1 l i i Row I Address 39 Add r I fess I 1 I Ely r 1 DO 39 i r I If J Trrmng gures taken from A Performance Comparrson of Contemporary DRAM Aromeomres by Cuppu Jacob Davis and Mudge anemia Tech I Rambus RDRAM lagg I Synchronous interface Row buffer cache last 4 rows accessed cached higher probability of low latency hit DRDRAM increases this to 8 entries 0 Uses other tricks since adopted by SDRAM multiple data words per clock high frequencies Chips can selfrefresh Expensive for PC s used by X Box PS2 Georgia quot Tech39 I Example Memory Latency ggg Computation FSB freq 200 MHz SDRAM RAS delay 2 CAS delay 2 A0 A1 BO co D3 A2 DO c1 A3 c3 c2 D1 Bl D2 0 What s this in CPU cycles assume 2GHz Impact on AMAT Georgia quot Tech39 More Latency More wire delay getting to the memory chips Signi cant wire delay just getting from the CPU to the memory controller f H CPU 128bit 100MHz bus Memwy and caches Controller WidthSpeed varies depending on memory type X16 DRAM plus the return trip Georgia gt quot Tech 39 Memory COI IthIIer Like WriteCombining Buffer Scheduler may coalesce multiple accesses together or reorder to reduce number of row accesses r lt Read E Write E Response Commands Queue E Queue E Queue Data I I T ToFrom CPU l g Scheduler 4 l L l I l Memory Controller Bank 0 Bank 1 Ill Memory Reference Scheduling 0 Just like registers need to enforce RAW WAW WAR dependencies No memory renaming in memory controller so enforce all three dependencies Like everything else still need to maintain appearance of sequential access Consider multiple readwrite requests to the same address Example Memory Latency Computation I l 3 lei FSB freq 200 MHz SDRAM RAS delay 2 CAS delay 2 0 Scheduling in memory controller A0 A1 BO C0 D3 A2 DO C1 A3 C3 C2 D1 Bl D2 Think about hardware complexity Georgia quot Tech39 So what do we do about it I Caching reduces average memory instruction latency by avoiding DRAM altogether Limitations Capacity 0 programs keep increasing in size Compulsory misses l Faster DRAM Speed I Clock FSB faster DRAM chips may not be able to keep up 0 Latency dominated by wire delay Bandwidth may be improved DDR vs regular but latency doesn t change much 0 Instead of 2 cycles for row access may take 3 cycles at a faster bus speed 0 Doesn t address latency of the memory access Georgia quot Tech39


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Anthony Lee UC Santa Barbara

"I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.