New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Class Note for COSC 6385 with Professor Gabriel at UH


Class Note for COSC 6385 with Professor Gabriel at UH

Marketplace > University of Houston > Class Note for COSC 6385 with Professor Gabriel at UH

No professor available

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

No professor available
Class Notes
25 ?




Popular in Course

Popular in Department

This 12 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Houston taught by a professor in Fall. Since its upload, it has received 14 views.

Similar to Course at UH


Reviews for Class Note for COSC 6385 with Professor Gabriel at UH


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15
III iv rinTL COSC 6385 Computer Architecture MultiProcessors II The IBM Cell Intel Larrabee and Nvidia G80 processors Edgar Gabriel Fall 2008 Edgar Gabriel lIll a PVSTL References Intel Larrabee 1 L Seiler D Carmean E Sprangle T Forsyth M Abrash P Dubey S Junkins A Lake J Sugerman R Cavin R Espasa E Grochowski T Juan P Hanrahan Larrabee a manycore x86 architecture for visual computing ACM Trans Graph Vol 27 No 3 August 2008 pp 115 httpsoftwarecommunit intelcomUserFilesenusFilelarrahee manxcoregdf IBM Cell processor 2 C R Johns D A Brokenshire Introductioon to the Cell Broadband Engine Architecture IBM Journal of Research and Development vol 51 no 5 pp 503519 httpllwwwresearchibmcomiournalrdSI5iohnspdf 3 M Kistler M Perrone F Petrini Cell Multiprocessor Communication Network Built for Speed IEEE Micro vol 26 no 3 pp 1023 ttphpcpnlgovpeoplefabriziopapersieeemicrocellpdf Nvidia 380 4 Scott Wasson Nvidia GeForce 8800 graphics processor httptechreportcomarticlesgtlt112111 WT cost as compuherArchltecture Edgar Gabriel U xas PSTL Larrabee Motivation Comparison of two architectures with the same number of transistors Half the performance of a single stream for the simplified core 40X increase for multistream executions 2 outoforder 1O inorder cores cores Instruction issue 4 2 VPU per core 4wide SSE 16wide L2 cache size 4 MB 4 MB Single stream 4 per clock 2 per clock Vector 8 per clock 160 per clock throughput cost as Edgar Gabriel I a PVSTL Larrabee Overview Manycore visual computing architecture Based on X86 CPU cores Extended version of the regular X86 instruction set Supports subroutines and page faulting Number of X86 cores can vary depending on the implementation and processor version cost as compuherArchltecture Edgar Gabriel Cn mm L2 mm comm L2 each nub M Equot E 1 PSTL I Overview of a Larrabee Core I X86 core derived from the Pentium processor No outoforder execution Standard Pentium instruction set with the addition of 64 bit instructions Instructions for prefetching data into L1 and L2 cache Support for 4 simultaneous threads separate registers for each thread Each core is augmented with a wide vector processor VPU 32kb L1 Instruction cache 32 kb L1 Data Cache 256 KB of local subset of the L2 cache Coherent L2 cache across all cores coscEs compuherArchltecture r r EdgarGabnel 3 PSTL I Vector Processing Unit in Larrabee 16wide VPU executing integer single and double precision floating point operations VPU supports gatherscatter operations The 16 elements are loaded or can be stored from up to 16 different addresses Support for predicated instructions using a mask control register ifthenelse statements cost as compuherArchltecture Edgar Gabriel xac PSTL I Inter Processor Ring Network Bidirectional ring network 512 bitswide per direction Routing decisions done before injecting message into the network coscEs compuherArchltecture r 7 g EdgarGabnel 3 PSTL I Larrabee Programming Models Most application can be executed without modification due to the full support of the X86 instruction set Support for POSIX threads to create multiple threads API extended by thread affinity parameters Recompiling code with Larrabee s native compiler will generate automatically the codes to use the VPUs Alternative parallel approaches Intel threading building blocks Larrabee specific OpenMP directives cost as compuherArchltecture Edgar Gabriel PSTL m n 50 m mum Hm Mm W PSTL I IBM Cell Oveniew I I Sell Bmadband Archvtecture CBEA de ned by a cunmmum mm my and Tumvba Ongmany Largeungthe mumrmedva mdumy r E g waystatmn 3 Tusmba HDTV at mm a regularcumputerblade am by BM 7 BMQEZO 1121 122 Nam vdea heterugeneuu mvcmpmcemrcunmmg uf 7 me ur mare general purpuse prunessur element WE and 7 me ur mare synergstn prunessur elements PES Em I Cell Architecture block Blwag i rr sn I Twu generatvunsavavlable 5 far 7 CeHK 2m 3 snows vrgle Dramer peak Perfumery2 14 s snows dauhle Dramer peak Derfarmanue e PMrXCeHBVIZEIEIBY 2m 3 snows vrgle Dramer peak perfarmance 1m 4 snows dauhle Dramer peak Derfarmanue e Bmhhave1PPE andBSPE l gemsWMquot 1 PSTL I General Purpose Processor PPE Based on the IBM PowerPC processor Supports multiple simultaneous operating environments virtualization Eg can execute an instance of a realtime operating system and an instance of a nonrealtime operating system Performs management and application control functions coscEs compuherArchltecture r r EdgarGabnel 3 PSTL I Synergistic Processor Element SPE SIMD processor used for offloading computeintensive data parallel operations from the PPE Each SPE has its own local storage and can access data only from the local storage Current versions of the Cell processors 256k local storage The local storage is connected to the main memory through a Memory Flow Controller MFC MFC moves data from main memory to local storage or between two SPEs cost as compuherArchltecture Edgar Gabriel l 5P quot quotquotWquot msun V PSTL I Synergistic Processor Element SPE II Each SPE ha 128 regmer Each reguterv 12B bu vmde vach can be med to hold 7 vateen 87va mteger Dr 7 1ng 157va mteger m r Fuur 327va Integer m mgle precmun uaungrpumt number 7 Twu 54m Integer m duume precmun uaungpumt number Mon mnrucuon upported bythe ynergvmc proceor umt utmze an element m a reguter rgt 5m In unpumumnnmm m 39 l PSTL Simplified representation of a current Cell processor EPE m m e PETL Element Interconnect Bus WE and PES communicate lnmugn the Element interconnect Bus 7 Camainsasharedcammand bu 522 up endrtnend mnxactian Used m coherence Protocol 7 perineum em interconnect Faur1639l7yterwide rings twa usedfar clockwixe em tranxfers twa m eeuneeneleekme em tramfer Each Ngtransferiza byte packets l cache black me an an spa Cammunicatmn cast between Wm M cam3W hetweeni nap and s hav 7 Overallhandwidth 2MB 63 III 1 PSTL Comparison IBM Cell and Intel Larrabee Both use a large number of small and simple cores Both use highbandwidth ring bus to communicate between the cores Intel Larrabee is homogeneous while IBM Cell is a heterogeneous process difference between PPE and SPE IBM Cell requires data to be moved explicitly to the local store while Larrabee can address any memory area Programm for the Cell have to be written taking the limited amount of memory available for a SPE into account coscEs compuherArchltecture r r EdgarGabnel IF PVSTL Nvidia G80 o Parallel Stream Processor Each green block is a stream processor 16 stream processors are grouped and connected by a L1 cache Each 680 has 8 groups with 16 SPs 128 SPs total Each SP is a generalized processors running at 135 GHZ Each SP operates on a single element scalar groups are connected by a crossbar style switch and that connects them to six ROP Each ROP has its own L2 cache and an interface to graphics memow frame buffer with 64 bits width 6 64bits 384 bits path to memory cost as compuherArchltecture Edgar Gabriel 11 nus 595 7 Eumvma mhma ure Edgar Emma I Performance co Ray Tracing Application mparison 38 Cell Flamsac 120 on so so 49 Stanford Bunny 1024x1024 l I l r I Seconda39v Va 5 can call m spa ca l8 sm uszu Procssnv I PigTL 0 to IBM Source httpgametomorrowcombloymdaphpZOO70905cellrvrgB0 nus 595 7 Eumvma mhma ure Edgar Emma m


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Allison Fischer University of Alabama

"I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.