New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Special Topics

by: Cassidy Effertz
Cassidy Effertz

GPA 3.64


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course


This 0 page Class Notes was uploaded by Cassidy Effertz on Monday November 2, 2015. The Class Notes belongs to ECE 4893 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 16 views. For similar materials see /class/233913/ece-4893-georgia-institute-of-technology-main-campus in ELECTRICAL AND COMPUTER ENGINEERING at Georgia Institute of Technology - Main Campus.



Reviews for Special Topics


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15
Larrabee A ManyCore x86 Architecture for Visual Computing from Intel Prof HsienHsin S Lee School of Electrical and Computer Engineering Georgia Tech Georgia Dmt tt 9139 nwlmgy Is t at necessarily represent the of cial opinions of Intel Nvid r Georgia Tech I E I If v 1 gt1 x Vision Ambition and Design Goals 0 Intel Software is the New Hardware Intel x86 ISA makes parallel program easier AN 5f Betterflexibility and programmability Support subroutine call and page faulting Mostly software rendering pipeline GXUUpL teALUIe filtering Note that general goal for current day GPGPU designers well also Intel s Larrabee architects 1 performance per mm2 1 performance per watt l J Georgia imi 41 V fiTechE39fiiiiJ T337 The Larrabee Architecture Cohelfrent CohezErent CoheErent CoheErent L2 L2g L2 L23 Coheirent CoheErent Coheirent CoheErent L2 L25 I Fixed Function Logic Lots of x86 cores 8 to 64 Fully coherence cache hierarchy J I 1 J Georgla mgs w quot3f Techm U gjy 3 Conventional GPGPU pipeline base on DirectX10 Larrabee s fu rorammabe i eine H I D II Georgla netmze TEChDTJL jufpixEjy v l II I X86 Core 0 LRB s inorder core is The original Pentium p54c ie pre MMX 64bit extensions Larger L1 caches a shared L2 4way multithreading 16wide VPU Vector Processing Unit Rumor has it this is the thoroughly debugged P540 given back by Pentagon who got the original RTL from Intel to develop their radiation hardened version which I really doubt Compatibility is the keyword l l l Georgia lm39e C Techmxa Single Larrabee Core 3 6 l NE 3 CijifTechE39 CODU xu y o Gerorgiamg w Dual Issue Core Rely on compilerto pairtwo instructions for asymmetric pipes Same as P540 Primary instruction pipe U pipe All instructions Secondary more restricted pipe V pipe 0 Id st spcc Ops ms cache manipumo instructions vector st lGHz 32 cores to reach 1 TeraFLQPS i i yGeorgiaimg ii wtg ii jTecl lE39BCIDUC 37 IE II I Shared L2 Divided L2 Each core has a local L2 subset 256KB each Enable parallel lookup among cores One core can access others subsets directly Entire L2 is coherent no hassle like Cell DMA SIGGRAPH paper shows a 4MB L2 indicating 16 cores l l i Georgia mgn w l J l l i on Techmco xgy Cache Control Instructions 0 Each core can Fast access its local subset of L2 256KB Access other s L2 shares too 0 Control for non temporal streaming data SSE Prefetch to L1 or L2 only 0 Mark a streaming cache line for early eviction Rendertaret ke t in L2 e FB ZB SB etc Georgialmg i w e ijecthrwzmgy Ring Network Bi directional ring network All cores L2 block of FF logic are attached to 512bit wide each direction Simpler than mesh easy wire routing One clock cycle for each stop a hop Number of nodes between two parties determine latencies Worst case halfway around the ring Ring latency is small compared to DRAM access When gt 16 cores multiple hierarchical rings will be needed think about KSR MPP l l Georgia lml fw e iffTechE1lt3UIn 7 4Way MT Four x86 contexts to support 4 hardware threads One thread picked per clock 0 MT is especially helpful When compiler fails to schedule code without stalls Upon L1 misses Can hide long vector instruction latency Can switch thread on every clock i l lyGeorgiaimea ii wtg ll jTecI lE39 CIDUC 37 Ilm l wln39lwlw Iscluln ulnlll mmnl ISOAUIIWI Im nsm mlgmlmmmn nonInmx AllHQ mI39nv c3U1mnm0Hm UIm rm mlommmv Allmam I030 U mnmn mUIm I39mInm39lIAu mmv I 1 q 4 44 4 V 4 4 ll Ill ll Ill ll Ill ll F 444 T 44 w 4r44 4 4 4 4 ulno mu lum vmlllnlmm 0 I39mInm39llAuV Ino 00ltm1v m0 Ill 10m H IS I IN 41 mm 01m MN JJ HMWMK HHUW I39lm 0 VPU 12 0 16 wide Integer single precision FP 8 wide double precision FP Ternary operands One source can come from memory 0 Free predication on every instruction 16bit predicate registers one quotenablequot per lane 0 Gatherscatter instructions Readwrite 16 results tofrom 16 different offsets 0 13 the area of the LRB core Mask Registers VPU 22 16wide Vector ALU Replicate GQSOI QI 16wide Vector ALU alm i TeGI39IE39BCIJDUCDXQEW Fixed Function Logic 13 Modern GPGPU have the following done in HW Texture filtering display processing post shader alpha blending rasterization interpolation etc LRB do all in SW except Texture Sampler Units Much faster than software approach 12x 40x Texture filtering still most commonly uses 8bit operations Efficiently selecting unaligned 2x2 quad requires a s ecialized i elined gather logic Filtering on VPU requires an impractical amount of RF bw Onthefly texture decompression drastically more efficient in dedicated hardware Georgialms f fh e l l l l l l Cifj39l39ech 39 wfdxgy Fixed Function Logic 23 0 Similar to typical GPU texture logic 32KB texture cache per core Supports all the usual operations DXlO compressed texture format Mipmapping Anisotropic filtering l l l Georgialmea i qtg l 7239 n l waTecthouzoxgy Fixed Function Logic 33 Core pass commands to the texture units through the L2 and receive results the same way Virtual to Phsical a39e translation Report any page misses to the core Retry the texture filter command after the page is in memory LRB Still can perform texture operations on the cores if the performance is fast enough in software l Georgialmgft 2e 3 37 l CiroeGI39IE39BCO 3t IE I Slmulatlon Data from SIGGRAPH paper uJ Scaled pellormance J U l I FEAR Ha11 Lifc 2 613 2 Gears of War Larrabee Units IGHZ Cores l l l v 16 24 32 40 48 t Cmn Fluid o Qune Cloth L GJK Cullisiou Dclccliuu 1K Obj Sweep amp l runc Broad l hazc 4 0 Game Rigid Body Cnsl lc D Lurrabe Units lGlu Cures U 8 16 21 31 IU 48 50 64 Scalable Performance for 3D games Scalable Performance for 3D game Physics Source SIGGRAPH08 l l Georgia mgn i w la LSTeGhEDCODU fgjy l I Slmulatlon Data from SIGGRAPH paper 80 70 o Larrabee lGHz nominal 7169 396 8 000 cores it 26GHz 5 60 C Scalable RT ray tracmg 6 a 50 416 S 40 x 3 30 E W0 E L 10 Larrabee Units1Ghz Cores O 0 8 16 24 32 70 Pmduciinn Flnxd O Pruducrim1Face Pmduciion Iorh Marcnng uth 60 3puns Video Analysis vid o Cusl Indexqu Tcxlludexmg Foreground Enimmion 50 r HumanEodyTracldng Fomfolm Mangcmcm 3 DFFT Nongraphics app amp kernels 407 9 639 30 8 1 2 20 I E 107 5 Source SIGGRAPH08 Larrabee Units 1 GHz Cores 0 u Georgia mgn i w 0 8 16 24 32 40 48 56 64 L5TeGhEBCODU fgjy l Slmulatlon Data from SIGGRAPH paper FEARA 30 U m25 0 o39 o20 jaw ES 615 BE so 2210 5 s 0 Gears of W at HalfLife 2 Ep 2 X axis is the 25 tested frames of LRB units needed for 60fps Source SIGGRAPH08 Georgialmae wiai w 5 r r 2 y d TechE acomogjy Profile Breakdown for Title Games 100 90 Alpha Blend E Pixel Shade 3 60 Pixel Setup 3 50 Depth Test g 40 Rasterization Vertex Shade PreVertex FEAR Gears of War Half LifE 2 Episode 2 Modern games 70 pixel setupshading 10 depth 10 rasterization 10 vertex shading M Ggorgia mga mwe Source Tom Forsyth Intel SIGGRAPHOS 3 gt TeGhED J CEJy 21 View from Nvidia http wwwpcpercom images news A20viewooint20from20NVlDlpdf I don t know who actually wrote this article HPC developers said Easier parallel computing on x86 multicore has not proven true Applications struggle to scale from 2 to 4 cores Why people are not using quad cores with 4 wide SIMD We d like to know what has changed in Larrabee Questions from Nvidia Will apps written for today s Intel CPUs run unmodified on Larrabee Will apps written for Larrabee run unmodified on today s Intel uti core CPUs The SIMD part of Larrabee is different from Intel s CPUs so won t that create compatibility problems Georgiaimg ii w e jTecthd r


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.