SPECIAL TOPICS CDA 6938
University of Central Florida
Popular in Course
Popular in Computer Design Architecture
This 14 page Class Notes was uploaded by Genoveva Bogisich on Thursday October 22, 2015. The Class Notes belongs to CDA 6938 at University of Central Florida taught by Huiyang Zhou in Fall. Since its upload, it has received 47 views. For similar materials see /class/227525/cda-6938-university-of-central-florida in Computer Design Architecture at University of Central Florida.
Reviews for SPECIAL TOPICS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/22/15
l39nivvrsil u Zrmm Flnrillu ST CDA 6938 MultiCoreManyCore Architectures and Programming htt cscsucfeducoursesCDA6938 Prof Huiyang Zhou School of Electrical Engineering and Computer Science University of Central Florida l B Outline 0 Administration 0 Motivation Why multicore many core processors Why GPGPU CPU vs CPU The brief history of GPGPU An overview of AMDATI streaming processors and the software development toolset Brook and CAL An overview of Nvidia G80 and CUDA Desc p onSyHabus 0 High performance computing on multi core manycore architectures 0 Focus Datalevel parallelism threadlevel parallelism How to express them in various programming models Architectural features with high impact on the performance 0 Prerequisite CDA5106 Advanced Computer Architecture I C programming l O Desc p oncont Textbook No required textbooks four optional ones Papers amp Notes Tentative grading policy policy will be used Homework 25 Participation in discussion 10 Project 65 0 Including two inclass presentations A90100 B 8590 B 8085 B 7580 MMOamI 0 Assistant Professor at School of EECS UCF 0 My research area computer architecture backend compiler embedded systems High Performance Power Energy Ef cient Fault Tolerant eg GPGPU Architectural support for software debugging Architectural support for information security l O Topms Y to wan r J J J Y to wan v AMDATT GPU 39 39 and the A U 39 U uludel for GPGPU Brook and CAL several guest lectures from AMD NVidia GPU architectures and the programming model for GPGPU CUDA IBM Cell BE architecture and the programming model for GPGPU CPUGPU tradeoffs Datalevel parallelism and the associated programming patterns Threadlevel parallelism and the associated programming patterns Future multicoremanycore architectures Future programming support for multicoremanycore processors Assignments 39 Homework 0 11Hello world using emulators running on CPU of GPUs Programming assignments 3 sets 39 Projects Select one processor model from NVidia G80 ATI streaming processors and IBM Cell processors Select or find your own an application Try to improve the performance using the GPU that you selected 39 Cross platform comparison l O Experiments 0 Lab HEC 238 PS3 and HEC 242 Computers with ATI NVidia Graphics cards 0 Get the access to the lab and Q amp A Yi Yang yangyigmailcom 0 Schedule the time Acknowledgement Some material including lecture notes are based on the lecture notes of the following courses 0 P 39 ssivel Parallel Processors UIUC 0 Multicore P 39 Premier Learn and Compete for the PS3 Cell Processors MIT Mnl n re and GPU P 39 for Video Games GaTech P6 Computer Science at a Crossroads D Patterson 0 Old CW Uniprocessor performance 2X 15 yrs 0 New CW Power Wall ILP Wall Memory Wall Brick Wall Uniprocessor performance now 2X 5 yrs 2 Sea change in chip design multiple cores 2X processors per chip 2 years 0 More simpler processors are more power efficient The Free performance Lunch is over A Fundamental Turn Toward Concurrency in Software The biggest sea change in software development since the OO revolution is knocking at the door and its name is Concurrency by Herb Sutber Problems with Sea Change Algorithms Programming Languages Compilers Operating Systems Architectures Libraries not ready to supply Thread Level Parallelism or Data Level Parallelism for 1000 CPUs chip Architectures not ready for 1000 CPUs chip 0 Unlike Instruction Level Parallelism cannot be solved by just by computer architects and compiler writers alone but also cannot be solved withuut participation of computer architects 0 Modern GPUs run hundreds or thousands threads chip Shifts from Instruction Level Parallelism to Thread Level Parallelism Data Level Parallelism GPGPU is one such example GPU at a Glance 1 Designed for graphics applications 0 Trend converging the different functions into a programmable model To suit graphics applications 7 High 39 th s GPU vs 84 GBs CPU last spring 1152 ATI HD 4870 1417 GBs GTX 280 7 High FP processing power 400N500 GFLOPS GPU vs 3040 GFLOPS CPU last spring 12 TFLOPS ATI HD 4870 933 GFLOPS GPU GTX 280 0 Can we utilize the processing power to perform computing besides graphics 7 GPGPU P U G80 Die 90 run tech Photo IBM Power 6 Outstanding Feature 47 GHZ 2 cores with symmetric 6 Inside the CPU core CDA5106 0 Power 5 die GPU Die GTX280 65 nm processor Texture Cores Texture Processor Cor NVidia G80 0 Some Outstanding features 16 highly threaded SM s gt128 Host FPU s Shared memory per SM 16KB Input Assembler i Constant memory 64 KB Parallel Data cache II II 6 G P U VS C P U The CPU is specialized for computeintensive highly data parallel computation exactly what graphics rendering is about So more transistors can be devoted to data processing rather than data caching and ow control Control Cache GPU vs CPU 0 CPU all these onichip estate are used to achieve performance improvement transparent to software developers 7 Sequential programming model 7 Moving towards multiecore and manyecore GPU more onichip resources used for oatingipoint c omputation r Requires data parallel pro gramming model 7 Expose architecture features to software developers and so are needs to explicitly taking advantage of those features to achieve performance 1 mm Side Bus oupvmm 2m Elsewev lnc AH ung veserved Appendixzn 2o m 0 m My Map mm w mmcunneusmmsngme Cupyngm l sewen m AHMngveseNed Appendisz 21 Inpm Eggng Rasmr Opemmns 4 shad yu ompm Merger Wm Cupyngm mus E sewev me An quotng veserved Appendisz 22 Raster Operations Omle Merger Cupyvmm 2m E sewev mo AH mm veserved Appendisz 23 masmenasm GeFuvceBBD h M and a shaved memurv Cupvngm 2m E sewEV we AH mm veserved Appendisz 24 Pvucessuv Each n m m nnmm m Ems veserved Appendixzia 25 39 m pm u m Mum ved ween muE and a pha cu uvcumpunemsufeach 5km Wemaumem cupvngmznna H52me nc AH quotgm veserved Appendixzn 2s Things to know for a GPU processor Thread execution model How the threads are executed how to synchronize threads How the instructions in each multiple threads are executed Memory model How the memory is organized Speed and Size considerations for different types of memories Shared or private memory If shared how to ensure the memory ordering Control ow handling Instruction Set Architecture Support Programing environment Compiler debugger emulator etc l G HW and SW support for GPGPU Nvidia Geforce 8800 GTX vs Geforce 7800 Slides from the Nvidia talk given at Stanford Univ Pro grammjng models Peak Stream Rapid Mind