Class Note for EECS 700 with Professor Kulkarni at KU
Class Note for EECS 700 with Professor Kulkarni at KU
Popular in Course
Popular in Department
This 16 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Kansas taught by a professor in Fall. Since its upload, it has received 18 views.
Reviews for Class Note for EECS 700 with Professor Kulkarni at KU
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Cell GC Using the Cell Synergistic Processor as a Garbage Collection Coprocessor ChenYong Cher Michael Gschwind IBM T Watson Research Center Presented by Arturo Ramos The University of Kansas EECS 700 Virtual Machines Spring 2009 Introduction in What is the Cell Broadband Engine Processor Cell BE in But First Heterogeneous multiprocessor found in super computers servers and the PlayStation 3 Designed to give a programmer more control over the internal workings of the hardware Designed for number crunching graphics What about programs not designed for parallelism Parallel processing large data sets like Cell Memory Architecture The Difference Between Cell and Conventional CPUs n A conventional CPU has multiple levels of cache which is used to help hide latency when running RandomAccess applications Great for single processor systems Bad for multiprocessor systems NUMA ring a bell Coherence problems a Cell s innovation Only the PPE has cache The 8 SPE39s have local stores Local Stores A local store is a large area oflow latency memory used to store both data and instructions The programmer tells the SPE how to use its local store unlike cache which is managed directly by hardware Effectively it is both a cache and a data area for the SPE n PPE n SPE 32KB instruction cache 256KB local store 32KB data cache 512KB L2 cache Normally a local store is used to store the SPE39s program and a large data set for it to work on What ifwe want to use it for something other than large data sets Running Java on the Cell BE In previous experiments the Cell processor was successfully tested for executing Java applications Problem Many system functions only run on PPE Type resolution Garbage collection This means that the entire CPU stalls when the garbage collection routine is called Solution Of oad the garbage collection routine to an SPE so that the PPE can continue to execute the Java application The Challenge a Why is using an SPE to do Garbage Collecting such a big deal a Garbage Collection is a Local Store s worst enemy Local stores are meant to provide fast access to contiguous data so that an SPE can process the data quickly sequential access Garbage collection dereferences pointers across a large memory space in order to find objects no longer referenced random access The Garbage Collection Algorithm n BoehmDemersWeiser BDW mark lazysweep collector A form of the markandsweep GC algorithm Used in many different languages and platforms Due to popularity it is well tested and tuned Marks unreferenced objects by placing them in a stack Sweeps away the objects lazily only when the space is needed by a new allocation El El El El The Testing System Cell Blade Server with 2 Cell CPUS 2 PPES 16 SPES 32 GhZ 1 GB RAM Cell BE Linux 2620CBE GNU Compiler for lava gcj Benchmarks SPEijm98 Iolden The Implementation The Cell PPE runs a ava Application One SPE is used to run the BDW GC Marking routine only markphasequot The Cell PPE still handles much of the GC routine that interacts directly with the IVM When the SPE begins to run the PPE is not allowed to change anything in the heap until the garbage collector has finished stoptheworldquot When the SPE is done with its workload it resynchronizes with the PPE passing it a descriptor to its mark stack The Implementation El The simplest way to implement the markphase Make the SPE transfer every reference to its local store one at a time every time it needs to check a reference baseline 1 Need some method to cache references so that we don t have to load everything into the local store one reference at a time every time a Two Solutions Operandbuffers Software Caching 10 Operand Buffers I Is a method of retrieving an entire block of pointers rather than one single pointer 1 Example An array of pointers Load block of pointers from ptr vs load ptri I When the SPU receives a reference to check it will retrieve the entire block rather than just the reference 11 Software Caches An operand buffer is only good for contiguous data Luckily the Cell SDK distributed by IBM comes with a Software Cache component Emulates a hardware cache through software Con Huge loss in performance due to being software based Pros Software cache can be configured to any size Replacement strategy can be configured for the data structure being used 12 Speedup in SC Marking Time Results Garbage Collection Caching a Performance is enhanced With any form of caching n 400 600 faster with a Hybrid Sofware Cache and Operand Buffer method I Operand buffer D128KB SW l SW Operand Operand buffer I 128KB SW DSW Operand 13 Results Garbage Collection on SPE vs PPE n Even With caching methods implemented and up to 600 speed up the SPE still cannot perform as well as the PPE n The SPE is however almost as good in most cases Normalized Mark Ferf oooo ce SPE versus PPE IPFE liSFE sws operand compress db 14 Conclusion n Running Garbage Collection on coprocessors using local memorybased hierarchy can be done a Allows the main processor to be used for the important user applications While coprocessor does the mundane Garbage Collection task a Since local stores are explicitly managed programmers can implement their own caching techniques 400600 performance improvement over no caching at all 15 Questions 16
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'