Popular in Course
verified elite notetaker
Popular in ComputerScienence
This 8 page Class Notes was uploaded by Jonas Bartell on Monday October 26, 2015. The Class Notes belongs to CS2001 at University of Pittsburgh taught by Staff in Fall. Since its upload, it has received 40 views. For similar materials see /class/229394/cs2001-university-of-pittsburgh in ComputerScienence at University of Pittsburgh.
Reviews for RSRCHTOPICSCOMPUTRSCIENCE
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/26/15
Continuous Compilation A New Approach to Aggressive and Adaptive Code Transformation i r 7 Bruce Chitiers quot Universilyan illsDurgh me Pittsourgh Pennsyyania USA l r chiderscsoit u http www cs pitt educoco In a t This Workis a cotaporation ofmany eopel Sponsored 2 Ne uh gt4 Generation systems Nationai Science Foundation Code Optimization Sophisticated aigorithms existfor many optimizations that do quite Well We are at the point or diminishing returns in applying optimizations 7 small gains are considered good he challenge is to go beyond current optimization improvements Continuous Compilation A new approach Apply optimizations ooth staticaiiy at compiietime and dynamically at rune time With static planning Plan ror both static and dynamic optimizations o unuorstanuirto ti nsulzxiswvu optimizations o Emcacv oipoth static aha dvnamic optimizations Determine what optimizations to apply where to applythem the order in which to applythem and their parameters Outline introduction Continuous Compilation CoCo Framework Proritdriyen Optimization system 7 Sullware oynamit translation 7 Pvugvam in rumenlaliun and Unlimlzaliun u mary Continuous Compilation Sui cmvimm 39xilg vmmu ammo mo mm N m ram Hummus anomw mm p as roammm mum Target applications tong running piogiamstnat naye diiieient phases oiexecution CoCo Framework ppm omit Page 1 CoCo s Preliminary Prediction Framework Predict the impact of optimizations With estimates ofbene t an p D ut what optimizations to apply Taking into account the code context and machine resources Challenges Performance varies widely based on r o e c ritext e g iooptrip count 7 Configuration of optimizations e g loop unroiiingfactor 7 Machine configurations e g cacne configuration 7 The order of optimizations Many resources impact overall performance nfiguration r instruction scneduiing rules 7 Register numbers and types i 0 g a o Framework for Predicting Optimizations Plug n Play Models Ext 1 N o a 3 mm 3 2 g E to machine resources O a Framework consisting of models for Predicting the impact of Optimizations Consider both loop and scalar optimizations eg PRE ract code context Page 2 Motivation Performance problems of the optimizations r Optimizations may degrade performance in som circumstances 7 Optimizations nteract Wl i tn one anotner by enaoiing and disaoiing otner optimizations Optimizations o en applied in an ad hoc fashion 7 Simple neuristic always apply if applicable 7 Predetermined order to apply optimizations 7 Fixed configurations of optimizations Our Approach Build and develop analytic modelsto predict when to apply an optimization without actually applying the optimization 7 Need models of particular optimizations 7 Need models of tne code 7 Need models of tne resources tnat are effected Based on analytic mode what optimizations to ap don t need accurate modeisi accurate enougn to do tne estima ls make decisions about ply uSt the trend needs to be tes Loop Code Model ub loop SprRZfS 1 RefxAgtltIC Express code characteristics that effect resources For loops and cache 1 loop header 2 array references 3 reference sequence Loop Optimization Model Eg Interchange v flt gltltR gltR vii 6 Wm ho lots 1AAIlt gtA Model how optimization transforms code model For loops and cache sequence offunctions that effect aspects of the loops representation code model Loop Resource Model Cache i croup rererences by temporai or spatiai reuse and compute conriicts 2 Pick representative rer rrom eacn group 3 Compute misses ror eacn representative by reuse and con icts Simplified Ghosh model How the code model effects machine resources For loops and cache with code model how the array reference pattern effects both cache misses and hits Loop Optimization Prediction Prediction of the between unoptimizeo and optimized Compare transformed amp nontransformed code For loops and cache Before amp a er loop optimization and whether optimization had reductionincrease in cache misses Experiments Benchmarks MediaBench DSPStone and others Machine model 7 SimpleScalar microarcnitecture simuiator e inrorder singie iSSue nonrblocking cacne MB cache direct mapped and 32B iine size 7 Simiiar to ARM s 94x iBivi Power PC 405 Usefulness of FPO for loop optimizations Prediction accuracy of FPO for loop optimizations Applications of FPO optimizations Ad hoc Always Applying 50 Ellmerchange lTIImg DReversal 25 0 25 50 3 Naif Page 3 Selectively Applying Loop Optimizations 50 uinterehange ITiling DRevErsal 25 n N N eN N N N N 3N amp 4 s 96 W 0 5 6 BE 3 h 5 5904 p e g 63 s f Age is Loop Opt Prediction Accuracy Accurate trend 7 Whether an optimization is beneficial ornot Correct prediction 7 When prediction matches actual execution behavior Prediction accuracy 7 Single loop nest varyingtrlp count 7 Multiple loop nests the number of loop nests FPO Scalar Optimizations Transformations that operate on scalar code 7 g constant propogation dead code eiimination partiai re undancy elimination Can have several impacts 7 Reduce amount of computation 7 Change register pressure tor the better ortor the Worsel 7 May change memory reterencing pattern and cache benaVlor FPO initially considers e Attect on computation 7 How register pressure heips or hurts spills and reioads Page 4 Choose Most Beneficial Loop Optimization irkernel 33 grde ma era piqued a was lms 923M llmevmange uriiing EIReversal uunmiiing IFlenn IDis1ril2ulinn DNntApPlv Loop Opt Prediction Accuracy Prediction accuracy for single loop nest Similar results for multiple loop nests Impact of PRE RE 331 Model 1 ltIris exp USE 3d 5dgt In EF 3d Sdgt 2 we EXD USE 35 ssgt ltns 3 USE 35 55 3 lt33 n DEF 33 s3gt ltIris VDEF 33 s3gt ltIris n DEF 33 s3 11gt ltIris v usE 33 5311gt Resuurte Model regs 331 Model or register zllutahm rr3mewurk rm Pradl l a npnmiunons quotmediumquot Enginequot x 01 load 3 stores mue3sed ur deue3sed Using a Heuristic to Make D Approaches for PRE ecIsIons Hmim39c39d vm PRE Hw s c39d vm LCIM m I aesrneunsn 11 Heunsnea REM H H H H H H H H o 4 8 15 o 4 8 15 i a guy 3 50 3 75 3 78 410 2 90 3 29 5 40 3 27 E E vpr 122 0 75 181 183 70 40 70 38 0 52 0 69 E mcf 237 235 231 222 250 262 258 247 24 parser 125 150 170 135 2 55 286 199 2 23 g vnnzx 4 73 525 4 66 3 86 4 88 569 4 99 5 28 E 2 1111112 7 35 7 52 819 7 91 7 02 735 6 70 4 57 twnlf 1 07 0 88 114 0 02 0 52 0 38 214 191 W W m pm mm mm W WNW Scalar Opt Prediction Accuracy CoCo Framework Static Jamale Dnamc comer CoCo RunTime System 0000 RunTime System Based on Software Dynamic Strata SDT developed at UVA amp Pitt Translation n n r i 3 amicIranslator Layerofso warebetweenapplication S PU binary and the 0 IC time op Imlzatio 7 Memery management Provides basic functionality for run n 05 1 Application39s instructions are Caching examined and modi d before being 7 c e anal 1 Overheadredu 1 1e executed on the CPU Uses include binary translation dynamic optimization amp others o 7 Program Instrumentation Includes target Interf ce 7 Handles all interactiuns between VM applicatlun binary and target D sCPU 39 VSPARt RnIari PieLinux y Page 5 P Translation and Execution 1 F tc e o l 3 m nate fragment 0 transfer and execute fragment Software Dynamic Translation ts next instruction Strata Software Dynamic Translation Awllcillm mer Ovuanng swan CPU slowdown Strata Performance D MIPS I SPARC D x86 D DynamoRIO No optimizations appEda basic runrtlrne system oyemead Approach Instrumentation Optimization Costs associated instrumentation Transform instrumentation to reduce costs Applied for static or dynamic instrumentation 7 Probe count Number of Payload partial lnllnlng Reduce pro probes executed 7 Probe cost intercept program execution e cost be costfor not code Page 6 Overhead Reduction in Strata Improve runtime performance of translated code without applying code optimizations Ef cient execution in fragment cache 7 Linking brancnes in tnetranslated cac e n 7 Handling indirect branches lnllne in the translated code Program instrum entation i n o E o m 3 8 g ristrurneritatloril rt rlse ed 7 Reduce the cost of an individual piece of instrumentation code Program Instrumentation CoCo monitors for runtime conditions 7 Wnen conditions are round Apply sorne optimization plan Program monitoring needs instrumentation a instrumentation code is inserted into translated binary a Gathers lnforrnatlon may inyollte some action 7 ventedrlven rnode Challenges 7 Dynamic instrumentation insert and remove at runetlrne a Portable mecnanisms l e i retargetablllty ow runtime overhead including tne insertion amp remoyal Example Cache Simulation Cache simulation can be slooooow Directexecution cache simulators 7 Popular amp compiled slrnulatlonr Snade Ernbra FasteSlrn a instrument eyery load and store a call data cacne simulator 7 instrument eyery basic blocka call instruction cacne simulator Used INSOP and Strata to build a very fast directexecution simulator Comparison of Cache Simulators Average improvement over StrataEmbra is 24x Related Work Effective optimization 7 Adaptiye optimizing compiiers Cooperozm ampWhalley04 e iteratiye compiiation KniirieriburgOS 7 Optimization space exploration Uriantaiyiiisosi e Anaiytic modeis iWoitgi sariltar 97 McKinleyQB Hu02 Software dynamic translation 7 DynamoRio Dynamo Moio yuicanvvaiiltaoout DELl Cache simulation 7 snade Emora FastrCache FaStSim Summary A new planningbased approach to compilation called Continuous Compilation CoCo Apply whole suite of optimizations with constant re nement of optimizations and plans for them Results 7 Highly accurate predictionstor simpie loop optimizations 7 Highly accurate predictions for scalar optimization 7 Low overhead runtime system based on SDT e lNSOP reduces instrumentation cost cacne simuiation Collaborators Many students have participated including FPO Min Zhao Pitt Program instrumentation Naveen Kumar Pitt Strata Kevin Scott UVAGoogle and Naveen Kumar Overhead reduction Kevin Scott Naveen Kumar Jason Mars Pitt Other Faculty Mary Lou Soffa Jack Davidson UVA Sponsored by the National Science Foundation Next Generation Systems 20022003 and 20032006 Current Areas of Focus Continuous compilation So ware dynamictranslation 7 Low oyernead dynamic transiation 7 New appiications to arcnitecture simuiation security Debugging ofdynamically translated code amicaiiy optimize c iltin e Dynamicaiiy compiied simuiation So error detection amp recovery based on SDT Poweraware memory systems Current Projects Debugging dyn translatedoptimized code N Kumar Ondem instrumentatlo and structural software testing with dynamic 39 n J Misurda and J Clause Staticdynamic optimization planning S Zhou Optimization checking Y Huang I A puwe N Memory systems for cognitive processing Reuse through Speculation on Traces M Pilla from UFRG 8 Page 7 Selected Past Projects Instruction code compressiondecompression Ondemand code downloading for Smartcards Program profiling primitives and pro ling language Software based value reuse on traces Power onShut down of superscalar functional units Memory bus reordering for power reduction Processordriven DVS based on IPCpeak demands Data widthsen tive VLIWs amp scheduling Applicationspecific processors automatic design and target architectures Let s Talk 6409 Sennott Square Of ce hours MW 13 PM Or by appointment Send e mail childerscspittedu See selected papers online httplwwwcspitteduchilders Page 8 082002 Projects Debugging for code security with SDT Dynamically compiled amp sampled architecture simulation Pro tdriven optimization for other constraints g p e code size Selfchecking programs for so errors eg memory bit ips Domain speci c languages for structural testing and au omatic planning Recon gurable custom memory systems And many others or your ideas
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'