Class Note for EECS 700 with Professor Kulkarni at KU
Class Note for EECS 700 with Professor Kulkarni at KU
Popular in Course
Popular in Department
This 89 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Kansas taught by a professor in Fall. Since its upload, it has received 17 views.
Reviews for Class Note for EECS 700 with Professor Kulkarni at KU
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 02/06/15
Latestversmn as Wmdnws 7XPVista License Free Q weatherblmkquot THF UNIVI RiITY Ol39 KANSAS Adaptive Optimization in the Jalapeno JVM By Matthew Arnold Stephen Fink David Grove Michael Hind and Peter F Sweeney Presentation by Michael Jantz Introduction The dynamic nature of the Java programming language presents both a large challenge and a great opportunity for high performance Java implementations Features such as dynamic class loading and reflection prevent straightforward applications of traditional static compilation and interprocedural optimization As a result much effort has been invested in developing dynamic compilers for Java 2 wigWE39RE Dynamic Compilation As dynamic compilation occurs during application execution dynamic compilers must carefully balance optimization effectiveness with compilation overhead to maximize system performance However dynamic compilers can also exploit runtime information to perform optimizations beyond the scope of a purely static compilation model 3 wigWE39RE A Little History Early VM39s employed JustlnTime JIT compilers that relied on simple static strategies to choose compilation targets and typically compiling with a fixed set of optimizations Smalltalk80 Self91 Later on more sophisticated VM39s began dynamically selecting a subset of all executed methods for optimization focusing optimization effort on program hot spots Self93 HotSpot early Jalapeno 4 wigWE39RE Online FeedbackDirected Optimizations Research has explored more aggressive forms of dynamic compilation using runtime information to tailor the executed code to its current environment Most of these systems are not fully automatic and most of their techniques have not appeared in mainstream JVM39s However this research has demonstrated that online feedbackdirected optimizations can yield substantial performance improvements 5 wigWE39RE Jalape o39s AOS Jalapeno is a research JVM developed at IBM primarily targeted at server applications Since the time of this writing it has become open source and is now known as the Jikes RVM Research Virtual Machine It is widely used in research This presentation will present an overview of the Jalapeno Adaptive Optimization System AOS as it was at the time of this writing a key component of the Jalapeno VM Later on we will cover in greater detail the controller model employed by this system 6 wigWE39RE Outline Background AOS Architecture Current Controller Model Experimental Results Conclusion Background Jalapeno itself is written in Java Techniques described here apply not only to the application code but to the JVM itself We may apply adaptive optimization to the JVM subsystems ie the compilers the thread scheduler the garbage collector and the A08 itself Drawback To avoid having to run Jalapeno on another JVM users must build a boot image file This file is a snapshot of a Jalapeno JVM written into a file Jalapeno uses this image to bootstrap itself into a running VM 8 IQJKWEKE Compilation in Jalapeno Jalapeno employs a compileonly strategy All methods are compiled to native before they execute There are two compilers in Jalapeno The baseline compiler translates bytecodes into native code by simulating Java39s operand stack Does not perform register allocation Performs only slightly better than bytecode interpretation The optimizing compiler translates bytecodes into an IR upon which it performs a variety of optimizations Uses an efficient linear scan register allocator May be applied with one of three levels which successively apply more aggressive and more expensive optimizations 9 wigWE39RE Compilation Rates Compiler Bytecode Bytes Millisecond Baseline 274 14 Opt Level 0 877 Opt Level 1 359 Opt Level 2 207 Table 1 Average compilation rates on the SPEijm98 benchmark suite 10 THT UNIVTR ITY 0T g KANSAS Java Threads and Scheduling Jalapeno multiplexes Java threads onto JVM virtual processors Jalapeno implements a scheduler in user mode to schedule Java Threads onto virtual processors Virtual processors are essentially threads which are schedulable by the operating system The system supports a quasipreemptive thread scheduler Each compiler generates yield points which are program points where the running thread checks a dedicated bit in a machine control register to determine if it should yield the virtual processor A timer interrupt periodically sets a dedicated bit When a running thread next reaches a yield point a check of this bit will result in a call to the scheduler Yield points are currently inserted in method prologues and loop back edges 11 wigWE39RE Compilation Scenarios Machine Code Executing Code Unresolved Lazy Compilation Profiling Data Stub Invoked Resolution Reference Opiiriiilgitifn Compilers System BaSCOpt ClassLoader ReCompilati0n Plan Figure 1 Compilation scenarios in the Jalape o JVM 1 2 g Compilation Scenarios Notice from Figure 1 a Jalapeno compiler can be invoked in one of three ways First when executing code reaches an unresolved reference causing a new class to be loaded the class loader invokes a compiler to compile the class initializer Second whenever the executing code attempts to invoke a method that has not yet been compiled methods are initialized to lazy compilation stubs when their class is loaded Third the A08 can invoke a compiler when profiling data suggests that recompiling a method with additional optimizations may be beneficial The remainder of this talk will focus on this third scenario 13 wigWE39RE System Architecture Compilers Code lBascOpI Prc lt In Formation Hm39dwme VM Pet l u mince Munilur lmu umented lmuumentatiun OpLimized Cumpilaljun Code Plan AOS Database 77 r l Compilatinn Threads Ru w Raw Data ii Organizer J Runti me Metuumncnts S uhsyslcm Formatted Formulled Data Data Compilation Queue Organizer Event Queue Cunlroller Adaptive Optimization System Figure 2 Architecture of the Jalape o Adaptive Optimization System 14 THY UNIVTRSJTY OF KANSAS Major Subsystems We will take each of these one at a time in more detail Runtime measurements subsystem Controller Recompilation subsystem AOS database 15 Runtime Measurement Subsystem The runtime measurement subsystem RMS Gathers information about the executing methods Summarizes the information And then either a passes this information to the controller via the organizer event queue or b records the information in the A08 database 16 IQJKWEKE Runtime Measurement Subsystem cont Raw profiling data is gathered from several sources Instrumentation in executed application code Hardware performance monitors Instrumentation in the VM itself The controller directs the data monitoring and creates organizer threads to process the raw data at specific time intervals Each organizer analyzes the raw data and either Packages it into a form suitable for direct use by the controller Adds the information to the organizer event queue for the controller to process Records the information for later queries by other AOS components 17 wigWE39RE The Current RMS Sampling is performed using existing Jalapeno mechanisms Whenever the scheduler is entered because a timer interrupt has occurred and the executing thread has reached a yield point instrumentation in the JVM records the current method before it switches to a new thread Timer interrupts occur every 10ms resulting in roughly 100 samplessec Two organizers periodically process these samples The hot method organizer searches for methods with a percentage of samples above a certain threshold The threshold varies from 025 to 1 The adaptive inlining organizer is used to guide inlining decisions This organizer searches for hot call edges ie callercallee method pairs where there is a high frequency of samples attributed to the entry of the callee This threshold is initialized to 1 and is periodically reduced until it reaches 02 For both samplers a decay mechanism is used to weight more recent behavior more heavily 18 Controller The controller orchestrates and conducts the other components of the A08 Specifically it coordinates the activities of the RMS and the recompilation subsystem Based on information from the RMS the controller performs a cost benefit analysis described later to Instruct the RMS to continue or change its profiling strategy Recompile one or more methods to improve their performance It uses priority queues to communicate with the other systems It extracts measurement events from a queue filled by the RMS the Organization Event Queue And it inserts recompilation decisions into a queue the compilation threads process the Compilation Queue 19 wigWE39RE System Architecture Compilers Code lBascOpI Prc lt In Formation Hm39dwme VM Pet l u mince Munilur lmu umented lmuumentatiun OpLimized Cumpilaljun Code Plan AOS Database 77 r l Compilatinn Threads Ru w Raw Data ii Organizer J Runti me Metuumncnts S uhsyslcm Formatted Formulled Data Data Compilation Queue Organizer Event Queue Cunlroller Adaptive Optimization System Figure 2 Architecture of the Jalape o Adaptive Optimization System 20 THY UNIVTRSJTY OF KANSAS Recompilation Subsystem This system consists of threads that invoke compilers These threads extract and execute compilation plans that are inserted into the compilation queue by the controller Recompilation occurs in separate threads from the application and may occur in parallel Note this is different from the initial lazy compilation which occurs the first time a method is invoked Lazy compilation occurs in the application thread that attempted to invoke the uncompiled method The output of the compiler is then installed into the JVM Currently previous activations of a freshly compiled method will continue to use the old compiled code for the method until that activation completes 21 AOS Database The A08 database provides a repository where the A08 records decisions events and static analysis results The various adaptive system components may query this database as needed For example the controller uses the A08 database to record compilation plans and track the status and history of methods selected for recompilation 22 IQJKWEKE System Architecture Compilers Code lBascOpI Prc lt In Formation Hm39dwme VM Pet l u mince Munilur lmu umented lmuumentatiun OpLimized Cumpilaljun Code Plan AOS Database 77 r l Compilatinn Threads Ru w Raw Data ii Organizer J Runti me Metuumncnts S uhsyslcm Formatted Formulled Data Data Compilation Queue Organizer Event Queue Cunlroller Adaptive Optimization System Figure 2 Architecture of the Jalape o Adaptive Optimization System 23 THY UNIVTRSJTY OF KANSAS Current Controller Model The central role of the controller is to determine if it is profitable to recompile a hot method with additional optimizations and if so which optimization level is used This decision is made using a simple a cost benefit analysis I will describe now 24 IQJKWEKE Current Controller Model Number the optimization levels available to the controller from O to N For a method m currently compiled at level i the controller estimates the following quantities T the expected time the program will spend executing method m ifm is not recompiled C the cost of recompiling method m at optimization levelj for i Sj lt N New pro ling information may enable additional speedups over the previous version compiled at level I the expected time the program will spend executing method m in the future ifm is recompiled at level j Using these estimated values the controller minimizes the expected future running time of a compiled version of m ie choosing thejthat minimizes C if C lt T the controller decides to recompile at level i otherwiseidoes not 25 wigBill li Estimating TI The factors in this model are unknowable in practice and estimating them is an ongoing research problem The current controller uses the following method to estimate the expected method execution time for some method m if m is not recompiled Ti The controller assumes that if a program has run for n seconds that it will continue to execute for exactly n more seconds De ne T to be the future expected running time of the program Using a weighted average of the samples described earlier recall sample weight starts at one and decays periodically the model estimates the percentage of future time that will be spent in each method Pm Now the controller predicts the future time spent in each method as T T Pm lt1 26 wigWE39RE Estimating Assuming we can estimate the relative speedup for code at optimization level k compared to level 0 then we can calculate the expected time the program will spend executing method m in the future if m is recompiled Tj o Let Sk be the speedup estimate for code at optimization level k relative to level 0 Then if method m is at level i the future expected running time if we recompile at leveljis TjTiSSj 2 27 Estimating Compilation Cost and Relative Speedup This simple analytical model requires two parameters for each optimization level the cost to compile at each level C and the expected speedup for recompiling at each level 8 The authors estimate these values by running a configuration that compiles with the designated compiler all invoked methods with no profiling or recompilation occurring ie a nonadaptive system on the SPEijm98 benchmarks with input size 100 They gathered the compilation rate bytecodesms that is used by C and the speedup 8 Their results are presented in Table 1 on the following slide The averages given in the last line are used as the default parameters to the analytical model 28 Table 1 Table 1 Compilation rate and speedup of the SPEijmQB benchmarks run with input size 100 This con guration is not adaptive i1 compiles all invoked methods and no pro ling or recompilation occurs Speedup is measured as the best of ve runs relative to a baselinecompiled system Compilation Rate bemsec Speedup over Baseline Benchmark Baseline Opt 0 Opt 1 Opt 2 Baseline Opt 0 Opt 1 Opt 2 compress 31876 959 316 169 100 542 692 750 jess 28716 916 356 173 100 310 514 533 db 35000 955 320 162 100 254 269 290 javac 32700 1000 442 187 100 121 314 346 mpeg 47916 1012 400 208 100 700 1034 1169 mtrt 33638 933 343 170 100 387 657 668 jack 36909 923 398 181 100 343 422 473 Geo Mean 34845 956 365 178 100 336 507 548 29 g Example Suppose the weighted samples suggest an application spends 10 of its execution time in method m and the program has been running for 10 seconds TI 10s 10 1 second Also suppose m is composed of 1000 bytecodes is composed and is currently baseline compiled and the controller is checking whether or not it would be bene cial to compile using the optimizing compiler at level 0 T 1s 1 0336 30s Now we add in the compile time and check if it would be beneficial to recompile this method O 1000bc 956 bcms 105ms 30s 105s 405 lt 1s Thus the compiler chooses to recompile the method with at least optimization level 0 Note the controller will check the other optimization levels as well to determine which is optimal IQJKWEKE 30 Evaluation The implementation of the controller model presented here rests on many simplifying questions Evaluation of this model is a topic of ongoing and future work In this section the authors present several experiments they used to evaluate the effectiveness of the approach employed in Jalapeno at the time of this writing For the sake of time I will only present two of these experiments here 31 IQJKWEKE Experimental Methodology The experimental evaluations presented here were performed on an IBM F50 Model 7025 with two 333MHz PPC604e processors running AIX v43 The system has 168 of main memory All experiments were performed using Jalape o39s non generational copying garbage collector The Jalapeno boot image was compiled using the optimizing compiler at level 2 the optimizing compiler and the adaptive optimization system were included in the boot image 32 wigWE39RE MultiLevel Recompilation The first experiment is intended to evaluate the effectiveness of the adaptive multilevel recompilation system described earlier by comparing its performance to both JIT and simple adaptive level configurations of Jalapeno To allow the experiments to focus on the recompilation decisions none of the configurations perform feedbackdirected optimizations ie no adaptive inlining For each benchmark the following configurations were ran Baseline compiler as a JIT Optimizing compiler at each level as a JIT Adaptive singlelevel con guration using the optimizing compiler at each level Adaptive multilevel system using the optimizing compiler at any of its three levels 33 wigWE39RE MultiLevel Recompilation The experiment was run on the SPEijm98 benchmarks the Jalapeno optimizing compiler and the Volano benchmark a multithreaded server application Program startup and steady state performance are also distinguished in the results Startup performance was gathered by running each benchmark with small input sizes causing each benchmark to run then quickly terminate Steady state performance was gathered over longer program executions The results for the SPEijm98 and the optimizing compiler are the minimum elapsed time from five runs of each benchmark The Volano benchmark performance is reported in terms of message throughput 34 wigWE39RE Speedup over JIT Baseline Startup Performance a JDquot OplLeVe 0 El JIT Opchvel l c 111 Opchvel 2 AdapLivc OplLevcl 0 I Adaptive OplLevcl l l Adaptive OplLevcl 2 l AdapLive MulliLuvel HM compress jess db juvuc mpegaudjo mm jack Figure 5 Startup performance oplcumpiler olano geumeuic mean 35 THY UNIVFRSJTY OF KANSAS Startup Performance Adaptive recompilation clearly delivers better performance than the JIT configurations Compiletime overhead plays a large role in this regime Note that for all benchmarks increasing the optimization level in JIT configuration always causes performance to suffer The best single level of adaptive optimization varies among benchmarks between level 0 and 1 For this reason overall the multilevel optimization strategy delivers the best performance 36 IQJKWEKE Steady State Performance 8 3 HT OptLeVel 0 El IlT OplLevel l u 39 n llT Opthvcl 2 2 6 K El Adaptive OptLevel 0 g I Adaptive OptLCVcl l 0 I Adaptive OptLevel 2 t v I Adaptive MuIliLevel a L a E 4 D 395 a v a v 2 quot quot M 0 H com ress 39ess db awn m evaudio mlrl 39uck o tcum ilcr volano neomeLricmczm E e Figure 6 Steadystate performance 37 TI391TLJN TRSTY 0T KAN SA S Steady State Performance Each adaptive configuration is competitive with its JIT counterpart The multilevel adaptive system delivers the best performance of the adaptive configurations with overall performance within 2 of the best JIT configuration 38 Good Performance in Both Startup and Steady State Figures 5 and 6 show that any one fixed strategy does not suit a workload with programs that execute for different lengths of time For longrunning programs the highest optimization level delivers the best performance However for shortrunning programs the highest optimization levels deliver the worst performance The adaptive multilevel system attains high performance in both the startup and steady state programs 39 The Controller Model39s Predictive Abilities The second experiment explores the effects of predicting the future execution time of an application The system was modified to take as an argument the expected execution time of an application and this argument was varied to observe an application39s runtime behavior compared with the heuristic described earlier Note adaptive inlining is enabled in this experiment 40 wigWE39RE Tables 2 and 3 javac Total Threads Compiled con gurations Time main 1 compile I gc Total I Methods Opt 0 I Opt 1 Opt 2 heuristic 5767 4994 498 275 266 205 93 9952 139 exact 5715 4942 490 283 225 188 78 9429 168 large 6753 5193 1155 405 66 66 0 0 660 Table 2 This table presents the runtime behavior of javac when the application s predicted execu tion time varies jack Total Threads Compiled con gurations Time main compile gc Total Methods Opt 0 Opt 1 Opt 2 heuristic 4385 4082 183 120 159 123 76 4433 323 exact 4279 3980 179 120 121 98 47 4521 62 large 4601 4235 239 127 23 23 0 0 230 Table 3 This table presents the runtime behavior ofjack when the application s predicted execution time varies 41 TI39H39 UNIVTRSITY 0139 KAN SA S Interpreting the Results Results are shown for the javac and jack benchmarks Interpret the tables as follows Row 1 shows the runtime behavior when the heuristic is used Row 2 shows the runtime behavior when the application39s exact execution time is known in advance Row 3 show the runtime behavior when the application39s expected execution time three orders of magnitude larger than its actual execution time The Total Time column is the application39s execution time This is subdivided in the Threads columns to show how much time the application spent in threads that contribute significantly to time The Compiled columns show how many times the optimizing compiler was invoked at what level of optimization and for how many methods The second number in some columns represents the number of methods that are recompiled multiple times before obtaining this level of optimization No method is ever recompiled more than twice 42 wigtili39 lig Observations Generally the exact prediction of execution time is slightly better than the heuristic The fact that predicting execution time exactly performs only slightly better than this heuristic indicates that this heuristic performs well in practice Grossly mispredicting execution time has significant performance implications As expected when execution time is grossly over predicted the controller is too aggressive with compilation 43 wigWE39RE Conclusions The adaptive optimization system of the Jalapeno JVM was presented More specifically the controller component of this system was described in detail It was shown that this multilevel optimization strategy can deliver robust performance in both startup and steadystate program regimes competitive with the best alternative in each regime Finally it was shown that the current heuristic of assuming a method will execute for twice the current duration is an effective predictor 44 wigWE39RE Conclusions The Jalapeno system provides a flexible infrastructure for future research on online optimization Future research topics include Automatic specialization Profiledirected memory layout optimizations Refinements to the recompilation analytic model Consideration of larger server codes based on IBM middleware products Expect the Jalapeno A08 to play a key role in efforts to improve Java server performance 45 IQJKWEKE Di Mm an mammal a 92mm A Comparison of Software and Hardware Techniques for x86 Virtualization Keith Adams VMware Ole Agesen VMware Jordan and Justin Ehrlich University of Kansas EECS750 Doug1as Niehaus I 1 Introduction that go beyond traditional trapandemulate Software VMM39s VMware Workstation and Virtual PC Use Binary Translation BT Xen 30 Software lt 30 uses Paravirtualization Recent HardwareAssisted VMM39s Supposedly rarely offer performance gains over Software Xen 30 Hardware Virtual PC VMWare and Parallels now all support native virtualization through hardware post paper I x86 OS virtualization has required methods I 2 Classical Virtualization Fidelity Software on the VMM executes identically to its execution on hardware barring timing effects Performance An overwhelming majority of guest instructions are executed by the hardware without the intervention of the VMM Safety The VMM manages all hardware resources In this sense x86 is classically virtualizable I 3 Requirements Popek and Goldberg I Classical Virtualization Paper39s definition Classically Virtualizable Can be virtualized purely by trapandemulate x86 in this sense NOT classically virtualizable Concepts used in Classical Virtualization Deprivileging Primary and Shadow structures Memory traces I 3 Software Virtualization Visibility of privileged state Lack of traps when privileged instructions run at userlevel Virtualization of Binaryonly OS39s like Windows disrupted Overcome by having Guest OS operate on interpreter rather than physical CPU Interpreter inefficient however Binary Translation can be used instead and can provide much better performance I x86 obstacles I Binary Translation Avoids costly Privileged Instruction traps Binary Dynamic On demand System Level Subsetting Adaptive I Simple binary translation Adaptive Binary Translation Eliminates nonprivileged instruction traps Loads stores that access sensitive data like page tables Still suffers some dynamic overhead Jumping to replace translation Static overhead code patching loss of icache contents Can be controlled with hysteresis to ensure low adaptation frequency Hardware Virtualization AMD AMDV svm flag proccpuinfo AMZ Athlon 64 F Family Opterons gt 2nOI generation Phenom Intel Intel VT vmx flag proccpuinfo Pentium 4 6X1 and 6x2 models and D Core and Core 2 with a few exceptions Xeon Core generation Both due the same thing but have different semantics Both now implement a virtual IOMMU which was added after the paper was published I X86 Architecture Extension guest mode A new instruction called vmrun enables the mode erun is executed Hardware loads guest state from the VMCS Execution proceeds until some condition located in the VMCS On exit host mode is loaded which is supplied by the VMM VMM is now running in order to emulate operation caused by the exit erun is executed system returns to guest mode exit reason has been serviced I A new run mode has been added called VMCS What paper calls the Virtual Machine Control Block VMCB Intel calls the Virtual Machine Control Structure VMCS Stores state information for each guest and information about the host Contains fields to aid the VMM in handling the exit describing the reasons for the exit and what to do next VMCS format is not a standard but is defined by the VMM VMWare programs the VMCS to exit on guest page faults TLB flushes addressspace switches lO instructions may be out of date Control bits allow flexibility in trust such as access to certain lO devices or interrupt handling VMWare can use the old software emulator code including MMU virtualization logging synchronization and device models Shadow Structures Derived from guestlevel primary structures such as page table pointer register processor status register page tables and the time stamp counter Read accesses from the primary structures will return shadow values and no exit Writes will cause exit allowing VMM to handle Tracing pageprotection through shadows of all page tables Every page table modification will cause an exit and modification to the shadow As untouched regions of guests virtual address space is accessed page faults will cause exit I Page Faults True Page Faults caused by violations of the I protection policy encoded in the guest PTEs These are forwarded to the guest Hidden page faults caused by misses in the shadow table These cause the VMM to construct an appropriate shadow PTE and resume guest execution It is hidden because it has no guest visible effect This allows full management of memory space by allowing VMM control of all page directories and tables Memory of guest is mapped to the virtual memory space of the VMM I Example fork CPL is changed from 3 to O The guest39s trap and system call vectors are already loaded so transition I happens without VMM intervention The guest uses copyonwrite Each guest page table write causes an exit VMM emulates effect on the traced page and reflects effect into the shadow page table Guest scheduler context switches to child loading child39s table pointer This causes an exit allowing the VMM to construct a new shadow page table and points the VMCS39s page table register at it I Example fork cont As child runs it touches pieces of its address I space that are not yet mapped by the shadow page tables This causes exit allowing the VMM to service the hidden page fault update the shadow page table and resume guest execution f memory accesses cause true page faults which are imposed by the guests the page faults are returned to the guest Guest Page Table Page Table Page Direcmry Pram When a load page mg directory pointer into V m CR3 is encountered cm Gueaniew an exit is performed The value of the virtual pointer is mammary given to the VMM mmquot which in turn load the real value into CR3 mm gt before handing control back to the guest E Page Table Xen Hyper snr gm Guest Page Frame Number m Machine Page Frame Number Software vs Hardware Hardware extensions allow classical virtualization on the x86 The overhead comes with exits it no exits then native speed Hardware Advantages Code density is preserved no translation Precise exceptions BT performs extra work to recover guest state for faults and interrupts in nonIDENT code System calls run without VMM intervention Software Advantages Trap elimination replaced with callouts which are usually faster Emulation speed callouts provide emulation routine whereas hardware must fetch and decode the trapping instruction then emulate Callout avoidance BT can avoid a lot of callouts by using in TC emulation 9quot 3f naive Ij hig h EF is better 128 1EE Bl 4D 2D User level results oftwar39e mm Hardware 39MM 2 gzip var r1cf crafty parser ennperlhmkgap warts Ejzip2 twolf specjlm 53 3f nati higher is better Macrobenchmarks IIO quotIUD I I I I I S Uftware MI Hardware MM I an EU 20 El mmpiIeLin mmpila quotquotin ApacheLin Apache39u quotin LargeFEAM EDEraphics I Paravirtualization vs Hardware 12 1 88 36 39 84 82 Xenu E222 UHKI Native KB SPECINT SPECFP CPUSIJHK SPECJBB BYTE SYSEENCH THREHDS HUTEX HEHDRY FILEII IJLTP CPU cm was smaHer s benev munun 10000 mun an 01 Nanobenchmarks News El Su ware vMM l Hardware VMM 1 syscaH m crEwr aHret pgiauh dwzem ptemud Overhead seconds Suitare VMM l Hardware VMM LL syscaH mom cr wr caum pgmu plemud Vans am Figure 4 thzslizanon uauobeuchumrks Figure 5 Somces of mmalizanon overhead m an XP bOOI39hak Future Microarchitecture Cydereduc on Hardware MMU support IOMMU 583Hz P4 672 266GHz Core 2 Duo VM emry 2409 937 Page fault VM 2x1 1931 1186 VMCB read 178 52 VMC39B vva 171 44 forkwait Hardware So ware P4 1 77 31 Core 148 F he 3 TLB TLE Fill Hardware 739 EVA hPA 1 gVA V PAX V Y Guest PT Ptr WA gt uPA 39 Guesr VMM Nested PTPtr 3 4 5 W s r gPA gt hFA gPAX 721144 Y I Conclusion hardware virtualization is currently slower then software Perhaps future will provide better solutions Hardware Virtualization now allows Classical Virtualization Hybrid VMMs are in the works with Betas available I Results were surprising to me in that