Outline for EECS 700 with Professor Kulkarni at KU
Outline for EECS 700 with Professor Kulkarni at KU
Popular in Course
Popular in Department
This 63 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Kansas taught by a professor in Fall. Since its upload, it has received 17 views.
Reviews for Outline for EECS 700 with Professor Kulkarni at KU
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
3 Emulation Outline Emulation Interpretation basic threaded directed threaded other issues Binary translation code discovery code location other issues Control Transfer Optimizations EECS 700 Virtual Machines Spring 2009 9 Key VM Technologies Emulation binary in one ISA is executed in processor supporting a different ISA 0 Dynamic Optimization binary is improved for higher performance may be done as part of emulation may optimize same ISA no emulation needed A mam Emulation Optimization EECS 700 Virtual Machines Spring 2009 3 Emulation Vs Simulation Emulation method for enabling a subsystem to present the same interface and characteristics as another ways of implementing emulation interpretation relatively inefficient instructionatatime 0 binary translation blockatatime optimized for repeated eg the execution of programs compiled for instruction set A on a machine that executes instruction set B Simulation method for modeling a subsystem s operation objective is to study the process not just to imitate the function typically emulation is part of the simulation process EECS 700 Virtual Machines Spring 2009 9 De nitions uest environment being supported by underlying platform llost underlying platform that provides guest Host environment supp rted by EECS 700 Virtual Machines Spring 2009 l5 De nitions 2 Source ISA or binary original instruction set or binary the ISA to be emulated Target ISA or binary ISA of the host processor emul ted by underlying ISA SourceTarget refer to ISAs Target GuestHost refer to platforms EECS 700 Virtual Machines Spring 2009 3 Emulation 0 Required for implementing many VMs Process of implementing the interface and functionality of one subsystem on a subsystem having a different interface and functionality terminal emulators such as for VTlOO xterm putty Instruction set emulation binaries in source instruction set can be executed on machine implementing target instruction set eg IA32 execution layer EECS 700 Virtual Machines Spring 2009 5 Interpretation Vs Translation Interpretation simple and easy to implement portable low performance threaded interpretation Binary translation complex implementation high initial translation cost small execution cost selective compilation We focus on userlevel instruction set emulation of program binaries EECS 700 Virtual Machines Spring 2009 I Interpreter State Program CoUnfer An interpreter needs to maintain the complete architected state of the machine implementing the source ISA registers Condition Codes Reg 0 Reg 1 Dfata Reg n1 memory code data stack Stac k EECS 700 Virtual Machines Spring 2009 8 3 Decode Dispatch Interpreter Decode and dispatch interpreter step through the source program one instruction at a time decode the current instruction dispatch to corresponding interpreter routine very high interpretation cost while halt ampamp interrupt inst codePC opcode extractinst316 switchopcode case LoadWordAndZero LoadWordAndZeroinst case ALU ALUinst case Branch Branchinst Instruction function list EECS 700 Virtual Machines Spring 2009 5 Decode Dispatch Interpreter 2 Instruction function Load LoadWordAndZeroinst RT extractinst255 RA extractinst205 displacement extractinst15l6 if RA source 0 else source regsRA address source displacement regsRT dataaddressltlt 32gtgt 32 PCPC4 EECS 700 Virtual Machines Spring 2009 10 5 Decode Dispatch Interpreter 3 Instruction function ALU ALUinst RT extractinst255 RA extractinst205 RB extractinst 155 sourcel regsRA sourceZ regsRB extendedopcode extractinstlOlO switchextendedopcode case Add Addinst case AddCarrying AddCarryinginst case AddExtended AddExtendedinst PCPC4 EECS 700 Virtual Machines Spring 2009 3 Decode Dispatch Ef ciency DecodeDispatch Loop mostly serial code case statement hardtopredict indirect jump call to function routine return Executing an add instruction approximately 20 target instructions several loads stores and shiftmask steps Handcoding can lead to better performance example DECCompaq FX32 EECS 700 Virtual Machines Spring 2009 12 5 Indirect Threaded Interpretation High number of branches in decodedispatch interpretation reduces performance overhead of 5 branches per instruction Threaded interpretation improves efficiency by reducing branch overhead append dispatch code with each interpretation routine removes 3 branches threads together function routines EECS 700 Virtual Machines Spring 2009 13 1ndirect Threaded Interpretation 2 LoadWordAndZero RT extractinst255 RA extractinst205 displacement extractinst15l6 if RA source 0 else source regsRA address source displacement regsRT dataaddressltlt 32 gtgt 32 PC PC 4 If halt interrupt goto exit inst codePC opcode extractinst316 extendedopcode extractinstlOlO routine dispatchopcodeextendedopcode goto routine I EECS 700 Virtual Machines Spring 2009 14 lmdirect Threaded Interpretation 3 Add RT extractinst255 RA extractinst205 RB extractinst155 sourcel regsRA source2 regsRB sum sourcel source2 regsRT sum PC PC 4 If halt interrupt goto exit inst codePC opcode extractinst316 extendedopcode extractinstlOlO routine dispatchopcodeextendedopcode goto routine EECS 700 Virtual Machines Spring 2009 15 3 Indirect Threaded Interpretation 4 Dispatch occurs indirectly through a table interpretation routines can be modi ed and relocated independently Advantages binary intermediate code still portable improves ef ciency over basic interpretation Disadvantages code replication increases interpreter size EECS 700 Virtual Machines Spring 2009 16 3 Indirect Threaded Interpretation 5 interpreter routines Interpreter routines source code source code quotdataquot accesses I I I I I I l I Decodedispatch Threaded EECS 700 Virtual Machines Spring 2009 17 E Predecoding Parse each instruction into a predefined structure to facilitate interpretation separate opcode operands etc reduces shifts masks signi cantly more useful for CICS ISAs changes to input binary damages portability lwz add stw r1 8r2 r3 r3r1 r3 0r4 O7 1 O8 O8 3 03 37 3 00 EECS 700 Virtual Machines Spring 2009 load word and zero add store word 18 3 Predecoding 2 struct instruction unsigned long op unsigned char dest srcl src2 code CODESIZE Load Word and Zero RT codeTPCdest RA codeTPCsrcl displacement codeTPCsrc2 if RA source 0 else source regsRA address source displacement regsRT dataaddressltlt 32 gtgt 32 SPC SPC 4 TPC TPC 1 If halt interrupt goto exit opcode codeTPCop routine dispatchopcode goto routine EECS 700 Virtual Machines Spring 2009 3 Direct Threaded Interpretation Allow even higher ef ciency by removing the memory access to the centralized table requires predecoding dependent on locations of interpreter routines loses portability 0010 48d0 1 2 08 0010 4800 3 1 03 0010 4910 3 4 00 load word and zero add store word EECS 700 Virtual Machines Spring 2009 20 3 Direct Threaded Interpretation 2 Predecode the source binary into an intermediate structure Replace the opcode in the intermediate form with the address of the interpreter routine Remove the memory lookup of the dispatch table Limits portability since exact locations of the interpreter routines are needed EECS 700 Virtual Machines Spring 2009 21 5 Direct Threaded Interpretation 3 Load Word and Zero RT codeTPCdest RA codeTPCsrcl displacement codeTPCsrc2 if RA source 0 else source regsRA address source displacement regsRT dataaddressltlt 32 gtgt 32 SPC SPC 4 TPC TPC 1 If halt interrupt goto exit routine codeTPCop goto routine EECS 700 Virtual Machines Spring 2009 22 3 Direct Threaded Interpretation 4 intermediate code Interpreter routines source code A I l quot N t I EECS 700 Virtual Machines Spring 2009 23 5 Interpreter Control Flow Decode for CISC ISA Individual routines for each instruction EECS 700 Virtual Machines Spring 2009 24 g Interpreter Control Flow 2 0 For CISC ISAS multiple byte opcode make common cases Hemmer fast Y EECS 700 Virtual Machines Spring 2009 25 5 Translate source binary program to target binary before execution Binary Translation is the logical conclusion of predecoding get rid of parsing and jumps altogether allows optimizations on the native code achieves higher performance than interpretation needs mapping of source state onto the host state state mapping EECS 700 Virtual Machines Spring 2009 26 a9 Binary Translation 2 x86 S ource Binarv addl edx4eax movl 4eaxedx add eax4 Translate to PowerPC Target r1 points to x86 register context block r2 points to x86 memory image r3 contains x86 ISAPC value EECS 700 Virtual Machines Spring 2009 27 J lwz addi lwzx lwz add stw addi lwz addi lwz sth addi lwz addi stw addi Binary Translation 3 r40rl r5r44 r5r2r5 r4l2rl r5r4r5 r5l2rl r3r33 r40rl r5r44 r4l2rl r4r2r5 r3r33 r40rl r4r44 r40rl r3r33 load eax from register block add 4 to eax load operand from memory load edx from register block perform add put result into edx update PC 3 bytes load eax from register block add 4 to eax load edx from register block store edx value into memory update PC 3 bytes load eax from register block add immediate place result back into eax update PC 3 bytes EECS 700 Virtual Machines Spring 2009 Binary Translation 4 binary translated target code source code EECS 700 Virtual Machines Spring 2009 29 5 State Mapping Maintaining the state of the source machine on the host target machme state includes source registers and memory contents source registers can be held in host registers or in host memory reduces loadsstores signi cantly easier if target registers gt source registers EECS 700 Virtual Machines Spring 2009 30 target ISA R1 3 Register Mapping source ISA Map source registers to target reg1sters Sw ig39ster spill registers if needed r N R2 Source Memory Image if target registers lt source registers program counter 1 R3 map on perblock basis stackpoimer R2 Reduces load store 91 R5 signi cantly regz Rs improves performance reg n RN4 EECS 700 Virtual Machines Spring 2009 31 5 Register Mapping 2 rl points to x86 register context block r2 points to x86 memory image r3 contains x86 ISAPC value r4 holds x86 register eax r7 holds x86 register edx etc addi rl6r44 add 4 to eax lwzx rl7r2rl6 load operand from memory add r7rl7r7 perform add of edx addi rl6r44 add 4 to eax sth r7r2rl6 store edx value into memory addi r4r44 increment eax addi r3r39 update PC 9 bytes EECS 700 Virtual Machines Spring 2009 32 gs Predecoding Vs Binary Translation Requirement of interpretation routines during predecoding After binary translation code can be directly executed EECS 700 Virtual Machines Spring 2009 33 3 Code Discovery Problem May be dif cult to statically translate or predecode the entire source program Consider X86 code mov ch0 8b b5 00 00 03 08 8b bd 00 00 03 00 movl esi 0x08030000ebp 31 c0 EECS 700 Virtual Machines Spring 2009 34 9 Code Discovery Problem 2 Contributors to code discovery problem variablelength CISC instructions indirect jumps data interspersed With code padding instructions to align branch targets source ISA instructions data in instruction stream pad forinstruction jump indirect to alignment EECS 700 Virtual Machines Spring 2009 35 5 Code Location Problem Mapping of the source program counter to the destination PC for indirect jumps indirect jump addresses in the translated code still refer to source addresses for indirect jumps x86 source code movl eax 4esp load jump address from memory jmp eax jump indirect through eax PowerPC target code addi r16rll4 compute x86 address lwzx r4r2rl6 get x86 jump address from x86 memory image mtctr r4 move to count register bctr jump indirect through ctr EECS 700 Virtual Machines Spring 2009 36 5 Simpli ed Solutions Fixedwidth RISC ISA are always aligned on xed boundaries Use special instruction sets Java no jumpsbranches to arbitrary locations no data or pads mixed with instructions all code can then be discovered Use incrernental dynamic translation EECS 700 Virtual Machines Spring 2009 37 5 Incremental Code Translation First interpret perform code discovery as a byproduct Translate code incrementally as it is discovered place translated code in code cache use lookup table to save source to target PC mappings Emulation process execute translated block lookup next source PC in lookup table if translated jump to target PC else interpret and translate EECS 700 Virtual Machines Spring 2009 38 5 Incremental Code Translation 2 ECS 700 Virtual Machines Spring 2009 a5 Unit of translation during dynamic translation Dynamic Basic Block Leaders identify starts of static basic blocks rst program instruction instruction following a branch or jump target of a branch or jump Runtime control ow identify dynamic blocks instruction following a taken branch or jump at runtimc EECS 700 Virtual Machines Spring 2009 40 5 Dynamic Basic Block 2 Static Basic Blocks store store store block 1 loop skip loop skip EECS 700 Virtual Machines Spring 2009 Dynamic Basic Blocks add load store load add store brcond skip store store store brcond loop block 1 41 E Even after all blocks are translated control ows between translated blocks and emulation manager EM connects the translated blocks during execution Optimizations can reduce the overhead of going through the EM between every pair of translation blocks Flow of Control EECS 700 Virtual Machines Spring 2009 42 s9 Flow of Control 2 mmnzi nm mm Era J ii m LR EECS 700 Virtual Machines Spring 2009 43 3 Update SPC as part of translated code place SPC in stub General approach translator returns to EM Via branchand link BL SPC placed in stub immediately after BL EM uses link register to nd SPC and hash to next target code block EECS 700 Virtual Machines Spring 2009 Tracking the Source PC Code Block Branch and Linkto EM Next Source PC Code Block 44 5 Emulation Manager Flowchart Lookup 4P SPC gt TPC in Map Table No HIt In Table Use SP0 to Read Insts from Source Memory Image Branch to TPC and Interpret Translate Execute Translated and Place into BIOCK Tranlsation Memory l Write new SPC gt TPC mapping into Table Get SP0 for next Block EECS 700 Virtual Machines Spring 2009 5 Translation Chaining Translation blocks are linked into chains If the successor block has not yet being translated code is inserted to jump to the EM later after jumping to the EM if the EM nds that the successor block has being translated then the jump is modi ed to instead point directly to the successor EECS 700 Virtual Machines Spring 2009 46 5 Translation Chaining 2 Without Chatning With Chatning m g i iamw m MESEI39IMQ d3 MESEIHLM k EECS 700 Virtual Machines Spring 2009 47 5 Translation Chaining 3 Creating a link get next SPC 1 Lookup Successor P edeceSsor JumpTPC l SUccessOr EECS 700 Virtual Machines Spring 2009 48 5 Translation Chaining 4 9ACO 9AE4 9008 l PowerPC Translation lwz r160r4 oad value from memory add r7r7r16 accumulate sum stw r70r5 store to memory addic r5r51 decrement loop count set cr0 beq cr0pc12 branch if loop exit bl F000 branch amp link to EM 4FDC save source PC in link register b 9c08 branch along chain 51C8 save source PC in link register stw r70r6 store last value of edx xor r7r7r7 clear edx bl F000 branch amp link to EM 6200 save source PC in link register EECS 700 Virtual Machines Spring 2009 49 5 Software Indirect Jump Prediction For blocks ending with an indirect jump chaining cannot be used as destination can change SPC TPC map table lookup is expensive indirect jump locations seldom change use pro ling to nd the common jump addresses inline frequently used SPC addresses most frequent SPC destination addresses given rst If Rx addrl goto targetl Else if Rx addr2 goto target2 Else if Rx addr3 goto target3 Else hashlookupRX do it the slow way EECS 700 Virtual Machines Spring 2009 50 3 Dynamic Translation Issues Tracking the source PC SPC used by the emulation manager and interpreter Handle selfmodifying code programs modifying perform stores code at runtime Handle selfreferencing code programs perform loads from the source code Provide precise traps provide precise source state at traps and exceptions EECS 700 Virtual Machines Spring 2009 51 5 Same ISA Emulation Same source and target ISAs Applications simulation OS call emulation program shepherding performance optimization EECS 700 Virtual Machines Spring 2009 52 5 Instruction Set Issues Register architectures register mappings reservation of special registers Condition codes lazy evaluation as needed Data formats and arithmetic oating point decimal lVllVlX Address resolution byte VS word addressing Data Alignment natural VS arbitrary Byte order biglittle endian EECS 700 Virtual Machines Spring 2009 53 5 GPRs of the target ISA are used for holding source ISA GPR holding source ISA specialpurpose registers Register Architectures point to register context block and memory image holding intermediate emulator values Issues target ISA registers lt source ISA registers prioritizing the use of target ISA registers EECS 700 Virtual Machines Spring 2009 54 5 Condition Codes Condition codes are not used uniformly IA32 ISA sets CC implicitly SPARC and PowerPC set CC explicitly MIPS ISA does not use CC Neither ISA uses CC nothing to do Source ISA does not use CC target ISA does easy additional ins to generate CC values EECS 700 Virtual Machines Spring 2009 55 5 Condition Codes cont Source ISA has explicit CC target ISA no CC trivial emulation of CC required Source ISA has implicit CC target ISA no CC very dif cult and time consuming to emulate CC emulation may be more expensive than instruction emulation EECS 700 Virtual Machines Spring 2009 56 at Lazy evaluation Condition Codes cont CC are seldom used only generate CC if required store the operands and the operation that set each condition code Optimizations can also be performed to analyze code to detect cases Where CC generated will never be used EECS 700 Virtual Machines Spring 2009 57 5 Lazy Condition Code Evaluation add ecxebx jmp labell labell jz target R4 H eax PPC to R5 H ebx x86 register R6 H ecx mappings R24 H scratch register used by emulation code R25 H condition code operand l registers R26 H condition code operand 2 used for R27 H condition code operation lazy condition emulation code R28 H jump table base address EECS 700 Virtual Machines Spring 2009 58 liazy Condition Code Evaluation 2 mr r25r6 save operands mr r26r5 and opcode for li r27 add lazy condition code emulation add r6r6r5 translation of add b labell labell bl genZF branch and link genZF code beq crOtarget branch on condition flag genZF add r29r28r27 add opcode to jump table base mtctr r29 copy to counter register bctr branch via jump table add add r24r25r26 perform PowerPC add set crO blr return EECS 700 Virtual Machines Spring 2009 59 5 Data Formats and Arithmetic Maintain compatibility of data transformations Data formats are arithmetic operations are standardized two s complement representation IEEE oating point standard basic logicalarithmetic operations are mostly present Exceptions IA32 FP uses 80bit intermediate results PowerPC and HP PA have multiplyandadd FMAC which has a higher precision on Intermediate values integer diVide vs using FP diVide to approximate ISAs may have different immediate lengths EECS 700 Virtual Machines Spring 2009 60 5 Memory Address Resolution ISAs can access data items of different sizes load stores of bytes halfwords full words as opposes to only bytes and words Emulating a less powerful ISA no issue 0 Emulating a more powerful ISA loads load entire word mask unneeded bits stores load entire word insert data store word EECS 700 Virtual Machines Spring 2009 61 at Aligned memory access word accesses performed with two low order bits 00 halfword access must have lowest bit 0 etc Memory Data Alignment Target ISA does not allow unaligned access break up all accesses into byte accesses ISAs provide supplementary instructions to simplify unaligned accesses unaligned access traps and then can be handled EECS 700 Virtual Machines Spring 2009 62 Byte Order Ordering of bytes Within a word may differ little endian and big endian Target code must perform byte ordering Guest data image is generally maintained in the same byte order as assumed by the source ISA Emulation software modifies addresses when bytes Within words are addressed can be very inefficient Some target ISAs may support both byte orders eg MIPS IA64 EECS 700 Virtual Machines Spring 2009 63
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'