New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Louisa O'Kon I


Louisa O'Kon I
OK State
GPA 3.58


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course


This 0 page Class Notes was uploaded by Louisa O'Kon I on Sunday November 1, 2015. The Class Notes belongs to ECEN 5253 at Oklahoma State University taught by Staff in Fall. Since its upload, it has received 6 views. For similar materials see /class/232914/ecen-5253-oklahoma-state-university in ELECTRICAL AND COMPUTER ENGINEERING at Oklahoma State University.

Similar to ECEN 5253 at OK State





Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/01/15
I ECEN 5253 Digital Computer Design I Virtual Memory All of the addresses generated by the processor must be found in physical main memory at the bottom of the cache memory hierarchy Main memory size on current systems is usu ally limited about 1 GB currently by the expense of enlarging the memory Two prob lems with a nite amount of main memory can be handled with a virtual memory system design Programs can be bigger than the available main memory no matter how big memory is someone will want more Without a virtual memory programs must be divided into pieces called segments that fit into physical memory The segments are designed by the programmer to be independent so that they do not have to be in mem ory at the same time The programmer has to write code to bring in the segments from disk into main memory overlaying the segments as needed to fit into the available memory space N Memory must be shared between different processes more than one process must be in memory at the same time for context switching to be ef cient The operating system assigns di erent parts of memory to each process The program code for each process must be relocatable to allow the operating system to put the process code anywhere at any time To solve these problems in a manner transparent to users the address generated by the processor the logical address or virtual address is translated by the virtual memory into a physical address that can be anywhere in main memory fig 719 p 512 When there is not enough main memory disk storage is used to store main memory contents until needed In this way each process can generate addresses for a virtual machine consist ing of the entire address space With the following correspondence in terminology virtual memory acts very much like cache virtual memory miss page fault address fault block page segment another level in the cache hierarchy with disk as the next lowest level memory see fig 722 p 518 lt physical memory length gt segment 1 unused segment 2 segment 3 i K external fragmentation page 393 m 1 g 39 K internal fragmentation Virtual Memory December 8 2005 page 1 of 11 I ECEN 5253 Digital Computer Design I Information is transferred between disk and main memory either in variable sized seg ments or fixed size pages With segments external fragmentation limits memory usage efficiency With pages internal fragmentation is a problem Program Segments It has become common practice to divide program code into seg ments by function These segments are designed with memory protection in mind rather then overlaying Four common types of segments are code fetch an instruction absolute data accessed with absolute addressing mode stack data accessed with a stack base pointer register bP N data data accessed any other way The point ofthese segments is to restrict the ways di erent segments can be used It is taken to be an unintended error for example to write data into the program segment or fetch instructions from one of the data segments Many systems support the user defining as many segments as heshe likes and also the user is allowed to control the access of each segment by setting protection bits for each segment Segmented Virtual Memory Typical hardware support required for virtual memory with variable length segments is shown below segment registers code segment absolute segment stack segment data segment readwrite sserpprz tenurA comparator physical address protection address fault fault Intel processors allow a much larger number of segments than the 4 shown above The segment portion of the virtual address selects a segment descriptor from an area of mem ory called the segment descriptor table Segment descriptors take the place of the segment Virtual Memory December 8 2005 page 2 of 11 I ECEN 5253 Digital Computer Design I registers Intel processors also have an elaborate protection scheme that allows user pro cesses to call the operating system without using interrupts This has advantages for pipe lined processors Memory Paging The dif culty of dealing with variable sized segments has lead most hardware manufacturers to divide memory into xed size pages usually several KB in length Hardware support for a simple paged virtual memory is in g 721 p 517 Note that no adder is required since pages always begin at an address that ends in all 0 s No comparator is needed since all pages are the same length The total amount of memory taken up by the page tables must be kept to a reasonable size The memory address consisting of nu bits is divided into two elds nv bits in the virtual page number and no bits in the offset where nv no na 4 nv gtlt no gt virtual page offset lt n r a quota quotV quot0 number of words 1n vrrtual memory 2 2 X 2 number of pages in virtual memory 2 size ofpage in memory 2na Since there must be one page table entry for each page in the memory the page table itself looks like the following where np is the total number of bits in the page table entry PTE 2 V physical page protection quotp Virtual Memory December 8 2005 page 3 of 11 I ECEN 5253 Digital Computer Design I For large memories the page tables can get quite large For example if the page size is 8K bytes DEC Alpha AXP 21064 then no must be 13 8K 213 Ifthe address is 43 bits then nv must be 30 43 l3 and there will be 230 1G PTE s The size ofthe PTE depends on the maximum number of physical pages implemented in memory Around 8 bytes is usually suf cient for the physical memory address and the protection bits This gives page table size no of pages x PTE size 1G x 8 bytes 8 GB Thus just the page table would be about as large or larger than a typical physical memory Multi level Paging Multilevel paging can reduce the total size of the page table Con sider the following three level paging scheme with the same numbers as the previous example The original logical page number is divided into three elds each with its own page tables The page table register points to the level 1 page table The level 1 and 2 page table entries contain the physical page number of the next level page table The level 3 page table contains the physical page number of the desired memory location The lowest three bits of the PTE address are zero since each PTE is 23 8 bits in length lt 43 lt 10 N10 gtlt10 gtltl3 gt logical address level 1 level2 level3 offset page table register level 1 Page Table level 2 Page Table physical 30 address level 3 Page Table 4 30 Virtual Memory December 8 2005 page 4 of 11 I ECEN 5253 Digital Computer Design I The logical page address has been divided into three 10bit elds so that the page tables for each of the three levels will be the same size levell PTE s level2 PTE s level3 PTE s 210 1K PTE size 8 bytes therefore page table size 1K X 8 bytes 8KB A page table ts exactly into one page This simpli es the task of managing the page tables A single page table is much smaller than the 8GB page table with single level pag ing Unfortunately there is now more than one page table There is one page table at the rst level but each levell PTE points to a different level2 page table There may be as many as 210 level2 page tables For each level2 PTE there can be as many as 210 level 3 page tables The total memory taken up by page tables can be as much as mem for page tables tables X table size 1 210 210 x 210 x 8KB 1M X 8KB 8GB which does not save space However it is not necessary for all of the level2 and level3 page tables to be present if the program segments do not use all of the address space For eXample a program can fit into a single level3 page table if it is shorter than 210 X 213 8M bytes Having a single levell page table a single level2 page table and a single level3 page table would require 24KB which is acceptable Although the maXimum amount of memory is not reduced clearly the average amount of memory used by page tables is reduced by using multilevel paging This design has several advantages 1 Page tables t naturally into the memory page 2 The physical address is generated by concatenation not addition which is much faster LA The physical address in the PTE is concatenated not added as with segments with a eld in the virtual address to produce an address for the neXt level page table which allows the page tables to be anywhere in memory 4 The disk address can be stored in the PTE of a page not in memory invalid page V39 There are protection and ag bits available for each page including the page tables themselves In addition to read write and eXecute protection additional ag bits usu ally include a valid page bit which indicates whether the page is in memory vl l or on disk vl 0 Other ag bits might indicate whether the page is uncacheable blocks from this page always generate a cache miss or unpageable cannot be swapped back out to disk while the process is running Virtual Memory December 8 2005 page 5 of 11 I ECEN 5253 Digital Computer Design I Translation Lookaside Buffer A paged main memory requires the use of one or more page tables for each memory access In our previous example three page tables are used for each main memory access This seems to mean that a total of four main memory oper ations three page table reads and then the data transfer are needed each time there is a cache miss Note that these extra memory operations to read the page tables are done by the Memory Management Unit MMU the CPU just generates the viItual address Also these extra main memory cycles only effect the cache flll time T ll and it is still possible to have a short average memory access time Tmem if the cache hit probability is high The time for cache flll T ll can be considerably reduced if the page tables themselves can be in cache The page tables might not be in the same cache as the instruction or data cache since the they are designed to respond to addresses generated by the CPU not the MMU Furthermore it is not necessary to store all of the page tables in the cache All that is needed is a translation between the main memory page number in the viItual address called the vi1tual page number or VPN and the physical page number PPN This cache of translations between the VPN and PPN is called a translation lookaside bulTer TLB See f1g 723 p 522 The TLB is like single level page table permanently in cache but there is not enough room in the TLB to store all of the single level page table entries for all of main memory On a cache miss if one gets a TLB hit then it is not necessary for the MMU to read the page tables in main memory to nd the correct memory location for the CPU A TLB miss is the only time that the cache flll time would be increased by reading the page tables A high TLB hit probability would almost eliminate the added delays of reading the page tables A very high TLB hit rate is achieved with fairly small TLB s 1K entries Con ict misses in the TLB are minimal if the VPN s generated by the processor are ran dom Unfortunately the low bits of the VPN should be random but are not because of the way addresses are assigned various segments of the program Lower miss rates are achieved if the bits in the VPN are hashed mixed up to generate a pseudorandom cache address to store the PPN entry The hashing function randomizes TLB addresses to minimize contention for the same locations in the TLB The hashing function must be implemented in hardware to be very fast When a context switch occurs all of the TLB entries for the current process must be ushed just as the cache must be ushed However it is not necessary to ush those page translations that correspond to the operating system addresses It is common to use two di erent TLB s one for the current process and one for the operating system Then only the TLB for the process needs to be ushed when doing a context switch Cache with a TLB The question we have avoided up to now is what address is used for the cache the virtual address VA or physical address PA Virtual Memory December 8 2005 page 6 of 11 I ECEN 5253 Digital Computer Design I 1 Put the TLB in series with the data cache and store PA s as the data cache tag as in g 724 p 525 Data Cache PA tag advantages there is a one to one correspondence between cache and main memory a block in main memory is represented by only one block in cache If data in cache is changed CPU write it is not necessary to worry about updating other blocks in cache that correspond to the same main memory location PA tag disadvantages The cache must wait for the TLB to provide the physical address This puts the TLB on the critical timing path that determines the processor clock period This disadvantage can be overcome by putting the TLB in a separate pipeline stage of a superpipelined processor Virtual Memory December 8 2005 page 7 of 11 I ECEN 5253 Digital Computer Design I 2 Put the TLB and data cache in parallel store VA s in the data cache tag and use an RTB to remove alias VA s from the data cache CPU VA i i VPN offset tag blocklword ill Data Cache replace stored VA VA VA tag advantages If the VA is used to address the cache the cache does not wait for the TLB The Reverse Translation BulTer RTB does not have a signi cant impact on mem ory delay since the RTB is used only on cache miss VA tag disadvantages Unfortunately there is not necessarily a one to one correspondence between VA s and PA s For example the operating system and the current process may use dilTerent VA s to refer to the same physical address space The duplicate VA s are called aliases When a CPU write modi es data in the cache data consistency requires that cache locations corresponding to all aliases must also be updated One way to do this is to allow only one valid alias in cache at a time and thus avoid having to update aliases A reverse translation buffer RTB is used to stores the VA corresponding to each PA When a PA is generated by the TLB on cache miss the RTB is checked to see if another VA already corresponds to this PA If so then the old VA entry is invalidated before the new one is read into data cache Unfortunately the RTB would need an entry for each main memory location each PA which would be too large VA cache is not commonly used because of the dif culty of removing aliases Virtual Memory December 8 2005 page 8 of 11 I ECEN 5253 Digital Computer Design I Memory Design Example The memory hierarchy for the DEC Alpha AXP 21064 is shown below Note the following features of the design I The CPU generates a 43 bit viItual address and uses 64 bit instructiondata words I Memory page size is 8KB which means that least signi cant 13 bits of address is the page offset and the most signi cant 30 bits is the viItual page number I There is a separate onchip TLB for instructions ITLB and data DTLB The ITLB 12 entries can be smaller than the DTLB 32 entries because of the greater sequenti ality of instructions The ITLB and DTLB are small enough to be implemented as fully associative caches to eliminate con ict misses without the use of a hashing function for the TLB address The TLB s translate the 30 bit viItual page number to a 21 bit physical page number With 13 bits for the page offset this gives a 34 bit physical address sulTicient for a 16GB byte addressable physical memory There is a separate onchip direct mapped instruction cache ICACHE and data cache DCACHE that use physical addresses provided by the ITLB and DTLB respectively The caches are identical in size with 32 byte blocks and 256 lines for a total capacity of 8KB each The onchip caches are small enough that the 8 bit cache index can be determined totally from the high order 8 bits of the 13 bit page offset the lower 5 bits are the block offset This means that the caches do NOT wait for the TLB to provide the physical page number since the page offset is the same in the physical and virtual address The cache provides read data at the same time that the TLB provides the physical page num ber used to compare with the 21 bit tag in the cache I An onchip prefetch bulTer is provided for the ICACHE consisting of a single block A 256 bit bus connects the prefetch bulTer to the ICACHE so that the prefetched block can be loaded in a single clock cycle The prefetch buffer tag size is 29 bits 34 bits of physical address less the 5 bits for the block offset The DCACHE uses a writethrough strategy with a multiblock write buffer On write hit data is written to the write buffer and a delayed write bulTer On write miss data is written to the write bulTer only no write allocate Since blocks are stored as one entry in the write bulTer the write bulTer tag size is also 29 bits Although any offchip level 2 cache could be used the Alpha is designed to work with direct mapped cache with 29 bits of address and 256 bits 32 bytes or 1 block of data or instructions at each location Although the block size is 32 bytes 256 bits the external memory bus is only 128 bits 16 bytes so that it takes two bus cycles to send the complete block Virtual Memory December 8 2005 page 9 of 11 I ECEN 5253 DigitalComputer Design I CPU virtual instr address 420 instruc data out virtual data address 420 data in virtual page no offset tion bus bus virtual page no offset bus 4213 120 6310 6310 4213 120 630 J ITLB Fully associative cache 12 locations phys page no 3313 virtual tag 42 13 byte offset 20 word offset 43 index 125 4 ICACHE Direct mapped cache 256 locations phys tag data 3313 2550 Prefetch Buffer 1 location Fully associative cache data 2550 My tag 335 Sti address 335 2550 off chip interface J DTLB Fully associative cache 32 locations virtual tag 42 13 phys page no 3313 word offset 43 byte offset 20 index 125 4 DCACHE Direct mapped cache 256 locations data 2550 phys tag 3313 data 630 1 location tag Fully associative cache 333 Delaved Write Buffer tag t 335 data Write Buffer 4 locations Fully associative cache data 2550 Virtual Memory December 8 2005 page 10 of 11 I ECEN 5253 Digital Computer Design I Cache Controller In the event of a cache miss or TLB miss the processor must be stalled while the cache ll sequence takes place or the page tables are read Even under normal conditions TLB hit and cache hit extra hardware must check for access privileges and that there is a valid page in memory it might have been swapped out to disk This work is done by the cache controller hardware The cache controller along with the onchip cache is often called the Memory Manage ment Unit MMU The MMU includes all of the hardware necessary to fool the proces sor into thinking that it has a large very fast memory The MMU does this by stalling the processor when necessary while the cache controller in the MMU loads the TLB and caches The cache controller is usually implemented as a small state machine Moore machine A typical ow chart for design ofthe controller state machine is shown in fig 725 p 526 Part of the work can be done by the processor executing protected code in a special read only memory ROM To do this requires interrupting the processor as discussed in a pre vious section Memory System Examples A photograph of the AMD Opteron same instruction set as Intel Pentium IV is shown in fig 733 p 546 Note the huge L2 on chip cache smaller Ll instruction and data caches and the relatively small area for the Execution Unit Both of these processors are superscalar processors which will be discussed next semester The on chip TLB organization of the Intel Pentium IV and the AMD Opteron is compared in fig 734 p 547 and the on chip cache organization is compared in fig 735 p 548 Finally the on chip memory systems of several processors are compared in fig 736 p 553 Virtual Memory December 8 2005 page 11 of 11 I ECEN 5253 Digital Computer Design I Floating Point Processing Unit The oating point unit contains hardware for arithmetic operations on numbers in oating point format instead of integer format The increased transistor count in modern inte grated circuits makes it possible to include the oating point unit as part of the processor Representation of Floating Point Numbers Floating point numbers are the binary analogues to scienti c notation with decimal num bers For example 512 X 10397 uses a signi cand of 512 and an exponent of 7 to repre sent the number 0000000512 One important property of scienti c notation and oating point numbers is that the repre sentation is not unique 0512 X 108 512 X 107 512 X 10396 all represent the same num ber To make scienti c notation unique the number must be normalized by putting the decimal point to the right of the rst nonzero digit in the signi cand Thus 512 X 10397 is the only properly normalized representation A normalized binary oating point number will always have a 1 as the rst digit of the signi cand since there is only one nonzero digit in binary For eXample 0001012 is rep 711 resented as 1012 X 2 Z in normalized binary oating point The IEEE standard 7541985 has become the industry standard for oating point numbers It de nes a 32bit format for single precision representation and a 64bit format for double precision representation 31 3023 220 7127 smgle a value folfxzeb 63 6252 510 double Isl eb l f I value71sxlfx 2e171023 The signi cand is represented in sign magnitude form with the rst bit not represented eXplicitly since it is always 1 Only the binary fraction part f of the signi cand is stored eXplicitly in the number format Unfortunately the choice of sign magnitude for the sig ni cand means that the representation of zero is redundant 0 and 0 The eXponent is represented in biased form To get the correct value of the eXponent 6 take the unsigned biased eXponent in 6 and subtract the bias either 127 or 1023 Bias ing is used so that the smallest eXponents most negative will have 6 00 which makes Floating Point Processing Unit August 27 2008 page 1 of6 I ECEN 5253 Digital Computer Design I it more natural to represent 00 as a word of all 0 s Here are several other special repre sentations de ned by the IEEE oating point format s eb f 39 0 01 00 00 zero 45 X 0fgtlt 26KB denormalized small number 0 1 00 0 I 00 in nity 01I 11 00 NaN 01I 11 I 20 not a number Note that Emir 126 for single precision and Emir 1022 for double precision Floating Point Multiplication Multiplication in oating point is straightforward multiply the signi cands and add the exponents 91 92 91 slx2 32gtlt2 SI32gtlt2 However the implementation details are somewhat more complicated as shown 62 lsll ebll f1 I lszi eb2 f2 normalize I control Floating Point Processing Unit August 27 2008 page 2 of6


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Janice Dongeun University of Washington

"I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.