CS439 Midterm 2
CS439 Midterm 2 C S 439
Popular in PRINCIPLES OF COMPUTER SYS-C S
Political Science 2311
verified elite notetaker
Popular in ComputerScienence
This 23 page Study Guide was uploaded by Abelardo Notetaker on Sunday February 21, 2016. The Study Guide belongs to C S 439 at University of Texas at Austin taught by Norman, A in Spring 2016. Since its upload, it has received 106 views. For similar materials see PRINCIPLES OF COMPUTER SYS-C S in ComputerScienence at University of Texas at Austin.
Reviews for CS439 Midterm 2
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/21/16
Memory Overview Virtual address space – Collection of addresses that the process can access Multiple Programs Share Memory: Requirements 1. Transparency: ● For multiple processes to coexist in memory ● No process is aware that memory is shared ● Process don’t care which segment of memory they get 2. Safety: ● Processes must not corrupt each other ● Process must not corrupt the OS 3. Efficiency: ● Performance is not degraded badly due to sharing for the CPU and memory Relocation Base address: First/Smallest physical address of the process [aka relocation address] Limit address: Largest physical address the process can access e.g Address space from 0 to 2400 The OS loaded into memory from address 2000 to 2400 Maximum address = Memory size – OS Size = 2400 – 400 = 2000 Process A loaded into memory from address 0[base address] to 400[limit address] The OS assigns process C physical memory Relocation: Two types Static: OS assigns the addresses in a processor to reflect its location in memory. Once assigned a place in memory, and starts executing, OS cannot move it Dynamic: In parallel Hardware adds relocation register to virtual address to get physical address. Then hardware compares address with limit register to check if it is within bound, if test fails then an exception is raised Static relocation: 4 stages in compilation pipeline: Compilation(Turn instruction to assembly and method headers to labels), Assembly(Compile into assembly language, pointer math is done here), Linking(Add all #include libraries), and Loading(assign and copy into physical memory) Dynamic relocation: 1. Program gives a logical address to the CPU. 2. CPU generated virtual/logical address 3. MMU does 2 things at once: ● Check virtual address or instruction from CPU to the limit register. If address is higher than bound register, it raises memory exception (hardware interrupt) that is handled by the OS. ● Adds the base register to the virtual address to get the physical address NOTE: Dynamic relocation adding the base register and checking against the limit register is done in parallel Note: Even when the address is successfully translated, you are not protected from memory errors: you could still accidentally try to access memory in your address space that hasn’t been initialized yet Advantages: ● OS can easily move a process during execution ● OS allows processes to grow over time ● Protection is easy, due to the limit/bound register, so other processes’ pages cannot be corrupted ● Simple, fast hardware: Two register(base and limit) Disadvantages: ● Requires continuous allocation ● Sharing is hard ● Degree of multiprogramming is limited o All active processes must fit in memory o Each process is limited to physical memory size ● Slows down hardware due to the add on every memory reference ● Complicates memory management Memory allocation policies: Evaluation Minimize wasted space Two types of wasted space: ● External Fragmentation: Unused memory between units of allocation ● Internal Fragmentation: Unused memory within a unit of allocation Types of memory allocation policies: FirstFit:To allocate n bytes, use the first available free block of sizn bytes or larger than. Requirements: Free block list sorted by address; Allocation requires a search for suitable position; Deallocation required check for merge with adjacent free blocks. Advantages: ● Simplest implementation ● Tends to produce larger free blocks towards the end of the address space Disadvantages: ● Slow allocation ● External fragmentation Best Fit: To allocate n bytes, use the smallest available free block of size n bytes or larger than. Goals: Avoid fragmenting big free blocks, to minimize the size of resulting external fragments Requirements: Sort free block list by size; Allocation requires a search for suitable position; Deallocation requires check for merge with adjacent free blocks. Advantages: ● Works well when most allocations are of small size ● Relatively simple Disadvantages: ● External fragmentation ● Slow deallocation (since free block list sorted by size, need to manually search the free block list if it has any neighbor free block, merge them together and put in the right location) Worst Fit: To allocate n bytes, use the largest available free block of size n bytes or larger than. Goals: To avoid having too many tiny fragments Requirements: Free block list sorted by size; Allocation is fast; Deallocation requires a check to see if the free partitions could be merged with any adjacent free partitions Advantages: ● Works best if allocations are of medium size Disadvantages: ● External fragmentation ● Slow deallocation (same reason as best fit) ● Tends to break large free blocks such that large partitions cannot be allocated Eliminating fragmentation Compaction: Relocate programs to coalesce holes ● Removes gaps between programs in memory Swapping: Preempt processes (roll them out to disk) & reclaim their memory ● Allows total amount of memory being used to exceed the total amount of physical memory. ● If a program hasn’t been used in a while it can be roll into memory to allow other programs to be loaded into memory and run. ● Adds extra state to process life cycle. Processes that are swapped out of memory are suspended. Processes can be suspended from the block state or the ready queue. They are not restored until added to the ready queue again. Virtual Memory: Mechanisms Overlays ● Program divided into pieces/overlays ● Overlay manager swaps overlays in and out ● Programmer had to manually divide the program into pieces ● Automated it then by programmer Virtual Memory ● Process’s view of memory ● Can be larger than the size of physical memory ● Only portions of the virtual address space are in physical memory at any one time ● Automatically divides process address space into pages Paging 1. Divides a process’s virtual address space into fixedsize pages 2. Stores a copy of that address space on disk 3. Views the physical memory as a series of equalsized page frames 4. Moves the pages into frames in memory 5. Manages the pages in memory Page Frames Physical memory portioned into equal sized frames They are access using its frame number f and its frame offseto. There are fMAX mes and oMAXytes/frames Address is divided into log2MAXbits for the frame number and log2 MAX bits for the offset number. This number of bits of bits depends on the system Virtual Memory A process’s virtual address space is portioned into equal sized pages | page | = |frame| The system;s page size is equivalent to its frame size They can be access using its page number p and its page offset o. There are pMAX es and oMAXtes/pages Address is divided into log2 MAXbits for the page number and log2 MAXbits for the offset number. Pages map to frames Not all pages mapped at all times Page table: maps virtual pages into physical frames, allows to locate frames when they are not allocated contiguously in memory Step to virtualphysical memory translation 1. Program gives a virtual address to the CPU to translate 2. The MMU splits the virtual address into its page and offset numbers 3. Since the offset is the same in the virtual and physical memory it is sent along with not change 4. The page number is translated into a frame number ● The page number is the index of the correct entry in the page table ● If the page exists in memory the frame number is sent along ● If it does not exist in memory but exists on disk, the corresponding page is moved from disk into physical memory and its frame number is recorded 5. Append offset to end of frame number to get the full physical address Advantages: Reduce or eliminate external fragmentation Gives process the ability to grow easily Allow processes to use more memory than that which is physically available Easy to allocate and deallocate memory to/from processes Ability to easily share memory between processes (Memory used by a process no longer needs to be contiguous) Protection (1 page table per process, keeps different processes from corrupting each other’s page table and being able to access each other’s memory) Page Table Entry Content Flags: dirty bit (set whenever the page is referenced by a write operation), resident bit (set if the page has been allocated in memory), clock/reference bit (set whenever the page is referenced for any reason, whether load or store Frame number There is a page table base register (PTBR) that holds the base address of each page table. E.g: Consider a paging system with 16 pages and a page size of 256 bytes. The system has 1024 bytes of physical memory Physical address= log (physical memory size) = log (1024) = 10 bits for 2 2 physical address Page number = log 2 MAXlog 2 6) = 4 bits for page number Offset number = log 2 age size) = log2 56) = 8 bits for offset Total virtual address = number of bits for frame number + number of bits for offset number = 4 + 8 = 12 bits for virtual address Frame size = page size = 256 bytes E E.g 2: System with 16bit address, 32KB of physical memory, 1024 byte pages Physical address = log 2 2*2^10) = 15 bits for physical address Virtual address = 16 bits for virtual address b/c is 16bit address system Offset number = log 2 age size) = log2 024) = 10 bits for offset number Frame number = bits for physical memory – bits for offset number = 15 10 = 5 bits for frame number Page number = bits for virtual memory – bits for offset number = 16 – 10 = 6 bits for page number Performance Issues: Speed Problem: Virtual memory references require two memory references: one access page table entry, one access to get the data. Translation Look aside Buffer (TLB)/Memory Management Unit(MMU) are an intermediary between the System Bus and Processor for Physical Addresses. Note: TLB is a cache for these translations Improving speed: TLB Cache recently accessed pagetoframe translations in a TLB ● For TLB hit, physical page number obtained in 1 cycle ● For TLB miss, translation is updated in TLB ● Has high hit ratio (due to exploit locality within the system) ● System simultaneously sends the page number to both TLB and page table. o If TLB hit, stop search in page table and send the frame table along o If TLB miss, look in page table for frame number, send frame number along and update the TLB Performance Issues: Space Two sources: data structure overhead (the page table), fragmentation (how large should a page be?) Page table can be very large Dealing with large page tables: MultiLevel Paging Add additional levels of indirection to the page table by subdividing page number into k parts. This creates a kdepth “tree” of page tables Each level points to the next level page table, except the last level page table which holds the bit/frame number information that usually associated with a page table entry Multilevel page tables save spaces because you only have to allocate space for the pagetable levels that you need. Note: you will always have only one first level page table Example hw6 question 1. 32bit machine divided in 4 segments: 10bit, 8bit, 6bit, 8bit Uses a 3level page table 1. Page size = 2 offset nu = 2 = 256 bytes 2. Memory size = 256KB ● Pages = memory size / page size = 256K/256 = 1000 pages ● Level 3 has 2 6=64 entries ● Number of pages to fit 1000 in 3 r level = ceiling(1000/26) =16 8 ● Level 2 has 2 =256 entries ● Level 2 just need 1 page ● Level 1 just need 1 page ● Page table size = (number of pages for level 1 * number of entries in level 1 + number of pages for level 2 * number of entries in level 2 + … + number of pages for level n * number of entries in level) * 4 ● Page table size = (1 * 2 10 + 1 * 2 + 16 * 2) * 4 = 9216 bytes Example 2: Two level paging P1 = 3 bits P2 = 4 bits 1. First 3 bits are for the first level. The value of these bits is added to the PTBR 2. Next 4 bits are for the second level page table. From the first level page table entry we get the address of the beginning of the secondlevel page table, then we add the value from the secondlevel bits in the virtual address to this address. This gives us the address of the corresponding page table entry in the secondlevel page table. Now we can access the relevant bits and frame number from the right secondlevel page table entry. 3. The offset number bits are pass along and appended after the frame number bits The TLB is very important now because each lookup from the page table requires 2step lookup instead of just a single page Inverted Page Table ● Each frame is associated with an entry ● As the CPU generates virtual addresses, where is the physical page? Would require searching the entire page table? Could solve with the TLB, but TLB is limited in size. Hashed Inverted Table in Use ● Using hash table: Hash page numbers to find corresponding frame number ● Page frame number is not explicitly stored (1 frame per entry) ● Protection, dirty, used, resident bits also in entry ● The hash function takes in the PID of the current running process and the page number. The answer from this hash function is added to the PTBR to get the address of the right page table entry. ● Once at the right page table entry, the PID and page number saved in the entry are checked against the given PID and page number. If they do not match, then the frame isn’t holding information for the page and it needs to be swapped in. If they match then the right frame has been found and the frame number is sent along Why use hashed/inverted page tables? ● Forwardmapped page tables don’t scale to larger virtual address spaces. The BIG Picture 1. CPU generates a virtual address 2. The virtual address is used to reference a virtually address cache. If there’s a match, we’re finished 3. Else, the address is used to access the TLB, if there is a match, then the resulting physical address is used to access the physically addressed cache 4. Else, the translation must from from the page table, which is located using the PTBR, and then indexed using the page number. The resulting frame number is appended with the offset number and used to access the physically addressed cache. 5. If there is a match in the physically addressed cache, we’re finished. Otherwise, we must access it from main memory. ● A process’s Virtual Address Space(VAS) is its context o Contains its code, data, stack ● Code pages are stored in a user’s file on disk o Some are currently residing in memory; most are not ● Data and stack pages are also stored in a file o Not visible to the user o File only exists while a program is executing ● OS determines which portions of a process’s VAS are mapped in memory at any one time Page Faults ● References to nonmapped pages ● Page fault handling steps: 1. Processor runs the interrupt handler 2. OS blocks the running process 3. OS starts read of the unmapped page 4. OS resumes/initiates some other process 5. Read of page completes 6. OS maps the missing page into memory 7. OS restarts the faulting process Paging Policy ● What is a good page size? ● How many processors are too many? ● When to load a page? o Demand paging: OS loads a page the first time it is referenced o Prepaging: OS guesses in advance which pages the process will need and preloads them Initializing Memory PrePaging 1. Process needing k pages arrives 2. If k page frames are free, the OS allocates all k pages to the free frames. 3. The OS puts the first page in a frame and puts the frame number in the first entry in the page table and so on 4. OS marks all TLB entries as invalid (flushes the TLB) 5. OS starts a process 6. As process executes, OS loads TLB entries as each page is accessed, replacing an existing entry if the TLB is full Demand Paging 1. A process arrives 2. The OS stores the process’s VAS on disk and does not put any of it into memory 3. OS marks all TLB entries as invalid (flushes the TLB) 4. OS starts the process 5. As process executes: a. Pages are faulted in as they are needed b. OS loads the TLB entries as each page is accessed replacing an existing entry if the TLB is pull Performance of Demand Paging It is possible that a possible could access a new page of memory with each instruction. But tend to exhibit locality of reference ● Temporal locality: If a process accesses an item in memory, it will tend to reference the same item again soon ● Spatial locality: If a process accesses an item in memory, it will tend to reference an adjacent item soon Page Fault steps when memory is full: The OS: 1. Selects a page to replace 2. Invalidates the old page in the page table 3. Start loading the new page into memory from disk 4. Context switch to another process while I/O is being done 5. Gets interrupt that the page is done loading into memory 6. Updates the page table entry 7. Continues faulting process Page Replacement Algorithm Goals: Reduce the frequency of page faults FIFO ● Throw out the oldest page ● Implemented with a single pointer Optimal ● Look into the future and throw out the page that will be accessed farthest in the future ● Keep list of time page needed next, highest time stamp gets evicted Least Recently Used ● Throw out the page that has not been used in the longest time ● Keep a list of time page last used, lowest time stamp page gets replacement on page fault Clock: Approximation of the LRU algorithm ● Clock hand points to the oldest page ● Clock hand sweeps over pages looking for one with reference bit that is 0 o Replace pages that haven’t been referenced for one complete revolution of the clock ● Pseudocode for algorithm o func clock_replacement ▪ while(victim page not found) do ● if (reference bit for current page is 0) then o Replace current page ● else o Reset reference bit ● advance clock pointer ▪ End while o end clock_replacement. Second Chance ● Cheaper to replace a page that has not written since it need not be written back to disk ● Check both the reference bit and modify/dirty bitto determine which page to replace o (reference, modify) pairs form classes: ▪ (0, 0): not used or modified, replace ▪ (0, 1): not recently used but modified ▪ (1, 0): recently used but not modified ▪ (1, 1): used recently and modified ● Algorithm ● The OS goes around at most three times searching for the (0, 0) class: 1. If the OS finds (0, 0) it replaces that page 2. If the OS finds (0, 1), clear dirty bit and move on, but remember page is dirty. Write only if evicted 3. For pages with the reference bit set, the reference bit is cleared 4. On second pass (no page (0, 0) found on first), pages that were (0, 1) or (1, 0) may have changed ● During first “sweep” of the hand the used/reference/clock bit is cleared ● During second “sweep” of the hand the dirty/modify bit is cleared Local vs. global page replacement ● Local page replacement only consider the pages owned by the faulting process o Fixed number of pages per process ● Global page replacement considers all the pages How much memory so we allocate to a process? It depends The Principle of Locality o 90% of execution is sequential o Most iterative constructs consist of a relatively small number of instructions o When processing large data structures, the dominant cost is sequential processing on individual structure elements Explicitly Using Locality ● Working set: set of all pages that a process referenced in the last T seconds ● T is called the window size ● Keeps track of whether or not each page is in memory, instead of keeping track of what each frame is holding ● Keeps track of the “age” of each page in memory o When a page becomes T + 1 ticks old it is kicked out of memory, even when no page fault occurs. ● If T is too small? Pages will be tossed out to often ● If T is too large? Thrashing ● Occurs when the memory is overcommited and pages are tossed out while they are still in use ● Many memory references cause pages to be faulted in ● To limit thrashing in multiprogramming we make T a lot bigger than 2 million instructions Load Control ● Refers to the number of processes that can reside in memory at one time ● Working set provides implicit load control by only allowing a process to execute if its working set first in memory ● If total number of pages needed is greater than the number of frames available, then processes are swapped out to disk o Put onto swap (aka the paging disk) o Add another stage to the process life cycle: suspended ▪ Can go from any of the other states to suspended, usually blocked or read ▪ Process can got from suspended to ready Page Size ● Benefit for small pages: more effective memory use, higher degree of multiprogramming possible ● Benefit for large pages: smaller page tables, reduced I/O time, fewer page faults ● Page sizes are growing slowly b/c o Memory is cheap; page tables could get huge with small pages and internal fragmentation is less of a concern o CPU speed is increasing faster than disk speed, so page faults cause a larger slow down Note: An application cannot modify its own translation tables b/c the page table for a process is in kernel address space Summary: All algorithms do poorly on typical processes if processes have insufficient physical memory All algorithms approach optimal as the physical memory allocated to a process approaches virtual memory size Paging The Cost: ● Translating from a Virtual address to a Physical address is time consuming ● Requires hardware support (TLB) to be decently efficient ● Require more complex OS to maintain the page table The expense of memory accesses and the flexibility of paging make paging cost effective Heap Memory Management ● Program/runtime system requests memory from the OS for the heap ● OS gives memory to a process 1 to kpages at time (???) ● Runtime system manages the heap memory ● The memory is not returned to the OS until the program ends What is in the heap? Dynamically allocated program objects and data, allocated using new/malloc. Needed when required memory size is not known until the program runs. Need to allocate and deallocate fast Two categories of heap memory management ● Explicit memory management ○ Program explicitly manages all of the memory (allocate: malloc, dealloc: free) ○ Anything may and may not be a pointer ● Automatic memory management (Garbage collection) ○ The program explicitly allocates memory, but the runtime system manages it ○ Allocation: new ○ Deallocation: none ○ Pointers: Program and runtime system know all pointers ○ Runtime system must discriminate live (in use) objects and garbage Explicit Memory Management User:request number of bytes to allocate and deallocate when needed Runtime system: Receive requests for memory allocation and deallocation, decide where to allocate requests, if allocation doesn’t fit, then requests more memory from the OS Runtime requirement: ● Handle arbitrary request sequences (allocated and freed in any order) ● Make immediate resposes to request (cannot buffer/reorder request) ● Use only the heap (any data structure used is stored in the heap ● Align blocks ● Not modify allocated blocks (only free blocks can be changed) Allocation Techniques 1. Bumppointer ○ Contiguos allocation (for all requested blocks) ○ Pointer begins at start of heap ○ As requested, bytes allocated, and pointer is “bumped” past allocation ○ When memory is deallocated, the gaps stay because bump pointer allocation doesn’t move memory once it has been allocated ○ Once deallocated, pointers need to be null out, since they still point to the heap, they could accidentally be used but we would get a SEGFAULT 2. Free List ○ Divides memory into some size blocks ○ Maintains a free list ■ Must be stored in the heap ■ Uses a special structure for free blocks ○ To allocate memory, find block in the free list ■ If the right size does not exist, carves up a bigger piece ○ To deallocate memory, put memory back on the free list ○ Free block in memory ■ Pointer to the next free block ■ Size of the free block ■ Free space (that can be allocated to user, pointer returned to the user) ○ Circular Linked List (may be ordered by address or by size) ○ Choosing a spot: ■ Perfect fit ● Remove from the list ● Link the previous and next element together ● Return the free space pointer ■ Block is too big ● Divide the free block into two blocks ● Keep first (now smaller) block in the free list ● Allocate the second block to the user ○ Coalesce deallocated memory with neighbour free blocks ○ Performance of bestfit: ■ Slow ■ Trouble: Free chunks are different sizes ■ Solution: Binning ● Divide list by chunk size ○ Binning Strategies ■ Exact Fit: have bin for each chunk size, up to a limit ● Advantages: no search for requests up to the size ● Disadvantages: many bins, each storing a pointer ● EXCEPT FOR: F inal bin for all larger free chunks, useful to allocate large amount of data and splitting to create smaller chunks ■ Range: have bin cover a range of sizes, up to a limit ● Advantages: fewer bins ● Disadvantages: need to search for a big enough chunk ● EXCEPT FOR: F inal bin for all larger free chunks, useful to allocate large amount of data and splitting to create smaller chunks Deallocation with Free ● Given a pointer by the user ● Free function insert block into the list ○ Identify the start of entry ○ Find the location in the free list ○ Add to the list, coalescing entries, if needed ● Coalescing with neighbors ○ Check if contiguous to upper and lower neighbors ● Correctness of free ○ Free an object too soon > core dump ○ Free an object too late > waste space ○ Never free > at best waste, at worst fail Garbage Collection: Automatic Memory Management ● Less user code compared to explicit memory management ● Eliminates sources of errors ● Less user code to get correct ● Protect against common classes of memory errors ○ No free(), thus premature free(), no double free(), or forgetting to free() ● Not perfect, memory can still leak ● Integral to modern OOP languages Safe Pointers ● Used in managed languages that provide garbage collection ● Programs may not access arbitrary addresses in memory ● The compiler can identify and provide to the garbage collector all the pointers, thus. Once garbage, always garbage ● Runtime system can move objects by updating pointers “Unsafe” languages can do nonmoving GC by assuming anything that looks like a pointer is one. Performance efficiency ● Identifying garbage is hard and expensive ○ Time proportional to dead objects or live ● Collecting less frequently reduces total time but increases space requirements and pause time (time when program is pause and garbage is collected) Garbage Collection Fundamentals ● Allocation ○ Free list ○ Bump Allocation ● Identification ○ Tracing Implicit ■ Traces pointers program objects to identify what is reachable in the program ○ Reference Counting explicit ■ Program keeps count of how many objects are using a pointer ● Reclamation: taking back the heap ○ Sweeptofree ○ Compact ○ Evacuate NOTE: Dead objects cannot be identified by the compile or runtime system. Any object the program cannot reach is garbage (aka unreachable objects) Finding reachable object ● Reference counting (implicit definition): Reference count of zero means garbage, does not work for cycle ● Tracing: Trace reachability from program roots (which are registers, stack, static variables), objects not traced are unreachable Tracing ● Marks the objects reachable from the root as reachable and then performs a transitive closure over them ● The global variable, stack variables, and registers all point to space allocated on the heap ● Steps: ○ Mark the objects on the heap that are directly pointed to by our variables, which are the roots of our example. ○ Keep marking any objects that are reachable from those objects in the heap. ○ “Sweep” all dead objects, reclaim unmarked objects and now the heap memory can be recycled. ○ Wait for garbage collection algorithm to be run again. Garbage Collection Algorithms: Evaluate ● Space efficiency ● Efficiency of allocator ○ Time ○ Locality of contemporaneous objects ● Time to collect garbage Reclamation ● Noncopying ○ Freelist allocation and reclamation ○ Example: MarkSweep ● Copying ○ Uses bump pointer allocation ○ En masse reclamation ○ Example: MarkCompact, SemiSpace Garbage Collector ● MarkSweep ○ Freelist + trace +sweeptofree ○ Freelist organized by size (binning) ○ Grabs free space off the free list until no more memory of the right size triggers a collection ○ Objects allocated on the heap ■ More objects allocated on heap and referenced by the first object ○ Mark phase find the reachable objects ○ Sweep phase put unreachable ones on the free list ■ Can be incremental by organizing the heap in blocks and sweeping one block at a time on demand ○ Advantages: ■ Good space efficiency ■ Incremental object reclamation possible ■ Simple, very fast collection ○ Disadvantages: ■ Relatively slow allocation time ■ Poor locality of contemporaneously allocated objects ● Contemporaneous = allocated close to each other in time ● MarkCompact ○ Bump allocation + trace + compact ○ Similar to MarkSweep ■ Extra reclamation step: Copy all objects remaining in the heap to one end of the heap (compact) ■ This steps allows markcompact to use bump pointer ○ Advantages: ■ Good locality ■ Space efficient ■ Allocator is fast ○ Disadvantages: ■ Total performance is worse than that of marksweep ■ Garbage collector has expensive multipass collection ● Slower than marksweep ● SemiSpace ○ Bump allocation + trace + evacuate ○ Fast bump pointer allocation ○ Requires copying collection ○ Must free en masse ○ Reserves half of the heap to copy into, in case all objects are reachable ○ Mark phase ■ Copies object when collector first encounters it ● Object copied into the “to space” in the other half of the heap ■ Installs forwarding pointers ■ Reclaims “from space” en masse ● “To space” will end up with only reachable objects in it ● Then the other half is just wiped clean ■ Start allocating again into “to space” ○ The “to space” becomes the “from space” as reachable objects are copied into the other part of the heap ○ Advantages: ■ Fast allocation ■ Locality of contemporaneously allocated objects ■ Locality of objects connected by pointers ■ Best performance for large heap sizes ○ Disadvantages: ■ Half of the heap is wasted Heap Organization ● Young objects die more quickly than older ● Organize the heap into young and old, collect young objects preferentially ● When young spaces fills up ○ Collect it, copy to the old space ● When old space fills up ○ Collect both space, generalize to m generations Taxonomy of Design Choices ● Incrementality ○ Bounded tracing time ● Composability ○ Creation of hybrid collectors ○ Example: Generational heap w/ semispace ● Concurrency ○ Mutator/tracer and GC operate concurrently ● Parallelism ○ Concurrency among multiple GC threads ● Distribution across multiple machines Architecture of I/O Systems ● Bus for communication between devices and CPU ● Device port consisting of 4 registeds ○ Status: holds the state of the device ○ Control ○ Datain ○ Dataout ● Controller: Receives command from system bus and carries commands into device actions ● The device itself Devices ● Characteristics: ○ Transfer unit: character or block ○ Access method: sequential or random ○ Timing: synchronous or asynchronous ○ Shared or dedicated ○ Speed ○ Operation: Input, output or both ● 3 ways of communication ○ Polling (Check every polling rate if I/O device is idle) ■ Advantages: ● Handles data promptly (require where data is lost if data not transferred data fast enough) ■ Disadvantages: ● Time wasted on CPU side ● Data may be lost if device is faster than polling rate ● Device bound if device is slower than polling rate ○ Interrupts (Command is send, OS does other things in the mean time, when I/O is done, device causes an interrupt to let the OS that it is done, start the next operation for the device ■ Advantages: ● Efficient used of CPU after command was requeste ■ Disadvantage: ● ○ Direct Memory Access (Can write directly to memory, DMA controller operates the bus and interrupts the CPU when the entire transfer is complete, instead of when each word is ready ■ Driver controller components: ● Buffer ● DMA registers: memory address and count ● Disk controller connects to the drive Improving Performance: ● Buffer, minimizes the time a user process is blocked on a write ● Caching, improve disk performance by reducing the number of disk access, remembers recently used disk blocks in main memory after I/O call that brought them into memory completes I/O Services Provided by the OS ● Uniform naming of files and devices ● Access control (Protection) ● Operations appropriate to the devices ● Device allocation and release ● Buffering, caching, and spooling to for efficient communication ● I/O scheduling ● Error handling and failure recovery ● Device drivers Disks (aka Secondary Storage) Two types: ● Magnetic disks ● Flash memory Anatomy of a Magnetic Disk: ● Sector: Smallest contiguous collection of bytes on disk ● Track: a contiguous collection of sectors ● Surface: a concentric collection of tracks, is circular in shape ● Platter: thin disk covered in magnetic material ○ Each platter has two surface: top and bottom ● Cylinder: Set of tracks on different surfaces with same track index ● Head: used to read/write ● Arm: Extends head ● Surfaces are made up of tracks ● Tracks are made up of sectors Disk Operations Seek time: Heads moved to appropriate track Rotation time: Time for sector to appear under head Transfer time: Read/write sector as it spins by ● Transfer rate faster in outer tracks than inner track Disk Head Scheduling ● Goal: The OS maximizes disk I/O throughput by minimizing head movement through disk head scheduling ○ Seek time takes the longest ● FIFO: Schedule head movement as they arrive in the queue ● Shortest Seek Time First (SSTF): Rearrange queue to minimize distance between head jumps between requests, sorts by distance ● SCAN/Elevator: Move the head in one direction until the end of the disk is reached and then reverse ● LOOK: head is reset when no more request exist between the current head position and the approaching edge of the disk ● CircularSCAN (CSCAN): Move the head in one direction until an edge of the disk is reached and then reset to the opposite edge ● CLOOk: the head is reset when no more request exists b/w the current head position and the approaching edge of the disk Improve performance: Partioning ● Goal: Minimize the largest possible seek time ● Partition: collection of cylinders ● Each partition is a logically separate disk Reduce overhead ● Goal: minimize seek time and rotational latency ○ Make disk smaller ○ Spin faster ○ Schedule disk head ○ Layout out data on disk and reduce stride ○ Place commonly used file on outer tracks ○ Choose block size carefully ■ Too small: low transfer rate (need to perform more seek on the same amount of data) ■ Too big: leads to internal fragmentation Flash Storage: Anatomy ● No moving parts ○ More resistant to physical damage ○ Better random acces ○ Less power ● Made of NAND flash units ○ 1 Block = 128 pages Operations ● Read page ● Write page ○ Can only write an empty page Note: Not very durable, they stop reliably storing a bit after many erasures, few years without power, after nearby cell is read many times (read disturb) Improve durability ● Error correcting code ● Management of defective pages/erasure blocks Approaches to improving performance: ● Reduces the number of times data is copied by caching values in memory ● Reduces the frequency of interrupts by using large data transfers when possible ● Offload computation from the main CPU by using DMA controllers Read/write overhead = seek time + rotational delay + transfer time Higher bandwidth for outer tracks than inner ones ● Same RPM but more space File System File: named collection of related information recorded on secondary storage 2 parts: ● Data: user or application information ● Metadata: Attributes added and managed by the OS ○ Name, type, location, security info, etc. Hard link: a mapping from each name to a specific underlying file or directory Soft link: a mapping from a file name to another file name File API ● Create() creates new file with its meta data, adds entry to the directory that contains the file ● Link() creates a hard link a new name for some underlying file, keeps track of number of directory entries pointing to that file, will point to same underlying file ● Unlink() Removes a name for a file from its directory, if last link, file and its resources are deleted ● Open() Creates per file data structure referred to by file descriptor. Returns file descriptor to the caller ● Close() closes the file ● Read() OS reads <size> bytes into <buffer> ● Write() OS writes <buffer> into <file ● Seek() Updates the file pointer ● Fsync() does not return until all data is written to persistent storage File System: what files are, what the OS does on file open, close, etc, inodes, directories, layout of files on disks (linked, direct, indexed, multilevel indexed, extents), locality and performance optimizations, file system consistency including sources of inconsistency, methods for achieving consistency including transactions, journaling file systems, copyonwrite file systems. RAID (probably)
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'