Class Note for CMPSCI 377 at UMass(46)
Class Note for CMPSCI 377 at UMass(46)
Popular in Course
Popular in Department
This 4 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 17 views.
Reviews for Class Note for CMPSCI 377 at UMass(46)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
CMPSCI 377 Operating Systems Garbage Collection 51 Garbage Collection As mentioned in the previous lectures when a program requests memory region of some given size to the allocator via malloc for instance the allocator acts as a layer between the application and the OS The allocator after getting a block of the the requested size gives a pointer back the requesting programi After it is done with the memory block it is obligation of the programmer to give that memory back to the systemi On one hand languages such as CC require the user to manually return all their used memory whenever it will no longer be used Java on the other hand free memory automatically in the sense that Java returns the memory to the system whenever it it is safe to do so We call this process of automatically deallocating memory garbage collection Although the use of free and delete looks relatively simple it can be tricky to get them right in large programs If we forget to free objects we end up with memory leaks if we free memory too soon we end up with dangling pointersi77 Also we can try to do weird things like performing double frees etci Therefore a runtime system that manages the return of memory automatically is clearly usefuli The most important concept to correctly implement a garbage collector is that of reachable objects a reachable object is any object that can still be reached through a chain iiei directly or indirectly of live pointers such as the stack or global variables Letls analyze the following code node x new node happy node ptr X After we run this code we have two pointers pointing to the same object If we then run delete x z is still alive because it can be reached through pt ri Morever since the memory area that used to belong to z is now free it could in theory be reused by some new object ie some new object allocated sometime after the delete commandi ln this case pt r would still point to that same memory area but now that a area would contain something unexpected and completely different from happylli This is a situation that is clearly undesiredi 52 Algorithms for doing garbage colllection ln summary any algorithm for doing garbage collection has to reclaim objects that are in memory but that can no longer be reached through any pointers These objects are clearly useless since the program cannot name them in order to use them Their memory areas can therefore be returned to the systemi 51 52 Chapter 5 Garbage Collection The big question inherent in all garbage collection algorithms is then how can we know whether or not something is still reachable7 Some of the classical algorithms for determining this are marksweep reference counting and semispace 521 GC Performance GC gets a lot of bad press for having poor performance As we will see later some of these arguments have merit and some donlt GC can impair performance in many ways The most obvious is the extra overhead for nding and reclaiming reachable objects GC also has secondary effects such as reducing cache and page locality GC systems generally require more memory than mallocfree systems to attain the same performance since they delay reclamation of dead objects 522 The Classical Algorithms 5221 Marksweep The objects that a program can access directly are those objects which are referenced by local variables on the processor stack as well as by any static variables that refer to objects In the context of garbage collection these variables are called the roots An object is indirectly accessible if it is referenced by a eld in some other directly or indirectly accessible object An accessible object is said to be reachable Conversely an object which is not reachable is garbage The idea of marksweep is relatively straightforward and proceeds in two phases The rst phase called the mark phase recursively visits every object accessible from the roots pointers and marks them as reachable This is a standard graph search problem and can be performed by breadth or depth rst search After marking the collector performs a sweep phase where it linearly traverses all memory and frees all unmarked objects Marksweep collectors are generally build on top of freelist allocators so the free operation is generally the same as in a mallocfree system Sweeping can be done incrementally and lazily ie during allocation in order to reduce pause time 5222 Reference counting The idea of reference counting is to maintain for every object the total number of references to that object ie the number of incoming pointers Whenever the number of references is zero we know that the object is not accessible through any pointers and thus is garbage Whenever we modify a pointer we decrement the previous targetls reference count and increment the new targets If the old targetls count is now zero the collector frees the object Whenever the collector deletes a dead object it decrements the reference counts for objects referred to by the dead object and so forth recursively One big problem of reference counting though is how to deal with cycles Suppose object A points to B and B points to A If these objects can only be reached through each other we have to free them too even though their counters are not zero The commonly adopted solution to this problem is to run marksweep now and then in order to remove cyclic references and then run normal reference counting on the rest of the time Chapter 5 Garbage Collection 53 5223 Semispace Semispace works by maintaining two disjoint areas in the heap these are called the fromspace and the tospace The algorithm allocates memory only from the fromspace7 without ever worrying about garbage collection The allocator is a simple bumppointer allocator through fromspace When we nally run out of space in the fromspace7 we recursively copy all reachable objects7 ie those that are not garbage We then move each of these live objects to the to space7 taking care to leave the objects new address7 called its forwarding pointer7 in its location in fromspace When the collector follows a pointer to an alreadycopied object7 it does not recopy the object7 instead it updates the pointer to point to the appropriate address in to space by reading the objects forwarding pointer After having done all this moving7 only reachable objects are in the to space At this point7 the system ips the roles of the spaces7 making to space fromspace and vice versa Thus subsequent allocations occur in the space in which the objects were newly copied The process is repeated again when the new fromspace eventually runs out of space This method has as advantages the fact that it compacts objects in the memory7 thus increasing locality also7 when performing allocation from one of the spaces7 it can use simple pointer bumping7 which is very fast However7 this approach doubles the memory requirements7 signi cantly reducing the performance advantages of semispace collection 523 GC Tweaks 524 Generational GC Generational GC consists of an optimization used by copying collectors The key assumption made by gen erational GC is that typically most objects die young this is called the generational hypothesis Whenever this assumption is true7 we can optimize the GC for the common case In order to do so7 we rst allocate objects into a nursery7 which is a small area of memory say 4MB The collector collects the nursery very frequently every time it lls up since according to the generational hypothesis7 most of nursery objects will be garbage Also notice that7 since most of them will indeed be garbage7 copying out all reachable objects in the nursery takes little time After performing GC in the nursery7 we copy out all the survivors into the mature space by doing so7 we are assuming that all objects that survived the nursery will most likely live for a long time Notice that we still have to collect the mature space however7 since those objects are not likely to die soon7 we typically donlt have to collect that area very frequently 525 Conservative GC Conservative GC is a method applicable to programming languages without strict type safety7 ie7 languages which can hide pointers ln Java7 the runtime system can identify pointers precisely7 which is not the case for C and C ln Conservative GC we donlt want to have to copy objects7 like in Generational GC7 since we might end up modifying program data that look like pointers A conservative garbage collector is one that7 instead of knowing the exact location of each object7 discovers a set of possible live objects by scanning a region of memory7 looking for values that may be pointers The possible pointers found may or may not actually be pointers7 but we are assured that all live objects referred to from that particular area are found by the garbage collector Since we must discover the areas that might be objects7 we also have to know how to identify pointers usually7 if something looks like a pointer7 we assume it is a pointer Then7 conservative GC trace throughs those pointers marking everything that is still reachable Some of the drawbacks of this method are 1 areas of memory that look like pointers to objects7 but arenlt7 may cause garbage objects to be retained as long as the fake pointer exists this increases memory usage of the garbage collector7 and can cause other limited resources to be exhausted and 2 the methods of nding exactly which areas to trace 54 Chapter 5 Garbage Collection isnlt always portable and 5 programs may hide pointers which will cause the GC to reclaim live objects causing the program to crash 526 Comparing GC to malloc It should be clear that performing manual memory management is usually faster but trickier than using GCI How then can we measure the performance loss caused by the use of a garbage collector In order to do so we rst have to assume a perfect malloc ie some process that magically knows exactly what to free and when so that nothing is wasted this process can be used to measure the least amount of memory a program can use Obviously any GC mechanism typically uses more memory than that since it does not free objects all the time By comparing the performance of the perfect malloc with different types of garbage collectors we notice that GCs when executed in an environment with lots of available memory can perform as fast as a system that does explicit memory management The key concept here is that of trading space for time if we collect all the time which requires a lot of processing time we are guaranteed to use the least memory possible if we on the other hand have a lot of space to use we can collect very infrequently and thus be very fast In general GC can take almost no time extra when compared to explicit memory management if given enough memory Eigi if we are willing to spend 4x more space than usual garbage collection can run as fast as explicit memory management if on the other hand we donlt give it too much space it will have to run very frequently and thus will demand much more processing time than explicit memory management In face of this tradeoff one might ask when it is advisable to use a garbage collector Usually I if you have lots of memory and 2 if we really want to avoid bugs related to poor memory management If however there is competition with other processes or if you need to guarantee very fast execution time eg scienti c applications realtime systems etc then GC may not be a good idea
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'