New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Operating Systems

by: Roman McCullough

Operating Systems CMPSCI 377

Roman McCullough
GPA 3.57


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 30 page Class Notes was uploaded by Roman McCullough on Friday October 30, 2015. The Class Notes belongs to CMPSCI 377 at University of Massachusetts taught by Staff in Fall. Since its upload, it has received 14 views. For similar materials see /class/232269/cmpsci-377-university-of-massachusetts in ComputerScienence at University of Massachusetts.


Reviews for Operating Systems


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/30/15
CMPSCI 377 Operating Systems Fall 2008 Lecture 17 November 4 Lecturer Prashzmt Sherwy Scribe Shashz39 Singh 171 Demand Paging Demand paging is an application of virtual memory In a system that uses demand paging the operating system copies a disk page into physical memory only if an attempt is made to access it ie if a page fault occurs It follows that a process begins execution with none of its pages in physical memory and many page faults will occur until most of a process7s working set of pages is located in physical memory This is an example of lazy loading techniques The page table indicates if the page is on disk or memory using a valid bit Once a page is brought from disk into memory the OS updates the page table and the valid bit For efficiency reasons memory accesses must reference pages that are in memory the vast majority of time In practice this is mostly true because of Locality the working set size of a process must t in memory The 9010 rule states that 90 of the time the program is execution only 10 of the code 1711 Advantages Demand paging does not load the pages that are never accessed so saves the memory for other programs and increases the degree of multiprogramming 0 There is less loading latency at the program startup 0 There is less of initial disk overhead because of fewer page reads It does not need extra hardware support than what paging needs since protection fault can be used to get page fault Pages will be shared by multiple programs until they are modi ed by one of them so a technique called copy on write will be used to save more resources 0 Ability to run large programs on the machine even though it does not have sufficient memory to run the program This method is easier for a programmer than an old manual overlays 1712 Disadvantages 0 Individual programs face extra latency when they access a page for the rst time So prepaging a method of remembering which pages a process used when it last executed and preloading a few of them is used to improve performance 0 Programs running on lowcost lowpower embedded systems may not have a memory management unit that supports page replacement 0 Memory management with page replacement algorithms becomes slightly more complex 17 1 172 Lecture 17 November 4 1713 Pre paging OS guesses in advance which pages the process will need and preloads them into memory If the guess is right most of the time7 less page faults will occur7 which in turn allows more overlap of CPU and lO The branches in a code make it dif cult to guess the set of pages that should be pre paged 1714 Implementation of Demand Paging A valid bit in the page table being 1 indicates that the corresponding page is in memory Valid bit being 0 indicates that the page is not in memory ie either it is on disk or it is an invalid address If the page is not in memory7 trap to the OS on the rst reference The OS upon verifying that the address is valid7 performs the following steps H selects a page to replace page replacement algorithm invalidates the old page in the page table starts loading new page into memory from disk context switches to another process while lO is being done gets interrupt that page is loaded in memory updates the page table entry NQP FP DN continues faulting process 1715 Swap Space It is a portion of the disk that is reserved for storing pages that are evicted from memory If a page containing code is removed from memory7 we can simply remove it since it is unchanged and can be reloaded from disk If the page containing data is removed from memory7 we need to save the data possibly changed from the last time so that it can be reloaded of the process it belongs to refers to it again Such pages are stored in the swap space At any given time7 a page of virtual memory might exist in one or more of the le system on disk7 physical memory7 or the swap space Page table must be sophisticated to gure out where to nd a page 1716 Performance of Demand Paging In the worst case7 a process could access a new page with each instruction But7 typically processes exhibit locality of reference Temporal locality If a process accesses an item in memory7 it will tend to reference the same item again soon Spatial locality If a process accesses an item in memory7 it will tend to reference an adjacent item soon If p is the probability of a page fault 0 S p S l7 then Effective access time l 7 p ma p pagefaulttime 172 Page Replacement Algorithms 0 MIN This is the theoretically optimal page replacement algorithm When a page needs to be swapped in7 the operating system swaps out the page whose next use will occur farthest in the future This Lecture 17 November 4 173 algorithm cannot be implemented in the general purpose operating system because it is impossible to compute reliably how long it will be before a page is going to be used except when all software that will run on a system is either known beforehand and is amenable to the static analysis of its memory reference patterns or only a class of applications allowing runtime analysis is allowed Despite this limitation algorithms exist that can offer nearoptimal performance the operating system keeps track of all pages referenced by the program and it uses those data to decide which pages to swap in and out on subsequent runsi This algorithm can offer nearoptimal performance but not on the rst run of a program and only if the program s memory reference pattern is relatively consistent each time it runs FIFO The rstin rstout FIFO page replacement algorithm is a lowoverhead algorithm that requires little bookkeeping on the part of the operating systemi The idea is obvious from the name the operating system keeps track of all the pages in memory in a queue with the most recent arrival at the back and the earliest arrival in front When a page needs to be replaced the page at the front of the queue the oldest page is selected While FIFO is cheap and intuitive it performs poorly in practical application Thus it is rarely used in its unmodi ed formi LRU The least recently used page LRU replacement algorithm keeps track of page usage over a short period of time LRU works on the idea that pages that have been most heavily used in the past few instructions are most likely to be used heavily in the next few instructions tool While LRU can provide nearoptimal performance in theory it is rather expensive to implement in practice There are a few implementation methods for this algorithm that try to reduce the cost yet keep as much of the performance as possible Random Random replacement algorithm replaces a random page in memory This eliminates the overhead cost of tracking page referencesi Usually it fares better than FIFO and for looping memory references it is better than LRU although generally LRU performs better in practice CMPSCI 377 Operating Systems Fall 2008 Lecture 10 October 2 Lecturer Prashant Shenoy Scribe Shashi Singh 101 Monitor You might get an experience from studying all semaphore examples that signal and wait calls may scatter everywhere in your program in a notso well structured way If you really get such a feeling the concept of monitor comes to rescue A monitor has four components initialization private data monitor procedures and monitor entry queuei The initialization component contains t e co e that is used exactly once when the monitor is created The private data section contains all private data including private procedures that can only be used within the monitor Thus these private items are not visible from outside of the monitor The monitor procedures are procedures that can be called from outside of the monitor The monitor entry queue contains all threads that called monitor procedures but have not been granted permissions Therefore a monitor looks like a class with the initialization private data and monitor procedures corresponding to constructors private data and methods of that class The only major difference is that classes do not have entry queuesi Monitors are supposed to be used in a 39 or 39 39 in which multiple threadsprocesses may call the monitor procedures at the same time asking for service Thus a monitor guarantees that at any moment at most one thread can be executing in a monitor When a thread calls a monitor procedure we can view the called monitor procedure as an extension to the calling thread If the called monitor procedure is in execution we will say the calling thread is in the monitor executing the called monitor procedurei Now if two threads are in the monitor ie they are executing two possibly the same monitor procedures some private data may be modi ed by both threads at the same time causing race conditions to occur Therefore to guarantee the integrality of the private data a monitor enforces mutual exclusion implicitly More precisely if a thread calls a monitor procedure this thread will be blocked if there is another thread executing in the monitor Those threads that were not granted the entering permission will be queued to a monitor entry queue outside of the monitor When the monitor becomes empty ie no thread is executing in it one of the threads in the entry queue will be released and granted the permission to execute the called monitor procedurei Although we say 77entry queue77 you should not view it literally More precisely when a thread must be released from the entry queue you should not assume any policy for which thread will be released In summary monitors ensure mutual exclusion automatically so that there is no more than one thread can be executing in a monitor at any time This is a very usably and handy capability 102 Condition Variables A thread in a monitor may have to block itself because its request may not have completed immediately This waiting for an event eg lO completion to occur is realized by condition variables A condition variable indicates an event and has no value More precisely one cannot store a value into nor retrieve a value from a condition variable If a thread must wait for an event to occur that thread waits on the corresponding condition variable If another thread causes an event to occur that thread simply signals the corresponding condition variablei Thus a condition variable has a queue for those threads that are waiting the corresponding event to occur to wait on and as a result the original monitor is extended to 101 102 Lecture 10 October 2 the following The private data section now can have a number of condition variables each of which has a queue for threads to wait on 1021 Condition Variable Operations Wait and Signal There are only two operations that can be applied to a condition variable wait and signal When a thread executes a wait call in a monitor of course on a condition variable it is immediately suspended and put into the waiting queue of that condition variable Thus this thread is suspended and is waiting for the event that is represented by the condition variable to occur Because the calling thread is the only thread that is running in the monitor it 77owns77 the monitor lock When it is put into the waiting queue of a condition variable the system will automatically take the monitor lock back As a result the monitor becomes empty and another thread can enter Eventually a thread will cause the event to occur To indicate a particular event occurs a thread calls the signal method on the corresponding condition variable At this point we have two cases to consider First if there are threads waiting on the signaled condition variable the monitor will allow one of the waiting threads to resume its execution and give this thread the monitor lock back Second if there is no waiting thread on the signaled condition variable this signal is lost as if it never occurs There is a third operation BroadcastO that wakes up all waiting threads in the queue instead ofjust one 1022 Semaphore vs Condition Variable o Semaphores can be used anywhere in a program but should not be used in a monitor Condition variables can only be used in montors 0 ln semaphores Wait does not always block the caller ie when the semaphore counter is greater than zero In condition variables Wait always blocks the caller 0 ln semaphores Signal either releases a blocked thread if there is one or increases the semaphore counter ln condition variables Signal either releases a blocked thread if there is one or the signal is lost as if it never happens So semaphores maintain a history of all past signals whereas condition variables do not 0 ln semaphores if Signal releases a blocked thread the caller and the released thread both continue In condition variables if Signal releases a blocked thread the caller yields the monitor Hoare type or continues Mesa Type Only one of the caller or the released thread can continue but not both 1023 Monitor types A signal on a condition variable causes a waiting thread of that condition variable to resume its execution Note that the released thread will have its monitor lock back So what happens to the lock owned by the signaling thread The signaling thread must be running so that it can signal Consequently the signaling thread must own the monitor lockl Now once the released thread runs this thread and the signaling thread would both own the monitor lock and both be running in monitorl This violates the mutual exclusion requirement of monitors Therefore to make sure that mutual exclusion is guaranteed for monitors either the released thread or the signaling thread can run but not both The choice of which thread should run creates at least two types of monitors the Hoare type and the Mesa type Lecture 10 October 2 103 10231 The Hoare Type Monitors ln Hoarels original 1974 monitor definition7 the signaler yields the monitor to the released thread More specifically7 if thread A signals a condition variable CVon which there are threads waiting7 one of the waiting threads7 say B will be released immediately Before B can run7 A is suspended and its monitor lock is taken away by the monitor Then the monitor lock is given to the released thread B so that when it runs it is the only thread executing in the monitor Sometime later7 when the monitor becomes free7 the signaling thread A will have a chance to run This type of monitor does have its merit lf thread B is waiting on condition variable CV when thread A signals7 this means B entered the monitor earlier than A did7 and A should yield the monitor to a 77senior77 thread who might have a more urgent task to perform Because the released thread runs immediately right after the signaler indicates the event has occurred7 the released thread can run without worrying about the event has been changed between the time the condition variable is signaled and the time the released thread runs This would simplify our programming effort somewhat 10232 The Mesa Type Monitors Mesa is a programming language developed by a group of Xerox researchers in late 70s7 and supports multithreaded programming Mesa also implements monitors7 but in a different style for efficiency purpose With Mesa7 the signaling thread continues and the released thread yields the monitor More specifically7 when thread A signals a condition variable CV and if there is no waiting thread7 the signal is lost just like Hoare s type If condition variable CV has waiting threads7 one of them7 say B is released However7 B does not get his monitor lock back lnstead7 B must wait until the monitor becomes empty to get a chance to run The signaling thread A continues to run in the monitor CMPSCI 377 Operating Systems Fall 2008 Lecture 0 September 2 Lecturer Prashant Shenoy Scribe Shashi Singh 01 General information This class is taught by professor Prashant Shenoy at CMPSCI 142 every Tuesdays and Thursdays from 100pm to 215pm For all kind of information related to the course syllabus projects etc please visit http1asscsumasseduNshenoyccurses377A Although the class name is Operating Systems this course will deal mostly with large scale computer systems The main question to be studied is how to build correct reliable and high performance computer systems In order to answer this question problems such as memory management concurrency scheduling etc will be studied 011 Getting help Prof Shenoy has office hours on Tuesdays and Thursdays from 215pm to 315pm at his office CS336 The TA has office hours on Mondays and Fridays from 100pm to 200pm at his office LGRT 220 In addition to office hours Discussion Sections will be lead by the TA on Wednesdays from 1220pm to 110pm at CMPSCI 142 Attendance is very important since sample test problems will be discussed and the main concepts presented during the last couple of classes will be reinforced Lecture notes will also be available on Prof Shenoy s course website The textbook for this class is Operating System Concepts Silberschatz et all 7th Edition OR 8th Edition 012 Grading Studentls grades will depend both on exams 2 exams 20 of the final grade each and on projects 40 of the final grade Homeworks will constitute the remaining 20 Prof Shenoy adopts a very strict late policy so please keep that in mind when deciding when to start working on your assignments All the projects can be done in groups of 2 or 3 You are welcome to do the projects alone too Students will have 3 bonus submissions and a total of 3 late days across all projects The grading of all projects will be performed by an autograder system so you will have the chance to assess how well youlre doing several days before the due ate Do not Cheat An automatic system for nding cheaters will be used and you will be caught 02 Introduction to Operating Systems The term Operating System OS is often misused It is common for example for people to speak of an OS when they are in fact referring to an OS and to a set of additional applications eg on Windows Notepad Windows Explorer etc 0 1 02 Lecture 0 September 2 The traditional View of the area however de nes an OS in a different way The OS can be seen as the layer that stands between the userlevel applications and the hardware Its main goal is to hide the complexity of the hardware from the applications The important concept here is abstraction an OS abstracts architectural details giving programs the illusion of existing in a homogeneous environment The OS effectively makes programs believe that they live in a machine with large amounts of memory a dedicated processor and so on It is also the OS s function to deal with managing the computers resourcers eg the OS decides which process to runs when for how long etci 021 A brief history of Operating Systems Operating Systems evolved tremendously in the last few decades In fact it would probably be impossible for someone coming directly from the 60s to perceive any resemblance between modern OSs and the OSs from those days The rst approach for building Operating Systems taken during the 40s and 60s was to allow only one user and one task at a timer Users had to wait for a task to be nished before they could specify another task or even interact with the computer In other words not only OSs were monouser and monotask there was no overlapping between computation and 10 The next step in the development of OSs was to allow batch processing Now multiple jobs could be exe cuted in a batch mode such that a program was loaded executed output was generated and then the cycle restarted with the next job Although in this type of processing there was still no interferencecommunication between programs some type of protection from poorly or maliciously written programs for instance was clearly needed Overlap between IO and computation was the next obvious problem to be addressed Of course this new feature brought with itself a series of new challenges such as the need for buffers interrupt handling etcl Although the OSs from this time allowed users to interact with the computer while jobs were being processed only one task at a time was permitedl Multiprogramming solved this and it was a task of Operating System to manage the interactions between the programs eg which jobs to run at each time how to protect a program s memory from others etc All these were complex issues that effectively lead to OS failures in the old days Eventually these problems brought up the attention for the need to design OSs in a scienti c mannerl During the 70s hardware became cheap but humans operators users were expensivel During this decade interaction was done via terminals in which a user could send commands to a mainframel This was the Unix eral Response time and thrashing became problems to be dealt with OSs started to treat programs and data in a more homogeneous wayl During the 80s hardware became even cheaper It was then that PCs became widespread and simple OSs such as DOS and MacOS were used DOS for example was so simple that it didn t have any multiprogramming featuresi From the 90s on until today hardware became even cheaper Processing demands keep increasing since then and real OSs such as WindowsNT MacOSX and Linux nally became available for PCs If there is a lesson to be learnt from all this history it is that its very hard to outline trends for the future After all computers advanced 9 orders of magnitude in terms of speed size price in the last 50 years Moorels law also seem to be running out of steam mainly due to fundamental physics limitsl However we can do some guesses on what to expect in the next few years multiple cores unreliable memory serious powerheat constraints trading off computer power for reliability etci CMPSCI 377 Operating Systems Fall 2008 Lecture 3 September 9 Lecturer Pmshant Shenoy Scribe Shashz39 Singh 31 Operator Overloading in C It allows you to provide an intuitive interface to users of your class7 plus makes it possible for templates to work equally well with classes and builtinintrinsic typesi Operator overloading allows CC operators to have userde ned meanings on userde ned types classes Overloaded operators are syntactic sugar for function calls class Fred public if 0 Without operator overloading Fred addconst Fredamp x const Fredamp y Fred mulconst Fredamp x const Fredamp y Fred fconst Fredamp a const Fredamp b const Fredamp 5 return addaddmulab mulbc mulca Yuk else With operator overloading Fred operator const Fredamp x const Fredamp y Fred operator const Fredamp x const Fredamp y Fred fconst Fredamp a const Fredamp b const Fredamp 5 return ab bc ca endif By overloading standard operators on a class7 you can exploit the intuition of the users of that class This lets users program in the language of the problem domain rather than in the language of the machine The ultimate goal is to reduce both the learning curve and the defect rater 3 1 Lecture 3 September 9 3 2 The C Standard Template Library s you know already7 it is tough to program without good data structures Fortunately7 most C imple mentations have them builtinl They are called the Standard Template Library STL The two that you may need for 377 are queues and maps You should know what these are already Using the STL is pretty straightforward Here is a simple example include ltiostreamgt include ltqueuegt using namespace std queueltintgt myQueue int mainint argc char argv myQueuepush10 myQueuepush11 cout ltlt myQueuefront ltlt endl 10 myQueue pop cout ltlt myQueuefront ltlt endl 11 myQueue pop cout ltlt myQueue size ltlt endl Zero The type inside of the angle brackets says what the queue myQueue will hold Push inserts a COPY of what you pass it Front gives you a reference to the object7 unless you copy it Pop7 pops it off the queue and discards iti Often you will have a queue or a map of pointers Here is a more complex example you should now understand include ltiostreamgt include ltmapgt using namespace std class IntCell public IntCell int initialValue int getValue int setValue int val private int storedValue IntCellIntCell int initialValue 0 storedValue initialValue int IntCellgetValue return storedValue Lecture 3 September 9 int IntCell setValue int val storedValue val In a map the first parameter is the key mapltint IntCell gt myMap int mainint argc char argv IntCell a int 1 max 100 for i O i lt max i a newIntCell agtsetValuemaxi myMapi a Inserts a copy of the pointer for i O i lt max i a myMapi cout ltlt agtgetValue ltlt endl delete a myMapi NULL Good idea myMapO gtsetValueO Quiz can I do this Think about what the output from this should be 33 Computer Architecture Although some computer architecture problems underly a lot of the science involved in designing Operating Systems7 in this class we will review just some of the important topics that are relevant to make 055 FAST Specifically7 in this class we will focus on memory hierarchy and on good policies for improving caching mechanisms 331 Memory Hierarchy Modern computers have several types of memory The most obvious ones are the RAM and the hard disk There are7 however7 at least two other types of memory that are important the registers and the cache memory 3311 Registers and a little bit on context changing Registers are specialized pieces of memory in the CPU7 and can be seen as dedicated names for speci c words of memory Some of them are of general purpose such as AX7 BX7 CX7 which on a X86 can be used for addition7 multiplication7 etc Other registers have special purposes the SP Stack Pointer7 for instante7 34 Lecture 3 September 9 is used to point to one of the ends of the stack the PC Program Counter is used to point to the current instruction being executed and so on One of the problems that an Operating System has to deal with is how to make a machine with several simultaneous processes get by with just only one set of registers In other words we know that the CPU has only one Program Counter one Stack Pointer etc also each process has a different instruction being executed a different stack pointer and so on how then is it possible for the OS to manage multiple concurrent processes with just one set of registers available The secret is to realize that in fact on a single U there are no two processes running at the same time What happens is that the Operating System keeps alternating between processes vary fast giving us the impression of true concurrency From now on we will call this replacement of the active process by another one by the name of context switching During a context switch what the OS has to do is to save the current register s values ie copy them from the CPU to the RAM and to reload the saved registers of the process being activated ie to copy them from the RAM to the CPU In this way even though at any given time there might be a bunch of saved registers there will be only one set of registers actually being used that of the currently active process 3312 Cache memory As mentioned before registers are incredibly fast pieces of memory located straight into the CPU they are however few RAM on the other hand is much slower than registers but has a much bigger capacity ls there any type of memory in between Yes the cache memory Consider that every access to the main memory is expensive in terms of CPU cycles typical access times are around 100 cycles The role of the cache memory is to a fasterthan RAM memory but at a more affordable cost Since cache memory is not so big as the main memory we need smart ways to manage it Usually caches hold recentlyaccessed data or instructions The crucial assumption that justi es this approach is that data recently accessed might be needed again in a near future and that data near the one just accessed might also be needed soon Notice that even though cache is the generic term to denote this type of memory there are in fact several types of cache the L1 cache which is an onchip on the CPU memory smallish very fast very expensive but with very low capacity usually 32k 84k the L2 cache usually on or next the the processor larger than Ll and still more expensive than RAM but faster and with capacity in the order of magnitude of megabytes and the L3 cache which is pretty large on busl One of the most important ways to measure the performance gained by means of caching is the hit rate The hit rate refers to the percentage of accesses to data that is cached and that therefore does not need to be fetched from the main memory Very small differences in the hit rate can make a huge difference on performance eg if the hit rate falls from 99 to 99 the computer can become even thousands of times slower When the cache starts being used we say that it is cold meaning that the data we will soon need is not yet cached Notice that when the cache is cold it is unavoidable to incur in some initial misses these are called compulsory misses When they occur the computer needs to fetch the needed data on the main memory and then uses it to populate the cac e One important detail to mention however is that the cache is not populated with individual bytes instead full lines are brought to the cache eg 128 bytes at a time The assumption behind this decision is that usually data presents spacial locality ie bytes near the one just accessed are likely to be needed in a near future After the cache has been lled in we said it has 1Compare the orders of magnitude of the cycles required to access each type of memory registers1 cycle of latency L1 cache 2 cycles L2 cache 7 cycles RAM 100 cycles Disk 40 million cycles Network 200 million cycles Lecture 3 September 9 35 been warmed up Cache memory then is expected to be holding the mostfrequently used data Besides compulsory misses we can also mention other two types of misses capacity misses which occur due to the nite size of the cache2 and con ict misses which occur whenever a different memory location had to be loaded into the same cache line as the one used by the needed data Since the cache is nite and because of associativity 7 to be discussed next a policy to manage the cache memory is needed Ideally the cache should be as large as the memory in which case we wouldnlt need any main memory of coursel since it cannot be that large we must nd a smart way to decide which data to keep and which data to remove from the cache when we run out of space The most popular policy for managing cache is known as the Least Recently Used LRU policyi According to LRU whenever we need to free some space in the cache we throw away the data we used the further in the past The assumption that justi es this behavior is that we often can use the past as a predictor of the future Thus it s reasonable to guess that the things we7ve used the further in the past will probably be needed the farthest in the future Of course this is not a certainty maybe the line just evicted will be needed two instructions ahead We don7t knowi Even then LRU turns out to be a pretty decent policy for managing cachet Another important issue related to caching is that of associativityi Ideally we would want that any memory position from the main memory could be stored in any part of the cache memory This would be a fully associative cache however this type of cache is very expensive to manufacture and requires very complicated logici Thus it is usual to restrict the places in the cache where each memory position of the main memory can be saved For example in a 2way set associative cache any piece of memory can end up in one of two places in a 1way direct mapping cache on the other hand each piece of memory always goes to the same place on the cachegi Finally it is important to understand the implications of a context switch on the cache Since the new process being activated is unlikely to share data or instructions with the current process it is reasonable to assume that the cache will turn cold again This brings us to the conclusion that context switches make the data on the cache useless causing misses and increasing the latency 332 Kernel mode and system calls It is a function of the kernel to protect of OS itself from users and users from other users In order to do so the kernel only allows privileged code to execute in what is called the kernel model Eg when a system call is made for reading something from the disk the OS makes sure among other things that the user which requested the reading is not trying to access other user s les Everytime the CPU goes into kernel mode all pipelines are flushed the context is saved the corresponding code is executed in kernel land and then the system returns to user mode restoring the context It is also important to realize that the hardware actually does have some support to ease the implementation of this separation between user and kernel modest 333 Timers and interrupts It is obvious that modern OSs have to deal with multiple processes running at the same time Also the system must remain responsive in the sense that 10 must be permitted even when the CPU is busy processing In order to make the system respond to certain events periodically timers and interrupts are used Eg when the timer that gives the time limit for process execution goes off the processor is interrupted the current process stops the OS takes control through the interrupt handler and nally the scheduler chooses the 2Ie these occur when the needed data had to be evicted due to lack of space and thus is no longer present in the cache 3This can be a problem if for instance A and B map to the same place and we need to read A and B in turns 36 Lecture 3 September 9 next process Interrupts also signal 10 events such as the arrival of a network packet when a disk read is complete etc 334 Traps and Interrupts A great deal of the kernel consists of code that is invoked as the result of a interrupt or a trap While the words 77interrupt77 and 77trap77 are often used interchangeably in the context of operating systems there is a distinct difference between the two An interrupt is a CPU event that is triggered by some external device A trap is a CPU event that is triggered by a program Traps are sometimes called software interrupts They can be deliberately triggered by a special instruction or they may be triggered by an illegal instruction or an attempt to access a restricted resource When an interrupt is triggered by an external device the hardware will save the the status of the currently executing process switch to kernel mode and enter a routine in the kernel This routine is a rst level interrupt handler It can either service the interrupt itself or wake up a process that has been waiting for the interrupt to occur When the handler nishes it usually causes the CPU to resume the processes that was interrupted However the operating system may schedule another process instead When an executing process requests a service from the kernel using a trap the process status information saved the CPU is placed in kernel mode and control passes to code in the kernel This kernel code is called the system service dispatcher It examines parameters set before the trap was triggered often information in speci c CPU registers to determine what action is required Control then passes to the code that performs the desired action When the service is nished control is returned to either the process that triggered the trap or some other process Traps can also be triggered by a fault In this case the usual action is to terminate the offending process It is possible on some systems for applications to register handlers that will be evoked when certain conditions occur 7 such as a division by zero 335 Synchronous and Asynchronous IO There are two types of inputoutput lO 39 39 10 and lO Asyn chronous 10 is also referred to as overlapped 10 In synchronous 10 a thread starts an lO operation and immediately enters a wait state until the 10 request has completed A thread performing asynchronous lO sends an lO request to the kernel by calling an appropriate function If the request is accepted by the kernel the calling thread continues processing another job until the kernel signals to the thread that the 10 operation is complete It then interrupts its current job and processes the data from the 10 operation as necessary In situations where an lO request is expected to take a large amount of time such as a refresh or backup of a large database or a slow communications link asynchronous 10 is generally a good way to optimize processing efficiency However for relatively fast lO operations the overhead of processing kernel lO requests and kernel signals may make asynchronous lO less bene cial particularly if many fast lO operations need to be made In this case synchronous lO would be better 336 MemoryMapped IO Memorymapped 10 is an lO scheme where the device s own onboard memory is mapped into the pro cessorls address space Data to be written to the device is copied by the driver to the device memory and data read in by the device is available in the shared memory for copying back into the system memory Memorymapped 10 is frequently used by network and video devices Many adapters offer a combination of programmed 10 and memorymapped modes where the data buffers are mapped into the processors memory space and the internal device registers that provide status are accessed through the 10 space The adapter s memory is mapped into system address space through the PCI BIOS a software setup program or by setting jumpers on the device Before the kernel can access the adapter s memory it must map the Lecture 3 September 9 37 adapter s entire physical address range into the kernels virtual address space using the functions supplied by the driver interface 337 Virtual Memory Virtual memory is a technique which gives an application program the impression that it has contiguous working memory while in fact it may be physically fragmented and may even over ow on to disk storage Systems that use this technique make programming of large applications easier and use real physical memory egi RAM more ef ciently than those without virtual memory Note that 77virtual memory77 is not just 77using disk space to extend physical memory size h Extending memory is a normal consequence of using virtual memory techniques but can be done by other means such as overlays or swapping programs and their data completely out to disk while they are inactive The de nition of 77virtual memory77 is based on tricking programs into thinking they are using large blocks of contiguous addresses All modern general purpose computer operating systems use virtual memory techniques for ordinary applications such as word processors spreadsheets multimedia players accounting etci Few older operating systems such as DOS of the 1980s or those for the mainframes of the 1960s had virtual memory functionality CMPSCI 377 Operating Systems Fall 2008 Lecture 11 October 7 Lecturer Prashzmt Sherwy Scribe Shashi Singh 1 11 ReadersWriters Problem In computer science the readerswriters problems are examples of a common computing problem in concur rency The two problems deal with situations in which many threads must access the same shared memory at one time some reading and some writing with the natural constraint that no process may access the share for reading or writing while another process is in the act of writing to it ln particular it is allowed for two readers to access the share at the same time 1111 Three types of solutions l Reader preferred waiting readers go before waiting writers A constraint is added that no reader shall be kept waiting if the share is currently opened for reading E0 Writer preferred waiting writers go before waiting readers A constraint is added that no writer once added to the queue shall be kept waiting longer than absolutely necessary 9 Neither preferred try to treat readers and writers fairly a simple queue is not good enough we want parallel readers whenever possible In fact the solutions implied by both problem statements reader preferred and writer preferred result in starvation the reader preferred problem may starve writers in the queue and the writer preferred problem may starve readers Therefore neither preferred problem is sometimes proposed which adds the constraint that no thread shall be allowed to starve that is the operation of obtaining a lock on the shared data will always terminate in a bounded amount of time Solutions to this will necessarily sometimes require readers to wait even though the share is opened for reading and sometimes require writers to wait longer than absolutely necessary Look at the lecture slides for pseudocodes for these three different solutions 1112 pthreads ReadWrite Locks pthread library supports readwrite lock So you donlt have to care for the synchronization among the readers and the writers A thread can acquire a read lock or a write lock Multiple threads can hold the same read lock concurrently but only one thread can hold a write lock Following are the pthread routines pthreadrwlock init pthreadrwlockrdlock pthreadrwlockwrlock pthreadrwlockunlock 112 Lecture 11 October 7 112 Dining Philosophers Problem The dining philosophers problem is an illustrative example of a common computing problem in concurrency It is a classic multiprocess synchronization problemi The dining philosophers problem is summarized a bunch of philosophers sitting at a table doing one of two things eating or thinking While eating they are not thinking and while thinking they are not eating The philosophers sit at a circular table with a large bowl of spaghetti in the center A chopstick is placed in between each philosopher and as such each philosopher has one chopstick to his or her left and one chopstick to his or her right As spaghetti is dif cult to serve and eat with a single chopstick it is assumed that a philosopher must eat with two chopsticks The philosopher can only use the chopstick on his or her immediate left or right The philosophers never speak to each other which creates a dangerous possibility of deadlock when every philosopher holds a left chopstick and waits perpetually for a right chopstick or vice versa Originally used as a means of illustrating the problem of deadlock this system reaches deadlock when there is a 7cycle of unwarranted requestsli In this case philosopher P1 waits for the chopstick grabbed by philosopher P2 who is waiting for the chopstick of philosopher P3 and so forth making a circular chaini Starvation might also occur independently of deadlock if a philosopher is unable to acquire both chopsticks due to a timing issue For example there might be a rule that the philosophers put down a chopstick after waiting ve minutes for the other chopstick to become available and wait a further ve minutes before making their next attempti This scheme eliminates the possibility of deadlock the system can always advance to a different state but still suffers from the problem of livelock A livelock is similar to a deadlock except that the states of the processes involved in the livelock constantly change with regard to one another none progressing If all the philosophers appear in the dining room at exactly the same time and each picks up their left chopstick at the same time the philosophers will wait ve minutes until they all put their chopsticks down and then wait a further ve minutes before they all pick them up again The lack of available chopsticks is an analogy to the locking of shared resources in real computer programming a situation known as concurrency Locking a resource is a common technique to ensure the resource is accessed by only one program or chunk of code at a time When the resource a program is interested in is already locked by another one the program waits until it is unlocked When several programs are involved in locking resources deadlock might happen depending on the circumstances For example one program needs two les to process When two such programs lock one le each both programs wait for the other one to unlock the other le which will never happen 1121 Solution to the Dining Philosophers problem A naive solution is to rst wait for the left chopstick using a semaphore After successfully acquiring it wait for the right chopsticki After both chopsticks have been acquired eati When done with eating release the two chopsticks in the same order one by one by call to signal Though simplistic this solution might lead to a deadlock when every philosopher around the table is holding hisher left chopstick and waiting for the right one A variation of this solution makes a philosopher release the left chopstick if it can7t acquire the right one But even this can lead to livelock because of timing issues as explained before The correct solution is to let a philosospher acquire both the chopsticks or none This can be done within a monitor wherein only one philosopher is allowed to check for availability of chopsticks at a time within a test function Look at lecture slide for the complete pseudocodei CMPSCI 377 Operating Systems Fall 2008 Lecture 14 October 21 Lecturer Prashzmt Sherwy Scribe Shashz39 Singh 141 Memory Management Memory management is the act of managing computer memory In its simpler forms this involves providing ways to allocate portions of memory to programs at their request and freeing it for reuse when no longer needed The management of main memory is critical to the computer system Initially the program exe cutable is resident on disk The OS loads the program from the disk into main memory While executing the program the CPU fetches instructions and data from memory 1411 Terminology 0 Segment It is a chunk of memory assigned to a process 0 Physical Address A physical address also real address or binary address is the memory address that is electronically in the form of binary number presented on the computer address bus circuitry in order to enable the data bus to access a particular storage cell of main memory 0 Virtual Address It is an address relative to the start of a process7 address space 1412 Generation of addresses Compile time The compiler generates the exact physical location in memory starting from some xed starting position k The OS is not involved here Load time Compiler generates an address but at load time the OS determines the process7 starting position Once the process loads it does not move in memory Execution time Compiler generates an address and OS cana place it anywhere in memory 142 Uniprogramming Perhaps the simplest model for using memory is to provide uniprogramming without memory protection where each application runs with a hardwired range of physical memory addresses Given that a uniprogram ming environment allows only one application to run at a time an application can use the same physical addresses every time even across reboots Typically applications use the lower memory addresses low memory and an operating system uses the higher memory addresses high memory An application can address any physical memory location OS is protected from process by checking addresses used by process 14 1 142 Lecture 14 October 21 143 Multiprogramming One step beyond the uniprogramming model is to provide multiprogramming without memory protection When a program is copied into memory a linkerloader alters the code of the program loads stores jumps to use the address of where the program lands in memory In this environment bugs in any program can cause other programs to crash even the operating system The third model is to have a multiprogrammed operating system with memory protection Memory protection keeps user programs from crashing one another and the operating system 144 Relocation The role of relocation the ability to execute processes independently from their physical location in memory is central for memory management virtually all the techniques in this eld rely on the ability to relocate processes ef ciently The need for relocation is immediately evident when one considers that in a general purpose multiprogramming environment a program cannot know in advance before execution ie at compile time what processes will be running in memory when it is executed nor how much memory the system has available for it nor where it is located Hence a program must be compiled and linked in such a way that it can later be loaded starting from an unpredictable address in memory an address that can even change during the execution of the process itself if any swapping occurs Its easy to identify the basic requirement for a binary executable program to be relocatable all the references to memory it makes during its execution must not contain absolute ie physical addresses of memory cells but must be generated relatively ie as a distance measured in number of contiguous memory words from some known point The memory references a program can generate are of two kinds references to instructions ad references to data The former kind is implied in the execution of program branches or subroutine calls a jump machine instruction always involves the loading of the CPU program counter register with the address of the memory word containing the instruction to jump to The executable code of a relocatable program must then contain only relative branch machine instructions in which the address to branch to is speci ed as an increment or decrement with respect to the address of the current instruction or to the content of a register or memory word The latter kind comes into play when whenever program variables including program execution variables like a subroutine call stack are accessed In this case relocation is made possible by the use of indexed or increment processor addressing modes in which the address of a memory word is computed at reference time as the sum of the content of a register plus an increment or a decrement As well see later the memory references of a process in a multitasking environment must somehow be bounded so to protect from unwanted interferences memory areas like the unwritable parts of the process itself or the memory areas containing the images of other processes etc This is usually accomplished in hardware by comparing the address of each memory reference produced by a process with the content of one or more bound registers or memory words so that the processor traps an exception to block the process should an illegal address be generated 145 Memory Allocation Usually memory is allocated from a large pool of unused memory area called the heap In C dynamic allocationdeallocation must be manually performed using commands like malloc free new and delete Malloc allocates space of a given size and gives a pointer back to the programmer the programmer then can do whatever he wants with it The new command on the other hand allocates a speci c object of a given Lecture 14 October 21 143 size The general way in which dynamic allocation is done is that the program asks the memory manager to allocate or free objects or multiple pages then the memory manager asks the OS to allocatefree pages or multiple pages Notice however that the allocator does not give the whole allocated page back to the program it just gives what it asked for The rest of the page ie the parts not given to the program is saved for future memory requestsi 146 Allocation techniques Since most of the programs require a lot of memory allocationdeallocation we expect the memory manage ment to be fast to have low fragmentation make good use of locality and be scalable to multiple processors We say that fragmentation occurs when the heap due to characteristics of the allocation algorithm gets broken up77 into several unusable spaces or when the overall utilization of the memory is compromised External fragmentation happens when there is waste of space outside ie in between allocated objects internal fragmentation happens when there is waste of space inside an allocated areal Remember that the allocator might at any single time have several pages with unused space from which it could draw pieces of memory to give to requesting programs There are several ways to decide what are the good spots among those with free memory from which to take memory The rst t method nds the rst chunk of desired size and returns it it is generally considered to be very fast best t nds the chunk that wastes the least of space and worst t takes memory from the largest existent spot and as a consequence maximizes the free space available As a general rule rst t is faster but increases fragmentationi One useful technique that can be applied with these methods is always to coalesce free adjacent objects into one big free objecti Compaction It is a process that reduces the amount of fragmentation in le systems It does this by physically organizing the contents of the disk to store the pieces of each le close together and contiguouslyi It also attempts to create larger regions of free space to impede the return of fragmentation Some defragmenters also try to keep smaller les within a single directory together as they are often accessed in sequence 147 Keeping track of free memory How do we keep track of where the free pieces of memory are One idea is to maintain a set of linkedlists of free space each linkedlist will store free chunks of some given size say one list for chunks of 4 bytes one for chunks of 8 16 etc This approach resembles the best t algorithm since we can easily nd the list with chunks of memory that are the closest to what we need the difference of course is that usually the linked lists store not objects of arbitrary size but only powers of two Also we can obviously never coalesce adjacent elements of the lists since then the very idea of segregating by size classes wouldnlt make sense anymore Another possible technique to keep track of free memory is called Big Bag of Pages BiBOP In this technique a bunch of pages usually segregated by size classes is maintained such that there is a header on each page on this header among other things we can nd a pointer to the next page of objects of that given size and also a pointer to the rst free slot inside that page Using the right optimizations we can make BiBOP nd and manage free slots of memory in 01 CMPSCI 377 Operating Systems Fall 2008 Lecture 8 September 25 Lecturer Pmshant Shenoy Scribe Shashz39 Singh 81 Synchronization As we already know threads must ensure consistency otherwise race conditions nondeterministic results might happen Now consider the too much milk problem two people share the same fridge and must guarantee that there7s always milk but not too much milk How to solve it First we consider some important concepts and their de nitions 0 Mutex prevents things from operating on the same data at the same time 0 Critical section a piece of code that only one thread can execute at a time 0 Lock a mechanism for mutual exclusion the program locks on entering a critical section acesses the shared data and then unlocks Also a program waits if it tries to enter a locked section 0 Invariant something that must always be true when not holding the lock For the above mentioned problem we want to ensure some correctness properties First we want to guarantee that only one person buys milk when it is neeed this is the safety property aka nothing bad happens Also we want to ensure that someone does buy milk when needed the progress property aka something good eventually happens Now consider that we can use the following atomic operations when writing the code for the problem 0 leave a note equivalent to a lock 0 remove a note equivalent to a unlock o don7t buy milk if there s a note equivalent to a wait Our rst try could be to use the following code on both threads if no milk and no note leave note buy milk remove note Unfortunately this doesn7t work because both threads could simultaneously verify that there s no note and no milk and t en both would simultaneously leave a note an buy more milk The problem in this case is that we end up with too much milk safety property not met Now consider our solution 2 Thread A 82 Lecture 8 September 25 leave note quotAquot if no note quotBquot if no milk buy milk remove note quotAquot Thread B leave note quotBquot if no note quotAquot if no milk buy milk remove note quotBquot The problem now is that if both threads leave notes at the same time neither will ever do anything Then we end up with no milk at all which means that the progress property not met Solution 3 will be discussed on the next class 82 Concurrency When programming with threads processes or with any type of program that has to deal with shared data we have to take into account all possible interleaving of these processes In other words in order to guarantee that concurrent processes are correct we have to somehow guarantee that they generate the correct solution no matter how they are interleavedl Earlier we discussed the Too Much Milk77 problem and realized that its very dif cult to come up with an approach that always solves it properly Let us now consider an approach that does work Thread A leave note A while note B do nothing if no milk buy milk remove note A Thread B leave note B if no note A if no milk buy milk remove note B This approach unlike the two examples considered on the previous class does work However it is not easy to be convinced that these two algorithms when taken together always produce the desired behaviorl Moreover these pieces of code have some drawbacks rst notice that Thread A goes into a loop waiting Lecture 8 September 25 83 for B to release its note This is called busy waiting and is generally not a good idea because Thread A wastes a lot of CPU and because it can t execute anything usefull while B is not done Also notice that even though both threads try to perform the exact same thing they do it in very different ways This is a problem specially when we were to write say a third thread This third thread would probably look very different than both A and B and this type of assymmetric code does not scale very well So the question is how can we guarantee correctness and at the same time avoid all these drawbacks The answer is that we can augment the programming language with highlevel constructs capable of solving synchronization problems Currently the best known constructs used in order to deal with concurrency problems are locks semaphores monitors 821 LocksMutex Locks also known as Mutex provide mutual exclusion to shared data inside a critical session They are implemented by means of two atomic routines acquire which waits for a lock and takes it when possible and release which unlocks the lock and wakes up the waiters The rules for using locksmutex are the following H only one person can have the lock E0 locks must be acquired before accessing shared data 9 locks must use release after use F locks are initially released The syntax for using locks in CC is the following pthreadmutexinit ampl pthreadmutexlockampl update data this is the critical section pthreadmutexunlockampl Let us now try to rewrite the Too Much Milk77 problem in a cleaner and more symmetric way using locks In order to do so the code for Thread A and also for Thread B has to be the following pthreadmutexlockampl if no milk buy milk pthreadmutexunlockampl This is clearly much easier to understand than the previous solutions also it is more scalable since all threads are implemented in the exact same way Now how could one go about implementing locks No matter how we choose to implement them we must have some hardware support One possibility to implement locks is to disable interrupts since these are the only way that a CPU has to change what it is doing in other words if we keep the CPU from switching processes we can guarantee that only one process the active one will have access to the shared data Another option would be to make use of atomic operations such as testampset This operation which usually Lecture 8 September 25 correspond to an assembly instruction is such that testampsetx returns 1 if X1 otherwise if X0 it returns 0 and sets X to 1 All this is of course implemented atomicallyi Having this type of atomic operation one could implement threadioek simply as while testampset 1 do nothing and threadunloek simply as 1 0 Summary 0 Communication between threads is done implicitly via shared variables 0 Critical sections are regions of code that access shared variables 0 Critical sections must be protected by synchronization 7 We need primitives that ensure mutual exclusion 7 Writing personalized solutions to concurrency is tricky and errorprone 7 The solution is to introduce general highlevel constructs into the language such as pthreadloeko and pth read nloekoi CMPSCI 377 Operating Systems Fall 2008 Lecture 5 September 16 Lecturer Prashaut Sherwy Scribe Shashz39 Singh 51 Process Management 511 Process A process is an instance of a computer program that is being sequentially executed by a computer system that has the ability to run several computer programs concurrently A computer program itself is just a passive collection of instructions while a process is the actual execution of those instructions Several processes may be associated with the same program for example opening up several windows of the same program typically means more than one process is being executed The state of a process consists of code for the running program its static data heap and the heap pointer HP program counter stack and the stack pointer value of CPU registers set of OS resources in use list of open les etc process execution state new ready running etc 512 Process Execution State Processes go through various process states which determine how the process is handled by the operating system kernel The speci c implementations of these states vary in different operating systems and the names of these states are not standardised but the general highlevel functionality is the same When a process is rst startedcreated it is in new state It needs to wait for the process scheduler of the operating system to set its status to 77new77 and load it into main memory from secondary storage device such as a hard disk or a CD ROM Once it is loaded into memory it enters ready state Once the process has been assigned to a processor by a shortterm scheduler a context switch is performed loading the process into the processor and the process state is set to running where the processor executes its instructions If a process needs to wait for a resource such as waiting for user input or waiting for a le to become available it is moved into the em wating state until it no longer needs to wait then it is moved back into the em ready state Once the process nishes execution or is terminated by the operating system it is moved to the terminated state where it waits to be removed from main memory The O manages multiple active process using state queues 513 Process Control Block A Process Control Block is a data structure in the operating system kernel containing the information needed to manage a particular process The PCB is 77the manifestation of a process in an operating system A PCB will include the identi er of the process a process identi er or PlD register values for the process including the program counter the address space for the process priority process accounting information such as when the process was last run how much CPU time it has accumulated etc pointer to the next PCB ie pointer to the PCB of the next process to run lO Information ie lO devices allocated to this process list of opened les etc Since the PCB contains the critical information for the process it must be 51 52 Lecture 5 September 16 kept in an area of memory protected from normal user access In some operating systems the PCB is placed in the beginning of the kernel stack of the process since that is a convenient protected location 514 Process State Queues The OS maintains the PCBs of all processes in state queues PCBs of all processes in the same execution state is placed in the same queue When the state of a process is changed7 its PCB is unlinked from its current queue and moved to its new state queue The OS can use different policies to manage each queue FIFO7 Round Robin7 Priority etc Each lO device has its own wait queue 515 Context Switch A context switch is the computing process of storing and restoring the state context of a CPU such that multiple processes can share a single CPU resource The context switch is an essential feature of a multitasking operating system Context switches are usually computationally intensive and much of the design of operating systems is to optimize the use of context switches There are three scenarios where a context switch needs to occur multitasking7 interrupt handling7 user and kernel mode switching In a context switch7 the state of the rst process must e saved somehow7 so that7 when the scheduler gets back to the execution of the rst process7 it can restore this state and continue The state of the process includes all the registers that the process may be using7 especially the program counter7 plus any other operating system speci c data that may be necessary Often7 all the data that is necessary for state is stored in one data structure7 called a process control block 516 Creating a Process fork System Call A process can create other processes to do work In computing7 when a process forks7 it creates a copy of itself7 which is called a 770 ild process The original process is then called the 77parent process More generally7 a fork in a multithreading environment means that a thread of execution is duplicated7 creating a child thread from the parent thread Under Unix and Unixlike operating systems7 the parent and the child processes can tell each other apart by examining the return value of the fork system call In the child process7 the return value of fork is 07 whereas the return value in the parent process is the PD of the newlycreated child process The fork operation creates a separate address space for the child The child process has an exact copy of all the memory segments of the parent process7 though if copyon write semantics are implemented actual physical memory may not be assigned ie7 both processes may share the same physical memory segments for a while Both the parent and child processes possess the same code segments7 but execute independently of each other The child process usually executes the exec function to do something useful The exec functions of Unixlike operating systems are a collection of functions that causes the running process to be completely replaced by the program passed as argument to the function As a new process is not created7 the process ID PlD does not change across and execute7 but the data7 heap and stack of the calling process are replaced by those of the new process In the eacecl7 eaceclp7 eacecv7 and execvp calls7 the child process inherits the parent s environment The parent process7 after creating the child process7 may issue a wait system call7 which suspends the execution of the parent process while the child executes When the child process terminates7 it returns an exit status to the operating system7 which is then returned to the waiting parent process The parent process then resumes execution Lecture 5 September 16 53 517 Process Termination On process termination7 OS reclaims all resources assigned to the process In Unix7 a process cana terminate itself using the exit system call A process can terminate another process if it has the privilege to do so using the kill system call 518 Cooperating Processes Cooperating processes work with each other to accomplish a single task This may improve performance by overlapping activities or performing work in parallel It helps tp easily share information between tasks It can enable an application to achieve a better program structure as a set of cooperating processes7 where each is smaller than a single monolithic program Distributed systems are examples of cooperating processes in action In computer science7 the producerconsumer problem also known as the boundedbuffer problem is a classical example of a multiprocess synchronization problem The problem describes two processes7 the producer and the consumer7 who share a common7 xedsize buffer The producers job is to generate a piece of data7 put it into the buffer and start again At the same time the consumer is consuming the data ie removing it from the buffer one piece at a time The problem is to make sure that the producer won7t try to add data into the buffer if it s full and that the consumer won t try to remove data from an empty buffer 519 Interprocess Communication InterProcess Communication lPC is a set of techniques for the exchange of data among two or more threads in one or more processes Processes may be running on one or more computers connected by a network lPC techniques are divided into methods for message passing7 synchronization7 shared memory7 and remote procedure calls RPC The method of lPC used may vary based on the bandwidth and latency of communication between the threads7 and the type of data being communicated 5 19 1 Message Passing message passing is a form of 39 39 used in iuteipiu e 39 39 C 39 39 is made by the sending of messages to recipients Each process should be able to name the other processes The consusmer is assumed to have an infnite buffer size The sender typically uses sendO system call to send messages7 and the receiver uses receiveO system call to receive messages These system calls can be either synchronous or asynchronous 5 19 2 Shared Memory Shared memory is a memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them One process will create an area in RAM which other processes can access this is typically done using system calls mmap7 shmget etc Since both processes can access the shared memory area like regular working memory7 this is a very fast way of communication as opposed to other mechanisms of lPC On the other hand7 it is less powerful7 as for example the communicating processes must be running on the same machine whereas other lPC methods can use a computer network7 and care must be taken to avoid issues if processes sharing memory are running on separate CPUs and the underlying architecture is not cache coherent Since shared memory is inherently nonblocking7 it canlt be used to achieve synchronization CMPSCI 377 Operating Systems Fall 2008 Lecture 6 September 18 Lecturer Pmshant Shenoy Scribe Shashz39 Singh 61 Interprocess Communication Remember that by using fork we can split one process into two and that the input of each process will be basically whatever is the state of the program right before the fork Also notice that after the fork each process is free to follow its own path of execution and that its output will be given by whatever value is return via the exit call Now considering that a machine can have several processes running in parallel we might want to consider make them communicate with each other On last class we discussed some of the possibilities for doing that signals sendingreceiving simple integer numbers mmap implicit communication by memory sharing pipes unidirectional communication channels and sockets explicit message passing 62 Threads First remember that different processes keep their own data on a distinct address spaces Threads on the other hand explicitly share their entire address space with one another Although this can make things a lot faster it comes with the cost of making programming a lot more complicated ln UnixPOSlX the threads API is composed by two main calls 0 pthreadcreate which starts a separate thread 0 pthreadjoin which waits for a thread to complete Notice that the general syntax for using these is pid pthreadcreateamptid NULL functionptr argument pthreadjointid ampresult Example void run void d int q int d int v 0 for int i0 iltq i v v somefunctionca11 return void v 62 Lecture 6 September 18 int main pthreadt t1 t2 int r1 r2 pthreadcreateampt1 NULL run int 100 the last parameter is a hack it should be a pointer but we can pass the desired data say an int as if it were a pointer pthreadcreateampt2 NULL run int 666 pthreadjoint1 void ampr1 pthreadjoint2 void ampr2 cout ltlt r1 ltlt rl ltlt r2 ltlt r2 ltlt endl Notice that the above threads maintain di erent stacks and different sets of registers except for those however they share all their address spaces Also notice that if you were to run this co e in a 2 core machine it would be expected that it ran sort of twice as fast as it would in a single core machine If you ran it in a 4 core machine however it would run as fast as in the 2 core machine since there would be no suf cient threads to exploit the available parallelismi 621 Processes vs threads One might argue that in general processes are more exible than threadsi For one thing they can live in two different machines and communicate via sockets they are easy to spawn remotely eg ssh fooicsiumassiedu ls l etc However processes requires explicit communication and risky hackeryi Threads also have their own problems because they communicate through shared memory they must run on same machine and require threadsafe code So even though threads they are faster they are much harder to programi In a sense we can say that processes are far more robuts than threads since they are completely isolated from other anotheri Threads on the other hand are not that safe since whenever one thread crashes the whole process terminates When comparing processes and threads we can also analyse the context switch costi Whenever it is needed to switch between two processes we must invalidate the TLB cache the so called TLB showdown This of course makes everything sloweri When we switch between two threads on the other hand it is not needed to invalidate the TLB because all threads share the same address space and thus have the same contents in the cache In other words the cost of switching between threads is much smaller than the cost of switching between procesesi 622 Kernel Threads and UserLevel Threads OS managed threads are called kernellevel threads or lightweight processesi All thread operations are implemented in the kernel The OS schedules all of the threads in the systemi Example Solaris lightweight processes LWP Kernellevel threads make concurrency much cheaper than processes This is because as compared to processes there is much less state to allocate and initialize However for ne grained concurrency kernellevel threads still suffer from too much overheadi Thread operations still require system calls ideally we want thread operations to be as fast as a function call Kernellevel threads have to be Lecture 6 September 18 63 general to support the needs of all programmers languages runtimes etc For such negrained concurrency we need even cheaper threads To make threads cheap and fast they need to be implemented at user level Userlevel threads are managed entirely by the runtime system userlevel library A thread is simply represented by a program counter registers stack and small thread control block TCB Creating a new thread switching between threads and synchronizing threads are done via function calls without any kernel involvement Userlevel threads are about 100 times faster than kernel threads But userlevel threads are not a perfect solution They are invisble to the 03 As a result the OS cana make poor decisions like scheduling a process with idle threads blocking a process whose thread initiated an lO even though the process has other threads that can execute unscheduling a process with a thread holding a lock even when other threads do not hold any locksi Solving this problem requires communication between the kernel and the userlevel thread manageri


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.