Class Note for EECS 678 with Professor Kulkarni at KU 5
Class Note for EECS 678 with Professor Kulkarni at KU 5
Popular in Course
Popular in Department
This 22 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Kansas taught by a professor in Fall. Since its upload, it has received 25 views.
Reviews for Class Note for EECS 678 with Professor Kulkarni at KU 5
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Introduction to PThreads and Basic Synchronization Michael Jantz Dr Prasad Kulkarni Introduction In this lab we will learn about some basic synchronization issues using PThreads Grab the tar file for this lab httppeopleeecskuedumjantz678labslabO6lab06tar Untar it and make the starter code tar xvzf lab06tar cd labO6 make Helpful man pages pthreads7 pthreadcreate pthreadjoin pthreadmutexinit pthreadmutexlock Threads vs Processes Multiple processes work well when the jobs performed by each process are unrelated However when a task requires multiple threads of control operating over the same state information eg a game with several characters a multi process model has several disadvantages Maintaining a copy of the same state information for each process is redundant Exchanging state information requires expensive IPC mechanisms With a task39s work divided between multiple processes the overhead of many context switches decreases performance The thread model was invented to address these issues By allowing multiple threads of control to share the same address space threads exchange state information efficiently Threads vs Processes When you create a new process an entire copy of the parent process39 process control block is copied to the child When you create a new thread only the components of the process control block that are necessary to create a new thread of control are actually allocated for the child A new PC registers stack and some other misc info are allocated for the new thread The code and data regions of the PCB are not copied but are now shared between the parent and the child For these reasons threads are often thought of as lightweight processes Thread Pros and Cons By allowing threads of control to share a common address space the thread model provides an efficient way to multi task a job over several threads of control State information for each thread is updated automatically without the use of IPC mechanisms Context switching from thread to thread is much more efficient than switching the context of an entire process These advantages however come with their own drawbacks The use of shared data must be synchronized between threads Functions used by multiple threads of control must be reentrant A reentrant function must not hold static data over successive calls or return a pointer to static data Creating Threads In Linux you use the fork system call to create processes fork makes use of the Clone system call to clone the calling processes process control block for the child int cloneint fnvoid void childstack int flags void arg Operating systems provide Clone primarily as a way to implement lightweight processes The ags argument allows the user to specify exactly which parts of the process control block the parent and child should share as well as some other relevant properties For example calling Clone with the CLONEFLES bit set in the flags argument will create a child which shares the parent39s open file table as opposed to creating a copy of the parent39s file descriptor table for the child Thread Libraries For many reasons thread standards are often implemented and distributed as libraries A common standard allows for portable software Libraries are an efficient way of distributing widely used user level code Depending on which metrics the programmer cares about different user level designs of the thread system will yield much different performance In today39s lab we will the Native POSIX Threads Library which implements the POSIX standard for Linux The POSIX Standard The POSIX standard specifies a set of interfaces ie functions and header files that can be used for threaded programming These interfaces require that the underlying implementations meet a certain set of criteria Most basically the underlying implementation must implement threading as described above Specifically Asingle process may contain multiple threads all of which execute the same program Threads share the same global memory data and heap segments Each thread has its own stack automatic variables There are several other characteristics specified by the standard eg Threads must share a common process ID but also have a unique thread ID See the pthreads man page for a complete description of the standard Native POSIX Thread Library The Native POSIX Thread Library implements the POSIX standard on modern Linux distributions Afew notes on the design of NPTL NPTL is a socalled 1 x 1 thread library This means every thread created by NTPL in user mode corresponds to exactly one schedulable entity in the kernel In the case of Linux this entity is a kernel thread the lightest weight schedulable entity the kernel uses Contrarily an m x n model typically has more threads in user land than there are schedulable entities as seen by the kernel In this model the thread library is responsible for scheduling threads unknown to the kernel The advantage of the m x n model is you eliminate most of the overhead of context switching between threads in the same process The main disadvantage is the overhead introduced by having to communicate scheduling information from kernel mode to user mode whenever a scheduling decision had to be made A user mode scheduler is also a fairly large piece of software which would have to be maintained Forthese reasons NPTL uses the 1 x1 model which is a relatively thin library layer ptcountc Open ptcountc and read through it to get a feel for the program39s structure As it is now the program doesn39t do much The goal of this lab is to learn how PThreads work by modifying this program to create some number of threads executing the same routine The routine each thread executes will spin in two oops bound by the OUTERLOOPMAX and NNERLOOPMAX command line arguments incrementing some goba integer for each iteration of the inner loop pthreadcreate The first step is to actually create each pthread using the library routine pthreadcreate int pthreadcreatepthreadt restrict thread const pthreadattrt restrict attr void startroutinevoid void restrict arg The arguments are as follows On return thread is an address pointing to the allocated thread object attris a pointer to the attribute object for this thread Attribute objects provide a mechanism for changing the configurable aspects of a thread startroutine is a pointer to the routine the thread should start its execution at For this lab the start routine for all of our threads will be inccount arg is a pointer to the arguments passed to the start routine Each thread startroutine is only allowed one argument Thus multiple arguments are packaged in a struct pthreadcreate cont In this example the easiest way to create all of our threads is in a for loop In order to save time I39ve provided the syntax you should use here but do make sure you understand what this call and each of its arguments mean for i o i lt NUMTHREADS i pthreadcreateampthreadsi ampattr inccount void targs When pthreadcreate returns the thread pointed to by ampthreadsi is live ie it is ready to run and is waiting to be scheduled or has started running from inccount already Waiting on the Threads You39ll recall from lab 3 that we used waitpid to tell the parent process to wait for its children We achieve similar behavior for threads with join int pthreadjoinpthreadt thread void vaueptr pthreadjoin will suspend execution of the calling process or thread until the target thread the thread argument has terminated valueptr is a pointer to the data returned by the pthreadexit library call We do not use this data so you can just pass in NULL here Again use pthreadjoin in a for loop for i 0 i lt NUMTHREADS i pthreadjointhreadsi NULL Testing the Threaded Program Now is a good time to go ahead and test the threaded program To run the program use bash32 ptcount OUTERLOOPMAX NNERLOOPMAX Where OUTERLOOPMAX is the number of times the outer loop in each thread should iterate and NNERLOOPMAX is the number of times the inner loop in each thread should iterate If you created and joined the PThreads correctly you should see something like bash32 ptcount 1O 50 Thread 0 finished Counted 500 Thread 1 finished Counted 500 Thread 2 finished Counted 500 Main Waited on 3 threads Final value of count 1500 Done Which is exactly what we wanted Each thread increments the count 500 times 1050500 and the final value of count is 1500 because we have three threads Puzzling Behavior Though these initial tests seem to show that ptcount is working correctly running ptcount with larger loop boundaries results in very puzzling behavior The makefile has a target called test that runs ptcount with sufficiently large loop boundaries The output should look something like this bash32 make test ptcount 10 500000 Thread 0 finished Counted 5000000 Thread 1 finished Counted 5000000 Thread 2 finished Counted 5000000 Main Waited on 3 threads Final value of count 10036976 Done Each thread reports that it counts up to 5 million but the final count is much less than 15 million Note that in ptcountc each process prints out a local variable to report its own count Furthermore if you run the test again the final count changes but it is still not the expected result What is going on Incrementing Count Each thread updates the count using the operator in C count Observe that this one line instruction in C is actually implemented by three instructions in the x86 hardware mov CADDR eax Move count into a register add 1 eax Add 1 to count mov eax CADDR Store count back into memory Now recall that each thread is sharing only one copy of count which is loaded from memory and stored again each time the thread increments it While this does not explain our behavior yet it39s necessary for understanding the solution A Preemptive Scheduler Recall from lecture that the Linux scheduler is preemptive That is the scheduler has the ability to stop a running process with the intention of resuming it later and replace it with another running process Now imagine a scenario where the schedulerjust happened to interleave our running threads in the following way T1 T2 mov CADDR eax mov CADDR eax add 1 eax mov eax CADDR P FnrbFDNT add 1eax mov eaxCADDR An Interleaving Problem Observe that At time 1 T1 loads the count value from memory Next T1 is preempted and T2 begins executing In its execution it loads the same value from memory T2 continues incrementing the value it had loaded and stores it back into memory Now at time 5 T2 is preempted by T1 At this time T1 still holds the value it loaded at time 1 into eax T1 proceeds to increment this value and stores it back into memory When T1 stores its value of count back into memory the work done by T2 is essentially lost This explains why the value read by the parent process after the threads had finished incrementing is substantially lower than what you might expect When each thread runs long enough to be preempted by the scheduler several times and when each thread is running code with a significant percentage of instructions that will cause this problem when preempted there is bound to be some inconsistency in the data being operated on Questions To test your understanding you should answer the following questions Why was there no inconsistency in the count when you tested for smaller inner and outer loop bounds Why are the local variables that are printed out always consistent Fixing the Problem Regions of code that update shared data are known as critical sections In order to ensure proper synchronization between cooperating threads we need some way of enforcing the property that only one thread may execute inside a critical section at a time Solution Place mutex locks around critical sections The operating system ensures mutual exclusion between threads executing inside code regions locked by mutex locks is enforced pthreadmutexlock NPTL provides a library call which implements a mutex lock int pthreadmutexlockpthreadm utext mutex int pthreadmutexunlockpthreadmutext mutex Place mutex lock and unlock calls around the inner for loop pthreadmutexlockampcountmutex for j 0 j lt myargsgtinl j count pthreadm utexunlockampcountm utex Now the operating system allows only one of the running threads to hold the countmutex at a time When a thread reaches pthreadmutexlock and countmutex is already held the operating system blocks the calling thread All threads blocking on countmutex are woken up when the lock is released by pthreadmutexunlock Final Solution When you have built the example with the locks in place ptcount should produce the following output bash32 make test ptcount 10 500000 Thread 0 finished Counted 5000000 Thread 2 finished Counted 5000000 Thread 1 finished Counted 5000000 Main Waited on 3 threads Final value of count 15000000 Done
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'