New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Addison Beer


Addison Beer
GPA 3.76


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Mathematics (M)

This 96 page Class Notes was uploaded by Addison Beer on Wednesday September 9, 2015. The Class Notes belongs to MATH 381 at University of Washington taught by Staff in Fall. Since its upload, it has received 6 views. For similar materials see /class/192102/math-381-university-of-washington in Mathematics (M) at University of Washington.




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/09/15
Mathematical Modeling Math 381 Course Notes University of Washington Prof Tim Chartier Winter 2008 R J LeVeque Acknowledgments These notes are based on many quarters of teaching Math 381 The original notes were compiled by Randy LeVeque in Applied Math at UWi Tim Chartier and Anne Greenbaum contributed signi cantly to the current version in 20012002 Sara Billey contributed to the manuscript in the fall of 2004 Tim Chartier made further contributions as he taught a version of this course at Davidson College in the 2005 Jim Burke contributed a portion of the chapter on linear programming We thank them for their efforts Since this document is still evolving7 we appreciate the input of our readers in improving the text and its content Math 381 Notes 7 University of Washington 7 Winter 2008 Contents 1 Introduction to Modeling lil Lifeboats and life vests i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 12 The goals of 381 i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 2 Graph Theory and Network Flows 2 1 Graphs and networks i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 22 The traveling salesman problem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 221 Brute force search i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 222 Counting the possibilities i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 223 Enumerating all possibilities i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 224 Branch and bound algorithms i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 23 Complexity theory i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 24 Shortest path problems i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 241 Word ladders i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 242 Enumerating all paths and a branch and bound algorithm i i i i i i i i i 243 Dijkstra s algorithm for shortest paths i i i i i i i i i i i i i i i i i i i i 25 Minimum spanning trees i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 251 Kruskal7s algorithm i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 252 Primls algorithm i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 253 Minimum spanning trees and the Euclidean TSP i i i i i i i i i i i i i i 26 Eulerian closed chains i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 27 73 N73 and NPcomplete problems i i i i i i i i i i i i i i i i i i i i i i i i i i 3 Stochastic Processes 31 Basic Statistics i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 32 Probability Distributions and Random Variables i i i i i i i i i i i i i i i i i i 321 Expected Value i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 33 The Central Limit Theorem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 34 Buffon7s Needle i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 4 Monte Carlo Methods 41 A sh tank modeling problem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 42 Monte Carlo simulation i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 421 Comparison of strategies i i i i i i i i i i i i i i i i i i i i i i i i i i i i 43 A hybrid strategy i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 44 Statistical analysis i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 45 Fish tank simulation code i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 46 Poisson processes i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 47 Queuing theory i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i ii CONTENTS 5 Markov Chains and Related Models 45 51 A Markov chain model of a queuing problem i i i i i i i i i i i i i i i i i i i i i i i i 45 52 Review of eigenvalues and eigenvectors i i i i i i i i i i i i i i i i i i i i i i i i i i i i 48 5 3 Convergence to steady state i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 50 54 Other examples of Markov Chains i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 52 5471 Breakdown of machines i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 52 542 Work schedules i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 53 543 The sh tank inventory problem i i i i i i i i i i i i i i i i i i i i i i i i i i i 55 544 Card shuf ing i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 56 55 General properties of transition matrices i i i i i i i i i i i i i i i i i i i i i i i i i i i 57 5 6 Ecological models and Leslie Matrices i i i i i i i i i i i i i i i i i i i i i i i i i i i i 59 5 61 Harvesting strategies i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 61 6 Linear Programming 63 6 1 lntroduction i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 63 6171 What is optimization i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 63 612 An Example i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 63 6 13 Sensitivity Analysis i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 67 6 14 Duality Theory i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 69 6175 LPs in Standard Form and Their Duals i i i i i i i i i i i i i i i i i i i i i i i 70 62 A pig farming model i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 73 6 21 Adding more constraints i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 75 6 22 Solving the problem by enumeration i i i i i i i i i i i i i i i i i i i i i i i i i 75 6 23 Adding a third decision variable i i i i i i i i i i i i i i i i i i i i i i i i i i i 77 624 The simplex algorithm i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 77 63 Complexity and the simplex algorithm i i i i i i i i i i i i i i i i i i i i i i i i i i i i 80 7 Integer Programming 81 71 The lifeboat problem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 81 7 2 Cargo plane loading i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 83 73 Branch and bound algorithm i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 84 7 4 Traveling salesman problem i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 86 7471 NP completeness of integer programming i i i i i i i i i i i i i i i i i i i i i i 87 75 l in Recreational Mathematics i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 87 7571 An Example Puzzle i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 88 7 52 Formulation of the l i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 88 7 6 Recasting an IP as a 01 LP i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 89 Math 381 Notes 7 University of Washington 7 Winter 2008 Chapter 1 Introduction to Modeling 11 Lifeboats and life vests In a January 1998 article Most Ferries Short Of Life Rafts 7 Is It A Titanic Problem Or A Reasonable CostControl Measure77 the Seattle Times announced that Washington State Ferries which service the Puget Sound area see Figure 11 were not carrying enough lifeboats for all passengers In fact as we will see there is a reason for this decision The ferries do carry enough life vests to make up the difference However the Washington ferries traverse cold waters As such it would be better to carry more lifeboats since your chances of surviving are better in a lifeboat than in cold water However lifeboats take up much more space and so there is a tradeoff 7 the capacity of the ferries would be greatly reduced if they had to carry enough lifeboats for everyone Suppose the XYZ Boatbuilders have come to us with the following problem They are designing a new ferry boat and want to determine the number of lifeboats and life vests to equip it with They give us the following information 0 Each vest holds 1 person and requires 005 m3 of space to store 0 Each boat holds 20 people and requires 21 m3 of space to store 0 We must have capacity for 1000 people in one or the other 0 The total space to be devoted to this equipment is at most 85 m3 How many of each should they install One might go through the following steps in determining a solution 1 First see if it s possible to accommodate everyone in lifeboats which would be best This would require 100020 X 21 105m3 of space so the answer is no we must use some combination 2 Make sure there7s enough space for life vests for everyone otherwise the problem is impossible 1000 X 05 50mg so it would be possible to provide only life vests and have space left over But for the best solution we7d like to use more of the available space and provide as many boats as possible 9 Figure out how many boats and vests are used if we exactly use all the available space and provide exactly the necessary capacity We can set this up as a mathematical problem There are two unknowns 11 number of vests 12 7 number of boats Introduction to Modeling Figure 11 The Washington State Ferries service the Puget Sound area7 which includes the metroi Seattle area Pictures taken from httpwwwwsdotwagovferries 4 01 03 K and two equations to be satis ed7 one saying that the total capacity must be 1000 people and the other saying that the total volume must be 85mg This is a linear system of equations 011171 02172 C 11 V1951 V2952 V where 01 1 capacity of each vest Cg 20 capacity of each boat C 1000 total capacity needed V1 005 volume of each vest V2 21 volume of each boat V 85 total volume available Solve the mathematical problem Equation 11 is just a 2 x 2 linear system so its easy to solve7 obtaining x1 3636364 number of vests x2 318182 number of boats Interpret the mathematical solution in terms of the real problem Clearly we cant actually provide fractional vests or boats7 so we ll have to x up these numbers The only way to do this and keep the volume from increasing is to reduce the number of boats to 31 and increase the number of vests to keep the capacity at 1000 Since 31 x 20 620 this means we need 380 vests The total volume required is 8417713 Present the answer to the XYZ Company At which point they say You must be crazy Every body knows that the lifeboats are arranged on symmetric racks on the two sides of the boat7 and so there must be an even number of lifeboats Fix up the solution using this new information There must be 30 boats and 400 vests Reconsider the solution to see if we7ve missed anything Notice that the original mathematical solution had 352 almost equal to 32 This suggests that if we had just a little more space 85277137 we could increase the number of boats to 32 instead of 307 a substantial improvement Suggest 12 The goals of 381 3 Figure 12 Balancing the load of ferries is important as the boats carry great weights Picture taken from httpwwwwsdotwagovferries this to XYZ They respond Well sure the 85m3 gure was just an approximation We can easily nd another 02m3if it means we can t in two more boats This simple example illustrates some of the issues involved in mathematical modeling 0 The problem is usually not presented as a standard mathematical problem but must be worked into this form 0 There may be many different ways to model the same problem mathematically In the above example we reduced it to a linear system But for more complicated problems of this same type this would not be possible and we would have to deal directly with the fact that we are given inequality constraints eg the total volume must be no more than 85mg rather than equalities The problem is better set up as a linear programming problem a type of discrete optimization problem in which we try to maximize the number of boats subject to two sets of constraints an upper bound on volume and a lower bound on capacity Actually it is an integer programming problem since we also have the constraint that the solution must be a pair of integers There are standard methods for solving these types of problems 0 Once the mathematical problem has been solved eg the linear system above we need to interpret the answer to this problem in terms of the original problem Does it make sense 0 Often the information presented is not complete or necessarily accurate Here for example we weren7t told to begin with the information that the number of boats had to be even or that the available volume was only approximate o Often the problem is far too hard to model in detail and solve exactly This example was quite straightforward in this regard but typically one must make decisions about what aspects to consider and which can be safely ignored or must be ignored to get a solvable problem Often there is no right answer though some answers are much better than others 12 The goals of 381 The goals of Math 381 are the following 0 Learn about the modeling process How does one take an illde ned real world problem and reduce it to a mathematical problem that can be solved by standard techniques What issues are Introduction to Modeling involved in dealing with real data and complex problems where we can often only hope to model some small piece of the total problem In order to have any chance of making a model that reduces to a standard mathematical problem77 we must learn what some of these standard problems are For example in the above problem we recognized early on that we could set it up as a linear system of equations and we knew that this would be a good thing to do since there are standard methods for solving such systems such as Gaussian elimination We need some more tools in our tool bag in order to do modelingl We will see a variety of standard problems and how each can arise in many different contexts These will be introduced in the course of looking at a number of modeling problemsl Math 381 concentrates on discrete modeling which means the mathematical problems typically involve a discrete set of values or objects eigl the life vests and lifeboatsi Moreover there are typically a discrete set of possibilities to choose between Trying out all of the possibilities to see which is best might be one possible approach but is typically very inef cient We want to use mathematical techniques to make the job easier For example it is clear that the solution to the problem above must consist of at most 1000 vests since this would accommodate everyone and at most 40 lifeboats since this would use all the available space One could try all possible combinations and nd the right mix by trial and error but as mathematicians we recognize that it is much easier to simply solve a 2 X 2 linear systemi There are many standard mathematical problems and techniques which are useful in discrete modelingl Some of these are 7 Linear programming and integer programming Ii i i Combinatoiic and ial 1 Graph theory Network flows and network optimization Dynamic programming Queuing theory Stochastic processes and Markov chains Monte Carlo simulation Difference equations and recurrence relations Discrete dynamical systems Game theory Decision analysis 7 Time series analysis and forecasting Many modeling problems are better solved with continuous models in which case different mathe matical techniques are used for example di erential equations Such models are studied in AMath 383 The main emphasis of this course will be on the modeling process taking a problem and setting it up as a mathematical problem and interpreting the results of the model Along the way we also will learn something about techniques for solving some of the standard problems listed above eg solving a linear programming problem by the Simplex Method This is necessary in order to solve the modeling problem but the study of methods for solving these problems is not the main emphasis of this course ii 1i 1 There are other courses which on the aspects of these problems and the methods which have been developed to solve theml Some of these are u i 1 12 The goals of 381 5 Math 461 Math 462 Math 407 Math 408 Math 409 Discrete optimization Math 491 492 Stochastic processes CSE 373 Data structures and algorithms CSE 417 Algorithms and Complexity Industrial Engineering 410 411412 and 424 Operations Research and Simulation Combinatorics Graph Theory Linear optimization Nonlinear optimization This course will provide an introduction to many of these topics which might motivate you to take some of these other courses and pursue the mathematics in more dept There are also other related courses in other departments including Industrial Engineering and Management Science See the description of the Operations Research Option in the ACMS BS Degree Program for a list of some of these http www mathwashington eduacms Another goal of the course is to give you experience in working together with others on a mathe matical problem Discussion will be encouraged in class and collaboration is encouraged outside of class The projects must be done in groups This is the way most modeling is done in the real world by teams of mathematicians or more commonly by a mathematician working together with scientists engineers or specialists in another discipline The project will involve a paper Scienti c writing even when it is written to a nonmathematical audience which such papers often are is an important part of any realworld modeling project and an important skill to master During the quarter you also will read some papers written by others where modeling problems are presented and do some shorter writing assignments Introduction to Modeling Math 381 Notes 7 University of Washington 7 Winter 2008 Chapter 2 Graph Theory and Network Flows 21 Graphs and networks Imagine you are planning a trip around the United States based on a map showing distances between various cities and points of interest The map has a labeled point for each city and lines connecting nearby cities are labeled by the distance between the two cities One might ask a question like How should we drive if we want to visit every city on the map77 This is an example of a network flow problem from graph theory More generally a graph is a collection of vertices or nodes which are connected by edgesi A graph is connected if there is a way to get from any vertex to any other vertex by following edgesi Figure 21 shows a few types of connected graphs that arise in many different applications 0 A directed graph or digraph has arrows on edges indicating what direction one can move between nodes 0 A network has weights on the edges often indicating the cost of moving from one vertex to the neighbor eg the distance or driving time in the map example A network could also be directed for example if some routes on the map are oneway streets 0 A tree is a connected graph which contains no circuitsi A family tree is one familiar examplei Graph Directed graph Network Tree Figure 21 Some common graphs The vertices and edges of a graph might represent any number of different things depending on the application For example the vertices in a digraph might represent legal positions in a game with an arrow from Vertex A to Vertex B if you can go from Position A to Position B by a single legal mover Or the vertices could represent individuals in a hierarchical organization with an edge from A to B if Person A reports to Person B These and other examples are discussed in Rob76li 7 8 Graph Theory and Network Flows 22 The traveling salesman problem A famous network ow problem is the traveling salesman problem or TSP for short This name comes from one particular application but the same mathematical problem arises in many contexts GENERAL FORMULATION OF TSP Given a network with N 1 vertices having an edge between every pair of vertices what is the shortest path through the network which starts at a particular vertex Visits every other node exactly once and returns to the original vertex Studying the TSP gives a good introduction to the mathematical theory of combinatorics and complexity of algorithms Consider the TSP for a salesman who wants to start in Seattle Visit 4 other cities and end up back in Seattle What is the optimal order in which to Visit the other cities 221 Brute force search One way to solve this problem is to simply enumerate all the possible routes determine the total length of each and choose the shortest This is called the brute force approach since we arenlt using any sophisticated ideas to do an intelligent search for the best route we re just doing the obvious thing It may still take some thought however to gure out how to systematically enumerate and check each possibility without missing any routes 222 Counting the possibilities The rst question is how many possible routes are there This will give us some guidance when designing an algorithm to check them all and also will tell us something about how long it will take to do this brute force search Counting questions like this are one aspect of the mathematical eld of combinatorics To do a count for this 4 city TSP note that we must choose one city to Visit rst and there are 4 possibilities Once we7ve chosen that city we must choose which city to Visit second and there are only 3 possibilities since this must be a different city Then we have to choose the city to Visit third and there are only 2 unvisited cities left Finally the fourth city is determined since there is only one city left This is illustrated as a decision tree in Figure 22 where the four cities are labeled A B C Di Starting from the city of origin 0 we have 4 choices then 3 choices then 2 and ll The total number of distinct routes is 41432124 Similarly for the TSP with N cities actually Nl cities including Seattle the number of possible routes we must check is NNNilN7232li Note that what we have counted is the number of ways one can permute the N cities In general for any set of N distinct objects there are N possible orderings or permutations of the N objects With only 4 cities we could quickly compute the traveling salesman s path length for all 24 permu tations by hand and determine the shortest route But unfortunately Nl grows very rapidly with N eg 5 120 10 3628800 20 2432902008176640000 50 m 3x105 For a TSP with 10 cities we would need to check about 36 million possibilities still fairly easy on a computer To get an idea of how long this would take consider a 500 MHZ PC This is a computer which performs 500 million instructions or cycles every second Here instruction means a low level machine operation and it typically takes many cycles to perform one arithmetic operation such 22 The traveling salesman problem 9 as an addition or multiplication These are called oating point operations since scienti c notation is usually used to represent a large range of values on the computer For computations of the sort mathematicians engineers and scientists do a useful measure of a computer s speed is the number of oating point operations per second called flops Machine speed is often measured in megaflops millions of oating point operations per second A 500 MHZ machine might do about 40 megaflops Supercomputer speeds are often measured in gigaflops billions of flops or teraflops trillions of flops Returning to the traveling salesman problem note that to compute the length of a single route requires about N additions adding up the length between each pair of cities on the route as well as more work to keep track of which route to examine next which route is shortest etc Lets not worry too much about the exact cost but suppose that we are working on a machine that can examine ten million routes per second Then solving the lOcity TSP would take about 13 of a second Not bad But already with 20 cities it would be difficult to nd the answer using this brute force approach We now have about 24 X 1018 routes to examine and at a rate of 10 million per second this will take 24 X 1011 seconds which is about 7715 years Even if we move up to a supercomputer that is a thousand times faster it would take almost 8 years For a TSP with 50 cities there is no conceivable computer that could nd the solution in our lifetimes using the brute force approac Many simple problems suffer from this kind of combinatorial explosion of possible solutions as the problem size increases which makes it impossible to simply enumerate an check every possibility Nonetheless such problems are solved daily using more sophisticated mathematical techniques 223 Enumerating all possibilities For an Ncity TSP with N sufficiently small we could solve by enumeration This requires some algorithm for systematically checking every possibility Figure 22 shows one way to think about enu merating all the possible routes for the 4city TSP In the rightmost column is an enumeration of all 24 routes in the lexicographic order so called because this is the order that would be obtained by putting the 24 strings representing different routes into alphabetical order Note that the Si 6 routes starting with A come rst then the 6 routes starting with B and so on Within each set the routes are grouped by which city comes second ordered from lowest to highest in lexicographic ordering 224 Branch and bound algorithms lnstead of evaluating the cost of every path appearing in the tree of Figure 22 one might be able to rule out some paths early if their cost exceeds that of the best path discovered so far Consider for example the network in Figure 23 One might rst try a greedy algorithm that takes the shortest edge to a new city at each step This leads to the path OADCB O which has a total cost of 15 After discovering this any path in Figure 22 that uses edge BD or equivalently DB can be eliminated since the cost of that edge alone is 16 By continuing to check the other paths that start with OA one nds that the path OABCD O has a cost of only 14 Examining next the paths that begin with 0D and discarding any partial paths whose cost exceeds 14 one nds that ODABCO costs only 11 and this enables one to eliminate even more possibilities Paths beginning with OC and OB still must be checked but as soon as the cost of a partial path exceeds 11 that branch of the tree can be eliminated from further consideration This type of algorithm is sometimes called a branch and bound algorithm The key idea is to obtain some sort of bound on how good the shortest path within a given subtree could possibly be and if a less costly path has already been identi ed to prune away those branches of the tree thereby reducing the required computation If we are unlucky then we might not be able to prune much of the tree and in the worstcase scenario we have to examine the entire tree anyway which is no improvement in running time However in practice a good branch and bound algorithm will typically be much faster than pure enumeration and problems are solved this way which would otherwise be intractable 10 Graph Theory and Network Flows W WWW 00gt gag gtUgtU gag U agt o lt BDC 7 BDCA o CAB 7 CABD CA lt CAD 7 CADB CBA 7 CBAD C CB lt CBD 7 CBDA CDA 7 CDAB CD lt 7 CDB CDBA DAB 7 DABC DA lt CAC 7 DACB D DB lt DBA 7 DBAC DEC 7 DBCA B lt DCA 7 DCAB DCB 7 DCBA Figure 22 Decision tree for enumerating all possible routes in a traveling salesman problem starting from City 0 and Visiting four other cities A B an D The right column shows an enumeration of all permutations of the 4 cities in lexicographic orderi Seattle O 1 1 D 6 C Figure 23 A traveling salesman network starting from Seattle and Visiting four other cities A B C 23 Complexity theory 11 Many other algorithms have been developed for the traveling salesman problem and for various special cases of this probleml An introduction to many of these can be found in LLASQOL 23 Complexity theory For a particular mathematical problem say the traveling salesman problem there is often a parameter N which measures the size of the problem eg the number of vertices in the network If we have an algorithm for solving the problem then we are interested in knowing how much work the algorithm requires as a function of Ni In particular we want to know how fast this grows as N increases For some algorithms the amount of work required or time required on a computer is linear in N meaning it is roughly cN where c is some constant If we double the size of the problem then the amount of work required roughly doubles We say that such an algorithm requires ON work The big Oh stands for order 7 it takes order N work and we re usually not too concerned about exactly what the size of the constant c is For some algorithms the work required increases much faster with Ni An 0N2 algorithm requires work that is proportional to N2 which means that doubling N increases the work required by a factor of 4 If an algorithm requires 0Np work for some p then it is said to be a polynomial time algorithml The larger p is the more rapidly the work grows with Ni Here are some examples of polynomial time algorithms 0 Finding the largest of N values Checking the value of each item and keeping track of the largest value seen takes ON operations 0 Solving a system of N linear equations in N unknowns using the Gaussian elimination algorithm takes 0N3 operations o If the linear system is upper triangular then it only takes 0N2 operations using back substi tution i o Dijkstra s algorithm for nding the shortest path between two vertices in a network see below requires 0N2 work in general For most problems we will discuss with polynomial time algorithms the value of p is typically fairly small like 1 2 or 3 in the examples above Such problems can generally be solved fairly easily on a computer unless N is extremely large There are other problems which are essentially much harder however and which apparently require an amount of work which grows exponentially with the size of N like 02N or like ONll The traveling salesman problem is in this category While many algorithms have been developed which work very well in practice there is no algorithm known which is guaranteed to solve all TSP s in less than exponential time See Section 27 for more about this 24 Shortest path problems A class of problems that arise in many applications are network ow problems in which we wish to nd the shortest path between two given vertices in a network The problem of nding the shortest route on a map is one obvious example but many other problems can be put into this form even if they donlt involve distance in the usual sense 241 Word ladders A word ladder is a sequence of valid English words each of which is identical to the previous word in all but one letter ltls fun to try to construct word ladders to go from some word to a very different word with as few intermediate words as possible For example we can make FLOUR into BREAD as follows 12 Graph Theory and Network Flows Given two English words with the same number of letters how could we determine the shortest word ladder between them Assuming there is one or determine that none exists This can be set up as a network ow problem Suppose we are considering Nletter words then we could in principle construct a graph where the vertices consist of all Nletter words in the English language We draw an edge from one vertex to another if those two words are simple mutations of each other in which only a single letter has been changed A word ladder then corresponds to a path through this graph from one word to the other In fact this graph with nearly 6000 vertices has been constructed for N 5 by Don Knuth and is available on the web see the pointer from the class webpage Word ladders and algorithms for manipulating these graphs are discussed at length in Knu93 which describes a general platform for combinatorial computing Finding the shortest word ladder is equivalent to the shortestpath problem through a network if we assign every edge in the graph the weight 1 Given a network and two speci c vertices how do we nd the shortest path between them For small networks this is easy to do by trial and error particularly if the graph is drawn in such a way that the physical distance between the vertices roughly corresponds to the weight of the edge as is the case for a map But the weights may have nothing to do with physical distance and at any rate if there are thousands or millions of vertices and edges it may be impossible to even draw the graph 242 Enumerating all paths and a branch and bound algorithm One approach to solving a shortestpath problem would be to enumerate all possible paths between the start and end vertices compute the length of each path and choose the shortest To enumerate all paths we could build a decision tree77 similar to what was illustrated in Figure 22 for the TSP The problem with this approach is that for a graph with N vertices there may be very many possible paths to choose between How many In the worst case there would be an edge connecting each vertex with each of the N 7 1 other vertices In this case we could visit any or all of the other vertices on the way between the start and nish and the number of possible paths would be exponential in N just as for the TSP The set of all possible paths could be searched more ef ciently using a branch and bound algorithm similar to that described for the TSP However for this problem there turn out to be much more ef cient algorithms which are guaranteed to give the answer with relatively little work 243 Dijkstra7s algorithm for shortest paths Dijkstra 5 algorithm is a systematic procedure for nding the shortest path between any two vertices Actually this algorithm solves the problem by embedding it into what appears to be a harder problem that of nding the shortest route to all other vertices from a given vertex The set of all shortest routes is built up in N steps by nding the closest node in Step 1 the next closest node in Step 2 and so on By the end we know the best route to every other node and in particular to the destination node Each step requires examining the distance from the node that was solved in the previous step to all unsolved nodes which can be reached from this node This requires examining at most N 7 l edges so the work required is ON in each of the N steps and hence the total work is 0N2 To implement it on a computer for a large network requires some suitable data structure for storing the network with information about which vertices are connected to one another and the weight of each edge One way to represent a network or directed network on a computer is with a matrix The rows 25 Minimum spanning trees 13 and columns correspond to the nodes of the network and the ij entry of the matrix is the weight on the edge from node i to node j or if there is no edge from i to j that entry can be set to A matrix representation of the network in Figure 23 is OABCD 0 02543 Al20291 Bl520116 0 49106 Dl311660 Well go through some simple examples of Dijkstra s algorithm in class It is described in many books eg HL95 and Rob84 A writing assignment will be to produce your own description of this algorit m 25 Minimum spanning trees Another type of problem that arises in many applications is to connect a collection of vertices at the lowest possible cost Examples include highway construction and telephone line installation the vertices represent cities and the roads or phone lines correspond to edges constructed so that there is some path between every pair of vertices ln graph theoretic terms this is accomplished via a minimal cost spanning tree Recall that a tree is a graph that is connected and has no circuits If a group of vertices is connected by edges that do contain a circuit then it is not hard to see that one or more of the edges can be removed without destroying the connectedness of the graph This is illustrated in Figure 24 Since each edge has a nonnegative cost associated with it clearly the least expensive way to connect the vertices will be via a graph with no circuits ie a spanning tree Figure 24 Graphs with circuits and corresponding spanning trees In a spanning tree the number of edges is always one less than the number of vertices e N 7 1 Each graph in Figure 24 has N 5 vertices and the two trees have 6 4 edges There are two well known algorithms for nding a minimal cost spanning tree in a connected network Kruskal7s algorithm and Primls algorithm 251 Kruskal7s algorithm Kruskal7s algorithm begins by sorting the edges of the graph in order of increasing cost For the network shown in Figure 25 this order is AB8 CD10 CE12 DE22 BC53 BE54 AD63 and AE70 Each edge is then considered in turn and if it does not form a circuit with edges already in the good set it is added to the good set This is continued until the good set has N71 edges Since the good set has no circuits it is a tree If all edges have been examined and the good set still does not have N 7 1 edges it means that the original graph must not have been connected and this fact can be reported to the user The rst edges added to the tree from Figure 25 are AB 14 Graph Theory and Network Flows CD and Edge DE is then examined but it is found to form a circuit with edges already in the tree so this edge is skipped and edge BC is added to the tree Now the tree has N 7 l 4 edges so the algorithm is nished The edges of the minimum spanning tree are AB CD CE and BC A 8 B 70 5 63 53 E 22 1 D 10 C Figure 25 Minimum spanning tree 252 Prim7s algorithm Primls algorithm for nding a minimal cost spanning tree avoids the initial sorting phase of Kruskalls algorithm The two algorithms generate the same result however when the minimal cost spanning tree is unique Primls algorithm begins by choosing any vertex from the graph and including it in the tree T For the graph of Figure 25 we could choose for instance vertex D It then nds that edge in the graph joining a vertex in T to one not in T which has the lowest cost CD in our example This step is repeated until the tree has N 71 edges or until there are no more edges joining vertices of T to vertices not in T In the latter case it reports that the graph is not connected In our example problem the next edge to be added is We then search for the lowest cost edge joining C D or E to one of the remaining nodes and add edge BC to the tree Finally the lowest cost edge between either B C D or E and A is added namely AB Exercise Try using Prim s algorithm with different starting vertices Convince yourself that it generates the same minimal cost spanning tree or one with the same cost if there is more than one minim cost spanning tree no matter where you start Do you see why Kruskal7s algorithm and Primls algorithm generate the same result Both Kruskal7s algorithm and Primls algorithm are polynomial time algorithms The best imple mentations of these algorithms run in OElog2 N time where N is the number of vertices and E the number of edges in the original graph The number of edges in a graph is at most Lagil so the work required is ON2 log2 N 2 253 Minimum spanning trees and the Euclidean TSP Often the weights on the edges of a network represent actual distances between cities or other destination points In this case the weight or cost of going from vertex A to vertex B is the same as that of going from B to A and the costs also satisfy the tiizmgle inequality costAB costBC 2 costAC When the weights on a network for the traveling salesman problem represent actual distances between points in a plane the problem is called a Euclidean TSP 26 Eulerian Closed Chains 15 While there is no known polynomial time algorithm for solving the Euclidean TSP there is an easy way to nd a route whose cost is within a factor of two of the optimal Construct a minimum spanning tree Starting at the origin node move along edges of the minimum spanning tree until you can go no further Then backtrack along these edges until you reach a point from which you can traverse a new edge to visit a new node or until all of the nodes have been visited and you can return to the origin This is illustrated in Figure 26 where the minimum spanning tree is marked and the suggested path is ODOABCBAOi The total cost of this path is twice the cost of the minimum spanning tree Another possible path is OABCBAODO which also traverses each edge of the minimum spanning tree twicei A 8 B Figure 26 Euclidean TSP network with minimum spanning tree Path ODOABCBAO costs twice as much as the minimum spanning tree and tour ODABCO is within a factor of two of optim The path constructed in this way is not a legitimate tour for the TSP since it visits some nodes more than once But because the weights satisfy the triangle inequality any path between nodes that visits other nodes along the way that have already been visited can be replaced by a direct route between the given nodes and the cost will be no greater Thus one arrives at a tour whose cost is at most twice that of the minimum spanning tree For the example in Figure 26 this tour is ODABCO and has a total cost of 40 Had we chosen the other path OABCBAODO it would have been replaced by the tour OABCDO which has a total cost of 34 and turns out to be the optimal tour for this problem Since the minimal cost spanning tree is the lowest cost set of edges that connects all of the vertices any traveling salesman tour that visits all of the vertices must cost at least as much as the minimal cost spanning tree 25 for our example This gives a lower bound on the cost of the optimal tour while the tour derived above whose cost is within a factor of two of this value provides an upper bound This information can be used within a branch and bound algorithm to hone in on the actual optimal touri 26 Eulerian closed chains Still another type of graph thoeretic problem that arises in many applications is that of nding an Eulerz39zm closed chain A path that starts at a given node traverses each edge of the graph exactly once and returns to the starting point This problem might arise for example in scheduling snowplows or street sweepers If the edges of the graph represent streets then the object is to plow or sweep each street once but if possible not to backtrack over streets that have already been servicedi This type of problem is also popular in many mathematical games In fact it is said that graph theory was invented by Euler in the process of settling the famous Konigsberg bridge problemi The situation is pictured in Figure 2 There are seven bridges and the question is whether one can start at a particular point cross each bridge exactly once and return to the starting point If you try it for a while you can probably convince yourself that the answer is NOW To see this in a more formal way let the land masses in the problem be represented by nodes and the bridges by edges in a multigraph a graph that can have more than one edge between a pair of nodes This multigraph is shown in Figure 28 Note that if we start at a particular node in the multigraph and travel along one edge to a different node then we must leave that node along a different edger If there is only one edge through the node then this will not be possible Moreover if there are three or ve or 16 Graph Theory and Network Flows Figure 27 The bridges of Konigsberg A C Figure 28 Multigraph representing the bridges of Konigsberg any odd number of edges through that node then in order to traverse all of these edges we will have to enter then leave the node until there is only one unused edge remaining But at that point we are stuck because we must enter the node in order to use that edge but then we cannot leave The same holds for the starting node If there is only one edge through it then we can leave but never return and if there are an odd number of edges through it then we can leave and return a certain number of times but eventually we will be at the starting node with one unused edge remaining to traverse The degree of a vertex is de ned as the number of edges through that vertexi Using the above sort of reasoning Euler proved the following Theorem Euler1736 A multigraph G has an Eulerian closed chain if and only if it is connected up to isolated vertices and every vertex of G has even degree The condition that the graph be connected up to isolated vertices just means that there is no edge that cannot be reached at all There is no Eulerian closed chain in Figure 28 since for example vertex D has degree 3 In fact all of the vertices in this multigraph have odd degree A good way to establish the suf ciency of Eulerls theorem is by describing an algorithm for computing an Eulerian closed chain in an even connected multigraphi Consider for example the multigraph in Figure 29 Start at any vertex and travel to other vertices along unused edges until you can go no further For example we might start at A go to B C D back to B and then to Al For each of the vertices visited we use an even number of edges 7 one to enter and one to leave 7 each time we visit the vertex so that the number of unused edges through each vertex remains event Moreover once we leave the initial vertex it is the only one with an odd number of unused edges through it for every other vertex if we enter we can also leave Therefore our path can end only back at the original vertexi At this point we may not have traversed all of the edges of the multigraph so go to any vertex that has some unused edges through it vertex C for example and repeat Take one of the unused edges to a new vertex and again continue until you can go no rt er For example we might go from C to E to F to G to E to C This path can be combined with our previous one to form the chain 27 73 N73 and NPComplete problems 17 D F Figure 29 Multigraph with an Eulerian closed chain ABC 7 EFGEC 7 DBA There are still two unused edges in the multigraph so we repeat by going from D to F and back to Di Combining this with the previous chain gives an Eulerian closed chain ABCEFGECD 7 FD 7 BA The amount of work required to nd an Eulerian closed chain is proportional to the number of edges in the multigraphi 27 73 N73 and NPcomplete problems We have seen that some problems such as the shortest path problem the minimum spanning tree problem and the Eulerian closed chain problem can be solved in a number of steps that is polynomial in the size of the problemi Such problems are in the class 73 the set of all problems that can be solved in Ptimei Some problems such as the traveling salesman problem might lie in the class 73 but probably don7ti If so nobody has yet found the Ptime algorithmi This problem does lie in a larger class called NP howeveri NP stands for nondetermz39m39stz39c polynomial and the exact de nition is a bit complicated The basic idea is that for a problem in this class it is at least possible to check one possible solution in polynomial time eg for the TSP it is possible to compute the length of any one possible path with only 0N operations For such problems there is at least a chance one could nd a Ptime algorithm to nd the best path if we could search ef ciently enough A problem which is not in N73 is more clearly hopelessi For example suppose we consider a variant of the TSP in which we specify a number B and we wish to nd all tours on a given set of N cities which have total length B or less Then it is possible to construct examples in which there are exponentially many paths in the solution set so even writing down the solution if we knew it would take exponentially long and clearly there s no hope of a P time algorithm to solve the problemi Note that any problem in P is also in N73 so 73 is a subset of N71 ls it a proper subset In other words are there problems in N73 which are not in 73 Or does 73 N73 This is one of the great unsolved problems of theoretical computer science Some progress was made toward this problem when Karp Kar72 showed that many hard problems lie in a set of problems within N73 called the NP complete problems which have the following property if any one of them can be solved in P time then every problem in N73 can be solved in Ptime and so in fact 73 N71 The TSP is NPcomplete and so are many other famous and easytostate problems such as the Hamiltonian circuit problem and the integer programming problemi For more discussion of such problems see for example GJ79 Sip97li Graph Theory and Network Flows Math 381 Notes 7 University of Washington 7 Winter 2008 Chapter 3 Stochastic Processes So far we have studied deterministic processes ie situations where the outcome is completely deter mined by our decisions In real life however there are many situations which are out of our control or where things happen with a certain probability These situations are called stochastic processes or random processes or probabilistic processes For example the amount of time we have to wait for the next bus to arrive is a stochastic process Also the number of salmon returning to spawn in the UW Hatchery each year the number of games won by the Husky volleyball team and the daily Dow Jones Industrial Average are other examples A thorough understanding of statistical modeling is beyond the scope of this course Still your modeling capabilities can be enhanced by a brief introduction to some fundamental concepts from probability and statistics including the mean median and variance of a probability distribution random variables and the Central Limit Theoremi 31 Basic Statistics In general terms a statistic is a number that captures important information about a data set If we capture the right information we can use the statistic to predict future outcomes The primary example of a statistic is the average or mean For example if we learn that bus number 43 comes every 10 minutes on average during rush hour then it helps us predict what time we will arrive at our destination For the data set X z i l 21 n the mean of X denoted n or EX is de ned as 21 Ii TL M To nd the median order the values I in ascending or descending order If n is odd then the median equals rm where m Otherwise the median equals the average of mm and zm1 where m 9 To analyze the behavior of averages consider the following data set Let X be the number of minutes we wait for the bus on the rst 51 days of class 3759 1939 6273 7165 1146 5973 0099 9048 6991 5113 6649 4199 5692 3972 7764 3654 7537 6318 4136 4893 1400 7939 2344 6552 1859 5668 9200 5488 8376 7006 8230 8447 9316 3716 9827 6739 3678 3352 4253 8066 9994 6208 6555 5947 7036 9616 20 Stochastic Processes 7313 3919 5657 4850 0589 To aid in calculating the median7 the following sorted list is also included 0099 3716 5488 6555 7939 9994 0589 3759 5657 6649 8066 1146 3919 5668 6739 8230 1400 3972 5692 6991 8376 1859 4136 5947 7006 8447 1939 4199 5973 7036 9048 2344 4253 6208 7165 9200 3352 4850 6273 7313 9316 3654 4893 6318 7537 9616 3678 5113 6552 7764 9827 The mean and median of X equal 57715 and 579737 respectively Consider changing the largest value of X from to 97994 to 100 Now7 the mean equals 7483 Such a change in the mean is expected given the large change in a value of the corresponding data set On the other hand7 the median remains unchanged Our change to X altered only the value of the largest elementi Therefore7 the value of the middle element7 which equals the median7 remained unchanged Suppose instead7 we had changed the largest value of X from 09994 to another random number between 0 and 17 say 05484 Now7 the mean and median equal 05626 and 059477 respectively in such a case7 the mean still changes7 albeit by a small amount Take a moment and re ect on why the median changes in this case Larger data sets say7 of size 17000 or 107000 exhibit similar behavior although the change in the mean is seen in less signi cant digits The median also changed to 058197 can you see why Check your understanding of these important concepts by verifying for yourself that this should occur Another important statistic is the vznizmce of the data set The variance gives us information about how close the data is centered around the mean iii The variance of the data set X z i 17 27 7 7 7 7 n is de ned as 1 1 7 2 VarX n 71 7 M 7 371 z The variance for the original data set above is 672847 This tells us we would probably never have to wait for the bus for more than 12 minutes On the other hand7 replacing 97994 by 100 we get a variance of 180 32 Probability Distributions and Random Variables in order to model stochastic processes7 we rst need a mathematical notation that can describe a wide variety of situations We wil in introduce the concepts of a probability distribution and a random variable through examplesi Let Y denote any one of the six possible outcomes that can be observed when a fair die is rolled Since a fair die is equally likely to roll any of the values from 1 to 67 there is a 16 h chance of rolling a given number We will write this as PY y for y 1727i 7 7 7 67 Equivalently7 we are de ning a probability distribution function py 16 for y 17 27 7 7 7 7 67 Such a probability distribution can be written as lyll1l2l34 l6l lpy H 16l16l16l16l16l16l 1Some authors de ned the sample variance to be VarX Egg 7 M2 instead 32 Probability Distributions and Random Variables Figure 32 Histograms for tossing a fair die 50 500 and 5000 times In this example Y is a random variable it is a variable that we cannot solve for explicitly but instead one that takes on a different xed value each time we run a trial by rolling the diet Random variables are very useful in probability because we can manipulate them just like ordinary variables in equations For example one could write P2 lt Y S 5 to mean the probability that a roll is among the numbers 3 4 5 A random variable Y is said to be discrete if it can assume only a nite or countably in nite number of distinct valuesl Every discrete random variable comes with an associated probability distribution function PY y An experiment where every value is equally likely to occur is said to have a uniform distribution as its underlying probability distribution Figure 31 a depicts a probability histogram for a uniform distribution In contrast Figure 31 b is an example of a probability histogram for a nonuniform distribution Another way to recognize a uniform distribution is by noting the probability distribution has the form My 7 total outcomes Therefore this type of probability is closely related to the study of counting methods known as combi natoricsl Clearly a probability histogram for Y should exhibit such a uniform distribution Yet each roll of the die is independent from the previous rollsl In other words the die has no memory of its last rolli As any gambler will attest the uniform distribution does not imply that a streak of 47s is impossible To see this graphically histograms of 50 500 and 5000 such experiments are depicted in Figure 22 Stochastic Processes 32 a b and c respectively As the number of experiments increases the histograms converge toward the corresponding uniform distribution Note that even 10000 experiments will not produce a histogram that exactly replicates the distribution in Figure 31 a Seen from the perspective of a gambler indeed guessing the value of a single roll of a fair die is mere chance even with knowledge of a previous outcomel Yet from the perspective of a casino the underlying distribution can place the odds heavily in the houses favor over many such experiments There are many additional probability distributions The modelerls challenge is to choose the one that most closely approximates the data 321 Expected Value The expected value E of a discrete random variable Y with probability function py is de ned to be EC Zy yl y Recall that the probability function for the rolling of a fair die is py 16 for y 12Hl6l Therefore EY equals 251 i6 35 33 The Central Limit Theorem Let Y1 Y2 l l l Yn be independent and identically distributed random variables Then Y Y1 Y2 l H Yn is the random variable measuring the mean over n trialsl What can be said about the distribution of Y The Central Limit Theorem answers this question Theorem 331 Central Limit Theorem Let Y1Y2HlYn be independent and identically dis tributed random variables Then the sample average Y tends to a normal distribution as the sample size tends to The mean of this distribution will be the mean of the population and the variance of this distribution will be the variance of the population divided by the sample size This all holds providing the mean and variance of the population is nite and there is random sampling Take a moment and consider the power of this theoreml The theorem allows any population In fact two populations may have widely differing underlying distributions such as the uniform and nonuniform probability distributions in Figure Sill Yet if multiple independent experiments occur for either population the resulting distribution of the sample average tends toward a normal distribution as sample size increases Let Y and X denote any one of the six possible outcomes available from tossing a fair and weighted die respectively The probability function py 16 for y 12 l l l 6 The probability distribution for X is M H l l5 l6l lpz H 18l13l112l524l112l16l Figure 31 a and b are the probability histograms for Y and X respectively Suppose we throw the fair die 50 times and record the sample average of the 50 outcomesl Repeating this 1000 times and plotting the results produces the histogram in Figure 33 a Repeating such an experiment with the weighted die produces the histogram in Figure 33 Both histograms re ect a normal distribution Should this be expected lndeed it is by the powerful Central Limit Theoreml If this is not readily apparent take a moment and compare our experiments to Theorem 3l3lll Note that convergence is slower for highly screwed distributionsl That is larger sample sizes are needed for a histogram of sample averages to re ect the underlying normal distribution of such an experiment We will see such behavior in the convergence of histograms for different experiments modeled with Monte Carlo methods in the next chapter 34 Buffon s Needle Figure 373 Histograms taken from sample averages a for a uniform distribution and b a nonuniform distribution 1 unit a b Figure 34 Buffonls Needle experiment a depicts the experiment where a needle of unit length is randomly dropped between two lines of unit length apart In b7 D denotes the distance between the needles midpoint and the closest line 9 is the angle of the needle to the horizontal 34 Buffon s Needle ln l7777 Comte de Buffon described an experiment that relates the real number Tr to a random event Speci cally7 a straight needle of unit length is dropped at random onto a plane ruled with straight lines of unit length apart as depicted in Figure 34 a What is the probability p that the needle will intersect one of the lines Before the time of digital computers7 laptops and calculators7 the term computer referred to a job title Parallel computing meant calculating with a large number of mathematiciansi Comte de Buffon performed the experiment repeatedly to determine pi Such hand computation limited the feasibility of mathematical experiments before the advent of the digital computer Comte de Buffon also derived the value of p mathematicallyi Let D denote the distance between the needles midpoint and the closest line7 and let 9 be the angle of the needle to the horizontal Refer to Figure 34 Here both D and 9 are random variables The needle crosses a line if 1 D S Esint l Consider the plot of sint9 as seen in Figure 375 The probability of the needle intersecting a line is the ratio of the area of the shaded region to the area of the rectangle This is the continuous version 24 Stochastic Processes Figure 35 Plot of sint aids in analysis of Buffon7s Needlei of the uniform distribution The area of the rectangle is clearly 7r2i The area of the shaded region is 7r1 0 Esin9d9l Which implies that the probability of a hit is Hence an estimate of 7r is total number of dropsgt total number of hits Just imagine the next time that you drop a pencil you have completed the rst step in a Monte Carlo experiment in estimating 7r Yet it takes heroic patience before an accurate estimate emergesi We Will see in class that Buffon s Needle experiment indeed produces an estimate of 7r but is slow to converge Math 381 Notes 7 University of Washington 7 Winter 2008 Chapter 4 Monte Carlo Methods Monte Carlo methods model physical phenomenon through the repeated use of random numbers It can seem surprising that meaningful insights can be made from such results Yet Monte Carlo methods are used effectively in a wide range of problems including physics mechanics and economics Buffon s Needle is a classic example of Monte Carlo methods and the ability of random numbers to generate meaningful results Note that no essential link exists between such methods and a computer Yet computers enormously enhance the methods7 effectiveness Monte Carlo methods are essentially carefully designed games of chance that enable research of interesting phenomenoni While Monte Carlo methods have been used for centuries important advancements in their use as a research tool occurred during World War II with mathematicians including Neumann and Ulam and the beginnings of the modern computer Today Monte Carlo methods are used in a wide variety of elds to model the physical world With our knowledge of statistics and ideas regarding the use of random numbers in simulation we are ready to examine problems that we can model and study with Monte Carlo methods 41 A sh tank modeling problem The XYZ Tropical Fish Store has come to us for advice on the following problemi They carry 150 gallon sh tanks which are quite large and take up a lot of shelf space in the store They observe that on average they only sell about 1 each week so they don7t want to have too many in stock at any one time On the other hand these are a highpro t item and if they don7t have one in stock when a customer comes in the customer will go elsewhere Moreover it takes 5 days for a new one to be delivered by the distributor after they place an order They are trying to decide on the best strategy for ordering the 150 gallon tanksi They are considering these two in particular 1 Strategy 1 Order a new tank each time one is sold Then they will never have more than one in stock at a time so they won t waste much shelf space but they may often be caught without any E0 Strategy 2 Order a new tank once a week arriving 5 days later Since they sell one a week on average it may be good to have one arrive each week on average Therels probably less chance of being caught without one in stock but if there aren t any customers for several weeks in a row they will accumulatei The question is which strategy is better or is there another that is better yet Think about some of the questions that might naturally arise if you were going to try to determine an answer What questions should you ask the store owners before you even try to model it What 25 26 Monte Carlo Methods other information would you need to know before you could hope to evaluate the strategies Thinking about this should help to suggest what is important in a mathematical model Here are a few examples 0 How much is the pro t on one of these sh tanks relative to other items in the store that would have to be eliminated in order to stock more sh tanks This is one thing we would need to know in order to make any sort of quantitative comparison of different strategies What is the cost of having additional tanks in stock What is the cost of losing a customer by not having a tank available Besides the lost sale itself there might be the cost of losing that customers business in the future if they decide to buy all their sh and supplies from the same place they buy the tank 0 What do you mean by we sell one a week on average ls there one person in town who comes in every Wednesday at noon and always buys one tank Or are there 52 people in town who always give a tank as a Christmas present so there are lots of sales in the month of December and none in other months Either scenario would lead to one a week on average over the year but would lead to very different ordering strategies This is a classic example of an inventory problem Variants of this problem are constantly being solved by businesses which often have to decide on an ordering strategy for many different products simultaneously even thousands for a grocery store for example ere we consider the two simple strategies mentioned above based on some simpli ed assumptions of customer behavior In particular welll assume the sales are uniform over the year Well also assume that an average of once a week means that each day there is a 17 chance that a customer will come This isnlt completely realistic surely there s some chance that two customers will appear in the same day Later we7ll consider this further Note in making these assumptions that we follow a primary rule of model building It is best to start with a simple model and re ne it incrementally Trying to start with a completely realistic model is usually hopeless 42 Monte Carlo simulation The approach we will consider at this stage is a simulation of how the competing strategies might work his means we write a computer program which simulates several weeks 0 how t e storels inventory uctuates assuming one strategy or the other and see which pays off better Of course we don7t know exactly when customers arrive but we can model the arrival of customers at random times by generating random numbers on the computer and using these values to indicate whether a customer arrives on each day or not For example in MATLAB the command rand 1 generates 1 random number which is uniformly distributed between 0 and 1 If this number is less than 17 we can say that a customer comes if it is greater than 17 we can say that a customer does not come On the next page is a simple MATLAB program which simulates the strategy of ordering a new sh tank only when the one in stock is sold In this case there is always at most one tank in stock which simpli es the program The program prints out information about what happens each day and keeps track of statistics over the course of the simulation The parameters have been changed here to make it possible to see more action in a short simulation The probability of a customer each day is 13 and it only takes 2 days for a new tank to be delivered after ordering The program has been simpli ed by leaving out some commands to keep track of totals and display results more complete program which also allows comparison of different strategies can be found in Section 45 and downloaded from the class webpage 42 Monte Carlo 39 27 a 13 X probability of a customer each day daysfordelivery 2 Z days from order to delivery of new tanks stock 1 Z number of tanks in stock deliv 1 X number of days until delivery of tank on order 1 means none on order totalcust O totalsold 0 totallost O for week 13 for weekday 17 d 0 lost 0 if deliv 0 stock stock1 Z a new tank is delivered end if deliv gt 0 deliv deliv1 Z days till next delivery end randomnum rand1 generate random number if randomnum lt a use this number to tell if a customer arrived customers 1 else customers 0 end if customers if stockgt0 Z we have a tank to sell the customer stock stock1 if deliv lt O deliv daysfordelivery we sold a tank and now order another one end else lost lost1 we didn t have a tank and lost a customer end end if customers keep track of total statistics totalcust totalcust customers totalsold totalsold sold totallost totallost lost dispweek weekday stock customers sold lostl display results for this day end loop on weekday end loop on week Z output total statistics now Figure 41 First pass at a MATLAB program for Monte Carlo simulation of the sh tank modeling problemi Monte Carlo Methods Here are the results of one sample simulation week day stock WWWWWWWMMMMMMMHHHHHHH NmmpmMHummpmMHummpmmH HHHHOOOD b OOOOOOOOOD b H cust OOOOOb b OOb Ob Ob b OOb OOO sold lost OOOOOOb OOOOb OOb OOb OOO total over entire simulation customers sold lost 3 OOOOOb OOOHOOOHOOOOOOO It s hard to come to any conclusion from this one short simulation The fact that we could only serve 4 of the 7 customers was due to the particular pattern in which they arrived If they had appeared differently7 the numbers could have been very different get a better feel for how this strategy works in the long run7 we can run the simulation for many more weeks For example we might run it for 104 weeks 2 years rather than only 3 weeks We might also run the simulation several times over to see if we get similar results even though the exact pattern of customer arrivals will be different in each case Here are the results of 10 different simulations7 each for 104 weeks customers sold 241 147 223 135 255 149 266 158 237 144 235 144 255 148 250 147 227 138 249 152 lost fraction served oooooooooo Notice that we expect roughly 104 X 73 243 customers to appear over this time period since we7re using the rate of 1 every 3 days In all of the above results we have roughly this number Much 42 Monte Carlo 39 29 more interesting is the fact that in each simulation we end up being able to serve about 60 of these customers This might be one number we want to use in comparing different strategies Markov Chains By doing the above simulations we came to the conclusion that with this strategy and particular set of parameters we could serve about 60 of the customers who come looking for a 150 gallon tank It turns out that we can predict this number using mathematics without doing any simulations if we use the theory of Markov chains We will study this later in the quarter 421 Comparison of strategies Section 45 contains a more complex program for this Monte Carlo simulation which allows comparison of different strategies The variable orderwhenout is a ag which determines whether to order a new tank whenever we run out The variable fixedde1ivery determines whether there is a xed schedule of deliveries once every so many days Strategy 1 corresponds to orderwhenout 1 fixedde1ivery 0 quot0 no standing order while Strategy 2 uses orderwhenout fixedde1ivery 7 quot0 arrivals every 7th day Percentage of customers served is one important number to compare More important to the busi ness however might be the expected pro t over time with different strategies This depends not only on how many customers we serve or lose but also on the cost of any overstocki For Strategy 1 tested above there is never more than 1 tank in stock and so no overstocki Strategy 2 can easily lead to overstoc Here are the results of a few 104week simulations with each strategy We also compute the pro t for each simulation and the average pro t over 100 simulations with each strategy This pro t is based on the assumed pro t per sale cost of losing a customer and cost per night of overstock as illustrated in the programi Strategy 1 customers sold lost fractionserved overstock endstock profit 106 63 43 5 0 1 8300 100 61 39 0610 0 1 83000 104 58 46 0 558 0 1 70000 104 60 44 0 577 0 0 760 00 94 55 39 0 585 0 1 71000 90 59 31 0 656 0 0 87000 etc 108 58 50 0 537 0 1 66000 101 59 42 0 584 0 0 76000 94 54 40 0 574 0 0 68000 99 61 38 0 616 0 1 84000 98 55 43 0 561 0 1 67000 averageprofit 7879000 30 Monte Carlo Methods Strategy 2 customers sold lost fractionserved overstock endstock profit 93 1 47 3 6750 99 98 1 0 990 1802 7 1049 00 98 95 3 0 969 2716 10 512 00 96 96 0 1 000 3690 9 75 00 99 98 1 0 990 3098 7 401 00 87 87 0 1 000 6811 18 1665 50 106 101 5 0 953 3450 4 245 00 etc 94 94 0 1 000 2867 11 446 50 126 103 23 0 817 493 2 1583 50 106 101 5 0 953 2023 4 958 50 131 104 27 0 794 709 1 1455 50 109 104 5 0 954 637 1 1711 50 91 88 3 0 967 6135 17 1337 50 91 90 1 0 989 4165 15 292 50 101 101 0 1 000 3899 4 70 50 101 100 1 0 990 3755 5 112 50 averageprofit 3427050 Note some interesting features of these simulations 0 The percentage of customers served is generally larger with Strategy 2 usually above 90 while with Strategy 1 the percentage is around 60l However the average pro t is larger with Strategy 1 This is because Strategy 2 generally leads to an overstockl Customers are rarely lost but there7s a large cost associated with this overstock which reduces profit The results for Strategy 1 have much smaller variance than for Strategy 2 The pro t for each individual simulation is typically not too far from the average For Strategy 2 the values are all over the place Some simulations give much higher profits than Strategy 1 while many simulations resulted in negative pro ts a net loss This is seen more strikingly by plotting histograms of the pro ts over each set of 100 simulations as shown in Figure 412 If the business is lucky pro ts may be higher with Strategy 2 but there is a great risk of being unluckyl Note one problem with Strategy 2 as implemented so far There is no mechanism to stop automatic delivery if the stock goes too high Since customers arrive at the same average rate as new tanks a high stock will tend to stay highl Figure 413 shows the stock of tanks as a function of day in the course of one particular unlucky simulationl By the end of two years the stock has grown to 20 tanksl 43 A hybrid strategy A better ordering strategy might be to have a standing order for a tank to arrive every N days where N is larger than 7 and then to also order a new tank whenever we run out In the program N is denoted by fixedde1iveryl Figure 414 shows histograms of the performance of this strategy for two different values of N N 9 and N 14 43 A hybrid strategy 31 suategv 1 SivalEEV 2 25 7 25 In 7 In 15 a m 7 m 5 7 5 71 u 7mm 7m I am mun 15m 2 an 73 uuu 72m 7mm 7mm 7mm 7m I am mun 15m mun Figure 42 Histograms of pro t observed over 100 simulations With two different strategies 3 0 Figure 43 History of number of tanks in stock each day for one particular Hvbm SualEW W m as simulation With Strategy 2 Hybrid Sivategy wnhw m mm 5 5 1D I 1EIEIEI SEIEI I SEE WEIEIEI WEED ZUUEI 4D I WEIEIEI SEIEI I SEE l WEED Figure 44 Histograms of pro t observed over 100 simulations With hybrid strategiesi 32 Monte Carlo Methods 44 Statistical analysis In comparing the results of simulations with different strategies we need to be able to reach sensible conclusions from a large quantity of experimental data k Typically we would also like to be able to quantify how con dent we are in the conclusions we reach how sensitive these conclusions are to the parameters we chose in the model etc This is the realm of statistics and sensible analysis of Monte Carlo simulations generally requires a good knowledge of this eld One statistic we have already looked at is the average pro t observed for each strategy If one strategy gives a larger average pro t over many simulations than another strategy it might be the better strategy On the other hand we have also observed that some strategies give results with much greater variance between simulations than other strategies In practice the owners of the store donlt really care about the average behavior over hundreds of simulations they care about what will happen in the one real experience they will have with whatever strategy they adopt The point of doing many simulations is to get some experience with what range of different outcomes they might reasonably expect to observe in practice Assuming of course that the model is good enough to match reality in some sense Looking at the histograms of Figure 42 we see that with Strategy 1 we might be able to predict fairly con dently that the pro t which will actually be observed assuming the model is good enough will lie somewhere between 600 and 1000 With Strategy 2 we can only predict that it will probably lie somewhere between 2000 and 2000 Even if the average observed with Strategy 2 were higher than the average pro t observed with Strategy 1 the strategy with the smaller variance might be preferable in practice because there would be less risk involved If we do N simulations and obtain N different values y1 y2 m yN for the observed pro t then the average sample mean is de ned as l N 1 M2 9 yin H The sample variance is 1 N 2 7 7 7 2 s 7 N711yz y and the sample standard deviation is s xs We can view the N values observed as N observations of a random variable chosen from some dis tribution Typically we don7t know what this distribution is and the point of doing a Monte Carlo simulation is to gain some information about the distribution With a large enough sample the sample mean should approximate the mean of the true distribution and the sample variance should approximate its variancei How big a sample is large enough That depends on the distribution Comparing the histograms of Figure 42 for example we see that 100 trials gives more information about the distribution of results in Strategy 1 than with Strategy 2 To get a better idea of the underlying distributions we might repeat these experiments with many more simulations Figure 45 shows histograms of the pro ts observed when 5000 simulations are performed with each strategyi Here we get a much better sense of the distributions For Strategy 1 we merely con rm what we already guessed from the 100 simulations of Figure 42 The distribution has a fairly Gaussian77 shape with almost all results lying between 600 and 1000 In fact it can be shown that the distribution of observed pro ts should be approximately normally distributed Gaussian This follows from the Central Limit Theorem as we will be better able to understand after studying the Markov Chain interpretation of this model The sample mean from this particular set of runs is about 783 and the standard deviation is about 83 For a normally distributed random variable it can be shown that the value observed will be farther than twice the standard deviation away from the mean only about 5 of the time So if the pro t with 45 Fish tank 39 code 33 Shaleg SualEgV 2 over 5000 simulations with two different strategies Figure 45 Histograms of pro t observed Strategy 1 is essentially normally distributed then we expect that the pro t observed should lie between 8 7 2 83 617 and 783 2 83 949 about 95 of the time For Strategy 2 we see in Figure 45 that the distribution of pro ts is certainly not a normal distri bution Note that the pro t would not be expected to exceed about 104 20 2080 even with a perfect strategy why not7 which partly accounts for the sharp cutoff at the higher end The fact that a very large overstock can accumulate accounts for the wide spread to the left The average pro t observed is 439 and the standard deviation is 1020 In 26 of the simulations the observed pro t was negative There are many other statistical questions that one might ask about the data obtained from these simulations and various ways that the model might be improved We will discuss some of these in class 45 Fish tank simulation code set parameters nsim 100 X number of different simulations to do nweeks 104 X number of weeks in each simulation ndays 7nweeks number of days in each simulation a 17 probability of a customer each day daysfordelivery 5 days from order to delivery of new tanks orderwhenout 1 Z 1 gt O gt don t order when out of tanks fixeddelivery 12 gt0 gt standing order for a new tank 39 every so many days profits and losses saleprofit 20 Z profit from selling one tank lostloss 10 loss from losing a customer overstockloss 50 cost of each tank overstock per night initialize profit zerosnsim1 print column headings Monte Carlo Methods disp customers sold lost fractionserved overstock endstock profit array of random numbers to use each day totallost 0 stock 1 X number of tanks in stock deliv 1 X number of days until delivery of tank on order 1 means none on order overstock 0 increment every night by number of excess tanks in stock main loop for a single simulation day O for week 1 nweeks for weekday 17 day day1 day in the simulation sold 0 lost 0 if deliv stock stock1 a new tank is delivered at the beginning of the day if deliv gt 0 deliv deliv1 days till next delivery end if modday fixeddelivery 0 X A new tank is delivered every so many days regardless of stock stock stock1 end use random number to decide how many customers arrived Here assume 0 or 1 if randomnumsday lt a customers 1 else customers 0 end if customers1 we have a tank to sell the customer sold sold1 stock stock1 lost lost1 we didn t have a tank and lost a customer if orderwhenout amp stockO amp deliv lt O 46 Poisson processes 35 none in stock and none on order deliv daysfordelivery order another end if stock gt 1 overstock overstock stock 1 end keep track of total statistics totalcust totalcust customers totalsold totalsold sold totallost totallost lost stockrecordday stock keep track of stock on each day end loop on day end loop on week fractionserved totalsold totalcust profitsim totalsoldsaleprofit totallostlostloss overstockoverstockloss output total statistics dispsprintf 60f Z60f Z80f Z123f Z110f Z80f Z122f totalcust totalsold totallost fractionserved overstock stock profitsim end loop on sim compute and print average profit over all simulations averageprofit sumprofit nsim disp disp average profit 7 sprintf 102f averageprofit standard deviation variance sumprofit averageprofit 2 nsim1 standarddeviation sqrtvariance disp standard deviation 7 sprintf 102f standarddeviation disp plot histogram of profits over all simulations histprofit1500502000 axis1500 2000 0 30 46 Poisson processes In the Monte Carlo simulation of the sh tank inventory problem7 we assumed that each day there was a probability a of a customer coming in The MATLAB code contained the following lines randomnum rand1 quot0 generate random number if randomnum lt a quot0 use this number to tell if a customer arrived customers 1 36 Monte Carlo Methods else customers 0 end This assumes that each day there is either 0 or 1 customers This is not very realistic since some days we would expect 2 or even more to appear if they come at random timesi Moreover we might want to simulate a situation in which there are typically many customers each day instead of only one customer every few days ore generally let a represent the average number of customers per day If a is very small then we can interpret a as the probability that a customer comes on any given day and ignore the possibility of more than one customer This isnlt very realistic if a is somewhat larger If a gt 1 then it is impossible to interpret a as a probability and clearly unrealistic to assume there will be either 0 or 1 customer It would be better if we could predict the probabilities pk that exactly k customers will appear in a given day and then use something like randomnum rand1 Z generate random number if randomnum lt p0 customers 0 else if randomnum lt p0p1 customers 1 else if randomnum lt p0p1p2 customers 2 else customers 3 Z assume negligible chance of more Here we have assumed that a is small enough that the chance of more than 3 customers is negligible 139e p0 101 p2 is very close to 1 One way to estimate these probabilities would be to do a Monte Carlo simulation We could split a day up into N much smaller pieces hours minutes seconds or even smaller and assume that in each small piece there is really no chance that more than one customer will appear and that the probability of one customer appearing is p aNi We could simulate many days and see what fraction of the days there are k customers for each kl Note that if N is large enough then p aN will be very small even if a gt 1 so it is valid to view this as a probability Here are the results of a 10 trials with N 20 and a ll Generated using poissonm from the webpage Each string has 20 bits with 0 representing no customer during this interval77 and 1 representing a customer during this interval The rst day two customers arrived the second day only one etci 46 Poisson processes 37 00000010001000000000 00000100000000000000 00010100000000000000 00000000000000000000 00000000000000000100 00000000000000000000 00000000000000000000 00000000100000000010 00110000000000000000 10100000000010000000 In this short simulation 30 ofthe days no customer arrived 20 had 1 customer 40 had 2 customers and there was 1 day 10 when 3 customers arrived If we ran another 10 day simulation we7d probably get different results To estimate the probabilities more reliably we should do a simulation with many more days Below are the results of simulations with 10000 days This was repeated 3 times and we get slightly different results each time Run 1 Run 2 Run 3 Poisson 0 customers 03686 03580 03606 03679 1 customer 03722 03717 03767 03679 2 customers 01837 01911 01883 01839 3 customers 00575 00646 00596 00613 4 customers 0 0 00122 00153 5 customers 0 0151 00120 00021 00031 6 customers 00025 00024 00004 00005 7 customers 00004 00002 00001 0 0001 Note that by running the simulation 3 times we get some reassurance that the value of 10000 is large enough to give values that don7t change much when we rerun the simulation If we had run it for only 100 days each we would see quite a bit of variation from run to run and would know that we donlt have reliable valuesl Here it seems fairly clear that p0 m 0136 In m 0137 p2 m 0118 etc Poisson distribution In fact we can compute these values without doing a simulation It can be shown mathematically that probabilities follow a Poisson distn39bution and that for any value of a the average number of customers per day the probability of observing exactly k customers in a single day is ake pk k A In particular P0 57 p1 acid 1 p2 1267 1 p3 gage The values for the case a 1 are shown in the last column of the table above and they compare well with the values obtained from the three 10000 day simulations Note that 00 a2 a3 2px a lt1aiimgt e e 1 H 2 3 Monte Carlo Methods The probabilities all add up to one as we should expect The formulas for the Poisson distribution can be derived by using a combinatorial argument based on the probability of the event occurring in each small time period and the number of ways it is possible to have 0 1 2 events occurring over the entire day For this to be valid we must assume that customers appear randomly with probability aN in any small increment of 1N day without regard to the past history The arrival of customers is then said to be a Poisson process The sequence of tests to see whether there is a customer in each interval also called a series of Bernoulli trials To see how these values arise rst consider a simpler case where we look at a sequence of only 3 Bernoulli trials with a probability p of success in each trial denoted by 1 and probability 1 7 p of failure denoted by 0 There are 23 8 possible strings that can arise from these 3 independent trialsi These are listed below along with the probability of observing each 000 1 7 p3 om a7m ow a7m on a7m 1w a7m 1m a7m 1w a7m 111 p3 411 Note that there are 3 ways to observe 1 customer during the entire day and 3 ways to observe 2 customers but only 1 way to observe 0 or 3 customers Adding up the probabilities we nd the 3 following values for pk the probability of observing k customers when N 3 0 customers 1 7 MS P33 1 customers 31 7 p2p p53 4 2 2 customers 31 7 pp2 pf 3 customers p3 1033 Note that these probabilities sum to 1 We can see this easily using the fact that z y3 7 13 3z2y Susy y3 with z 1 7 p and y p In fact we can use this type of binomial expansion to easily compute the probabilities plgN when N is larger without enumerating all 2N possible strings and counting things up We have N N N 1yNINlt1gtIN 1ylt2gt1N 2y2mltN71gtzyN 1yN 43 The values lt I gt N choose k are often called the binomial coe cientsi The probabilities we seek are pEN 1 7 0N N 7 p9lt1gta7mNNa7mNa N 1 4 4 wlt2gt w f w NW mM2 etci 47 Queuing theory 39 This is called the binomial distribution for the total number of successes in N Bernoulli trials To relate this to the Poisson distribution recall that we want to determine the probability of observing k customers in the case where N is very large and p aN where a is a xed value representing the average number of customers per day The probability p0 is given by N a N a i lt4 Similarly peggzvltvigtMlltigt amen M ae al 47 Queuing theory As another example of Monte Carlo simulation we will look at an example from queuing theoryl Suppose customers arrive at a service center or cashier and have to line up to wait for service Queuing theory is the study of how such lines are expected to behave and the comparison of different strategies For example if there are multiple servers we might want to compare the strategy of having independent lines for each as is common in grocery stores or have a single line with the next customer going to the rst available server as is common in banks and airport checkin counters Which approach minimizes the average time a customer has to wait for example There is a large literature on this subject see for example HL95ll Here we will look at the simplest example of a single server and discuss how Monte Carlo simulation might be used to study the expected length of the line The MATLAB code in Figure 46 shows a simple simulation of a queue Customer arrivals are modeled with a Poisson process with average rate a per minute They must wait in the queue until the single server is free The amount of time required to serve them is also random with average rate of 12 people per minute In this simulation this is also assumed to be Poisson process with probability of roughly bN that they will be nished in any small subunit of time lNl This may not be very realistic A better model might be that there is some minimal service time plus a random length of time after that or some other distribution of service times One could model this by splitting each minute up into N subunits and generating a random number to determine whether a new customer arrives and a second random number to determine whether the customer currently being served nishes Instead this simulation works by rst generating a set of arrival times and service times for each customer and then computing the total time that each customer must wait on the basis of these times The times are generated using an exponential distribution Probability that interarrival time is larger than T e aTl This can be shown to be the proper distribution of service times for a Poisson distribution in the following manner Suppose we split up the time since the last arrival into units T2 3 l H rather than minutes Then in each unit the expected number of arrivals is aTl According to the Poisson distribution the probability of exactly 0 arrivals in the rst of these periods is thus 6quot but this is exactly the probability that the next arrival time is more than T units later To generate random numbers from an exponential distribution with parameter a we can rst gener ate a random number 7 uniformly distributed in the interval 0 1 using rand in MATLAB for example and then compute l 7 log39rl 40 Monte Carlo Methods Note that log is the natural logarithm command in MATLAB Figure 47 shows the results of three simulations using the parameters a l and b 15 for 1000 customers eachi Two plots are shown for each simulation on the left is the length of the queue as a function of time and on the right is a histogram of the total time spent by each customer including both the time in line and the time being served While the 3 simulations show variation in details some similarities are observed Note that although the server can handle 50 more customers than are arriving on average the queue length can grow quite long due to the randomness of arrivalsi In all three simulations the queue length is more than 10 at times Also looking at the histogram we see that there are some customers who have to wait as long as 10 or 12 minutes in each simulation though the average total time is more like 2 minutes Figure 48 shows another set of simulations in which the parameter b 11 is used instead in which case the average service time 1 m 091 is even closer to the average interarrival timer Again the details vary between simulations but it is clear that now the queues will typically grow to be of length greater than 20 at times and the average waiting time is much longer There are now typically some customers who have to wait more than 20 minutes From comparing these sets of results it should be clear that such simulations can be effectively used to predict how badly queues may behave if the proper parameters are estimated One can also use Markov chain ideas to study the expected behavior of such systems and determine expected values for quantities such as the total waiting time In fact it can be shown that for arbitrary parameters a lt b the expected total waiting time is 102 7 a Note that this is consistent with what is observed in these simulations We have only looked at the simplest situation of a single queue and a single server This model is called an MMl queuei The rst M indicates that the arrival times are exponentially distributed or Markovian the second M indicates that the service times are also Markovian and the 1 indicates there is a single server More complicated models would be required to model a bank or super market with multiple lines and servers for example Various modi cations of this model are possible For example we could assume a maximum queue length beyond which customers go away rather than joining the queue In this case there are a nite set of states possible different queue lengths and it is possible to derive a Markov chain model of transitions as we will look at later 47 Queuing theory 41 Simple queuing theory simulation MMl queue Single server single queue a 1 Z average number of arrivals per minute b 15 average number of people served per minute ncust 1000 Notation at arrival time of a person joining the queue st service time once they reach the front ft finish time after waiting and being served initialize arrays at zerosncust1 ft zerosncust1 Generate random arrival times assuming Poisson process r randncust1 iat 1a logr Generate int i aiii al times 1 4 L L at1 iat1 Arrival time of first customer for i2ncust ati ati1 iati arrival times of other customers end Generate random service times for each customer r randncust1 st 1b logr service times for each Compute time each customer finishes ft1 at1st1 finish time for first customer for i2ncust compute finish time for each customer as the larger of arrival time plus service time if no wait Z finish time of previous customer plus service time if wait fti maxatisti fti1sti end totaltime ft at total time spent by each customer waittime totaltime st time spent waiting before being served aveservicetime sumstncust avewaittime sumwaittimencust avetotaltime sumtotaltimencust Plot histogram of waiting times histtotaltime0520 Figure 46 Queuing theory simulation The code shown here is simpli ed and doesn7t show the corn putation of the queue length Which is plotted here See the course webpage for the full In le7 queuemi 42 Monte Carlo Methods queue length vs ime histogram of waiting times a 1 35 b 1 5 average total ime 214B7 average service time 06786 histogram of waiting times a 1 M 35 b 15 average total ime 22084 average service time 06693 m a a r IMIWM IL rlmx lrl 1 Mr m hm HHJM n mm mm ann um Sun EDD mu Bun ann 1mm n 2 a a a m 12 m 16 m m queue lengm VS ime histogram of waiting times a1 35D b15 average total ime 21234 average service time 067806 5 11L r um n Bun B L km N mm Figure 47 4 7 Queuing theory 43 queue length vs time histogram of wai ing limes 18D 7 a 1 25 151 7 b 11 1w 7 2m 1m 7 average total ime 62679 15 1mm 7 average service time 091675 Bu 7 1n 6n 7 w 7 5 2n n n 1 mm mm EDD Bun mun 1 an n 5 1m 15 21 25 3D 35 w queue lengm VS quotme histogram of wai ing limes 18D 7 a 1 3m 1m 7 b 11 25 1w 7 mm averagelolal ime 113831 2m 1m 7 15 7 average service time 088394 Bu 7 7 m an 7 7 w 7 5 2n 7 1 1 1 1 1 1 1 1 1 1 1mm Inn sun mu Sun EDD mu Bun am 1 an 1 1m 15 21 25 3D 35 w queue lengm VS quotme histogram of wai ing limes 18D 7 a 1 151 7 b 11 1w 7 12m 7 average total ime 477 1m 7 7 average service time 0B7122 n 21111 on EDD m1 mun 1 an Figure 428 44 Monte Carlo Methods Math 381 Notes 7 University of Washington 7 Winter 2008 Chapter 5 Markov Chains and Related Models Markov Chains or processes are named for Andrei Andreevich Markov 185671922 Markov was a student under the famous mathematician P L Chebyshev at the Petersburg Universityi Markov Chains affect the world in which we liver In fact if you type a query into Google the ordering of the results is produced using a Markov Chain We will also brie y study a model of card shuf ing that uses Markov Chains to study the randomization produced by shuf ing a deck of cards Markov himself found a practical use for Markov Chains when he modeled the alteration of vowels and consonants in Russian literary works Figure 51 Andrei Andreyevich Markov 185671922 51 A Markov chain model of a queuing problem To introduce the theory of Markov chains let s consider a simple queuing theory model in which there is a single server and there are only three states which the queue can be in at any given time State 1 No customers are present State 2 A customer is being served State 3 A customer is being served and another is waiting Assume that the queue is never longer if a customer arrives when the queue is in State 3 then that customer disappears 46 Markov Chains and Related Models 113 C A 9 L 1q Figure 52 Transition diagram for a simple queuing theory model with three states Suppose that arrivals and service are Poisson processes and over a given short time interval there is a probability p of a new customer arriving and a probability 4 that any customer currently being served will nish if the time interval is very short then these are very small and we can assume there is negligible chance of more than one customer arriving or nishing and also negligible chance of both one customer nishing and another arriving in the same period We could perform a Monte Carlo simulation of this queue by keeping track of what state it is in and each time period either staying in this state or moving to a different state depending on the value of some random number randomnum rand1 if randomnum lt p it a new customer arrives if statelt3 state state 1 end end if randomnum gt 1q it a customer finishes if stategt1 state state 1 end end Note that we7ve done the tests in such a way that if p and q are both very small then we can t have both happeni Figure 52 shows the transition diagram between the three states For example if we are in State 2 then with probability p we move to State 3 if a customer arrives with probability 4 we move to State 1 if a customer nishes and with probability 1 7 p 7 q we stay in State 2 if neither happens if we do a single simulation then we will move around between the three states We might be interested in determining what percentage of the time we are in each of the three states over the long terml We could determine this for many different simulations and collect some statistics if we do this 51 A Markov Chain model of a queuing problem 47 with p 005 and q 0108 for example we will observe that about 50 of the time we are in state 1 about 31 in State 2 and about 19 in State 3 Another way to get these number is to run lots of simulations simultaneously and keep track of what percentage of the simulations are in each state at each time period After a while we might expect to see the same percentages as reported above whyi Here7s what we observe if we do 10000 simulations all starting in State 1 and print out how many of the simulations are in each of the three states at each time time period State 1 State 2 State 3 0 10000 0 0 1 9519 481 0 2 9073 891 36 3 8685 1240 75 4 8379 1491 130 5 8096 1702 202 6 7812 1914 274 etc 95 4949 3134 1917 96 4913 3166 1921 97 4920 3161 1919 98 4958 3148 1894 99 4982 3072 1946 100 4995 3038 1967 etc lnitially all simulations were in State 1 After one time period a customer had arrived in 481 of the simulations or 481 of the simulations about what we7d expect since p 0105 Since its impossible for 2 customers to arrive in a single period there are still no simulations in State 3 after one period After 2 periods however it is possible to be in State 3 and in fact 36 of the simulations reached this state How does this agree with the number you should expect Notice that after many time periods for n around 100 we appear to have reached a sort of steady state with roughly 50 of the simulations in State 1 31 in State 2 and 19 in State 3 Of course each individual simulation keeps moving around between states but the number in each stays roughly constant Note that these are the same percentages quoted above What we now want to explore is whether we can determine these values without having to do thousands of simulations In fact we can by using the theory of Markov chains Let min be the fraction of simulations which are in State It after n steps for k 1 2 3 and let M be the vector with these three components If all simulations start in State 1 then we have 1 vlt gt 0 1 5 1 0 What fractions do we expect for v0 Since each simulation moves to State 2 with probability p 0105 or stays in State 1 with probability 1 7 p 0195 we expect 0195 111 0 05 0 48 Markov Chains and Related Models Now consider the general case Suppose we know 1A and we want to determine v 1l From the transition diagram of Figure 52 we expect W 0 rgt 0 emu qv 0v v n min 1 7 p 7 WA avg 52 0m00umw emwr We can write this in matrixvector notation as Wt 1 AM 503 where the transition matrix A is 1 7 p q 0 0095 08 0 A p 17 p 7 q q 0 05 0 87 008 l 54 0 p 1 7 q 0 005 0092 The rightmost matrix above is for the speci c values p 005 and q 008 From the relation 53 we can compute 95 9065 8685 111 05 vlt2gt 0910 113 01247 etc 505 0 0025 0069 Note that these values agree reasonably well with what is seen in the simulation after 1 2 and 3 steps To predict the percentages seen later we need only multiply repeatedly by the matrix Al Alternatively note that M A2v0 and in general 1A Anvw We can compute 0496525 0496498 vlt99gt A99vlt gt 00309994 vlt1 gt A1 vlt gt 00309999 0193481 00193502 Note that again the values in the simulation are close to these values Also notice that these two values are very close to each other Just as we conjectured from the simulation we have reached a steady state in which the percentages do not change very much from one time period to the next If we kept applying A we would nd little change in future steps even after thousands of steps eg 004961240310077407 004961240310076444 vlt1 gt A1 vlt gt 0 3100775193798381 M10000 A1 vlt gt 0 3100775193797780 001937984496123990 001937984496123613 As it increases the vectors 1A are converging to a special vector i which has the property that A13 i 56 so that a step leaves the percentages unchangedl Note that the relation 56 means that this vector is an eigenvector of the matrix A with eigenvalue A 1 52 Review of eigenvalues and eigenvectors Let A be a square m X m matrix with real elementsl Recall from linear algebra that a scalar value A is an eigenvalue of the matrix A if there is a nonzero vector 7 the corresponding eigenvector for which AT m 5 7 52 Review of eigenvalues and eigenvectors 49 lf 7 is an eigenvector then so is our for any scalar value 1 The set of all eigenvectors corresponding to a given eigenvalue forms a linear subspace of lle In general A has m eigenvalues A1 Hi Am which can be determined by nding the roots of a polynomial of degree m in A This arises from the fact that if 57 holds then A 7 AIT 0 where I is the m X m identity matrix Since 7 is a nonzero nullvector of this matrix it must be singularl Hence its determinant is zero detA 7 AI 0 Computing the determinant gives the polynomial the eigenvalues are distinct then the eigenvectors Ti corresponding to the different M will be linearly independent and so we can form a matrix R with these vectors as columns which will be a nonsingular matrixl Then An Ain leads to AR RA 58 where A is the diagonal matrix with the values M on the diagonal Note that the order is important In general AR RA and RA AR since matrix multiplication is not commutative Since R is nonsingular it has an inverse R 1 and we can rewrite 58 as A RARill 59 We might be able to do all this even if the eigenvalues are not distinct but this case is a bit more subtle One use of the decomposition 59 is to compute powers of the matrix Al Note that A2 RAR 1RAR 1 RAR 1RAR 1 5 10 RAAR 1 4RA2R71 and similarly A RA R 1 aim for any n This is easy to work with because A is just the diagonal matrix with the values A on the diagonal ln MATLAB it is easy to compute eigenvalues and eigenvectorsl For example for the matrix A of 54 we can nd the eigenvalues by doing gtgt p 05 q 08 gtgtA1p q 0 P 1Pq q o p 1 q gtgt eigA ans 10000 08068 09332 Markov Chains and Related Models If we also want to nd the eigenvectors7 we can do gtgt RLambda eigA R 0 8050 04550 07741 05031 08146 01621 03144 03597 06120 Lambda 10000 0 0 0 08068 0 0 0 09332 This returns the matrices R and A 53 Convergence to steady state Note that one of the eigenvalues of A computed above is equal to l and the others are both smaller than 1 in magnitude As we will see7 this means the Markov process converges to a steady state Recall that Anlso as ngtoo ifMlltL Mn 1 for all n ifW 17 Mnlaoo as ngtoo ifMlgtL 512 So as n A 007 the matrix A approaches gtgtR 100000000 invR ans 04961 04961 04961 03101 03101 03101 01938 01938 01938 as we can easily con rm7 elgl7 gtgt A 100 ans 04965 04960 04954 03100 03101 03102 01935 01939 01944 53 Convergence to steady state 51 Now recall that when we multiply a matrix A by a vector v we are forming a new vector which is a linear combination of the columns of the matrix The elements of 1 give the weights applied to each column If aj is the j7th column of A and vj is the j7th element of 1 then the vector A1 is Av alvl agvg agvgi In particular if 110 l 0 0 then Av0 is just the rst column of Al Similarly Amovw is the rst column of the matrix A1001 Note that for the example we have been considering all three columns of A100 are nearly the same and for larger n the three columns of A would be even more similar This has an important consequence Suppose we started with different initial conditions77 110 instead of 110 1 0 0 which corresponded to all simulations being in State 1 to start For example we might start with 110 015 03 02y if half the simulations start in State 1 30 in State 2 and 20 in State 3 Then what would distribution of states would we see in the long run After n steps the expected distribution is given by Anvw After 100 steps gtgt V100 A 100 5 3 21 V100 04961 03101 01938 The same steady state as before Since all three columns of A100 are essentially the same and the elements of 110 add up to l we expect Amovw to be essentially the same for any choice of v 1 Even though the steady state behavior is the same no matter how we start the transient behavior over the rst few days will be quite different depending on how we start For example starting with 110 015 03 02y instead of 51 gives i499 4982 4976 111 1302 vlt2gt 3036 113 13049 1199 1982 1975 etc 5113 instead of 515 Another way to look at the behavior of v as we multiply repeatedly by A is to decompose the initial vector 110 as a linear combination of the eigenvectors o um elm 627 2 can 514 Since the vectors 7 1 7 2 and 7 3 are linearly independent this can always be done for any 110 and the solution c cl Cg 03y is unique In fact the vector 5 is the solution of the linear system RC 110 since this is exactly what 514 states writing the matrixvector multiply as a linear combination of the columns Now consider what happens if we multiply 110 from 5114 by the matrix A 111 Avw 01AM CQATQ CgATg 5 15 ClAlTl C2A2T2 CSASTS Multiplying by A again gives 112 Ava clAlAn CQAQATQ CgAgAm 5 16 01A12T1 C222T2 63 A32T3 52 Markov Chains and Related Models 01 Figure 53 Transition diagram for the modeling machine breakdown The three states are State 1 Working State 2 Temporarily Broken and State 3 Permanently Brokenl In general we have vol Anvw 01A1n39r1 Cg A2 T2 03A3nrgl 51 Again recall from 512 that as n gets large A1 A 0 for any eigenvalue that is smaller than one in magnitude For this particular problem A1 1 while the others are smaller and so UM A 0171 for large n This means the steady state is a multiple of 7 1 which is again an eigenvector of A as we already knewl Note that the smaller M is the faster A goes to zero So from 517 we can see how quickly we expect to approximately reach a steady state This depends also on how large the values cj are which depends on the particular initial data chosen If U0 happens to be a steady state already a multiple of 7 1 then Cg Cg i 54 Other examples of Markov Chains 541 Breakdown of machines A certain of ce machine has three states Working Temporarily Broken and Permanently Brokenl Each day it is in one of these states The transition diagram between the states is shown in Figure 53 If the machine is working on Day 1 what is the probability that it is still functional ie not permanently broken a year from now In this example we will consider at a single machine and talk about the probability that this machine is in each state Alternatively we could consider a pool of many machines and consider what fraction of the machines are in each state on a given day We can answer this question by computing the 3657th power of the transition matrix gtgt A 90 6 0 095 4 0 005 0 1 gtgt A 365 54 Other examples of Markov Chains 53 ans 01779 0 1792 0 00284 00286 0 07937 07922 10000 The rst column of this matrix gives the probability of being in each state on Day 366 with our assumption that it is working on Day 17 so 110 l 0 We see that the probability that the machine is permanently broken in 07937 and so there is about a 21 chance that the machine is still functional Note that the probabilities would be about the same if we7d started in the Temporarily Broken state on Day 1 by looking at the second column of this matrix Why does this make sense Why is the third column so different How about after 5 years7 or 1825 days We nd gtgt A 1825 ans 00003 00003 0 00001 00001 0 09996 09996 10000 so by this time the machine is almost surely dead whatever state we started in For this Markov chain the steady state solution is the vector 0 0 l as we can see by computing the eigenvalues and eigenvectors gtgt ER Lambda eigA R 0 07096 06496 0 0 7045 0 1036 10000 00051 07532 Lambda 10000 0 0 0 03043 0 0 0 09957 State 3 Permanently Broken is called an absorbing statel No matter what state we start in7 we will eventually end up in State 3 with probability l7 and once in State 3 we can never leave 542 Work schedules Suppose the XYZ Factory is operating every day and has a work schedule where each employee works every other day Say that an employee is in State 1 on work days and in State 2 on off days Then the 54 Markov Chains and Related Models transition diagram is simply since with probability 1 the employee switches states each day In this case the transition matrix is 0 l A 7 1 0 l a If 70 of the employees are working the rst day then we have 07 0 3 0 7 0 3 0 7 lt1 7 lt2 7 a olgl ml olgl ml and so on In this case we do not reach a single steady state7 but we do have a very regular pattern of cycling back and forth between two states This can again be understood by looking at the eigenvalues and eigenvectors of the matrix gtgtA 0 1 1 0 54 Other examples of Markov Chains 55 gtgt ER Lambda eigA R 07071 07071 0 7071 07071 Lambda 1 0 1 Both eigenvalues have magnitude 1 If we decompose 110 in terms of the eigenvectors7 110 cln 02m then we have 110 0171 T1 52m The rst terms ips sign every day and accounts for the oscillation In the special case where 110 is a multiple of T2 we would expect cl 0 and in this case the solution should be truly steady Since the elements of 1 must sum to 17 the only multiple of 7 2 we can consider is 0 7 05 05 This makes sense if the factory starts with 50 of employees working on the rst day then 50 will be working every day 543 The sh tank inventory problem The Monte Carlo simulation shown in Figure 41 gave results indicating that with this particular ordering strategy and set of parameters7 we could serve about 60 of the customers who come looking for a 150 gallon sh tank We can derive this using a Markov chain For the situation modeled in this program the store can be assumed to always be in one of six states each day l A customer arrives and there7s a tank in stock7 2 A customer arrives7 there7s none in stock but one due to arrive the next day7 3 A customer arrives7 there7s none in stock but one due to arrive in two days 4 No customer arrives7 but there s a tank in stock7 5 No customer arrives7 there s none in stock but one due to arrive the next day7 6 No customer arrives7 there s none in stock but one due to arrive in two days Working out the probabilities of moving from each state to each of the others7 we nd the transition matrix 0 p 0 p p 0 0 0 p 0 0 p 7 p 0 0 0 0 0 Aquot 0 17p 0 17p 17p 0 61 0 0 17p 0 0 17p 17p 0 0 0 0 0 56 Markov Chains and Related Models where p 13 is the probability of a customer arriving on any given day This matrix has three nonzero eigenvalues including A 1 The eigenvector corresponding to A 1 when normalized so the elements sum to 1 is found to be 3 012000 1 010667 1 010667 15 6 014000 2 011333 2 011333 The components of this vector are the expected fraction of days that the store is in each of the six possible states in the long term steady state From this we can see that the fraction of time a customer arrives and is served can be computed from the components of i as 06 3 fraction served 131131 2 133 m Exactly 60 as observed in the Monte Carlo simulation 544 Card shu ing Another sort of application of Markov chains is to model a 7i le shu le of a deck of cards This application received a great deal of attention some years back when it was used to determine how many shu es are necessary to randomize a deck of cards K0190 The answer turns out to be 7 The model was developed by GilbertGil55 Shannon and by Reeds ReeSl and it was used by Bayer and Diaconis 131392 to answer the question about randomizing an initially sorted deck of cards The ri e shu e is modeled as follows A deck of n cards is divided into two heaps with the probability of a heap containing k cards being given by a binomial distribution 2 2 l The heaps are then ri ed together into a single pile with the probability of dropping the next card from a particular heap being proportional to the number of cards in that eapi This is generally considered to be a reasonable model of the way humans perform a ri e shu e although this has been debated With this model the process of card shu ing is represented as a Markov chain Given the current ordering of the cards one can write down the probability of achieving any other ordering after the next shu ei Unfortunately with a deck of 52 cards there are 521 possible orderings and this would seem to render the problem intractable from a computational point of Viewl The great contribution of Bayer and Diaconis was to notice that the number of states of this system could be drastically reduced Given only the number of rising sequences in the current ordering one could determine the probability of any given number of rising sequences after the next shu ei A rising sequence is de ned as a maximal sequence of consecutively numbered cards found perhaps interspersed with other cards during one pass through the deck For example the ordering 146253 has three rising sequences 123 45 and 6 With this observation the number of states of the system is reduced from nl to n and for a deck of n 52 cards it now becomes an easy problem to solve numericallyl An excellent description of the problem and solution from more of a linear algebra point of view can be found in Jonsson and Trefethen JTQS where they also supply a MATLAB code that forms the probability transition matrix P and a related matrix A P 7 P00 whose kth power gives the difference between the state of the system after k shu es and that after in nitely many shu esl A version of their code is available on the class web page The entries of the probability transition matrix are v7 in n1 P1172 lt2i7jgtozi7 55 General properties of transition matrices 57 Steadyrstate Distributlnn er Rising sequences prnhamhly 2m 25 an 35 N umber er nsmg sequences Figure 514 Steadystate distribution of number of rising sequences where aj denotes the number of permutations of 1111 n that have exactly j rising sequences The ajls are known as Eulerz39zm numbers and they satisfy the recurrence Auk kArilJc T i k 1A7 71k717 where A11 1 Alk 0 for k f 1 and aj Ami See SloOO for more details on this sequence The eigenvalues of the probability transition matrix are known to be 2 j 011 1 i n 7 1 and the eigenvector corresponding to eigenvalue 1 is known as wel v a1lllanTnl The entries of this vector are the probabilities of each possible number of rising sequences after the deck has been shuf ed in nitely many times The vector v is plotted in Figure 5 i t can be determined from the entries of 1 that for example with probability 19999 a wellshuf ed deck has between 19 and 34 rising sequences This begins to give us a clue as to why the system behaves as it does If the deck is initially sorted in increasing order 7 so that there is one rising sequence and the initial state vector is so 1 01 l i 0T 7 then after one shuf e there will be at most two rising sequencesl Do you see why When you take a deck with two rising sequences cut it in two and rifHe the two portions together the resulting ordering can have at most four rising sequences etc After j shuf es there are at most Zj rising sequences so that for j S 4 it is easy to determine that the ordering is not randoml Try the experiment Shuf e a deck of cards four times and count the number of rising sequencesl It will be no more than 16 Then shuf e several more times at least three more times and with very high probability you will count at least 19 rising sequencesl Figure 55 shows a plot of the difference between the state vector after k shuf es and that after in nitely many shuf es 21 vi 7 Pksoml This is called the total van39atz39zm norm and is the measure used by Bayer and Diaconisl The deck is deemed to be suf ciently well shuf ed when this difference drops below 5 Notice that this happens after k 7 shuf es 55 General properties of transition matrices Transition matrices are special types of matrices which have some general properties which we summa rize here 0 Every element in a transition matrix must lie between 0 and 1 since it is a probability Markov Chains and Related Models A 5 6 Number nr shuf es Figure 55 Convergence toward the steadystate in total variation norm Each column of a transition matrix sums to 1 This is because the elements in the j7th column represent the probabilities of moving from State j to each of the other states Since we must move somewhere all of these probabilities must add up to 1 On the other hand the rows do not need to sum to 1 Note In many books transition matrices are de ned differently with aij being the probability of moving from State 239 to State j instead of being the probability of moving from Statej to State 239 as we have used In this case the transition matrix is the transpose of ours and the rows sum to 1 instead of the columns lf w is the row vector with all components equal to 1 then wA wl This is just a restatement of the fact that all columns must sum to 1 But this also shows that w is a left eigenvector of A with eigenvalue A 1 This proves that A l is always an eigenvalue of any transition matrix Recall that in general the left eigenvectors of A row vectors Z for which A AZ are different from the right eigenvectors but the eigenvalues are the same In fact the left eigenvectors Z are the rows of the matrix R l the inverse of the matrix of right eigenvectors It can also be shown that all eigenvalues ofa transition matrix satisfy M S 1 so some components of an eigenexpansion such as 514 decay while others have constant magnitude but none can grow exponentia yl If M 1 then the eigenvector Tj can be used to nd a potential steady state of the systemi Since the elements of 1 must sum to l we must normalize Tj to have this suml Recall that any scalar multiple Cj l j is also an eigenvector so we choose Cj l inj where Tij are the elements of the vector le For example if we use Matlab and obtain an eigenvector such as 7 1 708050 7 05031 7 03144 we can compute the normalized eigenvector as 04961 73 0 3101 708050 7 050317 03144 01938 This is the expected steady state Note that normalizing an eigenvector gives a reasonable result only if all the elements of the eigenvector have the same sign What can you 1 p u i a out 39 t 0 col 1 quot to eigenvalues M with Aj lt l 56 Ecological models and Leslie Matrices 59 56 Ecological models and Leslie Matrices Similar matrix techniques can be used in other situations that are not exactly Markov chains As an example suppose we want to study the population of a certain species of bird These birds are born in the spring and live at most 3 years and we will keep track of the population just before breeding There will be 3 types of birds Age 0 born the previous spring Age 1 and Age 2 Let von vln and v n represent the number of females in each age class just before breeding in Year 71 To model how the population changes we need to know 0 Survival rates Suppose 20 of Age 0 birds survive to the next spring and 50 of Age 1 birds survive to become Age 2 o Fecundity rates Suppose females that are 1 year old produce a clutch of a certain size of which 3 females are expected to survive to the next breeding season Females that are 2 years old lay more eggs and suppose 6 females are expected to survive to the next spring Then we have the following model v8n1 31 611 vinTD 02118 519 v nTD 051 In matrixvector form 0 3 0 v3 NH 02 0 0 v5 520 0 05 0 vltngt 2 Note that in this case the columns of the matrix do not sum to 1 and some of the entries do not represent probabilities However we can predict the future population by iterating with this matrix just as we did using transition matrices for a Markov chain and the eigenvalues and eigenvectors will give useful information about what we expect to see in the long run This form of matrix based on survival rates and fecundities is called a Leslie matn39x See Jef78 SK87 for more discussion Suppose we start with a population of 3000 females 1000 of each age Then we nd 1000 3000 0000 vlt gt 1000 v0 200 vlt2gt 1800 113 720 1000 500 100 900 and so on If we plot the population in each age group as a function of Year we see what is shown in Figure 56a Clearly the population is growing exponentially Figure 56b shows the fraction of the population which is in each age class as a function of year This settles down to a steady state fter 20 years the population is 1120 20833 3873 1797 while the percentage in each age class is 07861 01461 00678 We can compute that between the last two years the total population grew by a factor of 10760 Since 1120 A20v0 we can predict all this from the eigenvalues and eigenvectors of A We nd 0 30000 60000 02000 0 0 05000 0 60 x m pupulalmn m each age gmup Markov Chains and Related Models quotemuquot m pupulalmn m each age mun Age n A921 Age 2 5 m 15 m 5 Figure 516 Bird population with agg 0 5 and initial data 110 1000 1000 1000 gtgt RLambda eigA R 09796 06990 01821 00149 00846 01110 Lambda 10759 0 0 05380 0 0 gtgt R1sumR 1 ans 07860 01461 00679 06459i 02545i 01297i 05179i 06990 06459i 00149 02545i 01110 01297i 05380 05179i We see that A1 110759 is greater than 1 and accurately predicts the growth rate of the total population The other two eigenvalues are complex and have magnitude 0174687 so the corresponding components in the initial vector 110 die out This transient behavior which results from the fact that the initial population distribution equal number of each age is not the steady state distribution The rst eigenvector T1 normalized to have sum 17 gives the fraction expected in each age class in the steady state If we had started with a population that was distributed this way7 then we would see smooth behavior from the beginning with no transienti See the example in the next section 56 Ecological models and Leslie Matrices 61 561 Harvesting strategies We can use this model to investigate questions such as the following Suppose we want to harvest a certain fraction of this population7 either to control the exponential growth or as a food source or both How much should we harvest to control the growth without causing extinction Harvesting will decrease the survival rates We might like to decrease the rates to a point where the dominant eigenvalue is very close to l in value If we decrease the rate too much7 then all eigenvalues will be less than 1 and the population size will decrease exponentially to extinction For example7 suppose we harvest an additional 20 of the 1year olds7 so the survival rate agg falls from 05 to 03 Then we nd that the largest eigenvalue of A is 098307 giving the graphs shown in Figure 57 Domitian ataam age group mm amputation in each age group SHED 7DDD Emma 5000 ODDD 3000 2000 02 mu m 5 u 15 2a 5 u 15 2a Figure 57 Bird population with W 03 and initial data vlt gt 1000 1000 1000 Of course it might be more reasonable to assume the population starts out with a steadystate distribution of ages For the new matrix the relevant scaled eigenvector is 7 1 7902 01608 00491 and for a population of 3000 birds this suggests we should take 110 30007 1 2371 482 147 Then we obtain the boring graphs shown in Figure 58 Note that we now have exponential decay of the population and eventual extinction However7 over the 20year span shown here the total population is predicted to decrease from 3000 to 3000098320 2129 and probably our model is not even accurate out this far Assuming itls accurate at all To judge this7 one would have to investigate what assumptions the model is based on and how far in the future these assumptions will be valid Domitian ataam age group mm amputation in each age group 2am u a am as mum at Figure 58 Bird population with agg 03 and initial data having the expected steadystate distribution of ages 62 Markov Chains and Related Models Math 381 Notes 7 University of Washington 7 Winter 2008 Chapter 6 Linear Programming 61 Introduction Section 61 of the notes is adapted from an introduction to linear programming written by Professor J Burke for Math 407 Some of the discussion in the original notes which introduces duality theory and sensitivity analysis has been eliminated Burke s original notes can be found on the webpage httpwwwmathwashington edu burkecrs407HTMLnoteshtml Other introductions to linear programming can be found in many references including HL95 and ChvSS 611 What is optimization A mathematical optimization problem is one in which some function is either maximized or minimized relative to a given set of alternatives The function to be minimized or maximized is called the objective function and the set of alternatives is called the feasible region or constraint region In this course the feasible region is always taken to be a subset of IR real ndimensional space and the objective function is a function from IR to R We further restrict the class of optimization problems that we consider to linear programming problems or LPs An LP is an optimization problem over 1R wherein the objective function is a linear function that is the objective has the form 0111 C212 071 for some ci 6 R i 1 n and the feasible region is the set of solutions to a nite number of linear inequality and equality constraints of the orm ai111ai212mainInS5i i173 and ai111ai212mam1nbi i817m Linear programming is an extremely powerful tool for addressing a wide range of applied optimization problems A short list of application areas is resource allocation production scheduling warehousing layout transportation scheduling facility location ight crew scheduling parameter estimation 612 An Example To illustrate some of the basic features of LP we begin with a simple twodimensional example In modeling this example we will review the four basic steps in the development of an LP model 1 Determine and label the decision vaiiables 64 Linear Programming 2 Determine the objective and use the decision variables to write an expression for the objective function 9 Determine the explicit constraints and write a functional expression for each of them F Determine the implicit constraints PLASTIC CUP FACTORY A local familyowned plastic cup manufacturer wants to optimize their production mix in order to maximize their profit They produce personalized beer mugs and champaign glasses The profit on a case of beer mugs is 25 while the profit on a case of champaign glasses is 20 The cups are manufactured with a machine called a plastic extruder which feeds on plastic resins Each case of beer mugs requires 20 lbs of plastic resins to produce while champaign glasses require 12 lbs per case The daily supply of plastic resins is limited to at most 1800 pounds About 15 cases of either product can be produced per hour At the moment the family wants to limit their work day to 8 hours We will model the problem of maximizing the profit for this company as an LP The first step in our modeling process is to determine the decision variables These are the variables that represent the quantifiable decisions that must be made in order to determine the daily production schedule That is we need to specify those quantities whose values completely determine a production schedule and its associated profit In order to determine these quantities one can ask the question If I were the plant manager for this factory what must I know in order to implement a production schedule77 The best way to determine the decision variables is to put oneself in the shoes of the decision maker and then ask the question What do I need to know in order to make this thing work77 In the case of the plastic cup factory everything is determined once it is known how many cases of beer mugs and champaign glasses are to be produced each day Decision Variables B of cases of beer mugs to be produced daily C of cases of champaign glasses to be produced daily You will soon discover that the most difficult part of any modeling problem is the determination of decision variables Once these variables are correctly determined then the remainder of the modeling process usually goes smooth y After specifying the decision variables one can now specify the problem objective That is one can write an expression for the objective function Objective Function Maximize profit where profit 25B 20C The next step in the modeling process is to express the feasible region as the solution set of a finite collection of linear inequality and equality constraints We separate this process into two steps 1 determine the explicit constraints and 2 determine the implicit constraints The explicit constraints are those that are explicitly given in the problem statement In the problem under consideration there are explicit constraints on the amount of resin and the number of work hours that are available on a daily basis Explicit Constraints resin constraint 20B 12C S 1800 6 1 Introduction 65 work hours constraint B C S 8 This problem also has other constraints called implicit constraints These are constraints that are not explicitly given in the problem statement but are present nonetheless Typically these constraints are associated with natural or common sense77 restrictions on the decision variable In t e cup factory problem it is clear that one cannot have negative cases of beer mugs and champaign glasses That is both E and C must be nonnegative quantities Implicit Constraints 0 S B 0 S C The entire model for the cup factory problem can now be succinctly stated as 73 max 25B 20C subject to 20B 12C S 1800 B 0 3 8 0 S B C Since this problem is two dimensional it is possible to provide a graphical solution The rst step toward a graphical solution is to graph the feasible region To do this rst graph the line associated with each of the linear inequality constraints Then determine on which side of each of these lines the feasible region must lie don t forget the implicit constraintsl Once the correct side is determined it is helpful to put little arrows on the line to remind yourself of the correct side Then shade in the resulting feasible region The next step is to draw in the vector representing the gradient of the objective function at the origin Since the objective function has the form f117 I2 6111 C2127 the gradient of f is the same at every point in R2 5 Vf11712 1 62 Recall from calculus that the gradient always points in the direction of increasing function values Moreover since the gradient is constant on the whole space the level sets of f associated with different function values are given by the lines perpendicular to the gradient Consequently to obtain the location of the point at which the objective is maximized we simply set a ruler perpendicular to the gradient and then move the ruler in the direction of the gradient until we reach the last point or points at which the line determined by the ruler intersects the feasible region In the case of the cup factory problem B 7 45 C 7 75 We now recap the steps followed in the solution procedure given above this gives the solution to the LP as Step 1 Graph each of the linear constraints and indicate on which side of the constraint the feasible region must lie Don7t forget the implicit constraints Step 2 Shade in the feasible region Step 3 Draw the gradient vector of the objective function Step 4 Place a straightedge perpendicular to the gradient vector and move the straightedge either in the direction of the gradient vector for maximization or in the opposite direction of the gradient vector for minimization to the last point for which the straightedge intersects the feasible region The set of points of intersection between the straightedge and the feasible region is the set of solutions to the LP 66 Linear Programming Z ZEIBIZC18UEI 12 211K Dbjecnve nurmal nZS zn39 B c 45 75 9 W 8 if upumal value 2525 feasxble regmn 515 015 8 2 W x x w x x x x x x x x x r x x x x x x B 1 2 3 4 5 6 7 89 10 1M2 13 14 15 Figure 61 6 1 Introduction 67 The solution procedure described above for two dimensional problems reveals a great deal about the geometric structure of LPs that remains true in n dimensions We will explore this geometric structure more fully as the course evolves But for the moment we continue to study this 2 dimensional LP to see what else can be revealed about the structure of this problem Before leaving this section we make a nal comment on the modeling process described above We emphasize that there is not one and only one way to model the Cup Factory problem or any problem for that matter In particular there are many ways to choose the decision variables for this problem Clearly it is sufficient for the shop manager to know how many hours each days should be devoted to the manufacture of beer mugs and how many hours to champaign glasses From this information everything else can be determined For example the number of cases of beer mugs that get produced is 15 times the number of hours devoted to the production of beer mugs Therefore as you can see there are many ways to model a given problem But in the end they should all point to the same optimal process 613 Sensitivity Analysis One of the most important things to keep in mind about LPs that arise from models of real world phenomenon is that the input data associated with the problem speci cation not only may change over time but is also subject to measurement error For example in the case of the cup factory the pro t levels for both beer mugs and champaign glasses are no doubt subject to seasonal variations Prior to the New Year the higher demand for champaign glasses forces up the sale price and consequently their pro tability As St Patrick7s Day approaches the demand for champaign glasses drops but the demand for beer mugs soars In June demand for champaign glasses again rises due to the increase in marriage celebrations Then just before the Fourth of Ju y the demand for beer mugs returns These seasonal uctuations clearly may have an effect on the optimal solution and the optimal value A natural question to ask in this regard is What is the smallest pro t level required in order that champaign glasses remain in the optimal production mix Alternatively one could ask what is the smallest pro t level at which manufacturing champaign glasses remains ef cient The least level of pro tability for champaign glasses is called the breakeven price for champaign glasses Breakeven Prices The breakeven price for champaign glasses is easily computed Recall that the optimal solution is obtained by holding a ruler perpendicular to the objective gradient and then moving the ruler in the direction of the objective gradient to the furthest point at which the line determined by the ruler and the feasible region intersect Thus to determine the breakeven price for champaign glasses we need to determine the smallest tilt of the objective gradient for which the optimal solution changes from producing both beer mugs and champaign glasses to producing just beer mugs This occurs when the objective gradient is parallel to any normal to the constraint line associated with the resin bound That is if the objective is fBC aB 60 then we want 3 Q B 7 12 where the numbers 20 and 12 are coefficients in the resin constraint Holding a xed at 25 we get 12 i 25 15 B 20 That is the breakeven price for champaign glasses is 15 a case Similarly one computes the break even price for beer mugs by considering the labor hours constraint and arrives at 20 per case of beer mugs 68 Linear Programming Observe that if we set the per case pro tability of champaign glasses to 15 while keeping the pro tability of beer mugs xed we nd that every point on the line segment connecting the two points J 2 J J J and J 2 J J 900 J is optimal for the new LP Thus the set of optimal solutions to an LP can be in nite Marginal Values Next we consider the effect of uctuations in the availability of resources on both the optimal solution and the optimal value in the case of the cup factory there are two basic resources consumed by the production process plastic resin and labor hours in order to analyze the behavior of the problem as the value of these resources is perturbed we rst observe a geometric property of the optimal solution namely that if an optimal solution to an LP exists then there is at least one optimal solution that is a vertex of the feasible region Next note that as the availability of a resource is changed the constraint line associated with that resource moves in a parallel fashion along a line normal to the constraint Thus at least for a small range of perturbations to the resources the vertex associated with the current optimal solution moves but remains optimal We caution that this is only a generic property of an optimal vertex and there are examples for which it fails For example the feasible region can be made empty under arbitrarily small perturbations of the resources These observations lead us to conjecture that the solution to the LPs U6162 max25B 20C subject to 20B 12C S 1800 61 19 0 3 9 52 0 S BC lies at the intersection of the two lines 20B 12C 1800 61 and B C 8 62 for small values of 61 and 62 name y B 45 7 62 el C 75 62 761 and U6162 2625 32362 361 it can be veri ed by direct computation that this indeed yields the optimal solution for small values of 61 an Next observe that the value U6162 can now be viewed as a function of 61 and 62 and that this function is differentiable at J 1 J J 0 J with 2 0 WWW J 32 J The number 2 is called the marginal value of the resin resource at the optimal solution J 2 J J J 375 and the number 7 is called the marginal value of the labor time resource at the optimal solution B 45 J C J J 75 We have the follow1ng interpretation for these marginal values each additional pound of resin beyond the base amount of 1800 lbs contributes 2 to the pro t and each additional hour of labor beyond the base amount of 8 hours contributes 323 to the pro t Using this information one can answer certain questions concerning how one might change current operating limitations For example if we can buy additional resin from another supplier how much more per pound are we willing to pay than we are currently paying Answer 2 per pound is the most we are willing to pay beyond what we now pay why Or if we are willing to add overtime hours what 6 1 Introduction 69 is the greatest overtime salary we are willing to pay Of course7 the marginal values are only good for a certain range of uctuation in the resources7 but within that range they provide valuable information 614 Duality Theory We now brie y discuss the economic theory behind the marginal values and how the hidden hand of the market place77 gives rise to them This leads in a natural way to a mathematical theory of duality for linear programming Think of the cup factory production process as a black box through which the resources ow Raw resources go in one end and exit the other When they come out the resources have a different form7 but whatever comes out is still comprised of the entering resources However7 something has happened to the value of the resources by passing through the black box The resources have been purchased for one price as they enter the box and are sold in their new form as they leave The difference between the entering and exiting prices is called the pro t Assuming that there is a positive pro t the resources have increased in value as they pass through the production process The marginal value of a resource is precisely the increase in the per unit value of the resource due to the production process Let us now consider how the market introduces pressures on the pro tability and the value of the resources available to the market place We take the perspective of the cup factory vs the market place The market place does not want the cup factory to go out of business On the other hand7 it does not want the cup factory to see a pro t It wants to keep all the pro t for itself and only let the cup factory just break even It does this by setting the price of the resources available in the market place That is7 the market sets the price for plastic resin and labor and it tries to do so in such a way that the cup factory sees no pro t and just breaks even Since the cup factory is now seeing a profit7 the market must gure out by how much the sale price of resin and labor must be raised to reduce this pro t to zero This is done by minimizing the value of the available resources over all price increments that guarantee that the cup factory either loses money or sees no pro t from both of its products If we denote the per unit price increment for resin by R and that for labor by L7 then the pro t for beer mugs is eliminated as long as 1 20R 7L gt 25 15 7 since the left hand side represents the increased value of the resources consumed in the production of one case of beer mugs and the right hand side is the current pro t on a case of beer mugs Similarly7 for champaign glasses7 the market wants to choose R and L so that 1 12R 7L gt 20 15 7 Now in order to maintain equilibrium in the market place7 that is7 not drive the cup factory out of business since then the market realizes no pro t at all7 the market chooses R and L so as to minimize the increased value of the available resources That is7 the market chooses R and L to solve the problem D minimize 1800R 8L subject to 20R L 2 25 This is just another LP It is called the dual to the LP 73 in which the cup factory tries to maximize pro t Observe that if C j is feasible for 73 and I j is feasible for 39D then 253 200 g 20R LB 12R ch R20B 120 M B 0 g 1800R 8L 70 Linear Programming Thus the value of the objective in 73 at a feasible point in 73 is bounded above by the objective in D at any feasible point for D In particular the optimal value in 73 is bounded above by the optimal value in D The strong duality theorem77 states that if either of these problems has a nite optimal value then so does the other and these values coincide In addition we claim that the solution to D is given by the marginal values for 73 That is J I J J Si jg J is the optimal solution for D In order to show this we need only show that J I J J 3532 J is feasible for D and that the value of the objective in D at J L J Si fg J coincides with the value of the objective in 73 at J 2 J J First we check feasibility 5 375 0lt7 0lt7 8 2 5 1 375 207 i7gt25 81 2 i 5 1 375 127 77gt205 81 2 Next we check optimality 5 375 2545207526251800 87l This is a most remarkable relationship We have shown that the marginal values have three distinct and seemingly disparate interpretations 1 The marginal values are the partial derivatives of the value function for the LP with respect to resource availability 2 The marginal values give the per unit increase in value of each of the resources that occurs as a result of the production process and 9 The marginal values are the solutions to a dual LP D 615 LPs in Standard Form and Their Duals Recall that a linear program is a problem of maximization or minimization of a linear function subject to a nite number of linear inequality and equality constraints This general de nition leads to an enormous variety of possible formulationsl In this section we propose one xed formulation for the purposes of developing an algorithmic solution procedure We then show that every LP can be recast in this forml We say that an LP is in standard form if it has the form 73 maximize 0111 0212 cnzn subject to ailzl aigzg ainzn S bi for i12lum OSzj forjl2iuni Using matrix notation we can rewrite this LP as 73 maximize 5T1 subject to AI S b 0 S I 7 where the inequalities AI S b and 0 S I are to be interpreted componentwisel 6 1 Introduction 71 Transformation to Standard Form Every LP can be transformed to an LP in standard formi This process usually requires a transformation of variables and occasionally the addition of new variables In this section we provide a stepbystep procedure for transforming any LP to one in standard formi minimization A maximization To transform a minimization problem to a maximization problem just multiply the objective function by 71 linear inequalities If an LP has an inequality constraint of the form ai111 ai212 39 39 39 ainrn 2 bi This inequality can be transformed to one in standard form by multiplying the inequality through by 71 to get ailrl ai212 39 39 39 ainIn S 4 linear equation The linear equation ai111 39 39 39 ainrn bi can be written as two linear inequalities ai111 39 39 39 ainrn S bi and ai111 39 39 39 ain1n 2 bi The second of these inequalities can be transformed to standard form by multiplying through by variables with lower bounds If a variable 11 has lower bound li which is not zero li S Ii one obtains a nonnegative variable wi with the substitution zl wi lit In this case7 the bound li S 11 is equivalent to the bound 0 S wit variables with upper bounds If a variable 11 has an upper bound ui S one obtains a nonnegative variable wi with the substitution zl ui 7 wii In this case7 the bound 11 S ui is equivalent to the bound 0 S wit variables with interval bounds An interval bound of the form li S 11 S ui can be transformed into one nonnegativity constraint and one linear inequality constraint in standard form by making the substitution zl wi lit In this case7 the bounds li S 11 S ui are equivalent to the constraints 0 S wi and wi S ui 7 lit 72 Linear Programming free van39ables Sometimes a variable is given without any bounds Such variables are called free variables To obtain standard form every free variable must be replaced by the difference of two nonnegative variablesi That is if z is free then we get ziui7vi with 0 S ui and 0 S vi To illustrate the ideas given above we put the following LP into standard formi minimize 311 7 12 subject to 711 612 7 13 14 2 73 712 14 5 13 14 S 2 1S 12713 S 57 S 14 S 2 First we rewrite the objective as maximize 7 311 12 Next we replace the rst inequality constraint by the constraint 117612137I4S3i The equality constraint is replaced by the two inequality constraints 712 14 S 5 7712 14 S 75 Observe that the variable 11 is free so we replace it by 11 yi y2 With 0 S 9170 S y2 The variable 12 has a nonzero lower bound so we replace it by zgy37lwith0Sy3i The variable 13 is bounded above so we replace it by 13 57y4 withOSy4i The variable 14 is bounded below and above so we replace it by I4y572with0Sy5 andy5S4i Putting it all together we get the following LP in standard form maximize 73y1 3y yg subject to M 7 y2 7 6 7 M 7 ya S 710 7 ya S 14 i 793 ya S 14 i 94 ya S 1 95 S 4 0 S y17y27y37y47y5 62 A pig farming model 73 62 A pig farming model Suppose a pig farmer can choose between two different kinds of feed for his pigs corn or alfalfa He can feed them all of one or the other or some mixture The pigs must get at least 200 units of carbohydrates and at least 180 units of protein in their daily diet The farmer knows that 0 corn has 100 unitskg of carbohydrate and 30 unitskg of protein 0 alfalfa has 40 unitskg of carbohydrate and 60 unitskg of protein Here are a few questions the farmer might be faced with in choosing a good diet for the pigs 1 If the farmer uses only corn7 how many kgday does he need for each pig 2 If the farmer uses only alfalfa7 how many kgday does he need for each pig 3 Suppose corn and alfalfa each cost 30 cents kg How expensive would each of the above diets be 4 With these prices7 is there some mixture of the two grains that would satisfy the dietary require ments and be cheaper 5 What is the best mix if the price of alfalfa goes up to lkg What if it goes up only to 60 cents kg The problem of choosing the mix which minimizes cost can be set up as a linear programming problem minimize 0111 0212 subject to 100m 4012 2 200 61 3011 6012 2 180 11 12 2 0 where 11 and 12 are the quantity of corn and alfalfa to use in kilograms and cl and Cg are the cost per kilogram of each Multiplying the objective function and constraints by 71 would allow us to put this in the standard form discussed above Linear programming problems can be solved in MATLAB7 which requires a slightly different standard form7 since it solves a minimization problem rather than a maximization problem gtgt help linprog LINPRUG Linear programming XLINPROGfAb solves the linear programming problem min f x subject to Ax lt b x XLINPRUGfAbAeqbeq solves the problem above while additionally satisfying the equality constraints Aeqx beq XLINPROGfAbAeqbeqLBUB defines a set of lower and upper bounds on the design variables X so that the solution is in the range LB lt X lt UB Use empty matrices for LB and UB if no bounds exist Set LBi Inf if Xi is unbounded below 74 Linear Programming set UBi Inf if Xi is unbounded above XLINPROGfAbAeqbeqLBUBX0 sets the starting point to X0 This option is only available with the activeset algorithm The default interior point algorithm will ignore any nonempty starting point XLINPRUGfAbAeqBeqLBUBXOUPTIONS minimizes with the default optimization parameters replaced by values in the structure OPTIONS an argument created with the UPTIMSET function See UPTIMSET for details Use options are Display Diagnostics TolFun LargeScale MaxIter Currently only final and off are valid values for the parameter Display when LargeScale is off iter is valid when LargeScale is on XFVALLINPRUGfAb returns the value of the objective function at X FVAL f X XFVALEXITFLAG LINPROGfAb returns EXITFLAG that describes the exit condition of LINPROG If EXITFLAG is gt 0 then LINPRUG converged with a solution X 0 then LINPRUG reached the maximum number of iterations without converging lt 0 then the problem was infeasible or LINPRUG failed XFVALEXITFLAGUUTPUTj LINPRUGfAb returns a structure OUTPUT with the number of iterations taken in OUTPUTiterations the type of algorithm used in UUTPUTalgorithm the number of conjugate gradient iterations if used in UUTPUTcgiterations XFVALEXITFLAG0UTPUTLAMBDALINPRUGfAb returns the set of Lagrangian multipliers LAMBDA at the solution LAMBDA ineqlin for the linear inequalities A LAMBDA eqlin for the linear equalities Aeq LAMBDAlower for LB and LAMBDAupper for UB NOTE the LargeScale the default version of LINPRUG uses a primaldual method Both the primal problem and the dual problem must be feasible for convergence Infeasibility messages of either the primal or dual or both are given as appropriate The primal problem in standard form is min f x such that Ax b x gt 0 The dual problem is max b y such that A y s f s gt 0 62 A pig farming model To solve the optimization problem with costs cl 02 307 we can do the following gtgt f 3030 gtgt A 100 40 30 60 gtgt b 200180 gtgt vlb 00 gtgt linprogfAb vlb Optimization terminated successfully 10000 2 5000 This solution is illustrated graphically in Figure 62a If the cost of alfalfa goes up to li00kg gtgt f30100 gtgt linprogfAb vlb Optimization terminated successfully 6 0000 0 This solution is illustrated graphically in Figure 62b In class we will study the graphical interpreta tion of these and other related problems 621 Adding more constraints Suppose we add some new constraints to the problemi For example7 the farmer might want to keep the pigs from getting too fat by requiring that the total daily diet should be at most 5 kg7 adding the constraint 11 12 S 5 ln addition7 suppose pigs should eat at most 3 kg of any particular grain7 so that 11 S 3 and 12 S 3 This problem can be solved as follows gtgt A 100 40 30 60 1 1 gtgt b 200 180 5 gtgt vlb 0 0 gtgt vub 3 3 gtgt linprogfAb vlbvub Optimization terminated successfully ans 3 0000 1 5000 This solution is illustrated graphically in Figure 63 622 Solving the problem by enumeration Each constraint corresponds to a straight line7 so the feasible set is a convex polygon The objective function we are minimizing is also linear7 so the solution is at one of the corners of the feasible set Corners correspond to points where two constraints are satis ed simultaneously7 and can be found by solving a linear system of two equations from these two constraints in two unknowns 11 and 12 If we knew which two constraints were binding at the solution7 we could easily solve the problemi Unfortunately we don7t generally know this in advance 76 Linear Programming a b Figure 62 Graphical solution of the pigfarming problems in the 1112 plane In each case the solid lines show constraints and the dashed line is the iso cost line passing through the optimal solution7 Which is one of the corners of the feasible regioni Shade in the feasible region in each gure Figure 63 Graphical solution of the pigfarming problem With more constraints The solid lines show constraints and the dashed line is the iso cost line passing through the optimal solution7 Which is one of the corners of the feasible regioni Shade in the feasible regionl 62 A pig farming model 77 One approach to solving the problem would be to consider all possible corners by solving all possible sets of 2 equations selected from the various constraints For each choice of 2 equations we could o Solve the linear system for these two constraints 0 Check to see if the resulting solution is feasible it might violate other constraintsl o If it is feasible compute and record the value of the objective function at this point After doing this for all choices of two constraints the best value of the objective function found gives the solution The problem with this for larger problems anyway is that there is a combinatorial explosion of possible choices If there are m constraints and two decision variables then there are 7 mm 7 l 0m2 possible choices 623 Adding a third decision variable If there are more decision variables then the problem is even worse For example if we consider 3 grains instead of only 2 then we would have three variables 11 12 and 13 to consider In this case the pictures are harder to draw Each constraint corresponds to a plane in 3space and the feasible set is a convex body with planar sides Changing the value of the objective function sweeps out another set of planes and again the optimal solution occurs at a corner of the feasible set These corners are points where 3 of the constraints are simultaneously satis ed so we can nd all possible corners by considering all choices of 3 constraints from the set of m constraints There are now 7 gt 0m3 ways to do this 624 The simplex algorithm Practical linear programming problems often require solving for hundreds of variables with thousands of constraints In this case it is impossible to enumerate and check all possible corners of the feasible set which is now an object in a linear space with dimension in the hundreds The simplex algorithm is a method that has proved exceedingly effective at nding solutions in much less time though like branch and bound algorithms in the worst case it can still have exponential running time The basic idea is to start by identifying one corner of the feasible region and then systematically move to adjacent corners until the optimal value of the objective function is obtained To gain a basic understanding of the simplex method consider the following LP max 211 7 312 62 subject to 211 12 S 12 11 7 312 S 5 0 S 11712 The corresponding feasible region is depicted in Figure 64 The simplex method uses a series of elementary matrix operations Therefore the system of lin ear equalities in LP 62 must be reformulated into a system of linear equalities constraints The introduction of slack van39ables facilitates such a conversion of the constraints For instance the inequality 211 S 12 becomes 2zlzgzg 12 with the addition of the slack variable 13 2 0 Loosely speaking 13 makes up the slack in the inequality constraint 211 12 S 12 Take a moment and verify that the equality and inequality constraints are equivalent Such veri cation is an important part of mathematical modeling 78 Linear Programming 2Mgtlt2 12 X173gtlt2 5 Figure 614 Feasible region for LP The constraint 11 7312 g 5 still remains to be converted into a linear equality With a slack variable In an attempt to reduce decision variables one may be tempted to use 13 again as a slack variable Which would form the linear equality 11 7 312 13 g 5 Note that our linear system 62 does not imply that 211 12 7 12 11 7 312 7 5 or loosely speaking that there are equal amounts of slac 77 in each constraintl Forcing the same slack variable in each constraint changes the problem we are solvingl Take a moment and consider the implications of using 13 for each constraint in 612 Therefore another slack variable 14 Z 0 is introduced into the second constraint Which becomes 11 7 312 14 5 In general each inequality constraint should use a unique slack variablel Therefore LP 62 becomes max 2 211 7 312 63 subject to 211 12 13 12 11 7 312 14 5 OS 11 i1234 Notice 2 equals the optimum of the objective function Having obtained a system of linear equality constraints our system is prepared for matrix compu tation Traditionally the simplex method forms tableau1 Which are equivalent to matrices The initial simplex tableau for LP 63 is lt7 272113130 lt7 2117171271713 12 7117312145 Solutions can be read directly from the tableau Since the column associated With variable 11 contains more than 1 nonzero value 11 0 In contrast the column associated With 13 contains only one nonzero value Which implies that 13 121 121 Therefore this tableau leads to the solution 62 A pig farming model 79 11 12 013 1214 5 and 2 0 Considering more details of the method requires that we establish additional terminology regarding a simplex tableau 2 11 12 13 I4 RHS 1 l a 3 0 0 0 indicators 0 2 1 1 0 12 0 1 3 0 1 5 Basic variables are nonzero variables in the tableau and correspond to a corner point of the feasible region In the current tableau 13 and 14 are basic variables associated with the corner point 00 since 11 0 and 12 0 The basic idea of the simplex algorithm is to change which variables in the system are basic variables in order to improve the value of 2 The numbers in shaded boxes in the top row of the tableau are called indicators Let ai denote the indicator in the column associated with variable Ii Therefore the algorithmic steps for the simplex algorithm as applied to the simplex tableau are as follows H Check the indicators for possible improvement 7 If all the indicators are nonnegative then the solution is optimal else go to Step 2 E0 Check for unboundedness 7 If ai lt 0 and no positive element in the column associated with variable I exists then the problem has an unbounded objective value Otherwise nite improvement in the objective function is possible go to Step 3 CAD Select the entering variable 7 The most negative indicator identi es the entering variable That is select ai such that ai lt 0 and ai is the most negative indicator The entering variable is z and will be made nonzero The column associated with z is called the pivot column F Select the departing variable 7 This step determines which basic variable will become zero Find a quotient for each row by dividing each number from the right side of the tableau by the corre sponding number from the pivot column The pivot row is the row corresponding to the smallest quotient The pivot element is the element in the pivot row and pivot column 01 Pivot creating a new tableau Use elementary row operations on the old tableau so the column associated with 11 contains a zero in every entry except for a l in the pivot position Go to Step 1 Applying these steps to our tableau we nd the following 11 12 13 I4 RHS Quotients 122 6 51 1 pivot row pivot column Verify these results Note that 11 is the entering variable and 14 the departing variable For 11 to become nonzero we apply elementary row calculations to form the following tableau 80 Linear Programming This tableau leads to the solution 11 5 12 0 13 2 I4 0 and 2 10 The basic variables are indeed 11 and zgi Refer to Figure 64 to verify that our solution is a corner point of the feasible solution Moreover the current tableau represents an increase in the optimal value Yet the indicators identify that possible improvement still exists by selecting 12 as an entering variable The step of selecting the departing variable is left to the reader Elementary row operations produce the following tableau 2 11 12 13 I4 RHS 17 27 27 37 17 417 This tableau leads to the solution 11 212 2 zg I4 0 and 2 El The indicators are all positive which represents that the optimal value 2 cannot be improved Therefore the solution has been found Implementation of the simplex algorithm requires a great deal of linear algebra and bookkeeping Imagine performing elementary row operations on a simplex tableau with thousands of decision vari ables Fortunately software packages exist that are aimed specifically at solving linear programming and related problems One popular package is LINDO which is available on some campus computers The MATLAB function linprog implements the simplex algorithmi Program packages such as LINDO and MATLAB demonstrate the need for effective computation in the field of optimization 63 Complexity and the simplex algorithm The simplex algorithm was invented by George Dantzig in 1947 Dan63li Klee from UW and Minty KM72 proved that the simplex algorithm is exponential in the worst case but is very efficient in prac tice Khachian Kha79 invented the ellipsoid algorithm for nding solutions to linear programming problems in 1979 t at he showed runs in polynomial time Another more efficient algorithm due to KarmarkarKar84 uses an interiorpoint method instead of walking around the boundary This algo rithm appeared in 1984 In 2001 Spielman and Teng STOZ introduced a new complexity class they call polynomial smoothed complexity and they showed that the simplex algorithm is the classic example in this class Their work attempts to identify algorithms which are exponential in the worst case but which work efficiently in practice like the simplex algorithmi Math 381 Notes 7 University of Washington 7 Winter 2008 Chapter 7 Integer Programming 71 The lifeboat problem Many linear programming problems arising from real problems have the additional complication that the desired solution must have integer values For example in the lifeboat problem considered in Chapter 1 we must determine how best to equip a ferry with lifeboats and life vests maximizing the number of lifeboats subject to certain constraints The solution to the resulting linear programming problem will typically require some fractional number of lifeboats and life vests which is not possible Instead we must nd the best integer solution Our first guess at how to solve such a problem might be to simply round off the values that come from the LP solution This usually does not work however For example for the values considered in Chapter 1 the LP solution was 11 3636364 number of vests 12 318182 number of boats Trying to round this to 11 364 12 32 would give an infeasible solution requiring too much volume The best integer solution was found to be 11 381 12 31 We had to reduce 12 to 31 and then the number of vests 11 went up substantiallyi Figure 71 shows the graphical solution to this problem with a different set of parameters C1 1 capacity of each vest C2 2 capacity of each boat C ll total capacity needed V1 1 volume of each vest V2 31 volume of each boat V 15 total volume available The dashed line 12 3636 shows the maximum value of the objective function 12 obtained at the LP solution 11 3727 12 3636 The feasible integer points are indicated by the dark dots and we see that the best integer solution is 11 8 12 2 In this case we can7t even use the approach of Chapter 1 of reducing the 12 to the nearest integer and increasing 11 as neede i For most lP integer programming problems there is no direct way to nd the solution Recall that for the LP problem we know the optimum occurs at a corner of the feasible set and the corners can be found by solving an appropriate linear system of equations corresponding to the binding constraintsi For the IP problem typically none of the inequality constraints are exactly binding at the solution In simple cases one might be able to enumerate all integer points in the feasible set and check each but for problems with many variables and with a large range of possible integer values for each this is usually out of the question 82 Integer Programming Figure 71 Graphical solution of the lifeboat problem in the 1112 plane The solid lines show con straints and the dashed line is the objective function contour line passing through the optimal solution7 Which is one of the corners of the feasible regioni When integer constraints are added7 only the points marked by dots are feasible 72 Cargo plane loading 83 Some lP problems have a sufficiently special form that the solution is easily found For example certain transportation problems such as in the example we considered of canneries and warehouses have the property that the LP solution can be shown to be integervaluedl For other classes of l problems there are special techniques which are effective For many problems a branch and bound approach must be used in which linear programming problems are used to obtain boundsl This is illustrated in the next section for a relatively simple class of problems called zeroone integer programming since each decision variable is restricted to take only the values 0 or 1 Many practical problems can be put in this forml Often each value represents a yesno decision that must be made 72 Cargo plane loading Suppose you have a cargo plane with 1000 cubic meters of space available and you can choose from the following sets of cargos to load the plane Cargo Volume Pro t 1 410 200 2 600 60 3 220 20 4 450 40 5 330 30 For each cargo the volume and pro t expected from shipping this cargo are listed There are two possible scenarios 0 We are allowed to take any amount of each cargo up to the total volume with the pro t adjusted accordingly Elgi taking half the cargo yields half the pro t 0 We must take all of the cargo or none of it The second scenario is more realistic and gives a zeroone integer programming probleml The rst scenario gives a linear programming problem with continuous variables and is easier to solve using the simplex method for example It will be useful to consider problems of the rst type in the process of solving the integer programming problem as discussed in the next section Let be the volume of the jlth cargo Pj the pro t and V 1000 the total volume available in the cargo planer Introduce the decision variable zj to represent the fraction of the jlth cargo that we decide to take In the rst scenario above zj can take any value between 0 and 1 while for the actual problem each zj must be either 0 or 1 The integer programming problem then has the form maximize E szj 71 1 subject to E m g V 72 j 0 g I g 1 7 3 zj integer 74 If we drop constraint 74 then we have the linear programming problem corresponding to the rst scenario again For this small problem we can easily solve the integer programming problem by simply enumerating all possible choices of cargosl If we have It cargos to choose from then for each cargo the value zj can take one of two values and so there are 2k possible combinations of cargo In this case It 5 and there 84 Integer Programming are 32 possibilities to consider Not all will be feasible since the volume constraint may be exceeded Here is the enumeration of all 32 possibilities with the pro t arising from each which is set to zero for infeasible combinations cargo choice profit 1 5 HHHHHHHHHHHHHHHHOOOOOOOOOOOOOOOO HHHHHHHHOOOOOOOOHHHHHHHHOOOOOOOO HHHHOOOOHHHHOOOOHHHHOOOOHHHHOOOO HHOOHHOOHHOOHHOOHHOOHHOOHHOOHHOO HOHOHOHOHOHOHOHOHOHOHOHOHOHOHOHO 0000000000 The best combination is 101017 meaning we should take Cargos 17 37 and 5 For a larger problem it might not be possible to enumerate and check all possibilities Moreover the real problem might be more complicated7 for example a shipping company might have many planes and different sets of cargos and want to choose the optimal way to split them up between planesi 73 Branch and bound algorithm lnteger programming problems are often solved using branch and bound algorithmsi This is easily illustrated for the cargo plane problem since this is a particularly simple form of integer programming7 a 01 integer program in which each decision variable can take only the values 0 or 1 This suggests building up a binary tree by considering each cargo in turn Part of such a tree is shown in Figure 72 73 Branch and bound algorithm 85 cargo 11 U13 0 cargo 10101 pro t 250 cargo 10100 pro t 220 cargo 10011 pro t 0 cargo 10010 prom240 cargo 101 UB 253 o cargo 100 UB 253 1 cargo 1 cargo 1010 UB 250 cargo 1001 UB 2527 cargo 1000 UB 230 Figure 712 Illustration of the branch and bound algorithm for solving the cargoloading problemr In each box the values of some 01 integer values are speci ed and then the linear programming problem is solved with the other variables allowed to take any value between 0 and 11 This gives an upper bound UB on the best that could be obtained with this partial cargor lf UB is 0 then this choice is infeasible UB 259 cargo 10 UB 253 o cargo 0 UB 95 4 The root is to the left where the cargo is represented as meaning all ve decision variables are unspeci edr From here we branch to 1 and O in which we either take or do not take Cargo 11 Each of these has two branches depending on whether we take or do not take Cargo 2 Continuing in this way we could build up a full tree whose leaves would be the 32 possible choices of cargosr The goal of the branch and bound algorithm is to avoid having to build up the entire tree by obtaining a bound at each branching point for the best possible solution we could nd down that branch of the tree In this case we want an upper bound on the best pro t we could possibly nd with any selection that contains the pattern of zeroes and ones already speci ed along with any possible choice of zeroes and ones for the cargos which are still denoted by 1 It is exactly these selections which lie further down the tree on this branch We want to obtain this bound by doing something simpler than enumerating all possibilities down this branch and computing each pro t 7 this is what we want to avoi r The essential idea is to instead solve a linear programming problem in which the variables still denoted by are allowed to take any value between 0 and 11 This is more easily solved than the integer programming problemr The solution to this problem may involve noninteger values of some I but whatever pro t it determines perhaps with these partial cargos is an upper bound on the pro t that could be obtained with the integer constraints imposed Make sure you understand why For example at the root node we obtain an upper bound on the pro t of any selection of cargos by solving the linear programming problem 711 with the constraints 72 and 713 only dropping all integer constraints The solution is 11 l 12 0983 an 13 I4 I5 0 with a pro t of 2591 This isnlt a valid solution to the integer problem but any valid selection must have a pro t that is no larger than this At the node 1 we solve the linear programming problem 711 with the constraints 72 and 713 and also the constraint 11 l by changing the bounds on 11 to l S 11 S 11 Or simply reduce the number of decision variables and modify the objective function appropriately The solution is the same as found at the root node since the solution there ha At the node 0 we solve the linear programming problem 711 with the constraints 72 and 713 and also the constraint 11 01 This gives an upper bound on the pro t of only 9636 if Cargo 1 is not taken Again this requires taking fractional cargos and any integer selection with 11 0 will be even worse 86 Integer Programming It looks more promising to take Cargo 1 than to not take it and so we explore down this branch rst We next try 11 taking Cargo 1 and Cargo 2 and solving a linear programming problem for the other variables but we nd this is infeasible since Cargo 1 and 2 together require too much volume At this point we can prune off this branch of the tree since every selection which lies down path must be infeasible Continuing this way we eventually work down to a leaf and nd an actual integer selection with an associated pro t From this point on we can also prune off any branches for which the upper bound on the pro t is smaller than the actual pro t which can be obtained by a valid selection So for example all of the tree lying beyond the node 0 can be pruned off without further work since the upper bound of 96136 is smaller than leaf values we have already found 74 Traveling salesman problem We now return to the Traveling Salesman Problem TSP of Chapter 2 and see how this can be set up as an integer programming problem Consider a graph with N vertices labeled 1 through N and let dij be the distance from i to j Recall that we wish to nd the path through all nodes which minimizes the total distance traveled To set this up as an IP problem let zij be 1 if the edge from Node 239 to Nodej is used and 0 otherwise For example with N 5 the path 1 A 2 A 3 A 5 A 4 A 1 would be encoded by setting 112 I23 135 154 I41 1 and the other 15 zij values to 0 For a graph with N nodes there will be NN 7 1 decision variables Now we wis N N minimize Z Z dijzij 7 5 i1 j1 subject to various constraints One constraint is that we must use exactly one edge out of each node This leads to N Z 1 1 7 6 j1 for each i 1 2 i i i Ni Also we must use exactly one edge into each node so N 21 1 7 7 i1 for eachj 1 2 Hi N This is not enough for the constraints would allow things like 112 123 131 1 and 145 I54 1 7 8 with all the other zij 0 This is not a single TSP tour but rather two disconnected tours To rule out the possibility of subtours we must add other constraints This can be done in various ways One possibility is to introduce a new set of decision variables ul 1 i i uN which do not appear in the objective function but which have the N 7 12 constraints uiiujJerZjgNiL for ij23 iiiNi 719 Note that ul does not appear in these constraints The idea is that if we have a set of 117 which correspond to a valid TSP tour then it should be possible to choose some values ui so that these constraints are satis ed so that all such tours are still feasible On the other hand for a set of zij that contain subtours such as 718 it should be impossible to nd a set of ui satisfying 7 9 and hence such sets will no longer be feasible 75 IP in Recreational 87 To see why it is impossible to satisfy these constraints if there are subtours consider 78 for example The second subtour de ned by 145 I54 1 does not involve Node 1 The constraints 79 fori4 j5andfori5 j4yield u4 7 u5 5 S 4 u5 7 u4 5 S 4 Adding these together gives 10 S 8 which cannot be satisfied no matter what values we assign to u4 and u5 So we cannot possibly satisfy both of the constraints above simultaneously A similar argument works more generally If there are subtours then there is always a subtour that doesn7t involve Node 1 and writing down the constraints 79 for each ij in this subtour gives a set of equations that when added together leads to cancellation of all the ui values and an inequality that can never be satisfied On the other hand we wish to show that if we consider a valid TSP tour with no subtours then we can find an assignment of ui values that satisfies all these constraints The basic idea is to set ul 0 and then follow the tour starting at Node 1 setting the ui variables according to the order in which we visit the nodes This is best illustrated with an example 17275747371 yields u10 1121 u52 u43 1134 We can show that this satisfies all the constraints in 79 First consider the pairs ij for which zij 0 In this case 79 reads ui 7 uj S N 7 1 This will certainly be true since all the values we assigned to u variables lie between 0 and N 7 1 Next consider the pairs ij for which zij 1 This means we use the edge from Node 239 to j in our tour Nodej is visited immediately after Node 239 and hence uj ui 1 So we can compute that ui7ujNzij7lNN7l and hence the inequality in 79 is satisfied with equality in this case 741 NPcompleteness of integer programming We just showed that it is possible to reformulate any traveling salesman problem as an integer program ming problem This can be used to prove that the general integer programming problem is NPcomplete For if there were a polynomialtime algorithm to solve the general integer programming problem then we could use it to solve the TSP and hence have a polynomialtime algorithm for the TSP 75 IP in Recreational Mathematics Integer Programming also applies to problems in recreational math such as the challenging brain teasers found in puzzle books available at bookstores or on the World Wide Web at sites such as rec puzzles Whether tinkering with such problems in your mind or applying lP these problems offer fun challenges 88 Integer Programming I7 15 15 North castle 18 Castle 14 O O O O 11 I2 13 a b Figure 73 The wellguarded castle puzzle involves l2 guards posted around a square castle a Each guard is depicted with a solid dot Note that l soldier is posted at each corner of the castle and 2 soldiers on each wall b The associated I can be formulated with 8 decision variables 751 An Example Puzzle To illustrate lP7s in recreational math we begin with a simple brain teaser based on a puzzle from the book The Moscow Puzzles by B A Kordemsky Kor92 THE WELL GUARDED CASTLE There is unrest in the land and King Husky declares that the castle be guarded on every side Guards will be posted along a wall or at a corner of the square castle When posted on a corner a soldier can watch 2 walls Other soldiers are posted along the face of a wall and watch only 1 side of the castle The king orders 12 of his bravest soldiers to guard the stone fortress and initially plans to post 2 guards on each wall and 1 soldier on each corner as depicted in Figure 751 a Aid King Husky in securing the kingdom by nding an arrangement of the soldiers such that there are 5 soldiers watching each wall Since the problem serves as a brain teaser you may be able to derive an answer by inspection or trial and error imagine a much larger problem with thousands of soldiers in such cases the process needed to nd a solution could easily move beyond the scope of mental calculation Therefore this puzzle serves as an illustrative example of formulating such problems as integer programming problems 752 Formulation of the IP The solution process begins by determining the decision variables As in linear programming the best way to determine these variables is to put oneself in the shoes of the decision maker and ask the question What do I need to know in order to fully determine the solution77 in the case of the wellguarded castle we must know the number of soldiers positioned along each wall and at each corner Hence we de ne 8 decision variables I for i 8 where 1 equals the number of soldiers posted at position i The positions are numbered counterclockwise beginning at the southwest corner as seen in Figure 751 Recall the king declared that 5 soldiers watch each wall Again a soldier on a corner can watch 2 walls while a soldier along a wall watches only 1 This leads to the following constraints southern wall constraint 11 12 13 5 eastern wall constraint 13 14 15 5 northern wall constraint 15 15 17 5 western wall constraint 17 13 11 5 76 Recasting an IP as a 01 LP 89 What is the puzzles objective That is what is a suitable objective function As you may have noticed there are several solutions to this puzzle For instance the guards can be positioned such that 1154andzgz4zgzglorbysymmetry13174andzgz4z zgli Objective functions to maximize 11 15 or 13 17 could produce these answers respectively It is also possible to reduce the number of guards on the corners such that 11 I5 zg I7 2 and 12 I4 15 13 1 Take a moment and consider what objective function could produce this answer As we see the choice of an objective function determines which of such solutions is optimal It should be noted that in some cases if you desire a particular solution then additional constraints may be needed Suppose King Husky wants the total number of soldiers watching the corners to be at a minimumi Then our objective function becomes min 11 13 15 17 One constraint remains The king issues 12 guards which implies 211 11 12 The entire model for the wellguarded castle puzzle can now be succinctly stated as P minzlzgz5z7 subject to zlzgzg 5 zsz4zs 5 Iszsz7 5 171811 5 211 12 Ughgs i12m8 In class we will look at other puzzles and pose the problems as IP problems The interested reader n i 1 Af u i 77 is encouraged to visit the web site Puzzles lnteger r 1n httpwww chlonddemonco ukacademicpuzzleshtml to nd additional puzzlesi 76 Recasting an IP as a 01 LP In the last section P posed the wellguarded castle puzzle as an lPi Yet posing an IP does not imply that it is solved King Husky like an executive of a company who hires a consulting rm to model a phenomenon demands an answer from a model that he can trust Presenting King Husky with a properly posed IP and no solution might result in ones being thrown over that mighty wal l Solving P as an LP allows variables in the solution to contain nonzero decimal valuesi Such a solution relaxes the constraint that the decision variables are integers Hence an IP that does not enforce integral solutions is called a relaxed LPi lmagine presenting King Husky with the solution that 11 soldiers be posted on the southern walli Even a cruel leader would recognize the foolishness of such an action Still solving an IP as a relaxed LP can yield intuition on the nal solution Yet deciphering the optimal or even near optimal integer solution from a real numbered solution is often a difficult if not impossible task A 01 LP also called a 01 lP is a linear programming problem where decision variables can be considered as boolean variables attaining only the values 0 or 1 Moreover every lP can be recast as a 01 LPi Techniques such as branch and bound can be applied to solve a 01 LPi Our solution can then be recast from the 01 LP back to the integer solution of the corresponding lPi Assume Q is an lP and z is an integer decision variable of We introduce the decision variables 90 Integer Programming di1di2 idik and let k z ZjSldij 11 710 dil 2di2 A A A Zk ldik where dij 0 or lj 12Hiki For P the decision variables are bounded below by 0 and above by 5 Therefore it is suitable to let 11 du 2d12 4d13i Note that if du dlg l and dlg 0 then 11 3 Similarly every possible value of 11 can be determined by a suitable linear combination of du dlg and dlgi Note that the decision variables dij can also be viewed as the binary representation of the associated value of mix Completing similar substitutions for each decision variable in P the corresponding 01 LP becomes min d11 2d12 4d13 d31 2d32 4d33 d51 2d52 4d73 d71 2d72 4d73 subject to 211 22121 1dij 12 d11 2d12 4d13 d21 2d22 4d23 d31 2d32 4d33 5 d31 2d32 was d41 2d42 4d43 d51 2d52 4d53 5 dsi 2d52 4d53 d61 2d52 4d53 d71 2d72 4d73 5 d71 2d72 4d73 den 2d32 4d33 d11 2d12 4d13 5 0 g d S1i12iu8j123i Such a problem can be solved using a branch and bound algorithm or linear programming software What if our problem did not have an upper bound on the decision variables In such cases the IP is recast to a 01 LP with a large bound on each decision variable Then the bound is incremented by a xed value say between 10 and 50 until the solution no longer changes at which point the solution is found Integer programming can model a diverse range of problems In this chapter you learned techniques and some of the dif culties of solving 1 problems Math 381 Notes 7 University of Washington 7 Winter 2008 Bibliography 131392 ChvSS DAQS Dan63 Dyk84 cuss GJ79 HL95 Jef78 mas Kar72 Kar84 Kha79 KM72 Knu93 K0190 Di Bayer and Pi Diaconisi Trailing the dovetail shuf e to its lairi Ann Appl Prob 222947313 1992 Vasek Chvatali Linear Programming Wi Hi Freeman New York 1983 T57i74iC54i Al Dolan and l Aldousi Networks and Algorithms An Introductory Approach John Wiley amp Sons Chichester 1993 George Dantzigi Linear programming and extensions Princeton University Press Princeton NJ 1963 Di Pi Dykstrai Mathematical Programming for Natural Resource Management McGrawHill 1984 El Gilberti Bell labs technical report 1955 Mr Ri Garey and Di S Johnson Computers and Intractalzility A guide to the Theory of NPCompletenessi Wi Hi Freeman San Francisco 1979 F S Hillier and Gr Ji Lieberman Introduction to operations research McGrawHill New York 1995 l Ni Ri Jeffersi An Introduction to Systems Analysis with Ecological Applications University Park Press London 1978 Gr Jonsson and L N Trefetheni A numerical analyst looks at the cutoff phenomenon7 in card shuf ing and other markov chains In Numerical Analysis 1997 D F Cri ths D J Higham and C A Watson eds pages 1507178 Addison Wesley Long man Ltd 1998 R Mi Karpi Reducibility among combinatorial problems In Complexity of Computer Compu tations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY pages 857103 Plenum 1972i Ni Karmarkari A new polynomialtime algorithm for linear programming Combinatorica 423737395 1984 Li Khachiani A polynomial algorithm in linear programming Soviet Math Dokl 2021917194 1979 Victor Klee and Gi Mintyi How good is the simplex algorithmi ln Inequalities III pages 1597172 Academic Press 1972 Di El Knuthi The Stanford CraphBase A Platform for Combinatorial Computing Addison Wesley New York 1993 Gr Kolatai ln card shuf ing 7 is winning number In New York Times page C1 1990 91 92 BIBLIOGRAPHY Kor92 Bi Kordemskyl The Moscow Puzzles Dover reprint edition 1992 man problem 39 a guided tour 0 bub Chichester 1990i LLASQO E L Lawler Ji Ki Lenstra Al Hi Go Rinnooy Kan and Di Bi Shmoysl The Traveling sales JJJ WWTL Mak73 Di Pi Makil Mathematical models and applications with emphasis on the social life and management sciences PrenticeHall Englewood Cliffs 1973 Min78 El Miniekal Optimization algorithms for networks and graphs Marcel Dekker New York 1978 ReeSl Ji Reeds Theory of shuf ing unpublished manuscript 1981 Rob76 Fred S Roberts Discrete mathematical models with applications to social biological and environmental problems PrenticeHall Englewood Cliffs NJ 1976 Rob84 Fred S Roberts Applied combinatoricsl PrenticeHall Englewood Cliffs Nil 1984 Ros90 Si Ml Ross A Course in Simulation Macmillan Publishing Company New York 1990 SCCH97 Go C Smith C Li Cheeseman and R S CliftonHadley Modelling the control of bovine tuberculosis in badgers in england culling and the release of lactating females J Applied Ecology 341137571386 1997 Sip97 Ml Sipserl Introduction to the theory of computation PWS Publ Co Boston 1997 SK87 Go Li Swartzman and S P Kaluznyl Ecological Simulation Primerl MacMillan Publishing 1987 Ski90 Si Si Skienal Implementing Discrete Mathematics Combinatorics and Graph Theory with Mathematical AddisonWesley Redwood City CA 1990 S1000 Neil Sloanel The on line encyclopedia of integer sequences In A008292l httpwwwlreseachlattlcom njassequences 2000i STOZ Daniel Spielman and Shang Hua Tengl Smoothed analysis of algorithms In Proceedings of the International Congress of Mathematicians volume 1 pages 5977606 2002 UMAQS UMAP Journal MCM The First Ten Years special edition of the UMAP Journal COMAP 1995i ZP97 Zi Zhou and WW Panl Analysis of the Viability of a giant panda population J Applied Ecology 3423637374 1997


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.