Design & Theory of Algorithms
Design & Theory of Algorithms CSE 830
Popular in Course
Popular in Computer Science and Engineering
This 47 page Class Notes was uploaded by Donnell Kertzmann on Saturday September 19, 2015. The Class Notes belongs to CSE 830 at Michigan State University taught by Eric Torng in Fall. Since its upload, it has received 63 views. For similar materials see /class/207412/cse-830-michigan-state-university in Computer Science and Engineering at Michigan State University.
Reviews for Design & Theory of Algorithms
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/19/15
Reduction Worksheet 0 Hamiltonian Path 17 Hamiltonian Circuit 7 Transformation R gtk Input to R graph G V7E7 an input to HP gtk Output of R graph G V 7 E 7 an input to H0 V VU ZV E EUzv lUEV Argument that output of R7 namely G7 has polynomial size gtk What is the size of G relative to G 7 Yes to Yes argument if G has a HP7 then G has a HC gtk Let 111112 7UV be the HP in G gtk Then is a H0 in G gtk Brie y argue why 7 Yes from yes argument or No to No argument If G7 has a HC7 then G has a HP gtk Let 70102771V7 be the H0 in G gtk Then is a HP in G gtk Brie y argue why o Hamiltonian Circuit 17 Hamiltonian Path 7 Transformation R gtk Input to R graph G V7E7 an input to what gtk Output of R graph G V 7 E 7 an input to what Give a possible transformation Argument that output of R7 namely G has polynomial size gtk What is the size of G relative to G 7 Yes to Yes argument 7 Yes from yes argument or No to No argument Reduction Worksheet 0 Quick check on reductions Suppose you wish to show that SAT 17 CLIQUE see list of problems 7 You need to come up with a transformation R that satis es certain properties gtk What should the input to R be gtk What should the output of R be gtk What relationship should there be between the input to R and the output of R 0 Restriction Technique show that a special case of the problem you are interested in that is a restriction of the problem is NP complete Example Long Path Problem 7 Input Graph G V7E7 integer k YesNo Question Does G has a simple path of length at least k i A special case of the long path problem is identical to one of the problems on the list of problems Identify the appropriate problem7 and describe the special case 0 Local replacement technique Make many independent non interacting local changes to the structure 7 Example Reducing SAT to 3SAT Suppose an input instance to SAT consists of the following 3 clauses 1727374757 h 27 nigh 17 2737 4lgt Give the resulting clauses in the input instance to 3SAT after applying the local replacement technique 0 Component Design 7 Example 3SAT 17 Independent Set 7 Suppose the input instance to 3SAT has the following clauses 17273717 27 4727 37 4 Give the resulting Independent Set input instance 7 What would change if we directly reduced SAT to Independent set o Hamiltonian Path 17 Hamiltonian Circuit 7 Transformation R gtk Input to R graph G V7E7 an input to HP gtk Output of R graph G V 7 E 7 an input to H0 V VU ZV E EUzv lvEV Argument that output of R7 namely G7 has polynomial size gtk What is the size of G relative to G 7 Yes to Yes argument if G has a HP7 then G has a HC gtk Let 111112 7UV be the HP in G gtk Then is a H0 in G gtk Brie y argue why 7 Yes from yes argument or No to No argument If G7 has a HC7 then G has a HP gtk Let 70102771V7 be the H0 in G gtk Then is a HP in G gtk Brie y argue why o Hamiltonian Circuit 17 Hamiltonian Path 7 Transformation R gtk Input to R graph G V7E7 an input to what gtk Output of R graph G V 7 E 7 an input to what Give a possible transformation Argument that output of R7 namely G has polynomial size gtk What is the size of G relative to G 7 Yes to Yes argument 7 Yes from yes argument or No to No argument Lecture 8 Greedy Algorithms Singlesource shortest path CDijkstra s algorithm Scheduling Shortest Path Problem Problem Specification Input Weighted graph G VE source node s Weight length function W on each edge e in E Task Compute the length of the shortest path from s to every node v in V Compute the actual shortest paths from s to very node v in V Dijkstra s algorithm Initialize set of completed nodes S to be s Initialize Dv to be the length of the shortest path to v where all intermediate nodes are in S Selection algorithm Choose node v in VS such that Dv is minimized Implementation Initialize S to be s Initialize Dv to be wsv for all v in VS Loop Find minimum Dv value and addv to S Update Du for other nodes in VS based on addition of v to S Running time analysis lVlz time ifwe use a simple linear search to find min element and perform update Find and update can be reduced to log n operations per edge in the graph if we use a heap and an array of lists of edges from each node Heap elements based on Dv value Update items in heap using list of edges from v Each update costs log n time Updates happen at most once per edge lEl 10g M Which implementation is better when Proofs of Correctness Assume that Dijkstra s algorithm doesn t compute the length of all shortest paths from s Let v be the first node whose shortest path length is computed incorrectl Let S be the set of nodes whose shortest paths were computed correctly by Dijkstra prior to adding v to the processed set of nodes The shortest path to v must include some node u not in S Dijkstra s algorithm has just used the shortest path from s to v using only nodes in S when it added v to S Either u or some node on the shortest path to u must be a node in VS whose shortest path contains only nodes in S Call this node y Two nodes both cannot be on each other s shortest path The length of the shortest path to y must be at least that of the length of the path computed to v or else we would have chosen y instead of v to add at this step The length of the path from y to v must be lt 0 or else the alternate shortest path has at least the same length as the one found by Dijkstra s algorithm No path can have negative length and thus we have a contradiction No swap this time just the derivation of a contradiction u or y sort of takes the place of the swap element We swap the shortest path to s that goes through u with the path that uses only nodes in S Minimum average total waiting time scheduling Problem Specification Input Set ofjobs 1 2 n with processing times t Task Find an ordering of these jobs on a single processor that minimizes T the sum of their waiting times in the system T 2 time in system forjob i E c Where c is the completion time of eachjob i Greedy Algorithm The Shortest Processing Time First SPT Algorithm There are n different permutations of the n jobs What is a greedy strategy for ordering these jobs Choose jobs in order of processing times Implementation Sortjobs by processing time n log n Proof of Correctness Assume SPT is not optimal Consider any optimal schedule permutation of the n jobs P Let the jobs be numbered by their order in P Assuming that SPT is not optimal there must be some job i such that ti gt ti1 Construct schedule P swapping jobs i and il The time in the system for all other jobs is identical to P The time in the system for job i in P is identical to that of job il in P The time in the system for job il in P is strictly less than that of job i in P This is because t gt tg1 This is a contradiction as P was supposed to be optimal Thus SPT is optimal Alternative proof of correctness If we look at the mathematical sum we get T ntl nlt2 n2t3 2 tn1 tn This is clearly minimized by making t1 the minimum possible t2 the second minimum possible Minimizing average waiting time with release times and preemptions Problem changes jobs have release times r and jobs can be preempted Release times Job i cannot be processed until time r in the above problem for all i r 0 Preemptions We can process a job stop it to process another job and then resume processing the original job with no penalty Input Set ofjobs 1 2 n with processing times t and release times ri Task Find a preemptive schedule of these jobs on a single processor that minimizes T the sum of their waiting times in the system T 2 time in system forjob i E C fa Greedy Strategy Implementation cost Proof of correctness Maximizing Pro t of Jobs with Deadlines Problem Specification Input Set ofjobs 1 2 n with processing times 1 profit g gt 0 and integer deadlines d gt 0 Task Find a schedule of jobs that maximizes the sum of profits of jobs that complete by their deadline Greedy Algorithm Choose jobs in order of profit Implementation Initialize S to be Loop Choose job j with maximum profit Check if S union j can be legally scheduled feasibility check If so addj to S Feasibility Check Lemma A set of jobs is feasible if and only if a deadline ordered schedule is feasible Proof Assume the jobs are numbered in order of increasing deadline Suppose the deadline ordered schedule is not feasible Then some job j is the first to not complete by its deadline This means there are j jobs that have deadlines at least jl The set ofjobs is not feasible Analysis of algorithm Sorting by profit n log n Feasibility check Inserting new job into order can take linear time Loop occurs n times so a total time of nz Proof of correctness Greedy chooses jobs I with schedule S1 Optimal chooses jobs I with schedule SJ Rearrange schedules to make sure common jobs are scheduled in the same time slot Suppose job a occurs in S1 in time slot p while job a occurs in S1 in time slot q and pltq We can swap the contents of time slots p and q in SI without affecting feasibility Similar swap for reverse case Call new schedules SI and S There must be a difference somewhere or else I I Suppose there is a time t when the job scheduled in S1 is different than that in SJ Cannot be a gap in SJ or we could add the job and produce a better schedule Cannot be a gap in S1 or greedy would have added the job Cannot have the job in I have higher profit than the job in I Greedy would have considered the higher profit job first Cannot have the job in I have higher profit than the job in J Can swap jobs to produce a better schedule contradicting optimality of I Only possibility the jobs must have equal profits Thus I and I must produce equal profits and Greedy must be optimal as well Concepts of Completeness and NP 0 Utility of relative classi cation results 7 Consider only a pair of problems H and H gtk What does H 17 H mean This means that if H E P then H E P lntuitively H is at least as hard as H gtk What does H 17 H and H 17 H mean This means that if either H or H are in P then the other one is as well lntuitively these two problems are equivalent in dif culty gtk ln isolation these results have relatively little impact unless you care about these two speci c problems 7 Consider a set of problems C gtk What does VH E C H 17 H mean This means that if H is in P then all the problems in C are in P lntuitively H is the hardest problem in the set of problems H U C gtk What does VHll E C H 17 H mean If any one of the problem in the set C is in P then they all are lntuitively the problems in C are equivalent in dif culty gtk The importance of these results really depends on the class C 0 De nition of C complete and C hard Let C be a set of problems C hard gtk A problem H is C hard if VH E Cll 17 H holds gtk That is H is in P if and only if all of C is C complete gtk A problem H is C complete if H is C hard and H E C gtk That is H is in C and is the hardest problem in C with respect to being in P 0 Observations about C complete and C hard All C complete problems are equivalent in dif culty with respect to being in P Proving a new problem H is C hard gtk If there is a known C complete problem H then we can prove H is C hard by showing that H 17 H This follows from transitivity of polynomial time many one reductions gtk If there is no known C complete problem H we require some method for proving that all problems in C polynomial time many one reduce to H o Polynomial time rnany one reductions are transitive That is7 if H1 17 H2 and H2 17 H3 then H1 17 H3 0 Proof 6 Let R1 be the reduction used to prove H1 17 H2 6 Let R2 be the reduction used to prove H2 17 H3 6 Let x be an input to H1 6 De ne R3x to be R2R1 6 Argument that R3z is a yes input to H3 if and only if z is a yes input to H1 gtk Because R1 is a reduction between H1 and H2 we know that R1z is a yes input to H2 if and only if z is a yes input to H1 gtk Because R2 is a reduction between H2 and H3 we know that R2R1 is a yes input to H3 if and only if R2R1 is a yes input to H2 gtk Applying transitivity of if and only if7 we get the desired relationship that R3x is a yes input to H3 if and only if z is a yes input to H1 6 Argument that R3 is polynornial tirne cornputable Let R1 take time 7101 Let R2 take time 7102 Let n be the size of z Then the R1 portion of R3 takes time n01 Furtherrnore7 R1 has size at most rnaxnn01 Therefore7 the R2 portion of R3 takes time at most rnaxncz7 7101 rnaxn02n0102 In either case7 the total time taken by R3 is polynomial in 71 96969696969696 o A potentially interesting C to consider is the class NP because i It includes many interesting problems i It seems unlikely it is identical to P i We can show many interesting problems have the above property of being NP complete 0 NP hard and NP complete A problem H is NP hard if VH E NP7 H 17 H i A problem H is NP complete if gtllt H E NP gtk H is NP hard Proving a problem H is NP complete 1 Show H E NP gtk Usually the easy step 2 Prove for all H E NP7 H 17 H gtk Table method7 simulation based method Cook7s Theorem gtk Show that H 17 H where H is a known NP hard problem 0 The importance of NP completeness and the ls P NP777 question Practioner7s point of view gtk There exist a large number of interesting and seemingly di erent problems which have been proven to be NP complete gtk The P NP question represents the question of whether or not all of these di erent and interesting problems belong to P gtk As the set of NP complete problems increases7 the question becomes more interesting and important Theoretician7s point of view gtk We will show later that NP is exactly the set of problems which can be veri ed in polynomial time gtk Thus ls P NP777 can be rephrased as follows ls it true that any problem which can be veri ed77 in polynomial time can be solved77 in polynomial time Hardness implications gtk It seems unlikely that the problems which can be veri ed in polynomial time also can be solved in polynomial time gtk If you believe that7 this implies that P 31 NP gtk Thus7 proving a problem to be NP complete is a hardness result 0 Complexity class NP 7 De nitions gtllt gtllt gtllt gtllt gtllt Let H be a decision problem Let I be an input instance for H Let YH be the set of yes input instances Let NH be the set of no input instances Let CI be a certi cate that instance I E YH P is the set of problems that can be solved in polynomial time gtllt If H is in P7 there exists an algorithm that when given I can determine if I is a yes input instance in polynomial time lnformal NP is the set of problems that can be veri ed in polynomial time i More formal If H is in NP7 then there exists an algorithm A with the following properties gtllt gtllt gtllt For all I E ll l7 there exists a polynomial sized certi cate CI such that AI7 CI yes For all I E Nll7 for all polynomial sized certi cates CI7 AI7 CI no Furthermore7 A runs in polynomial time Example lndependent Set problem gtllt gtllt Certi cate a candidate independent set 0 Algorithm Check to see if C is an independent set of size K For any input I that is a yes input instance7 there does exist a C such that AICI will return yes For any input I that is a no input instance7 there does NOT exist a C such that AI7 CI will ever return yes Certi cates and Veri cation Algorithms gtllt 96 96 96 gtllt In most cases7 you can think of the certi cate CI as a candidate solution for input instance I For example7 in the independent set problem7 the certi cate was a candidate independent set In this case7 the veri cation algorithm simply veri es that CI is a true solu tion for I For yes input instances7 such a candidate solution will exist For no input instances7 no such candidate solution can exist Proving a problem is in NP gtllt gtllt gtllt You need to describe what certi cate CI will be for any input instance I You need to describe the veri cation algorithm usually trivial You need to argue that all yes input instances and only yes input instances have an appropriate certi cate CI also usually trivial Lecture 7 Greedy Algorithms Making Change Minimum Spanning Trees Knapsack Problem Basic Technique Optimization problem Many feasible solutions Some solutions are better than others by some criteria Objective function Choices We can break the problem down into a sequence of local choices Selection function Greedy Technique Make a local choice that is locally optimal Proof Techniques Alternation between the following two strategies 1 Prove the algorithm is NOT optimal by find a counterexample 2 Prove the algorithm IS optimal by observing specific structures in the problem Making Change Objective function number of coins Finding a counterexam Problem 620 3 6 12 24 30 Problem 64 1 1025 50 100 Finding a counterexample or proving the greedy algorithm is optimal Problem 63 l 25 5 10 20 25 50 Problem 65 l 2 4 8 16 32 2quot Is there a coin we could eliminate from Problem 62 and then make greedy optimal Which coin and why Minimum Spanning Tree Problem Specification Input Weighted connected graph G VE Weight length function w on each edge e in E Task Compute a spanning tree of G of minimum total weight Spanning tree If there are n nodes in G a spanning tree consists of nl edges such that no cycles are formed Kruskal s algorithm Selection algorithm Add minimum weight edge that does not form a cycle Implementation sort edges by increasing lengths Or use heap to organize edges creating heap can be done in lEl time finding element with minimum value may take log lEl time Initialize set of connected components to be each individual node use disjoint set structure from section 59 Findv determine which CC a node v belongs to Mergeccl cc2 merge two connected components Running time analysis lEl log lEl time dominates for sorting edges or using heap implementation Prim s algorithm Initialize connected component N to be any single node Selection algorithm Add minimum weight edge connecting N to VN Implementation For all nodes v in V 7 N maintain list of nearest node in N to v as well as distance Use these lists to select next node to add to N Update these two lists each time a node is added to n Running time analysis Work involved in adding each edge and thus a new node v to N is VD This needs to be done Vil times V time for this algorithm Comparison of efficiencies n is Prim s algorithm advantageous When is Kruskal s algorithm advantageous Note there are other more efficient algorithms available Proofs of Correctness Kruskal s proof of correctness Let T be a minimum spanning tree Let T be a tree formed by Kruskal s algorithm that utilizes edges in T whenever a tie needs to be broken Assumption T has weight greater than T otherwise Kruskal s algorithm has produced an optimal solution Let e be the smallest weight edge in T that is not in T Add e to T and consider the cycle CY that is formed There must be some edge e on cycle CY that has weight greater than e Why Replace e by e in T We have a new spanning tree T and this new spanning tree has smaller weight This contradicts the fact that T was a minimum spanning tree Prim s proof of correctness Let T be a minimum spanning tree Let T be a tree formed by Prim s algorithm that utilizes edges in T whenever a tie needs to be broken Let e be the first edge added to T that is not in T LetN be the existing connected component formed by Prim s algorithm up to this time In T there must be an edge e that connects N to VN of weight greater than e W Replace e by e in T We have a new spanning tree T and this new spanning tree has smaller weight This contradicts the fact that T was a minimum spanning tree Common features of both proofs Start with a supposed optimal solution Identify a first element that is in greedy solution but not in optimal solution Swap to derive a contradiction of optima ity Lecture 3 Asymptotic Notation l o the little oh of 2 O the big oh of 3 the theta of 4 Q the omega of Intuition We want to discuss the asymptotic performance of algorithms ignoring constant factors Why do we ignore constant factors Implementation issues can speed up an algorithm by constant factors improved hardware slight improvements in code We want to understand how effective an algorithm is independent of these factors What is asymptotic analysis It is an analysis where we focus on the infinite set of large n ignoring small values of n Picture with n on x axis cost on y axis cut off finite number of n to 0 These notations represent sets of functions Ofn is a set of functions However we will use oneway equalities like n log n Onz This really means the set n log n is a subset of the set of functions described by Onz Incorrect notation Onz n lo n Analogy A dog is an animal but not an animal is a dog Domain and range assumptions about the functions f we will study f N gt RI Analysis is still essentially valid without these assumptions 1 Little oh notation fn ogn Informal lt f grows more slowly than g Formal limn gt in nity fngn exists and 0 Example n2 ons since lim nZn5 ln3 0 2 Big oh notation fn ogn Informal lt f s growth rate is no larger than g s growth rate within a constant factor Could be the same or slower Formal there exist constants c and no such that for all n gt no fn lt c gn limn gt in nity fngn exists and is a constant not infinity 3 Omega notation fn Qgn rmalgt f s growth rate is no smaller than g s growth rate within a constant factor Could be the same or larger Formal there exist constants c and no such that for all n gt no fn gt c gn limn gt in nity gnfn exists and is a constant not infinity Other ways fn Qgn if and only if gn Ofn if and only if fn is not ogn Used to express lower bounds Awkward to say n2 is Otime to execute this algorithm Better is Alg is gnz Approximation Algorithms 0 Main issues 7 Finding a good lower or upper bound for the optimal solution value Relating your algorithm7s solution value to this bound 7 Examples Scheduling and Vertex Cover 0 Basic Setting 7 Working with an optimization problem not a decision problem where you are trying to minimize or maximize some value 7 Examples gtk Vertex cover lnput Graph G V E Task Find a vertex cover of minimum size Value to be minimized vertex cover size gtk lndependent Set lnput Graph G V E Task Find an independent set of maximum size Value to be maximized independent set size 7 The problems are NP hard gtk The decision version of all these problems is NP complete gtk Thus we probably cannot nd a polynomial time algorithm that solves the problem optimally for all input instances 7 Notation gtk Let H be the problem under consideration gtk Let I be an input instance of H gtk let OPT denote the optimal algorithm gtk Let A denote the algorithm under consideration gtk For any algorithm A and any input instance I let AU denote the value of As solution for input instance I 7 Absolute approximation of c c absolute approximation algorithm gtk 30 such that VIAU S OPTI c for minimization problem H A is a 2 absolute approximation algorithm for vertex cover if A can always nd a vertex cover at most 2 nodes larger than the optimal size gtk 30 such that VIAU 2 OPTI 7 c for maximization problem H A is a 2 absolute approximation algorithm for independent set if A can always nd an independent set at most 2 nodes smaller than the optimal size gtk Very few NP hard problems have polynomial time absolute approximation algorithms 7 Relative approximation c approximation algorithm gtk 30 such that VIAU S c gtlt OPTU for minimization problem H For example A is a 2 approximation algorithm for vertex cover if A can always nd a vertex cover of at most twice the optimal size gtk 30 such that VIAU 2 10 gtlt OPTI for maximization problem H For example A is a 2 approximation algorithm for independent set if A can always nd an independent set of at least half the optimal size gtk Most of our focus is nding good relative approximation algorithms unless explicitly stated otherwise when I talk about approximation algorithms 1 mean relative approximation algorithms 0 Example Multiprocessor scheduling to minimize makespan lnput gtk Set of 71 jobs with processing times zn gtk Number of machines m 7 Task gtk Schedule the jobs on the m machines with the goal of minimizing the makespan maximum completion time of any job of the schedule 7 Initial thoughts about this problem gtk Suppose m 1 is this problem hard gtk We know this problem is hard for m 2 2 because of a reduction from the partition problem 0 Graham7s list scheduling algorithm Greedy Take the items in an arbitrary order 7 Place each item on the currently least loaded machine 7 This is a greedy algorithm 0 Proof of 2 7 1m approximation factor Greedy7s cost gtk Let job j be the last job to complete by Greedy gtk Let h be the load of the machine j is placed onto before job j is placed gtk Greedy7s cost is h z Bounds on OPTU gtk OPTI 2 maxj OPTI 2 i 2 z Relating the two costs h S E1 95 95739 This follows because greedy placesj onto the least loaded machine available In the worst case7 the least loaded machine has height the average cost of the remaining 71 7 1 jobs if jobj is the last job and the other jobs can be evenly partitioned his gives us GREEDYU h ir 95139 i 95739 95739 321 95139 m l j OPTI WWAOPTU lt2 7 imam gtllt H l H l o Longest job rst 7 Same as Graham7s list scheduling but rst sort the jobs into non decreasing order by size 0 Proof of 43 7 yapproximation ratio 7 Let job j be the last job to complete by LJF 7 Without loss of generality7 remove all jobs smaller than job j gtk LJF7s cost cannot decrease because all jobs removed are scheduled after job j and thus do not affect the completion time of job j by LJF gtk OPT7s cost cannot increase by removing jobs 7 LJF7s cost gtk Let h be the load of the machine j is placed onto before job j is placed gtk LJF7s cost is h 7 7 Bound on OPTI gtk OPTI 2 i 2 1 7 2 cases based on size of 7 7 Case 1 Q S OPTI3 h am xi 7 xi This follows because LJF places j onto the least loaded machine available In the worst case7 the least loaded machine has height the average cost of the remaining 71 7 1 jobs if jobj is the last job and the other jobs can be evenly partitioned 321 95139 i 95739 95739 221 mTil j pT L71 OPEU QE ElH 7 Case 2 Q gt OPTI3 gtk Since the smallest job in the instance has size strictly larger than 130PTI7 in the optimal schedule7 no machine has more than 2 jobs scheduled on it gtk In this case7 the optimal way to schedule jobs is to do exactly what LJF will do pair up job m 7239 with job m 1 3 gtk Thus7 in this case LJFU OPTI Since in both cases we have shown that LJFI S 43 7 follows OPTI7 the result L 3m 0 Example Vertex Cover 7 Input gtk Graph G V7 E 7 Task gtk Find a vertex cover of minimum possible size NP hardness result gtk We know this problem is NP hard because of a reduction from the independent set problem 0 Greedy is NOT good for vertex cover 7 Choose a node with highest updated degree breaking ties arbitrarily Update degrees of remaining nodes 7 Can produce solution logn times as large as optimal o Maximal matching algorithm 7 Find a maximal matching in the graph gtk A matching is a set of edges such that no two edges in the matching share a node A maximal matching is a matching that cannot be increased by the addition of an edge without violating the condition of no two edges sharing a node A maximum matching is a largest possible maximal matching Maximum matchings can be found in polynomial time7 but the algorithm is fairly sophisticated gtk Maximal matchings can be found ef ciently by processing the edges one at a time retaining an edge if and only if it does not share a node with any of the already chosen edges 96 7 Choose both nodes from all edges in the matching to be in the vertex cover gtk This set of nodes must be a vertex cover gtk Suppose it is not gtk Then there is some edge um that does not include a node in the candidate vertex cover gtk In this case7 edge um can be added to the maximal matching contradicting the claim that we had a maximal matching 0 Proof of 2 approximation ratio 7 Let the matching size be M 7 Our algorithm7s cost is 2M 7 OPTI 2 M gtk This follows because each edge in the matching must be covered by at least one node gtk Since no two edges in the matching share a node7 at least one of the two nodes from each edge must be chosen 0 Example Bin Packing lnput gtk Set of 71 objects of sizes 51 gtk ln nite number of bins of size B 7 Task gtk Find a packing of the 71 objects into the minimum possible number of bins Bin packing is NP complete gtk Can reuse the reduction from Partition to Makespan Scheduling 0 Algorithms First Fit Algorithm gtk Arbitrarily order the items gtk Number the bins by the order they are opened initially no bins are open gtk When working with item 239 place it into the rst open bin it ts into If it does not t into any open bin7 open a new bin and place it into that bin 7 Best Fit Algorithm gtk Arbitrarily order the items gtk When working with item 239 place it into the open bin it ts most tightly into If it does not t into any open bin7 open a new bin and place it into that bin 7 First Fit Decreasing Algorithm gtk Sort the items into non decreasing order by size gtk Number the bins by the order they are opened initially no bins are open gtk When working with item 239 place it into the rst open bin it ts into If it does not t into any open bin7 open a new bin and place it into that bin 7 Best Fit Decreasing Algorithm gtk Sort the items into non decreasing order by size gtk When working with item 239 place it into the open bin it ts most tightly into If it does not t into any open bin7 open a new bin and place it into that bin 7 Proof that all algorithms are at least 2 approximations gtk There is at most 1 open bin that is more than half empty lf not7 any of the algorithms would have combined the items in the two at least half empty bins into 1 bin gtk Thus7 an upper bound on the number of bins ignoring the one possibly half empty bin used by any of these algorithms is 2 ELI sl 5 gtk Clearly OPTI 2 22 5 gtk The result of 2 follows FF and BF have approximation ratios of 1710 ignoring a constant additive factor FFD and BFD have approxmation ratios of 119 ignoring a constant additive factor 7 The proofs of these results are very long and involved 0 Example Traveling Salesperson lnput gtk List of n cities gtk Distances dz39j between each pair of cities 7 Task gtk Find a tour of the cities of minimum possible total length i No polynomial time o approximation algorithm exists for this problem unless P gtk If there is a polynomial time o approximation algorithm for this problem then Hamiltonian Cycle can be solved in polynomial time gtk Take an arbitrary input instance graph G V of Hamiltonian Cycle and turn it into a TSP problem as follows For each node in V create a city For each edge 239j E E set dz39j 1 For each pair of vertices 27 Z E set dz39j no If G has a Hamiltonian cycle then the optimal tour in the TSP instance is n If G does not have a Hamiltonian cycle then the optimal tour in the TSP instance has length at least n 7 1 no gtk Apply our o approximation algorithm to the TSP instance If our approximation algorithm for TSP returns an answer of at most no then we know our original graph had a Hamiltonian cycle If our approximation algorithm for TSP returns an answer greater than no then we know our original graph did not have a Hamiltonian cycle Thus we can solve the Hamiltonian cycle problem in polynomial time using our o approximation algorithm for TSP 0 Example Metric Traveling Salesperson lnput gtk List of n cities gtk Distances dz39j between each pair of cities Distances satisfy triangle inequality dz39j dj k S dz39 k Note that the distances in our proof above that unrestricted TSP is hard to approximate do not satisfy this constraint 7 Task gtk Find a tour of the cities of minimum possible total length 0 Greedy algorithm 7 Best guarantee is Olog nOPTI o MST algorithm 7 Find a minimum spanning tree T 7 Double all the edges in T to create an Eulerian graph 7 Take an Euler tour of the graph using shortcuts to avoid visiting cities more than once gtk Shortcuts cannot increase cost because of triangle inequality 0 Proof of 2 approximation ratio 7 Let CT be the total weight of the minimum spanning tree T 7 OPTI 2 CT gtk Remove an edge from the optimal tour gtk We now have a path which is one possible spanning tree gtk The minimum spanning tree must have cost no more than this path 7 MSTI S 20T gtk The Eulerian tour has cost exactly 20T gtk When taking shortcuts7 this cost may decrease but cannot increase because of triangle inequality 0 Christo des7 matching improvement 7 Find a minimum spanning tree T 7 Find all the nodes in T with odd degree 7 Find a minimum weight matching M of these nodes 7 Create an Eulerian graph by adding the edges in M to those in T 7 Take an Euler tour of the graph using shortcuts to avoid visiting cities more than once 0 Proof of 32 approximation ratio 7 Let CT be the total weight of the minimum spanning tree T and let CM be the total weight of M 7 OPTI 2 CT gtk Remove an edge from the optimal tour gtk We now have a path which is one possible spanning tree gtk The minimum spanning tree must have cost no more than this path 7120PTI2 CM Lecture 6 Divideand Conquer Algorithm s Quicksort Algorithm and MultiplicationExponentiation Problems Quicksort Algorithm Choose a partition element Partition list of elements into smaller and larger lists Recurse on smaller and larger lists Running time analysis Tn Tsmaller Tlarger n Bestcase Tsmaller and Tlarger are both TnZ Tn 2 TnZ n Tl 1 Apply master theorem to see result as n log n Worstcase Tsmaller or Tlarger Tnl other is T0 Tn Tnl n Applying inhomogeneous equation approach we get rootl with mult 3 This leads to an nz complexity This can also be observed by unrolling the recurrence Averagecase analysis Assumptions All items are distinct Number of comparisons needed to partition array is nl All permutations are equally likely tn nl lnE1ton til tni for n gt 1 tO 0 2H tni tn1 tnz to 2H 0 n tiI tn n1 2n 2H 0 mo Constructive induction or clever manipulation leads to n log n result Can we modify how quicksort chooses its partition element to guarantee that we get n log n behavior in the worst case Matrix Multiplication How many operations are needed to multiply two 2 by 2 matrices Traditional approach twonumber multiplications and 4 additions Strassen s approach 1969 7tquot quotquotmb 39 391quot 39 and15 quot See problem 722 How does this change things for multiplying n by n matrices Lets assume that n is a power of 2 Traditional approach n twonumber multiplications n3 7 n twonumber additions Complexity is n3 Divideandconquer approach Tn 7 Tn2 18 matrix additions ofmatrices of dimension n2 7 Tn2 18 If4 each matrix addition subtraction involves n24 two number ops Applying the master method we get that the complexity is nlg 7 Can we do better than 7 multiplications for a 2 by 2 matrix No Hopcroft and Kerr have shown that 7 multiplications are necessary Com parison n Direct multiplication 703 343000 Strassen s method 70lg 7 N 150000 Current best known matrix multiplication is On2 376 Hidden constants make this impractical by Coppersmith and Winograd Conjecture Two n by n matrices can be multiplied in time On2 3 0lt8ltl Matrix multiplication can sometimes be used in graph algorithms as a fundamental step so theoretical improvements in the efficiency of this operation can lead theoretical improvements in those algorithms edges in a graph can be represented as a matrix Homework For a k by k multiplication how many two number multiplications can be done while improving on Strassen s algori What effect does the number of additions have on the final complexity of the algorithm Exponentiation Computing x an Let a have m bits Cost to compute Tmn Definition Let Mqs be the time required to multiply a qbit number with an sbit number q lt s Using classical technique Mqs s Using divide and conquer technique Mqs sqlg 3 1 Review section 71 Obvious approach to compute x an n multiplications of a Let m denote number of bits in a a a Mmm a a2 Mm 2ml or Mm2m a a3 Mm 3m2 or Mm 3ml or Mm3m a aquot391 Mm nlm1 l or or Mm nlm Total cost taking worstcase Tmn EH to n1 Mm im Plugging in Mm im im2 gives us Tmgtn EH to nl 1m 2 m EH to nl 1 mznz Plugging in Mm im immlg 3 1 imlg3 gives us Tmgtn EH to nl l39IUlg3 m 21 to nl i mlg 3n2 Divideandconquer approach an aquot2aquot2 if n is even an a aquot391 ifn is odd a1 a How many multiplicationssquarings Will occur in this computation Nn denotes this quantity Nn 0 if nl NnZ l ifn is even Nnl l ifn is odd This function is NOT nondecreasing Reanalyze case when n is odd Nn Nnl l Nnl2l l Nnl2 2 Nfloorn22 When n is even we observe Nn NnZ l Nfloorn2 1 Thus Nfloorn2l lt Nn lt Nfloorn22 for n gt 1 This easily evaluates to lg n which is less than the nl multiplications from before Searching techniques Backtracking Branch and Bound 0 Main issues 7 How to organize the candidate solution space 7 How to traverseunravel the candidate solution space 7 Techniques for pruning the search 7 Analysis of searching algorithm 7 Examples n queens problem and subset sum problem 0 General template Candidate solution space gtk Candidate solutions can generally be represented as an n tuple z1z2 zn In some representations and some problems the candidate solutions all have the same length n for otherse the candidate solutions have different lengths There should be a nite number of choices for each component of the n tuple gtk One can visualize the candidate solution space as a tree or directed acyclic graph dag gtk The size of the candidate solution space depends on the cleverness of your representation of solutions You need to make sure you include all optimal solutions At the same time you want to minimize the number of extraneous candi date solutions Criterion function Pz1z2 zn gtk Candidate solutions must minimize maximize or satisfy some criterion func tion P Traversal algorithm gtk Thinking of the space as a tree or dag we can use traditional graph traversal techniques such as DFS or BFS Backtracking is generally a DFS through this solution space A breadth rst search is more appropriate if there is no upper bound on the size of a vector the solution space is potentially in nite gtk We can also use pruning techniques to limit number of candidate solutions explored Use of such a pruning technique is typically referred to as branch and bound Must evaluate trade off between time spent on evaluating pruning condi tion with time saved by limiting search space 0 Example n Queens problem 7 Input gtk Positive integer n 7 Task gtk Place 71 queens on an n by n chessboard so that no 2 attack none are on same row column or diagonal or report that this is impossible Solve the n queens problems for n 1 23 0 n queens problem with n 4 0 Solution 1 7 Solution space gtk Let 2 represent the position of the 2 queen for 1 S 239 S 4 gtk There are 16 di erent squares number them from 1 to 16 gtk Size of solution space is 164 2 16 Criterion function P gtk P12z3x4 1 if and only if all 4 positions are in unique rows columns and diagonals Brute force traversal gtk Pseudocode algorithm for i1 ilt16 i for j1 jlt16 j for k1 klt16 k for 11 1lt16 1 if Pij kl 1 print ij kl How can we improve upon this solution strategy gtk We can reduce candidate search space in several ways Assume that 1 lt x2 lt 33 lt M removes some redundancies and prunes search space to 146 1820 We know each queen must be in a di erent row so assume that queen 1 is in row 1 queen 2 is in row 2 queen 3 is in row 3 and queen 4 is in row 4 giving 44 256 nodes to search gtk We can prune impossible searches earlier by detecting column andor diagonal con icts earlier lf queens 1 and 2 are both in column 1 they already attack each other and thus we do not need to explore any further in this tree 0 Solution 2 using ordering constraint on 7 Solution space gtk Let 21 represent the position of the 2 queen for 1 S 2 S 4 gtk Add constraint 21 lt 2111 gtk Size of solution space is 146 1820 Criterion function P gtk Pzlzgzgz4 1 if and only if all 4 positions are in unique rows7 columns7 and diagonals Brute force traversal gtk Pseudocode algorithm for i1 ilt16 i for ji1 jlt16 j for kj1 klt16 k for lk1 1lt16 1 if Pijkl 1 print ijkl 0 Solution 3 using one queen per row constraint 7 Solution space gtk Let 21 represent the column of the 2 queen for 1 S 2 S 4 Assumption that queen239 is in row2 gtk Size of solution space is 44 256 Criterion function P gtk Pzlzgzgz4 1 if and only if all 4 positions are in unique rows7 columns7 and diagonals Brute force traversal gtk Pseudocode algorithm for i1 ilt4 i for j1 jlt4 j for k1 klt4 k for 11 1lt4 1 if Pijkl 1 print ijkl 0 Solution 4 using one queen per row constraint and some pruning Solution space gtk Let 21 represent the column of the 2 queen for 1 S 2 S 4 Assumption that queen239 is in row2 gtk Size of solution space is 44 256 Criterion function P gtk Pzlzgzgz4 1 if and only if all 4 positions are in unique rows7 columns7 and diagonals Smarter traversal Detect con icts before we have complete candidate solutions 96 Types of con icts Column con icts two queens con ict if their 2 values are identical Diag45 con icts two queens 2 and j are on the same 45 degree diagonal if M 7 Z j 7 j Diag135 con icts two queens 2 and j are on the same 135 degree diagonal gtk Assuming the rst 2 queens are compatible we check that the 2 15 queen is compatible when adding it Pseudocode algorithm queensint k set col set diag45 set diag135 96 global array x1 k contains Choices so far col xi 1lt i lt k diag45 xi i 1 lt 1 lt k diag45 Xii I 1 lt 1 lt k if k4 then write print x we have a complete solution else explore all k1 extensions to the solution for j1 jlt4 j if j notin col and j notin diag45 and j notin diag135 j works with previous Choices xk1 j queensk1 col union j diag45 union j k diag135 union jk o n queens problem with n 8 i If we use solution 1 we need to explore a 648 size solution space 7 With solution 2 we cut this down to 684 4426165368 With solution 3 we cut this down to 88 16 777216 7 With pruning we cut the tree down to 2057 nodes i If we need only one solution we nd it after exploring 114 nodes 0 n queens problem with n 12 7 With solution 3 we have a solution space of 1212 nodes 7 With pruning we cut the tree down to 856189 nodes i The rst solution is found after exploring 262 nodes 0 Example Subset surn problem 7 Input gtk Positive integer vector x of size 71 1 2 xn gtk Integer M 7 Task gtk Find a subvector not necessarily contiguous such that the sum of the sub vector is M 7 Example z 1113247 gtk M 31 gtk Solutions 11137 and 24 7 gtk Alternate representations 124 and 34 gtk Fixed length alternate representation 1101 and 0 0 1 1 777 0 Solution 1 7 Solution space gtk There are 2 possible solutions gtk A solution has the form d1 dk where d lt dill and k S n gtk Solutions have variable length and correspond to an unbalanced tree see ac cornpanying slide in class gtk Note all nodes in tree are solutions Criterion function gtk Check to see if the current solution subvector sums to M Traveral algorithrn gtk Pseudocode algorithrn subsetsumint k int currentsum global array d1k contains indices of Choices so far currentsum sum of Choices so far if currentsum M printd else explore all extensions to the solution for jdk1 jlt11 j dk1 j subsetsumk1 currentsumx j l l Cryptography 0 Scenario 7 Alice wants to send Bob a message z 6 01 without letting anyone else know what z is 7 Alice and Bob can compute with complete secrecy Alice and Bob cannot communicate without eavesdroppers o Cryptographic protocol 7 Alice privately computes y Ee7 a and sends y to Bob 7 Bob privately computes z Ddy to recover message z Assumptions about E7 Ded gtk D and E are inverse functions with proper keys 6 and d gtk D and E should be easily computable 7 ie in FF gtk Functions D and E are publicly known gtk Key d is not publicly known gtk Key 67s secrecy varies depending on cryptographic protocol 0 Evaluation criteria for cryptographic protocols Implementation issues 7 Security guarantee Private Key Cryptography o Encryption key 6 is only known to Alice and maybe Bob 0 Cryptanalysis codebreaking problem 7 Input string y 7 Knowledge of E and D i No knowledge of e or d 7 Question What is x such that Ee y 0 Example protocol One time pad 7 Implementation gtk edand ldlll gtk Ed dEBz and Ddy yeBd gtk Example with z 1001 and d 0101 7 Security Guarantee gtk Code breaker can only learn length of message without extra information gtk Every string 5 of length is equally likely to be z because there exists a string k such that 569k y gtk Without any more information7 this length is the only information gained by the eavesdropper gtk We can work on ways to even disguise length 7 Implementation issues gtk E and D are extremely simple functions gtk lnitialization required to rst establish secret key d and this key itself must be communicated in secret by some means gtk key d must be length of message and no reuse of key bits allowed or else code breaker can gain information which means frequent retransmission of bits is needed gtk Separate key needed for each pair of communicators multiple clients gtk What about someone who doesn7t know Bob second7 new client Public Key Cryptography o Dif e and Hellman were the pioneers o This gets around many of the implementation problems we just discussed but at some loss of security 0 Bob publicizes universal encryption key 6 o Decryption codebreaking problem 7 Input string y 7 Knowledge of D and E 7 Knowledge of e i No knowledge of d 7 Question What is x such that Ee y 0 Implementation bottlenecks disappear though E and D may be more complex than exclusive or 0 Security 7 Protocol is breakable Eavesdropper can compute z with absolute certainty with no extra information 7 Try all strings z and see if Eez y and stop when a match is found 7 ln fact7 this demonstrates decryption problem is in FNP assuming E and D are in FF 7 Thus we cannot have a public key protocol which is not breakable We can only hope public key protocol requires sign cant computational time and expense to break Furthermore7 this type of security can only exist if P 31 NP o One way Functions f encryption function 7 f is a one to one function7 so f 1 is de ned f E FP mm 31W 7 f 1 Z FP but as we saw must be in FNP 0 Possible One way Functions 7 Integer multiplication of prime numbers for one to one gtk inverse function is factorization 7 Exponentiation modulo a prime gtk prime p gtk primitive root 7 r174 1 rpTil 31 1 for all prime divisors q ofp 7 1 gtk inverse function is the discrete logarithm problem gtk rm mod p 0 RSA and their possible one way function 7 encryption key pq7 e gtk p and q are prime numbers gtk lpql 71 should be in the hundreds gtk let pq pg 7 p 7 q 1 Euler function on pg gtk 6 should be relatively prime to pq 7 decryption key pq7d gtk d is also relatively prime to pq gtk ed 1 k pq for some integer k 7 Encryption function Eepq7 x 5 mod pg 7 Decryption function Ddpqy yd mod pq xed mod pq x1k qgt mod pq z mod pq because z m ql 1 mod pg 7 Hardness of lnverting RSA gtk If you can factor in polynomial time7 you can invert RSA in polynomial time factor pq into p and q compute Mpg pg 7 p 7 q 1 compute d from pq and 6 via Euclid7s algorithm nally compute z yd mod pq gtk It can be shown that if you can invert variants of RSA in polynomial time7 you can factor in polynomial time gtk Thus these variants are as hard to break as factoring Cryptography and Complexity Theory 0 Can we link the existence of one way functions to P NP 7 N07 but l7m not going to get into this 0 Would this be meaningful No i We have based complexity theory and NP completeness on worst case performance of algorithms 7 ln cryptography7 this is not acceptable We want all cases or almost all cases to require exponential time to decrypt 0 Strong one way functions 7 First 3 same as one way functions 7 there is no integer k and no algorithm which7 for large enough 717 in time 0nk successfully computes f 1y for at least 37 strings y of length n i That is7 no polynomial time algorithm successfully inverts f on a polynomial frac tion of the inputs of length n o Trapdoor Functions 7 strong one way functions 7 we can generate keys relatively quickly 7 we can decode ef ciently given an extra decryption key 0 RSA seems to be a trapdoor function Randomization 0 There are still some security problems 7 Maybe some important message are easy to decrypt for certain decrypting algo rithms quicksort worst case analogy gtk ATTACK AT DAWN gtk SELL IBM Eavesdropper could notice message repetitions 1 bit messages b gtllt 05 0 gtllt 15 1 0 Solution Randomized RSA Implementation gtk Choose a random integer k S gtk Send y 2k b5 mod pq gtk Upon decryption7 Bob only looks at last bit to recover message 7 Question How secure is this one bit b 7 Alternatively Can decrypter get last bit without getting k as well i It can be proven that any method which guesses b from 2x b5 mod pqpq7 and 6 with probability signi cantly greater than can be used to break RSA 0 We can apply this method to deal with other problems too 7 Implementation gtk Break z into b17b27 bn gtk Choose k1 k2 kn randomly gtk Send encrypted 2k1 b1 2kg b2 2kn bn avoids repetition problem i no one message is always easy every message unlikely to be easy no matter what decrypting algorithm is used Zero Knowledge Proofs 0 Setting 7 Alice and Bob are taking a take home nal 7 One of the questions asks them to 3 color a large graph 7 Alice nds a 3 coloring quickly and gloats to Bob she is done 7 Bob doesn7t believe her since he hasn7t been able to nd this 3 coloring Alice wants to convince Bob she really has found the 3 coloring7 but she doesn7t want to help Bob nd it 7 Such a proof is called a zero knowledge proof protocol 0 Example 7 Alice has a 3 coloring C which is a mapping from V to 000110 gtk We use two bits bl and b to encode a node7s color 7 Protocol Round 1 Alice7s Actions gtk Alice chooses a random permutation of 3 colors 000110 gtk Alice chooses lVl RSA key combinations plqi7di 6i gtk For each node 239 Alice computes yi 2 bi mOd pin39 y 2962 52 mod pigl Put where zhxi S 22 Z gtk Alice sends to Bob ehpiqhyhyg for allz E V Lecture 10 am ic Programming Shortest Paths Chained Matrix Multiplication All Pairs Shortest Path Problem Input Edge Weighted Graph GVE weight function we on each edge Task Compute the length of the shortest path from every node i to every node j in G Perhaps also some representation for the set of optimal paths as well 1 For every node i and j we want to compute the shortest path from i to j using ANY nodes as intermediate nodes We can create a subproblem by restricting the set of intermediate nodes we work with Arbitrarily order the nodes from 1 to n Define Dijk to be the problem of finding the shortest path from node i to node j with only nodes in lk as intermediate nodes Our original problem corresponds to computing Dijn for all pairs of nodes i and j 2 Recurrence relation for Dijk A shortest path from i to j using nodes lk either uses k or it does not a If it uses node k then we need a shortest path from i to k using nodes 1 through kl and a shortest path from k to j using nodes 1 through kl b If it does not use node k then use shortest path from i to j using nodes 1 through kl Dijk min Dijk l Dikk1 Dkjkl Dij0 length of edge from i to j no intermediate nodes allowed 3 Table visualization 3 dimensional because we have two parameters We can eliminate the kdimension by reusing space We can even use just one 2dimensional array because the intermediate results in the kth row and column will not change during the kth update 4 Order of computation Base conditions gives an initial value in each entry of 2D array Increment k by l Can now compute in any arbitrary manner for i and j dimensions 5 Computing actual paths an auxilary array Pij Initially set to 0 At end Pij is set to the last intermediate node k used to shorten path length Can compute shortest path by chasing using this intermediate node k and then recursing to find ik and kj paths used Running time analysis Time required for each table entry in 3D array is constant Thus time is n3 Space requirement If we maintain 3D table n3 If we maintain 2D tables nz Comparison to using Dijkstra s algorithm as a subroutine Dijkstra s algorithm had a time complexity of min n2 mlog n m is the number of edges When is this algorithm better and when is Dijkstra s algorithm better Chained Matrix Multiplication Problem Input List of n matrices to be multiplied together using traditional matrix multiplication algorithm Task Compute the optimal ordering of multiplications to minimize total number of scalar mults Example input MM1 M2 M3 M4 M1 is 13x5 M2 is5x89 M3 is 89x3 M4 is3x34 Mij Mjk requires ij k multiplications Some possible orderings M1 M2 M3 M4 requiring 10582 scalar multiplications M1 M2 M3 M4 requiring 54201 scalar multiplications Ml M2 M3 M4 requiring 2856 scalar multiplications M1 M2 M3 M4 requiring 4055 scalar multiplications M1 M2 M3 M4 requiring 26418 scalar multiplications Difference between fastest and slowest above is roughly a factor of 19 Define dimensions to be a vector d0 d1 dn where matrix M has dimensions di1 x di 1 We want to compute the cheapest order for multiplying matrices 1 throu h n We can create a subproblem by restricting the list of matrices to be multiplied Define Cij to be the problem of finding the cheapest order for multiplying matrices i throug j Our original problem corresponds to computing C1n 2 Recurrence relation for Cij Suppose the last pair of matrices to be multiplied are Mik and Mk1 through j in computing M1j Then Cij Cik Ck1j cost of this last multiplication Cik Ck1j di1 dk dj CiJ minkw mu Ciitk 30 t dil 0100 t dj Cii o 3 Table visualization 2 dimensional because we have two parameters upper triangular 4 Order of computation Base conditions gives us the diagonal Move using antidiagonals 5 Computing actual ordering Use an auxilary array Pij Initially set to 0 At end Pij is set to the intermediate multiplication k used to multiply i and j Example Reductions 1 HAMlLTONlAN PATH Sp SATlSFlABlLlTY o HAMILTlONlAN PATH lnput gtk Graph G V7 E YesNo Question gtk Does there exist a Hamiltonian Path a path that visits each node exactly once in G o SATISFIABILITY Input gtk Set of variables X gtk Set of clauses G 07 where each clause is a conjunction of literals YesNo Question gtk ls there a truth assignment to the variables that satis es all the clauses 0 Input to the reduction input to HAM PATH Graph G V7 E 0 Output of the reduction input to SAT Boolean expression 25 in CNF with variables X and clauses G 0 Description of b in terms of G i If 1V1 12n7 then X mvl1g272 S n gtk Meaning of xiv node 2 is the 2 h node of the Hamiltonian path 7 The set of clauses G will enforce this meaning gtk For nodes 1 S 2 S n7 node 2 must appear in the path 21v V 22v V V 2 n clauses of length n gtk For nodes 1 S 2 S n7 node 2 must appear only once in the path zm V xjv for 1 S 2 ltj S n 0723 clauses of length 2 gtk For 1 S 2 S 727 some node 2 must be the 2 h node in V M2 V 39 39 39 V n clauses of length n gtk For 1 S 2 S 727 two nodes 2 and w cannot both be the 2 h zm V xm for 1 S 2 lt w 3 n 0723 clauses of length 2 gtk For each non edge 1w Z E w cannot follow 1 in the Hamiltonian Path zm V zi w for 1 S 239 S n 71 0n3 clauses of length 2 0 To show R is a polynomial time reduction we must prove that a G E HAMlLTONlAN PATH if and only if RG b E SATISFIABlLlTY b R can be computed in polynomial time i I will only require you to show that Rz is of polynomial size Proving polynomial time is often fuzzy On a quali er you may be asked to prove polynomial time computability 0 Thus we know that HAMlLTONlAN PATH is no harder than SATlSElABlLlTY with respect to being in P 2 HAMlLTONlAN PATH Sp HAMlLTONlAN CYCLE o HAMlLTlONlAN PATH lnput gtk Graph G V E YesNo Question gtk Does there exist a Hamiltonian path a path that visits each node exactly once in G o HAMlLTlONlAN CYCLE lnput gtk Graph G V E YesNo Question gtk Does there exist a Hamiltonian Cycle a cycle that visits each node exactly once in G 0 Input to reduction input to HAM PATH Graph G V E 0 Output of reduction input to HAM CYCLE Graph G V E Describe a G7 that satis es the desired property that G is a yes input instance to HAM CYCLE if and only if G is a yes input instance to HAM PATH o This proves that HAM PATH is no harder than HAM CYCLE with respect to being in P 3 Independent Set 17 Clique 0 Independent decision Set 7 Input gtk Graph G V E integer K YesNo Question gtk Does there exist an independent set in G of size K o Clique Decision problem 7 Input gtk Graph G V E integer K YesNo Question gtk Does there exist a clique in G of size K 0 Input to reduction input to Independent Set 7 Graph G V E integer K 0 Output of reduction input to clique Graph G V E integer K gtk V V gtk E is the complement of E that is um E E if and only if um Z E o Argument of correctness Yes input instance maps to yes gtk If GK is a yes input instance to independent set then there is an inde pendent set of size K in G gtk When we look at the edge complement graph G the same nodes that were an independent set in G now form a clique in G gtk Since we kept K the same G has a clique of size K and G K is a yes input instance to clique No input instance maps to no yes for clique must have come from yes to independent set gtk If G K is a yes input instance to clique then there is a clique of size K in G gtk When we look at the original graph G the same nodes that were a clique in G now form an independent set in G gtk Since we kept K the same G has an independent set of size K and GK is a yes input instance to independent set 0 This proves that Independent Set is no harder than Clique with respect to being in P o The exact same reduction can be used to prove the reverse result and we see that Independent Set and Clique are equivalent77 with respect to being in P 4 Independent Set 17 Vertex Cover 0 Independent decision Set 7 Input gtk Graph G V E integer K 7 YesNo Question gtk Does there exist an independent set in G of size K o Vertex Cover Decision problem 7 Input gtk Graph G V E integer K 7 YesNo Question gtk Does there exist a vertex cover in G of size K where a vertex cover is a set of nodes G such that for every edge um E E at least one of u and 1 is in G 0 Input to reduction input to Independent Set 7 Graph G V E integer K 0 Output of reduction input to vertex cover 7 Graph G V E integer n 7 K where n is the number of nodes in V o Argument of Correctness 7 Yes input maps to yes input gtk If G K is a yes input instance to independent set then G has an indepen dent set G of size K This means no edge in G has both endpoints in G Consider the vertex set V 7 C It has size n 7 K because it has all other vertices in G It must be a vertex cover otherwise some edge would have both endpoints in V 7 V7 G G and this is not possible since G is an independent set gtk Thus Gn 7 K is a yes input instance to vertex cover 96969696 7 No input maps to no input yes input comes from yes input gtk If Gn 7 K is a yes input instance to vertex cover then G has a vertex cover G of size n 7 K This means every edge in E has at least one endpoint in G Consider the vertex set V 7 C It has size n 7 n 7 K K because it has all other vertices in G It must be an independent set otherwise some edge would no endpoints in V 7 V 7 G G and this is not possible since G is a vertex cover gtk Thus GK is a yes input instance to independent set 96969696 0 This proves that Independent Set is no harder than Clique with respect to being in P o The exact same reduction can be used to prove the reverse result and we see that Independent Set and Vertex Cover are eoluivalent77 with respect to being in P 5 Circuit SAT 17 SAT 0 Circuit SAT 7 Input Boolean circuit with variable input gates 7 YN Question ls there an assignment of truth values to input gates that leads to the circuit evaluating to true 0 SAT 7 Input Boolean expression consisting of a set of clauses C over a Boolean variable set X 7 YN Question ls there an assignment of truth values to the variables that leads to all the clauses evaluating to true 0 Input to transformation input to Circut SAT 7 A Boolean circuit C 0 Output of transformation input to SAT 7 A Boolean expression b consisting of a set of clauses C and a set of variables X as follows 7 For each variable gate s in the circuit create a variable x in the SAT instance 7 For each gate 9 in the circuit including variable gates create a variable 9 in the SAT instance gtk lntuition the truth assignment of g in the SAT instance is the output value of the corresponding gate in the ClRCUlT SAT instance 7 Use local replacernent technique to create clauses as follows gtk Case gate 9 is a variable gate xj g V zj g V zj Case gate 9 is a true gate Case gate 9 is a false gate gi Case gate 9 is a not gate input gate hi g V hj gi hj Case gate 9 is an OR gate input gates hi and hk hj V 9 hk V 90771739 V hk V Tgi gtk Case gate 9 is an AND gate input gates hi and hk g V hj g V hk hj V hk V 9 gtk Output gate is 9 6 SAT 3 BSAT o KSAT SAT with the additional restriction that each clause has at most K literals 0 Transformation uses local replacernent technique Identify any clause with n gt 3 literals Example 1 V l2 V l3 V l4 V l5 V V Zn 0 Create n 7 2 new clauses using 71 7 3 new variables 27 as follows i 1 V 2 V zl 21 V 3 V zz 22 V 4 V 23 znig V n71 V 0 Key idea New 71 7 3 variables can satisfy exactly 71 7 3 of the n 7 2 clauses so at least one of the new clauses must be satis ed by an old literal 7 BSAT 1 3 OCC BSAT o 3 OCC BSAT Each variable appears at most 3 times each literal at most twice 0 Suppose variable s appears k gt 1 times in the BSAT instance 0 Replace each occurrence of x by a new variable zig for 1 S j S k 0 Add k new clauses xm V x132 zm V 133 l V 130 nk V x 8 BSAT Sp MAX 2SAT o MAX QSAT lnput Boolean expression b consisting of a set of clauses C where each clause contains at most 2 literals and an integer K YN Question Can K clauses in C be satis ed by some truth assignment to the variables 0 Transformation uses local replacement technique 7 Take a clause 1 V l2 V l3 7 Create a new variable w unique to this clause 7 Create 10 clauses as follows symmetric wrt l1 l2l3 lt11 lt12 ls w gtk ll V lg lg V lg ll V lg gtk l1 V ow l2 V ow l3 V w Set k 70 where C is the number of clauses in the original BSAT instance 0 Argument of correctness If at least one of l1 l2 and Z3 are true we can satisfy exactly 7 of the clauses by some TA for w and we can never satisfy more than 7 i If all of l1 l2 and Z3 are false we can satisfy at most 6 of the clauses by setting w true Lecture 2 Example Algorithm Analysis Finding the maximum element in an array Lessons to learn Averagecase versus worstcase analysis Average case analysis requires assumptions about inputs Recurrence relations and their importance on the analysis of algorithms Why focusing on order of growth rather than exact analysis is preferable Maximum Element Problem Specification Input n integers in Xl Xn Output Integers m andj such that m XU max Xl Xn Withj as large as possible Proposed algorithm 1 initialize m Xnj n k nl 2 All tested If k 0 the algorithm terminates 3 Compare lek lt m goto step 5 4 Update m m Xkj k 5 Decrease k k Goto step 2 Exact Analysis of Algorithm tep Number Number of times executed l l 2 n 3 nl 4 A 7 5 nl What can we say about A bestcase A 0 Xn is the max element worstcase A nl array X is sorted in decreasing order Averagecase Assumption 1 Lets assume all items are distinct Assumption 2 Lets assume that all permutations are equally likely Brute force analysis for specific value of n 3 6 permutations equally likely X1ltX2ltX3 A0 X1ltX3ltX2 A1 X2ltX1ltX3 A0 X2ltX3ltX1 A1 X3ltX1ltX2 A1 X3ltX2ltX1 A2 Average cost 0l0ll26 56 Lecture 1 Problem Definition A question to be answered usually accompanied by a set of parameters free variables A mappingrelation between a set of input instances domain and an output set range Problem Specification Give a general description of its parameters specify what a typical input instance looks like 2 The properties demanded by the solution what you should do with the input instance what the output should be with respect to the input Problem Types Search Find an X in the input satisfying property y Structuring Transform the input to satisfy property y Construction Build an X to satisfy property y Optimization Find the bestX satisfying property y Decision Decide whether the input instance satisfies property y Adaptive Maintain property y over time Classifying Problems by Difficulty onceptually hard we have difficulty understanding the problem Analytically hard we know how to solve the problem but have difficulty estimating how long it would take to solve the problem Computationally hard we know how to solve the problem have an algorithm However for any algorithm which solves the problem some input instances require a large amount of resources to solve Computationally unsolvable problem No algorithm exists to solve such a problem Example Specification T SP39 Input A finite set C c1 c2 cn of cities or each pair of cities cg cj the distance dij between them Output A permutation ltc1 c2 cnxgt that minimizes the expression Eltj1ton1 djj1 dn1 over all permutations of the cities Example instance C has size 4 dl2 10 dl3 5 dl4 9 d23 6 d24 9 d34 3 Solution 1 3 4 2 with sum 27 Algorithm Intuitive definition 1 The idea behind any computer programprocedure 2 The thing that stays the same whether you do it in C or Java or Pascal etc 3 The method that solves a problem Definition A finite set of rules which gives a sequence of operations for solving a specific type of problem The execution of any algorithm must not involve any subjective definitions nor must it call for the use of intuition or creativity Bad examples Add salt to taste or cook until tender See 25 for more Lecture 5 Divideand Conquer Algorithm s Integer Multiplication and Selection Problems Description of basic technique 1 Decompose input instance into a number of smaller subinstances 2 Solve each of the subinstances 3 Combine the subsolutions to obtain a solution to the original input instance Key issue How do we solve subproblems direct solution or recursive call Tradeoff depending on problem size Multiplication of two ndigit integers Assume n is a perfect power of 2 Operations we will measure How many 1 digit additionsshifts are required Direct Solution n2 1 digit additionsshifts required lets say cn2 work for some constant c Onestep DivideandConquer Two integers can be represented as wx and yz Their multiplication can be broken down into the addition of w times y shift n places 2 w times 2 plus x times y shift n2 places 3 x times 2 Also observe that wxyz wy wz xy yz Thus we can compute all of the above three parts by doing essentially 3 n2 digit multiplications We do need some extra additionsshifts n total The last multiplication may be of n2 1 digit numbers rather n2 digit numbers Analysis Tn 3 cn22 dn c n2 dn The n2 term dominates for large n and we have reduced the time by A Full DivideandConquer To solve subproblem recursively call the same routine Analysis Tn 3 Tn2 dn 1 A where A is some constant Complexity by Master Theorem analysis Working out using substitution and recurrence relation techniques When to do direct versus recursive solution Theoretically doing the recursive solution is fine However there is typically a tradeoff point at which the direct solution is preferable We then use the direct solution once the problem size dips below that level Calculating this tradeoff point precisely requires having a good sense of all the constants in play including lower order terms Example n 1024 Direct solution time complexity n2 Direct solution cost 1024 Recursive solution time complexit Tn 3 Tn2 16 n gt 33nlg3 7 32 n Recursive solution cost 33 1024 g3 7 32 1024 Direct solution is roughly half the cost of the recursive solution Lecture 9 Dynamic Programming Knapsack Problem Basic Idea Two main ideas behind dynamic programming 1 The principle of optimality This principle applies to some problems but not for others For problems that it does apply to dynamic programming is a viable design technique a Given an optimal sequence of decisions any subsequence represents must also be optimal b Suppose solution S is optimal for problem P Suppose we decompose problem P into subproblems P1 through Pk and that S can be decomposed into pieces S1 through Sk corresponding to the subproblems Then solution S is optimal for subproblem Pi Example Shortest paths A shortest path from s to t visits node u can decompose the path into su and ut The su path must be a shortest path from s to u Likewise for the path from ut Conclusion Dynamic programming can be used for shortest paths Longest paths A longest path from s to tvisits node u We can decompose the path into su and ut However the su path is not necessarily the longest path from s to u Likewise for the path from ut Why Conclusion Dynamic programming cannot be used for longest paths 2 Recursion in a bottom up fashion Computing Fibonacci numbers Fn Fnl Fn2 Topdown recursion is very inefficient Many Fi values are computed multiple times Bottomup computation is much more efficient Each Fi value is computed just once Computing binomial coefficient Cnk Cnlk1 Cnlk Topdown computation is very inefficient Many Cij values are computed multiple times Bottomup computation is much more efficient Each Cij value is computed once Key implementation issues 1 Identify subsolutions that may be useful in computing whole solution often need to introduce parameters 2 Develop a recurrence relation 3 Setting up the table of values to be computed the dimensionality of the table is determined by the number of parameters Steps 2 and 3 are often the same step 4 Determining the order of computation of values 5 Backtracking through the table to obtain complete solution not just solution value
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'