### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Dsgn&Analysis CS 3510

GPA 3.81

### View Full Document

## 19

## 0

## Popular in Course

## Popular in ComputerScienence

This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 3510 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 19 views. For similar materials see /class/234155/cs-3510-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.

## Reviews for Dsgn&Analysis

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15

M Design and analysis of algorithms Lecture 2 Edyta Szyma ska edyta cc gatech edu pl 383510 A Pa 2005 r p 12 Model of computation M 6 Turing machine convenient in the Theory of Computation CS3510 A Fall 2005 p22 Model of computation M 6 Turing machine convenient in the Theory of Computation 6 Randomaccess machine RAM model CS3510 A Fall 2005 p22 Model of computation M 6 Turing machine convenient in the Theory of Computation 6 Randomaccess machine RAM model A sequen al A arithmetic operations in unit time A data movement load store copy in constant time A limited word size integer n is represented by clogn bits for some 0 2 1 CS3510 A Fall 2005 p22 Sorting Algorithms M Input lt 61161261 gt a sequence of n numbers 383510 A Pa 2005 r p 32 Sorting Algorithms M Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag 383510 A FaH 2005 r p 32 Sorting Algorithms M Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag Warm up INSERTION SORT 383510 A FaH 2005 r p 32 Sorting Algorithms 1 LA L 7quot 1 39 Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag Warm up INSERTION SORT Example 383510 A FaH 2005 r p 32 Sorting Algorithms L 1 Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag Warm up INSERTION SORT Example 5 2 4 613 383510 A FaH 2005 r p 32 Sorting Algorithms 1 LA L 7quot 1 39 Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag Warm up INSERTION SORT Example 524613 254613 383510 A FaH 2005 r p 32 Sorting Algorithms M Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag Warm up INSERTION SORT Example 524613 254613 245613 383510 A FaH 2005 r p 32 Sorting Algorithms M Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag Warm up INSERTION SORT Example 524613 254613 245613 245613 383510 A FaH 2005 r p 32 Sorting Algorithms M Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag Warm up INSERTION SORT Example 524613 254613 245613 245613 124563 383510 A F5112005 r p 32 Sorting Algorithms M Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag Warm up INSERTION SORT Example 524613 254613 245613 245613 124563 CSS510AF8HZOO5rp32 Sorting Algorithms M Input lt 61161261 gt a sequence of n numbers Output a permutation lt ay 1a 2a2 gt of the input such thata 1 gag g gag VVarnlupilNSERHONSORT so ed Exampleldea key 5 24613 2 54613 245 613 245 613 1245 6 3 1 2 3 4 5 6 C83510A Fa1120057p32 Insertion sort pseudocode w t i l A1n 383510 A Pa 2005 r p 42 Insertion sort pseudocode M A1n NSERTONSORTA forj2t0ndo 6 keytAj 6 ij 1 a while 239 gt OampA2 gt key do A Az 1 2 Am A 239 239 1 G Ai 1 2 key CS3510 A Fall 2005 p42 Insertion sort pseudocode M A1n NSERTONSORTA forj2t0ndo 6 keytAj 6 ij 1 a while 239 gt OampA2 gt key do A Az 1 2 Am A 239 239 1 G Ai 1 2 key CS3510 A Fall 2005 p42 Insertion sort pseudocode M A1n NSERTONSORTA COST forj2t0ndo 1 6 key 2 AU C2 6 i 1 C3 while 239 gt OampA2 gt key do 4 A I C5 n i I i 1 6 6 Ai 1 2 key 7 CS3510 A Fall 2005 p42 Insertion sort pseudocode A1n NSERTONSORTA COST TIMES forj2t0ndo C1 72 3 key 2 AU C2 71 1 3 I 1 Cg TL 1 5 while 239 gt OampAi gt key do C4 2 tj j2 A l I C5 j2 AiZZi l 66 j2 it 6 Az 1 2 key C7 71 1 C8351O A Fall 2005 p42 Analysis of Insertionsort M Tn the running time of the algorithm on input of size n 383510 A Pa 2005 r p 52 Analysis of Insertionsort A L r V Tn the running time of the algorithm on input of size n cl the cost of z39th line a constant 383510 A FaH 2005 r p 52 Analysis of Insertionsort n AA L r V Tn the running time of the algorithm on input of size n cl the cost of z39th line a constant tj the number of times the while loop test is executed for j 2 n 383510 A FaH 2005 r p 52 Analysis of Insertionsort n AA AL 39 7 L 1 1 Tn the running time of the algorithm on input of size n cl the cost of z39th line a constant tj the number of times the while loop test is executed for j 2 n Best case Tm is a linear function of n Tm n 383510 A FaH 2005 r p 52 Analysis of Insertionsort n AA L r V Tn the running time of the algorithm on input of size n cl the cost of z39th line a constant tj the number of times the while loop test is executed for j 2 n Best case Tm is a linear function of n Tm n Worst case Tn is a quadratic function of n TM 2 97 383510 A FaH 2005 r p 52 Analysis of Insertionsort 4 L L A r V 1 39 Tn the running time of the algorithm on input of size n cl the cost of z39th line a constant tj the number of times the while loop test is executed for 39 2 n JBest case Tm is a linear function of n Tm n Worst case Tn is a quadratic function of n TM 2 97 Averagecase tj 2 half of the worstcase but still quadratic asymptotically the same as the worst case 383510 A Fall 2005 r p 52 Analysis of Insertionsort 4 L L A r V 1 39 Tn the running time of the algorithm on input of size n cl the cost of z39th line a constant tj the number of times the while loop test is executed for 39 2 n JBest case Tm is a linear function of n Tm n Worst case Tn is a quadratic function of n TM 2 97 Averagecase tj 2 half of the worstcase but still quadratic asymptotically the same as the worst case Summary incremental sorting algorithm with quadratic run ning time 383510 A Fall 2005 r p 52 Divide and conquer paradigm M 6 Divide the problem into several subproblems similar to the original problem CS3510 A Fall 2005 p62 Divide and conquer paradigm M 6 Divide the problem into several subproblems similar to the original problem 6 Conquer solve the subproblem recursively if small enough then solve it using a simplest technique CS3510 A Fall 2005 p62 Divide and conquer paradigm M Divide the problem into several subproblems similar to the original problem Conquer solve the subproblem recursively if small enough then solve it using a simplest technique Combine solutions CS3510 A Fall 2005 p62 7 Divide and conquer for sorting Merge sort M 6 Divide the sequence into two subsequence of 712 elements CS3510 A Fall 2005 p72 Divide and conquer for sorting Merge sort M 6 Divide the sequence into two subsequence of 712 elements 6 Conquer sort the subsequences recursively If length1 then DONE CS3510 A Fall 2005 p72 Divide and conquer for sorting Merge sort M 6 Divide the sequence into two subsequence of 712 elements 6 Conquer sort the subsequences recursively If ength1 then DONE 6 Combine merge the sorted sequences to get the sorted answer CS3510 A Fall 2005 p72 Divide and conquer for sorting Merge sort M 6 Divide the sequence into two subsequence of 712 elements 6 Conquer sort the subsequences recursively If length1 then DONE 6 Combine merge the sorted sequences to get the sorted answer Example 52461326 CS3510 A Fall 2005 p72 Mergesort example 4 LA L A 7quot 1 39 52461326 5 2 4 6H13 2 6 Divide Conquer 383510 A Pa 2005 r p 82 Mergesort example 4 L L A r V 1 39 52461326 5 2 4 6H13 2 6 Divide Conquer gtM 2 4 5 612 3 6 erge 1 2 2 3 45 6 6 Trvalan397 1A72E9R 383510 A Pa 2005 r p 82 Merge sort pseudocode W A h A Al 383510 A Pa 2005 r p 92 Merge sort pseudocode M MERGESORTAp 7 pr lt r then 6 q I V J 6 MERGESORTA p q MERGESORTA q 1 7 G MERGEAp q 7 CS3510 A Fall 2005 p92 Merge sort pseudocode M MERGESORTAp 7 pr lt r then 6 q I i1i G MERGESORTA p q MERGESORTA q 1 7 6 MERGEAp q 7 initial call MERGESORTA 171 CS3510 A Fall 2005 p92 Merging M MERGEAp q r 383510 A Pa 2005 r p102 Merging M MERGEApqT A sorted LH sorted A Merging M MERGEApqT A sorted LH sorted r lt AH sorted See pseudocode in the textbook Time for Merge n 383510 A Pa 2005 r p102 3quot Analysis of divide and conquer algorithms M 6 Tn the running time on a problem of size n 083510 A Fall 2005 p112 3quot Analysis of divide and conquer algorithms M 6 Tn the running time on a problem of size n 6 ifn g c then Tn 81 083510 A Fall 2005 p112 Analysis of divide and conquer algorithms M e Tn the running time on a problem of size n 6 ifn g c then Tn 91 8 asymptotic notation drop constants and lower order terms 91 denotes a constant 083510 A Fall 2005 p112 Analysis of divide and conquer algorithms M Tn the running time on a problem of size n ifn g c then Tn 91 8 asymptotic notation drop constants and lower order terms 91 denotes a constant Dn divide time 083510 A Fall 2005 p112 Analysis of divide and conquer algorithms M Tn the running time on a problem of size n ifn g c then Tn 91 8 asymptotic notation drop constants and lower order terms 91 denotes a constant Dn divide time Cn combine time 083510 A Fall 2005 p112 Analysis of divide and conquer algorithms M Tn the running time on a problem of size n ifn g c then Tn 91 8 asymptotic notation drop constants and lower order terms 91 denotes a constant Dn divide time Cn combine time general divide and conquer recurrence 083510 A Fall 2005 p112 Analysis of divide and conquer algorithms M Tn the running time on a problem of size n ifn g c then Tn 91 8 asymptotic notation drop constants and lower order terms 91 denotes a constant Dn divide time Cn combine time general divide and conquer recurrence 91 n g 3 TO aTnb D01 C71 otherwise 083510 A Fall 2005 p112 Analysis of divide and conquer algorithms M Tn the running time on a problem of size n ifn g c then Tn 91 8 asymptotic notation drop constants and lower order terms 91 denotes a constant Dn divide time Cn combine time general divide and conquer recurrence 91 n g 3 TO aTnb D01 C71 otherwise 083510 A Fall 2005 p112 Analysis of Mergesort 083510 A Fall 2005 p122 Analysis of Mergesort Analysis of Mergesort Analysis of Mergesort M e Dn 81 6 C71 n 91 n 1 TM Hm2 n n gt 1 Solving the mergesort recurrence 083510 A Fall 2005 p122 Analysis of Mergesort 91 n 1 TM Hm2 n n gt 1 Solving the mergesort recurrence 6 unroll the recurrence draw a recursion tree to generate a guess 083510 A Fall 2005 p122 Analysis of Mergesort 91 n 1 TM Hm2 n n gt 1 Solving the mergesort recurrence 6 unro the recurrence draw a recursion tree to generate a guess 6 prove it using induction 083510 A Fall 2005 p122 Analysis of Mergesort 91 n 1 TM Hm2 n n gt 1 Solving the mergesort recurrence 6 unroll the recurrence draw a recursion tree to generate a guess 6 prove it using induction i o in general we may need to use the Master Thceorem S3510 A Fall 2005 p122 Analysis of Mergesort 91 n 1 TM Hm2 n n gt 1 Solving the mergesort recurrence 6 unroll the recurrence draw a recursion tree to generate a guess 6 prove it using induction i o in general we may need to use the Master Thceorem S3510 A Fall 2005 p122 log n Analysis of Mergesort CR cnZ cnZ gt on on BM cn4 Cnll quotquotgt n log n Analysis of Mergesort CR cnZ cnZ gt on on BM cn4 Cnll quotquotgt n Asymptotic notation 4 LA L A 7quot 1 39 Asymptotic order of growth rate of growth is used for comparing the relative performance of alternative algorithms 383510 A FaH 2005 r p 142 Asymptotic notation M Asymptotic order of growth rate of growth is used for comparing the relative performance of alternative algorithms step depends on 6 the way the program is compiled 6 the architecture being used for computing and thus it may grow or shrink by a constant factor 083510 A Fall 2005 p142 Asymptotic notation M Asymptotic order of growth rate of growth is used for comparing the relative performance of alternative algorithms step depends on 6 the way the program is compiled 6 the architecture being used for computing and thus it may grow or shrink by a constant factor Example Running time 3 162712 3571 8 steps 083510 A Fall 2005 p142 Asymptotic notation M Asymptotic order of growth rate of growth is used for comparing the relative performance of alternative algorithms step depends on 6 the way the program is compiled 6 the architecture being used for computing and thus it may grow or shrink by a constant factor Example Running time 3 162712 3571 8 steps if one step z 25 low level machine instructions then we have 3 405712 87571 200 steps 083510 A Fall 2005 p142 Asymptotic notation M Asymptotic order of growth rate of growth is used for comparing the relative performance of alternative algorithms step depends on 6 the way the program is compiled 6 the architecture being used for computing and thus it may grow or shrink by a constant factor Example Running time 3 162712 3571 8 steps if one step z 25 low level machine instructions then we have 3 405712 87571 200 steps we can omit constant factors and lower order terms and use 083510 A Fall 2005 p142 Asymptotic notation M Asymptotic order of growth rate of growth is used for comparing the relative performance of alternative algorithms step depends on 6 the way the program is compiled 6 the architecture being used for computing and thus it may grow or shrink by a constant factor Example Running time 3 162712 3571 8 steps if one step z 25 low level machine instructions then we have 3 405712 87571 200 steps we can omit constant factors and lower order terms and use asymptotic notation 083510 A Fall 2005 p142 Asymptotic notation M Asymptotic order of growth rate of growth is used for comparing the relative performance of alternative algorithms we can omit constant factors and lower order terms and use asymptotic notation 6 Upper bounds O m g 6 Lower bounds 2 z 2 Tight bounds 8 z 5 0lt wzgt 083510 A Fall 2005 p142 7 0 notation asymptotic upper bound Let Tn be a nonnegative function say the worstcase running time of an algorithm on an input of size n 083510 A Fall 2005 p152 7 0 notation asymptotic upper bound Let Tn be a nonnegative function say the worstcase running time of an algorithm on an input of size 71 Given another nonnegative function fn we say that Tn iS or write Tn if EC gt O and no 2 0 such that V71 2 no we have Tn S 083510 A Fall 2005 p152 0 notation asymptotic upper bound Let Tn be a nonnegative function say the worstcase running time of an algorithm on an input of size 71 Given another nonnegative function fn we say that Tn iS or write Tn if EC gt O and no 2 0 such that V71 2 no we have Tn S Example 083510 A Fall 2005 p152 0 notation asymptotic upper bound Let Tn be a nonnegative function say the worstcase running time of an algorithm on an input of size 71 Given another nonnegative function fn we say that Tn iS or write Tn if EC gt O and no 2 0 such that V71 2 no we have Tn S Example 3 Tn 3712 2nl5 i TM 0712 083510 A Fall 2005 p152 0 notation asymptotic upper bound Let Tn be a nonnegative function say the worstcase running time of an algorithm on an input of size 71 Given another nonnegative function fn we say that Tn iS or write Tn if EC gt O and no 2 0 such that V71 2 no we have Tn S Example 3 Tn 3712 271 5 i TM 007 for n 2 2 2n 5 3722 5 g 3722 and thus 3712 271 5 S 9712 0012 With C 9 083510 A Fall 2005 p152 0 notation asymptotic upper bound Let Tn be a nonnegative function say the worstcase running time of an algorithm on an input of size 71 Given another nonnegative function fn we say that Tn iS or write Tn if EC gt O and no 2 0 such that V71 2 no we have Tn S Example 3 Tn 3712 271 5 i TM 007 for n 2 2 2n 5 3722 5 g 3722 and thus 3712 271 5 S 9712 0012 With C 9 65 T01 1712 3n 07 2 083510 A Fall 2005 p152 0 notation asymptotic upper bound Let Tn be a nonnegative function say the worstcase running time of an algorithm on an input of size 71 Given another nonnegative function fn we say that Tn iS or write Tn if EC gt O and no 2 0 such that V71 2 no we have Tn S Example 3 Tn 3712 271 5 i TM 007 for n 2 2 2n 5 3722 5 g 3722 and thus 3712 271 5 S 9712 0012 With C 9 65 T01 an 3n 0712 3 Tn an2bnCTn 007 083510 A Fall 2005 p152 UC BerkeleyicS 170 Ef cient Algorithms and lntractable Problems Handout 23 Lecturer Satish Rao Nov 25 2003 Notes 23 for CS 170 1 NPcompleteness 0f CircuitSAT We will prove that the circuit satis ability problem CSAT described in the previous notes is NP complete Proving that it is in NP is easy enough The algorithm V takes in input the descrip tion of a circuit C and a sequence of n Boolean values m1 zn and VCm1 zn Cz1 zn le V simulates or evaluates the circuit Now we have to prove that for every decision problem A in NP we can nd a reduction from A to CSAT This is a dif cult result to prove and it is impossible to prove it really formally without introducing the Turing machine model of computation We will prove the result based on the following fact of which we only give an informal proof THEOREM 1 Suppose A is a decision problem that is solvable in pn time by some program P where n is the length of the input Also assume that the input is represented as a sequence of bits Then for every xed 71 there is a circuit 0 of size about Opn2 l0gpn01 such that for every input z ml zn of length n we have Ax Cnm1 mn That is circuit 0 solves problem A on all the inputs of length 71 Furthermore there exists an ef cient algorithm running in time polynomial in that on input 71 and the description ofP produces On The algorithm in the furthermore part of the theorem can be seen as the ultimate CAD tool that on input say a C program that computes a boolean function returns the description of a circuit that computes the same boolean function Of course the generality is paid in terms of inef ciency and the resulting circuits are fairly big PROOF Sketch Without loss of generality we can assume that the language in which P is written is some very low level machine language as otherwise we can compile it Let us restrict ourselves to inputs of length n Then P runs in at most pn steps It then accesses at most pn cells of memory At any step the global state of the program is given by the content of such pn cells plus 01 registers such as program counter etc No registermemory cell needs to contain numbers bigger than pn or equivalently no registermemory cell needs to hold more than lgpn Ologn bits Let qn pnO1Olog n denote the size of the whole global state We maintain a qn x pn tableau that describes the computation The row i of the tableau is the global state at time 2 Each row of the tableau can be computed starting from the previous one by means of a small circuit of size about In fact the Notes number 23 2 microprocessor that executes our machine language is such a circuit this is not totally accurate D Now we can argue about the NP completeness of CSAT Let us rst think of how the proof would go if say we want to reduce the Hamiltonian cycle problem to CSAT Then given a graph G with n vertices and m edges we would construct a circuit that given in input a sequence of n vertices of G outputs 1 if and only if the sequence of vertices is a Hamiltonian cycle in G How can we construct such a circuit There is a computer program that given G and the sequence checks if the sequence is a Hamiltonian cycle so there is also a circuit that given G and the sequence does the same check Then we hard wire G into the circuit and we are done Now it remains to observe that the circuit is a Yes instance of CSAT if and only if the graph is Hamiltonian The example should give an idea of how the general proof goes Take an arbitrary problem A in NP We show how to reduce A to Circuit Satis ability Since A is in NP there is some polynomial time computable algorithm VA and a polynomial pA such that Am YES if and only if there exists a y with lengthy pAlengthz such that Vmy outputs YES Consider now the following reduction On input z of length n we construct a circuit C that on input y of length pn decides whether Vmy outputs YES or NOT Since V runs in time polynomial in npn the construction can be done in polynomial time Now we have that the circuit is satis able if and only if z E A 2 Proving More NPcompleteness Results Now that we have one NP complete problem we do not need to start from scratch77 in order to prove more NP completeness results Indeed the following result clearly holds LEMMA 2 HA reduces to B and B reduces to C then A reduces to C PROOF If A reduces to B there is a polynomial time computable function f such that Am Bfm if B reduces to C it means that there is a polynomial time computable function 9 such that By Then we can conclude that we have Am Cgfz where gf is computable in polynomial time S0 A does indeed reduce to C D Suppose that we have some problem A in NP that we are studying and that we are able to prove that CSAT reduces to A Then we have that every problem N in NP reduces to CSAT which we have just proved and CSAT reduces to A so it is also true that every problem in NP reduces to A that is A is NP hard This is very convenient a single reduction from CSAT to A shows the existence of all the in nitely many reductions needed to establish the NP hardness of A This is a general method LEMMA 3 Let C be an NP complete problem and A be a problem in NP If we can prove that C reduces to A then it follows that A is NP complete Right now literally thousands of problems are known to be NP complete and each one except for a few root problems like CSAT has been proved NP complete by way Notes number 23 3 of a single reduction from another problem previously proved to be NP complete By the de nition all NP complete problems reduce to each other so the body of work that lead to the proof of the currently known thousands of NP complete problems actually implies millions of pairwise reductions between such problems 3 NPcompleteness of SAT We de ned the CNF Satis ability Problem abbreviated SAT above SAT is clearly in NP In fact it is a special case of Circuit Satis ability Can you see why We want to prove that SAT it is NP hard and we will do so by reducing from Circuit Satis ability First of all let us see how not to do the reduction We might be tempted to use the following approach given a circuit transform it into a Boolean CNF formula that computes the same Boolean function Unfortunately this approach cannot lead to a polynomial time reduction Consider the Boolean function that is 1 iff an odd number of inputs is 1 There is a circuit of size 0n that computes this function for inputs of length 71 But the smallest CNF for this function has size more than 2 This means we cannot translate a circuit into a CNF formula of comparable size that computes the same function but we may still be able to transform a circuit into a CNF formula such that the circuit is satis able iff the formula is sati able although the circuit and the formula do compute somewhat different Boolean functions We now show how to implement the above idea We will need to add new variables Suppose the circuit C has m gates including input gates then we introduce new variables 91 gm with the intended meaning that variable 9739 corresponds to the output of gate j We make a formula F which is the AND of m 1 sub expression There is a sub expression for every gate j saying that the value of the variable for that gate is set in accordance to the value of the variables corresponding to inputs for gate j We also have a m 1 th term that says that the output gate outputs 1 There is no sub expression for the input gates For a gate j which is a NOT applied to the output of gate 2 we have the sub expression 9i ng A i V j For a gate j which is a AND applied to the output of gates i and l we have the sub expression V90 A 739 V91 A 9739 V i V z Similarly for OR This completes the description of the reduction We now have to show that it works Suppose C is satis able then consider setting 9739 being equal to the output of the j th gate of C when a satisfying set of values is given in input Such a setting for 91 gm satis es F Suppose F is satis able and give in input to C the part of the assignment to F corre sponding to input gates of C We can prove by induction that the output of gate j in C is also equal to 9739 and therefore the output gate of C outputs 1 So C is satis able if and only if F is satis able Notes number 23 4 4 NPcompleteness 0f SSAT SAT is a much simpler problem than Circuit Satis ability if we want to use it as a starting point of NP completeness proofs We can use an even simpler starting point 3 CNF Formula Satis ability abbreviated 3SAT The 3SAT problem is the same as SAT except that each OR is on precisely 3 possibly negates variables For example the following is an instance of 3SAT mgVi4Vx5Am1Vi3Vi4Ai2Vx3Vm5 1 Certainly 3SAT is in NP just because it s a special case of SAT In the following we will need some terminology Each little OR in a SAT formula is called a clause Each occurrence of a variable complemented or not is called a literal We now prove that 3SAT is NP complete by reduction from SAT Take a formula F of SAT We transform it into a formula F of 3SAT such that F is satis able if and only if F is satis able Each clause of F is transformed into a sub expression of F Clauses of length 3 are left unchanged A clause of length 1 such as is changed as follows mVyiVyzAWWAWJZAV 1Vy2AV 1V 2 where yl and y2 are two new variables added speci cally for the transformation of that clause A clause of length 2 such as 1 V 2 is changed as follows m1 VzZVy M251 Vzgvg where y is a new variable added speci cally for the transformation of that clause For a clause of length k 2 4 such as 1 V V mk we change it as follows 951 V2Vy1AzJ1 V903Vy2 AQng leysx AWAW stkii mG where yl yk3 are new variables added speci cally for the transformation of that clause We now have to prove the correctness of the reduction 0 We rst argue that if F is satis able then there is an assignment that satis es F For the shorter clauses we just set the y variables arbitrarily For the longer clause it is slightly more tricky c We then argue that if F is not satis able then F is not satis able Fix an assignment to the z variables Then there is a clause in F that is not satis ed We argue that one of the derived clauses in F is not satis ed Notes number 23 5 5 Some NPcomplete Graph Problems 51 Independent Set Given an undirected non weighted graph G V7 E7 an independent set is a subset I Q V of the vertices such that no two vertices of I are adjacent This is similar to the notion of a matching7 except that it involves vertices and not edges We will be interested in the following optimization problem given a graph7 nd a largest independent set We have seen that this problem is easily solvable in forests In the general case7 unfortunately7 it is much harder The problem models the execution of con icting tasks7 it is related to the construction of error correcting codes7 and it is a special case of more interesting problems We are going to prove that it is not solvable in polynomial time unless P NP First of all7 we need to formulate it as a decision problem 0 Given a graph G and an integer k does there exist an independent set in G with at least k vertices It is easy to see that the problem is in NP We have to see that it is NP hard We will reduce 3SAT to Maximum lndependent Set Starting from a formula 1 with n variables m1 7m and m clauses7 we generate a graph G with 3m vertices7 and we show that the graph has an independent set with at least m vertices if and only if the formula is satis able In fact we show that the size of the largest independent set in G is equal to the maximum number of clauses of j that can be simultaneously satis ed 7 This is more than what is required to prove the NP completeness of the problem The graph G has a triangle for every clause in j The vertices in the triangle correspond to the three literals of the clause Vertices in different triangles are joined by an edge iff they correspond to two literals that are one the complement of the other In Figure 1 we see the graph resulting by applying the reduction to the following formula 1 V 5 V 3gt A wl V 3 V 4 A 3 V 2 V 4 To prove the correctness of the reduction7 we need to show that o lf 1 is satis able7 then there is an independent set in G with at least m vertices o If there is an independent set in G with at least m vertices7 then 1 is satis able From Satisfaction to Independence Suppose we have an assignment of Boolean values to the variables m1 an of 1 such that all the clauses of j are satis ed This means that for every clause7 at least on of its literals is satis ed by the assignment We construct an independent set as follows for every triangle we pick a node that corresponds to a satis ed literal we break ties arbitrarily It is impossible that two such nodes are adjacent7 since only nodes that corresponds to a literal and its negation are adjacent and they cannot be both satis ed by the assignment Notes number 23 6 Figure 1 The reduction from 3SAT to Independent Set From Independence to Satisfaction Suppose we have an independent set I with m vertices We better have exactly one vertex in I for every triangle Two vertices in the same triangle are always adjacent Let us x an assignment that satis es all the literals that correspond to vertices of I Assign values to the other variables arbitrarily This is a consistent rule to generate an assignment because we cannot have a literal and its negation in the independent set Finally we note how every clause is satis ed by this assignment Wrapping up 0 We showed a reduction j a G m that given an instance of 3SAT produces an instance of the decision version of Maximum lndependent Set We have the property that j is satis able answer YES for the 3SAT problem if and only if G has an independent set of size at least m We knew 3SAT is NP hard 0 Then also Max lndependent Set is NP hard and so also NP complete 52 Maximum Clique Given a undirected non weighted graph G V E a clique K is a set of vertices K Q V such that any two vertices in K are adjacent In the MAXIMUM CLIQUE problem given a graph G we want to nd a largest clique In the decision version given G and a parameter k we want to know whether or not G contains a clique of size at least k It should be clear that the problem is in NP We can prove that Maximum Clique is NP hard by reduction from Maximum lndepen dent Set Take a graph G and a parameter k and consider the graph G such that two Notes number 23 7 vertices in G are connected by an edge if and only if they are not connected by an edge in G We can observe that every independent set in G is a clique in G and every clique in G is an independent set in G Therefore G has an independent set of size at least k if and only if G has a clique of size at least k 53 Minimum Vertex Cover Given a undirected non weighted graph G V E a vertex cover C is a set of vertices C Q V such that for every edge 1412 6 E either u E C or v E C or possibly both In the MINIMUM VERTEX COVER problem given a graph G we want to nd a smallest vertex cover In the decision version given G and a parameter k we want to know whether or not G contains a vertex cover of size at most k It should be clear that the problem is in NP We can prove that Minimum Vertex Cover is NP hard by reduction from Maximum lndependent Set The reduction is based on the following observation LEMMA 4 IfI is an independent set in a graph G V E then the set of vertices C V7 I that are not in I is a vertex cover in G Furthermore ifC is a vertex cover in G then I V 7 C is an independent set in G PROOF Suppose C is not a vertex cover then there is some edge 1412 neither of whose endpoints is in C This means both the endpoints are in I and so I is not an independent set which is a contradiction For the furthermore part suppose I is not an independent set then there is some edge 1412 6 E such that u E I and v E I but then we have an edge in E neither of whose endpoints are in C and so C is not a vertex cover which is a contradiction D Now the reduction is very easy starting from an instance G k of Maximum lndepen dent set we produce an instance G n 7 k of Minimum Vertex Cover Some NPcomplete Numerical Problems 54 Subset Sum The Subset Sum problem is de ned as follows 0 Given a sequence of integers a1 an and a parameter k 0 Decide whether there is a subset of the integers whose sum is exactly k Formally decide whether there is a subset I Q 1 n such that 261 a k Subset Sum is a true decision problem not an optimization problem forced to become a decision problem It is easy to see that Subset Sum is in NP We prove that Subset Sum is NP complete by reduction from Vertex Cover We have to proceed as follows 0 Start from a graph G and a parameter k Notes number 23 8 o Create a sequence of integers and a parameter k c Prove that the graph has vertex cover with k vertices iff there is a subset ofthe integers that sum to k Let then G V E be our input graph with n vertices and let us assume for simplicity that V 1 n and let k be the parameter of the vertex cover problem We de ne integers a1 an one for every vertex and also integers b0 one for every edge ij E E and nally a parameter k We will de ne the integers ai and In so that if we have a subset of the a and the In that sums to k then the subset of the a corresponds to a vertex cover C in the graph and the subset of the In corresponds to the edges in the graph such that exactly one of their endpoints is in C Furthermore the construction will force C to be of size k How do we de ne the integers in the subset sum instance so that the above properties hold We represent the integers in a matrix Each integer is a row and the row should be seen as the base 4 representation of the integer with 1 digits The rst column of the matrix the most signi cant digit77 of each integer is a special one It contains 1 for the ms and 0 for the bjs Then there is a column or digit for every edge The column ij has a 1 in 01 Li and j and all 0s elsewhere The parameter k is de ned as 1 lEH k k4lEl E 241 70 This completes the description of the reduction Let us now proceed to analyze it From Covers to Subsets Suppose there is a vertex cover C of size k in G Then we choose all the integers a such that i E C and all the integers In such that exactly one of i and j is in C Then when we sum these integers doing the operation in base 4 we have a 2 in all digits except for the most signi cant one In the most signi cant digit we are summing one 0 k times The sum of the integers is thus k From Subsets to Covers Suppose we nd a subset C Q V and E Q E such that Za Z bltia39gtk 13960 weE First note that we never have a carry in the less signi cant digits operations are in base 4 and there are at most 3 ones in every column Since the bag can contribute at most one 1 in every column and H has a 2 in all the less signi cant digits it means that for every edge ij C must contain either i or j So C is a cover Every a is at least 4lEl and k gives a quotient of k when divided by 4lEl So C cannot contain more than k elements M Design and analysis of algorithms Lecture 38 amp 39amp 40 Edyta Szyma ska edyta cc gatech edu I 383510 A Pa 2005 r p 115 Euclid s algorithm for greatest common divisor n AA 39 v L it Given two integers a and b compute the largest integer which divides both of them I 383510 A FaH 2005 r p 215 Euclid s algorithm for greatest common divisor n AA 39 v L it Given two integers a and b compute the largest integer which divides both of them Example GCD1035 759 I 383510 A FaH 2005 r p 215 Euclid s algorithm for greatest common divisor 1 LA r v V 39 Given two integers a and b compute the largest integer which divides both of them Example GCD1035 759 If we can factorize then 1035 32 5 23 and 759 3 11 23 and thus GCD1035 759 3 23 69 I 383510 A FaH 2005 r p 215 Euclid s algorithm for greatest common divisor 1 LA r V V 39 Given two integers a and b compute the largest integer which divides both of them Example GCD1035 759 If we can factorize then 1035 32 5 23 and 759 3 11 23 and thus GCD1035 759 3 23 69 No polynomial procedure is known for factoring integers Different method must be used I 383510 A FaH 2005 r p 215 Euclid s algorithm for greatest common divisor M Idea I 383510 A Pa 2005 r p 315 Euclid s algorithm for greatest common divisor Idea Proposition lfa gt b then goda bgoda mod 9 b I 383510 A Pa 2005 r p 315 Euclid s algorithm for greatest common divisor Idea Proposition lfa gt b then goda bgoda mod 9 b function EUCLIDa 9 Input two positive integers a b with a 2 9 Output ngab if b 0 then return a return EUCLIDb a mod 6 Running time I 383510 A FaH 2005 r in 315 Euclid s algorithm for greatest common divisor Idea Proposition lfa gt b then goda bgoda mod 9 b function EUCLIDa 9 Input two positive integers a b with a 2 9 Output ngab if b 0 then return a return EUCLIDb a mod 6 Running time Claim lfa 2 b then a mod 9 g g I 383510 A FaH 2005 r in 315 Euclid s algorithm for greatest common divisor 1 LA L r V 39 Idea Proposition lfa gt b then goda bgoda mod 9 b function EUCLIDa 9 Input two positive integers a b with a 2 9 Output ngab if b 0 then return a return EUCLIDb a mod 6 Running time Claim lfa 2 b then a mod 6 g g Number of recursive calls 2 og 6 Since 71 logg a logg b we have 0n rounds and the total time is 0n n2 713 n2 for division I 383510 A FaH 2005 r in 315 An extension to Euclid s algorithm M How can we check that d is the greatest common divisor of a and b I 383510 A Pa 2005 r p 415 An extension to Euclid s algorithm 4 AA L W Li 39 How can we check that d is the greatest common divisor of a and 9 Claim If da and db and d ax by for some integers say then d gcda b I 383510 A Pa 2005 r p 415 An extension to Euclid s algorithm 1 LA L r v V 39 How can we check that d is the greatest common divisor of u and 9 Claim If da and db and d use by for some integers say then d gcda 9 Claim For any inputs a b the Extended Euclid algorithm returns integers x y d such that gcda b d use by I 383510 A FaH 2005 r in 415 An extension to Euclid s algorithm 1 LA L r V How can we check that d is the greatest common divisor of u and 9 Claim If da and db and d use by for some integers say then d gcdu 9 Claim For any inputs a b the Extended Euclid algorithm returns integers x y d such that gcda b d use by function EXTENDEDEUCLIDCL 9 Input two positive integers u b with u 2 9 Output integers 6 y d such that d gcdu b ax by if b 0 then return 1 0 a x y d EXTENDEDEUCLIDbu mod 6 return y 13 L y d I 383510 A FaH 2005 r in 415 Modular division M In regular arithmetic Va 7A 0 a has an inverse I 383510 A FaH 2005 r p 515 Modular division M In regular arithmetic Va 7A 0 a has an inverse I 383510 A FaH 2005 r p 515 Modular division 4 1 LA L rv V 39 In regular arithmetic Va 7A 0 a has an inverse In modular arithmetic x is a multiplicative inverse of a if an E 1 mod m We denote x fl at most one such x modulo m exists why I 383510 A FaH 2005 r p 515 Modular division 1 L4 L r V V 39 In regular arithmetic Va 7A 0 a has an inverse In modular arithmetic x is a multiplicative inverse of a if an E 1 mod m We denote x fl at most one such x modulo m exists why Proposition For any a lt m a has a multiplicative inverse modulo m if and only if it is relatively prime to m When this inverse exists then it can be found in time 010g3 m I 383510 A Fall 2005 r p 515 Modular division 1 L4 L r V V 39 In regular arithmetic Va 7A 0 a has an inverse In modular arithmetic x is a multiplicative inverse of a if an E 1 mod m We denote x fl at most one such x modulo m exists why Proposition For any a lt m a has a multiplicative inverse modulo m if and only if it is relatively prime to m When this inverse exists then it can be found in time 010g3 m Use ExtendedEuclid s algorithm to find the inverse of a if a and m are relatively prime I 383510 A Fall 2005 r p 515 Primality testing M Theorem Fermat s Little Theorem lfm is prime then for everyl g a lt m we have n 1 E 1 mod m I 383510 A Pa 2005 r p 615 Primality testing M Theorem Fermat s Little Theorem lfm is prime then for everyl g a lt m we have n 1 E 1 mod m It is promising but notice that it is not an equivalence statement I 383510 A FaH 2005 r p 615 Primality testing 4 1 LA L rv V 39 Theorem Fermat s Little Theorem lfm is prime then for everyl g a lt m we have n 1 E 1 mod m It is promising but notice that it is not an equivalence statement However if a is picked randomly it is unlikely that din 1 E 1 mod m This motivates a randomized algorithm PRIMALITY1 I 383510 A Faii 2005 r in 615 Primality testing 4 1 LA L rv V 39 Theorem Fermat s Little Theorem lfm is prime then for everyl g a lt m we have n 1 E 1 mod m It is promising but notice that it is not an equivalence statement However if a is picked randomly it is unlikely that din 1 E 1 mod m This motivates a randomized algorithm PRIMALITY1 I 383510 A Faii 2005 r in 615 Primality testing M PRIMALITY1 m Input a positive integer m Output yesno Pick an integer a lt m at random if Lin 1 E 1 mod m then return yes else return no I 383510 A FaH 2005 r in 715 Primality testing M PRIMALITY1 m Input a positive integer m Output yesno Pick an integer a lt m at random if Lin 1 E 1 mod m then return yes else return no I 383510 A FaH 2005 r in 715 Primality testing analyzis M PRIMALITY 1 is a randomized algorithm I 383510 A Pa 2005 r p 815 Primality testing analyzis W PRIMALITY1 is a randomized algorithm Properties o it may give an incorrect answer when 9 one of the two possible outputs yesno is always correct which one i 083510 A Fall 2005 p 815 Primality testing analyzis M PRIMALITY1 is a randomized algorithm Properties o it may give an incorrect answer when 65 one of the two possible outputs yesno is always correct which one The algorithm belongs to a class of Monte Carlo algorithms With onesided error I CS3510 A Fall 2005 p 815 Primality testing analyzis M PRIMALITY1 is a randomized algorithm Properties o it may give an incorrect answer when 65 one of the two possible outputs yesno is always correct which one The algorithm belongs to a class of Monte Carlo algorithms With onesided error I CS3510 A Fall 2005 p 815 Carmichael numbers M Bad news for PRIMALITY1 algorithm I 383510 A Pa 2005 r p 915 Carmichael numbers M Bad news for PRIMALITY1 algorithm There exist composite numbers m named after Carmichael which satisfy elm 1 E 1 mod m for all values of a I 383510 A Fall 2005 r p 915 Carmichael numbers 1 LA L r v V 39 Bad news for PRIMALITY1 algorithm There exist composite numbers m named after Carmichael which satisfy elm 1 E 1 mod m for all values of a Fortunately Carmichael numbers are extremely rare first three are 561 1105 1729 and we will ignore them for a moment I 383510 A Fall 2005 r p 915 Carmichael numbers L d 1 Bad news for PRIMALITY1 algorithm There exist composite numbers m named after Carmichael which satisfy elm 1 E 1 mod m for all values of a Fortunately Carmichael numbers are extremely rare first three are 561 1105 1729 and we will ignore them for a moment Claim If am 1 i 1 mod m for some a relatively prime to m then it must hold for at least half the choices of a lt m I 383510 A Fall 2005 r p 915 Primality testing analysis M Assuming that there are no Carmichael numbers we have I 383510 A Pa 2005 r p 1015 Primality testing analysis M Assuming that there are no Carmichael numbers we have 6 ifm is a prime then elm 1 E 1 mod m for all a lt m 6 ifm is composite then a 1 E mod m for at most half the values of a lt m I 083510 A Fall 2005 p 1015 Primality testing analysis W Assuming that there are no Carmichael numbers we have 6 ifm is a prime then elm 1 E 1 mod m for all a lt m o ifm is composite then a 1 E mod m for at most half the values of a lt m The algorithm PRIMALITY1 has the following guarantee l 083510 A Fall 2005 p 1015 Primality testing analysis M Assuming that there are no Carmichael numbers we have 6 ifm is a prime then elm 1 E 1 mod m for all a lt m o ifm is composite then a 1 E mod m for at most half the values of a lt m The algorithm PRIMALITY1 has the following guarantee 6 PrPRMALITY1 returns yes when m is a prime 1 o PrPRlMALlTY1 returns yes when m is composite I CS3510 A Fall 2005 p 1015 Lower the error probability M Independent repetitions can reduce the error probability I 383510 A Fa112005 r p 1115 Lower the error probability M Independent repetitions can reduce the error probability PRIMALITY2m Input a positive integer m Output yesno Pick integers a1 a2 ak lt m at random Way 1 E 1 mod mf0raz3912k then return yes else return no I C83510A Fa112005r p 1115 Lower the error probability M Independent repetitions can reduce the error probability PRIMALITY2m Input a positive integer m Output yesno Pick integers a1 a2 ak lt m at random Way 1 E 1 mod mf0raz3912k then return yes else return no PrPRlMALlTY2 returns yes when m is composite can be made arbitrarily small by choosing k large enough A 2k I 383510 A F5112005 r p 1115 MillerRabin primality test M It is an algorithm which handles the Carmichael numbers see Advanced Notes for details I 383510 A Fall 2005 r p 1215 Complexity of Primes M Recent advances I 383510 A Pa 2005 r p 1315 Complexity of Primes 4 1 LA L rv V 39 Recent advances The problem of deciding whether or not an integer is a prime can be solved in time polynomial in the number of bits necessary to represent the integer at hand that is roughly the logarithm of the integer itselD 383510 A Fall 2005 r p 1315 Complexity of Primes 1 LA L r v V 39 Recent advances The problem of deciding whether or not an integer is a prime can be solved in time polynomial in the number of bits necessary to represent the integer at hand that is roughly the logarithm of the integer itselD References M Agrawal N Kayal and N SaxenaPRMES is in P I 383510 A Fall 2005 r p 1315 Dynamic Programming So far in this book we have seen some elegant principles for the design of algorithms divideandconquer graph exploration greedy choice These techniques do work for some problems but they are certainly specialized they are delicate delicate tools We now turn to the two sledgehammers of the algorithms craft dynamic programming and linear program ming They are both techniques of very broad applicability which can be invoked when more specialized methods fail but this generality often comes with a reduction in ef ciency 1 Shortest paths in dags revisited At the conclusion of our study of shortest paths we observed that the problem is especially easy in directed acyclic graphs dags Let s recapitulate this case because it lies at the heart of dynamic programming The special distinguishing feature of a dag is that its nodes can be linearized that is they can be arranged on a line so that all edges go from left to right Figure 11 To see why this helps with shortest paths suppose we want to gure out distances from node 5 to the other nodes For concreteness let s focus on node D The only way to get to it is through its predecessors B or C so to nd the shortest path to D we need only compare these two routes distD mindistB 1distC 3 A similar relation can be written for every node If we compute these dist values in the lefttoright order of Figure 11 we can always be sure that by the time we get to a node 1 we already have all information we need to compute distv We are therefore able to compute all distances in a single pass initialize all dist values to 00 dist s O for each v Vs in topologin order dist 1 minu v6Edist luv Notice that this algorithm is solving a collection of subproblems distu u e V We start with the smallest of them dists since we immediately know its answer to be zero We then proceed with progressively larger subproblems distances to vertices further and further to the right That is we think of a subproblem as large if it is far along in the linearization that is if we need to have solved a lot of other subproblems before we can get to it This is a very general technique At each node we compute some function of the values of the node s predecessors It so happens that our particular function is a minimum of sums but we could just as well make it a maximum in which case we would get longest paths in the dag Or we could use a product instead of a sum inside the brackets in which case we would end up computing the path with the smallest product of edge lengths Dynamic programming is a very powerful algorithmic paradigm in which a problem is solved by identifying a collection of subproblems and tackling them onebyone smallest rst using the answers to small problems to help gure out larger ones until the whole lot of them are solved ln dynamic programming we are not given a dag the dag is implicit lts nodes are the subproblems we de ne and its edges are the dependencies between the subproblems If subproblem B needs the answer to subproblem A to be solved then there is a conceptual edge from A to B And in this case A is thought of a a smaller subproblem than B 7 and it will always be smaller in an obvious sense But it s time we saw an example Figure 11 A dag and its topological ordering 2 Longest increasing subsequences Connectionsz statistical tests for pseudorandomness card games In the longest increasing subsequence problem the input is a sequence of numbers 11 an A subsequence is any subset of these numbers taken in order something of the form ail an 1 where 1 3 2 1 lt 2 2 lt lt 2 g n and an increasing subsequence is one in which the numbers are getting strictly larger The task is to nd the increasing subsequence of greatest length For instance the longest increasing subsequence of 5 2863 69 7 is 2 369 5 2 8 6 3 6 9 7 vv In this little example the arrows above denote transitions between consecutive elements in the optimal solution More generally to better understand the solution space let s create a graph of all permissible transitions establish a node 2 for each element 1239 and add di rected edges 2 9 whenever it is possible for 1239 Zj to be consecutive elements in an increasing subsequence that is wheneverz lt j and a lt 17 Figure 21 Notice that 1 this graph G VE is a dag since it respects the linear order 1 2 n and 2 there is a onetoone correspondence between increasing subsequences and paths in this dag Therefore our goal is simply to nd the longest path in the dag Here is the algorithm for j12n Lltjgt71maxLltz gt m e E return max L0 Lj is the length of the longest path 7 the longest increasing subsequence 7 ending at 9 plus one since strictly speaking we need to count nodes on the path not edges By reason ing in the same way as we did for shortest paths we see that any path to node 9 must pass through one of its predecessors and therefore Lj is one plus the maximum value of Figure 21 The dag of increasing subsequences these predecessors If there are no edges into 9 we take the maximum over the empty set zero And the nal answer is the largest Lj since any ending position is allowed This is dynamic programming In order to solve our original problem we have de ned a collection of subproblems Lj 1 g j g n with the following key property that allows them to be solved in a single pass There is an ordering on the subproblems and a relation which shows how to solve a subproblem given the answers to smaller subproblems that is subprob lems which appear earlier in the ordering In our case each subproblem is solved using the relation L0 1 maXLi I 239 E E an expression which involves only smaller subproblems How long does this step take It requires the predecessors of j to be known for this the adjacency list of the reverse graph GR constructible in linear time can you see how is handy The computation of Lj then takes time proportional to the indegree of 9 giving an overall running time linear in This is at most 003 the maximum being when the input array is sorted in increasing order Thus the dynamic programming solution is both simple and ef cient1 There is one last issue to be cleared up the Lvalues only tell us the length of the optimal subsequence so how do we recover the subsequence itself This is easy While computing L0 we should also note down prevj the predecessor through which the longest path to 9 was realized The optimal subsequence can then be reconstructed by following these backpointers 3 Edit distance When a spell checker encounters a possible misspelling it looks up its dictionary for other words that are close by What is the appropriate notion of closeness in this case A very natural measure of distance between strings w and y is the minimum number of edits 7 insertions deletions and substitutions of characters 7 needed to transform 6 into 3 1However if one is willing to forgo some simplicity a faster On log n algorithm does exist see Exercise Figure 22 The recursion tree for a sorted sequence of length ve L5 L1 L2 L3 L4 L1 L1 L2 L1 L2 L3 L1 L1 L1 L2 L1 Recursion No thanks The formula for Lj also suggests an alternative recursive algorithm Wouldn t that be even simpler Actually recursion is a very bad idea the resulting procedure would require exponential time To see why suppose that the dag contains all edges 19 for all 2 lt j 7 that is the given sequence of numbers 117 an is sorted In that case the formula for subproblem Lj becomes Figure 22 unravels some of the initial stages of recursion for a small instance n 5 The key point to notice is that the same subproblems get solved over and over again In contrast recursion was a perfect t for divideandconquer The reason is that in divide andconquer we decompose a problem into independent disjoint subproblems the hierarchy of the problems is a tree recall Figure 2777 In contrast in dynamic programming the subproblems we invoke have many common suproblems the hierarchy of subproblems is as we know a dag with a potentially exponential number of paths recursive calls For instance the edit distance between to and fro is 2 to a 0 substitution fro insertion and clearly one edit isn t enough to do the job Notice that edit distance is symmetric in the sense that it doesn t depend on the order of its two arguments Let s gure out how to compute this distance A dynamic programing solution When solving a problem by dynamic programming the most crucial question is what are the subproblems As long as they are chosen so as to have the property we discussed earlier it is an easy matter to write down the algorithm iteratively solve one subproblem after the other in order of increasing size Programming The origin of the term dynamic programming has very little to do with writing code It was rst coined by Richard Bellman in the 1950s a time when computer programming was an esoteric activity practiced by so few people as to not even merit a name Back then programming meant planning and dynamic programming was conceived to optimally plan multistage processes The dag of Figure 21 can be thought of as describing the possible ways in which such a process can evolve each node denotes a state the leftmost node is the starting point and the edges leaving a state represent possible actions leading to different states in the next unit of time The etymology of linear programming the subject of the next chapter is similar Our goal is to nd the edit distance of two strings say EXPONENI IAL and POLYNJVEEAL What is a good subproblem Well it should go part of the way towards converting one string to the other so how about looking at the smallest number of edits needed to transform some pre x of the rst string such as EXPO into some pre x of the second say POL This is certainly plausible but we also need to express large subproblems in terms of smaller ones Let s think The edit distance between EXPO and POL call it E43 is the number of edits needed to convert the rst string into the second one There are many ways to perform this transforma tion We could start at the lefthand side of EXPO and systematically move right gradually converting to POL as we go along EXPO 7 XPO delete E 7 PO delete X 7 POL insert L Or we could move righttoleft so that the next edit is always at the rightmost unmatched position of the source string EXPO 7 EXPOL insert L 7 EPOL delete X 7 POL delete E Or equally middleout What should we do Righttoleft seems the best strategy if we can at least partially align the rightmost char acters of the two strings we should then be able to fall back on smaller subproblems like EXP 7POL or EXP 7P0 Now given that the rst edit will be at right end of EXPO there are only three things it can be a insert L b delete O c substitute 07L In the rst case the remaining edits must convert EXPO to PO which we would call subprob lem E42 In case b the remaining subproblem is to convert EXP to POL E33 And in case c what s left is EXP 7P0 E32 In short we have expressed E43 as one edit fol lowed one of the smaller subproblems E4 2 E3 3E37 2 We don t know which one so we just try them all and take the best E4 3 1 minE42E3 3 E32 More generally to compute the edit distance between two words word at with m letters and word 3 with n we de ne E2 j to be the edit distance between the pre xes w1 2 and 31 9 Obviously our nal objective is to compute Emn To gure out E2 j let s look at two cases If the 2 letter of w differs from the 9 letter of y the rst operation in the transformation must be one of the following 5 o delete 5cm in which case E2 j 7 19 1 or 0 insert 39 implying E2 j E2 j 7 1 1 or o substitute 9cm 7 39 implying E2 j 7 1j 7 1 1 On the other hand if the end characters of the two pre xes match the rst two options above remain legitimate but for the third there is no increase in edit distance It becomes E2 jE2 71j71 Therefore we can calculate E2 j as the minimum of these three possibilities Speci cally we can express E2 j in terms ofthe smaller subproblems E2 71j71E2 71j and E2 j7 1 We are almost done There just remain the base cases of the dynamic programming the very smallest subproblems In the present situation these are E0 and E 0 both of which are easily solved E0 j is the fastest way to convert the Olength pre x of 3 namely the empty string to the rst 9 letters of 3 clearly j insertions And similarly E2 0 2 for 2 deletions At this point the algorithm for edit distance basically writes itself For convenience let diff 2 j be 1 if the 2 letter of w differs from the jth letter of y 0 otherwise for all ij Et0t and EOjj for 2 12m for j12n minEi7 19 1Eij7 1 1Ei7 1j 7 1 diff return Emn The collection of subproblems forms a twodimensional table E2 j and this procedure lls in the table column by column top to bottom and left to right Figure 31 Actually there are many valid orderings we just need to make sure that we always get to 7 1 j 7 1 7 1 j 2 j 7 1 before 19 For instance we could ll the table row by row or in diagonals incrementing 2 9 Each entry takes constant time to ll in so the overall running time is just the size of the table And in our example the edit distance turns out to be 7 EXPONENI IAL EXPONENIAL EXPONE MIAL EXPONZMIAL mm EXPOLYmVIIAL EPOLYmVEEAL POLYNOVEEAL The underlying dag Every dynamic program has an underlying dag structure think of each node as representing a subproblem and each edge as a precedence constraint on the order in which the subproblems can be tackled Having nodes 2 1 2 point to 9 means subproblem 9 can be solved once the answers to 11 2 are known In our present edit distance application the nodes of the underlying dag correspond to subproblems or equivalently to positions 2 j in the table Its edges are the precedence constraints ofthe form 2719 7 29 1971 7 29 and 2 7 197 1 7 29 Figure 31 Figure 31 The distance between mm and POLYNJVEEAL is 7 Left The nal table of values found by dynamic programming Right The underlying dag and a path of length 7 POLYNOMIAL P 0 L Y N 0 M I A L 0 1 2 3 4 5 6 7 8 9 10 E E 1 1 2 3 4 5 6 7 8 9 10 x X 2 2 2 3 4 5 6 7 8 9 10 P P 3 2 3 3 4 5 6 7 8 9 10 0 4 4 2 3 4 5 5 6 7 8 9 0 N 5 5 3 4 5 4 5 6 7 8 9 N E 6 6 4 4 5 5 5 6 7 8 9 E N 7 7 5 5 6 5 6 6 7 8 9 T 8 8 6 6 6 6 6 7 7 8 9 N I 9 9 7 7 7 7 7 7 7 8 9 T A 10 10 8 8 8 8 8 8 8 7 8 I L 11 11 9 8 9 9 9 9 9 8 7 A L Of mice and men Discussion of genomics and the extra tricks used in BLAST In fact we can take things a little further and put weights on the edges so that the edit distances are given by shortest paths in the dag To see this set all edge lengths to one except for 7 1j 7 1 a 79 942 gm shown dotted in the gure whose length is zero The nal answer is then simply the distance between nodes 5 00 and t One possible shortest path is shown the one that yields the alignment we found earlier On this path each move down is a deletion each move right is an insertion and each diagonal move is either a match or a substitution By altering the weights on this dag we can allow generalized forms of edit distance in which insertions deletions and substitutions have different associated costs 4 Knapsack During a robbery a burglar nds much more loot than he had expected and has to decide what to take His bag or knapsack will hold a total weight of at most W pounds There are 71 items to pick from of weight 101 wn and dollar value 111 1 What s the most valuable combination of items he can t into his bag2 For instance take W 10 and 21f this application seems frivolous replace weight with CPU time and only W pounds can be taken with only W units of CPU time are available Or bandwidth instead of CPU time etc The knapsack problem generalizes a wide variety of resourceconstrained selection tasks Item Weight Value 30 2 3 14 3 4 16 4 2 9 There are two versions of this problem If there are unlimited quantities of each item avail able the optimal choice is to pick item 1 and two of item 4 total 48 On the other hand if there is one of each item the burglar has broken in an art gallery say then the optimal knapsack contains items 1 and 3 total 46 As we shall see a few chapters down the line neither version of this problem is likely to have a polynomialtime algorithm However using dynamic programming they can both be solved in OnW time which is reasonable when W is small but is not polynomial since the input size is proportional to log W not W Let s start with the version which allows repetition As always the main question in dynamic programming is what are the subproblems Well in this case we can shrink the original problem in two ways we can either look at smaller knapsack capacities m 3 W or we can look at fewer items for instance items 12 j for j g n It usually takes a little experimentation to gure out exactly what works The rst restriction calls for smaller capacities Accordingly de ne KW maximum value achievable with a knapsack of capacity w Can we express this in terms of smaller subproblems Well if the optimal solution to KW includes item 2 then removing this item from the knapsack should yield an optimal solution to Kw 7 10 Of course we don t know 2 so let s try all possibilities Kltwgt mag Km 7 w 7121 where as usual our convention is that the maximum over an empty set is zero We re done The algorithm now writes itself and it is characteristically simple and elegant K00 for 101 to W KwmaxKwiwivizw w return This algorithm lls in a onedimensional table of length W 1 in lefttoright order Each entry can take up to 0n time to compute so the overall running time is As always we can also construct the underlying dag Doing so gives us a startling insight this particular variant of knapsack is nothing other than longest path in a dag Check The dag of this dynamic program is also very simple The vertices are all integers between 0 and W and there is an edge between 10 and w whenever w w 10 for some 2 g n Onto the second variant what if repetitions are not allowed Our earlier subproblems now become completely useless Because knowing that the value K W 7 Mn is very high doesn t help us because we don t know whether or not item n already got used up in this partial solution We must thereore re ne our concept of a subproblem Our subproblems need to carry additional information about the items being used We add a second parameter 0 S j S n Kw j maximum value achievable using a knapsack of capacity 10 and items 1 j The answer we seek is KW How can we express a subproblem Kw j in terms of smaller subproblems Quite simple either item 9 is needed to achieve the optimal value or it isn t needed Kwj maXKw 7wjj 7 1Kwj 7 The rst case is invoked only if 10 g 10 In other words we can express Kw j in terms of subproblems K j 71 The algorithm then consists of lling out a twodimensional table with W 1 rows and n 1 columns Each table entry takes just constant time so even though the table is much larger than in the previous case the running time remains the same Here s the code Initialize all KOj0 and all Kw00 for j1 to n for 101 to W if lUjgtle KwjKwj7l else Kwj maxKwj71Kw 7wjj71 return KWn 5 Chain matrix multiplication Suppose that we want to multiply four matrices A X B X C X D of dimensions 50 X 20 20 X 1 1 X 10 and 10 X 100 respectively This will involve iteratively multiplying two matrices at a time Matrix multiplication is not commutative in general A X B 7 B X A but it is associative which means for instance that A X B X C A X B X 0 Thus we can compute our product of four matrices in many different ways depending on how we parenthesize it Are some of these better than others Multiplying an m X n matrix by an n X p matrix takes mnp multiplications to a good enough approximation Using this formula let s compare several different ways of evaluating A X B X C X D Parenthesization Cost computation Cost Agtlt ongt X D 201 1020101005020 100 120200 AgtltBgtltCgtltD 201 105020105010100 60200 AgtltBgtltCgtltD 50 20 11 10 10050 1 100 7000 As you can see the order of multiplications makes a big difference in the nal running time Moreover the natural greedy approach to always perform the cheapest matrix multiplication available leads to the second parenthesization shown here and is therefore a failure How do we determine the optimal order if we want to compute A1 gtlt A2 gtlt gtlt An where the Ai s are matrices with dimensions m0 gtlt m1m1 gtlt m2 mnil gtlt mm respectively The 9 rst thing to notice is that a particular parenthesization can be represented very naturally by a binary tree in which the individual matrices correspond to the leaves the root is the nal product and interior nodes are intermediate products Figure 51 The possible orders in which to do the multiplication correspond to the various binary trees with n leaves whose number is exponential in n checkI We certainly cannot try each tree and with brute force thus ruled out we turn to dynamic programming The binary trees of Figure 51 are suggestive for a tree to be optimal its subtrees must also be optimal What are the subproblems corresponding to subtrees They are products of the form A X A1 gtlt X A Let s see ifthis works Ford 3 9 we de ne Cz j minimum cost of multiplying A X A1 gtlt X A The size of this subproblem is the number of matrix multiplications j 7 The smallest subproblem is when 2 j in which case there s nothing to multiply so 0 0 For 9 gt 2 consider the optimal subtree for Cz j The rst branch in this subtree the one at the top will split the product in two pieces of the form A X gtlt Ac and Ak1 gtlt X A for some k between 2 and j The cost of the subtree is then the cost of these two partial products plus the cost of combining them Cz k Ck 1j m1 m mj And we just need to nd the splitting point k for which this is smallest 02 igig o yk 0 1yjm2gt1 mk mjgt We are ready to code In the following the variable 5 denotes subproblem size for 11 to n Cz 2 0 for 51 to 7171 for 11 to 7175 j i s Cz j minC2 k Ck 1jm1 mk mj 13 k lt 9 return C1n The subproblems constitute a twodimensional table each of whose entries can take 0n time to compute The overall running is then 0n3 6 Shortest Paths We started this chapter with the most elementary strippeddown dynamic programming al gorithm nding shortest paths in dags Not surprisingly it turns out that more elaborate dynamic programming algorithms help solve quite sophisticated shortestpath problems Shortest Reliable Paths Life is complicated and abstractions such as graphs edge lengths and shortest paths rarely capture the whole truth For example in a communications network even though edge lengths may faithfully re ect transmission delays there may be other considerations involved in choosing a path For example each extra edge in the path may be an extra hop frought 10 with uncertainties and dangers of packet loss For this reason we may avoid paths that have too many edges For example in the Graph of Figure 2 the shortest path from s to t has four edges while another path a little longer but with two edges exists If four edges translate to prohibitive unreliability we may have to choose the latter Suppose then that we are given a graph G with lengths on the edges two nodes 5 and t and an integer Q We want the shortest path from s to t with Q or fewer edges One could try to adapt Dijkstra s algorithm to this new task The problem is that Dijkstra s algorithm focuses on the dist information the length of the shortest path to the current vertex It does not remember the number of hops in that path presently a crucial piece of information In dynamic programming the trick is to choose subproblems so that all vital information is remembered and carried forward In this case let us de ne for each vertex 1 and each integer 2 g Q dist 12 to be the length of the shortest path from s to 1 that has 2 Initially dist 110 is 00 for all vertices except 5 for which it is 0 The equation is naturally enough distvi min distui 7 1 uv u1EE Need we say more Allpairs Shortest Paths What if we want to nd the shortest path between it and 1 all pairs of vertices One way would be to execute the BellmanFord algorithm since there may be negative edges V times once for each starting node u for a total ofOV2 A better algorithm called the FloydWarshall algorithm takes time OV3 and is based on dynamic programming We want to nd the distances between all pairs of vertices in a graph What is a subprob lem of this Solving the problem for more and more pairs or starting points is not a good idea 7 it leads to the OV2E algorithm One idea comes to mind The shortest path uw1w2 wkv between it and 1 uses several intermediate nodes 7 possibly zero of them When no intermedaie nodes are allowed we know the answer Direct edges whenever they exist What if we allow more and more intermediate nodes This works but it would be a Vfold repetition of the shortest reliable paths algorithm of the previous subsection with Q V 7 1 for a total of OV4 steps 7 even the V repetitions of BellmanFord beats that Here is a more radical idea for building upon the direct edges solution Suppose that we restrict the set I of vertices that can be used as intermediate nodes in a path In the beginning I is empty and only direct edges are allowed Then we add to I the nodes of the graph one by one updating the shortest path lengths When I V all vertices are allowed to be on all paths and we have found the true shortest paths between all vertices of the graph To be more concrete suppose that we have numbered the vertices in V as 12 n and let dist 2 j k denote the length of the shortest path from 2 to 9 using as intermediate nodes only nodes from the set 12 k Initially dist 2 90 is the length of the direct edge between 2 and j if it exists and it is 00 otherwise What happens when we add a new node k to I We must reexamine all pairs 2 and j and see whether using k as a stop makes the shortest path from 2 to 9 any shorter But this is easy to check A shortest path from 2 to j that uses k along with possibly other lowernumbered intermediate nodes goes through k just once why because we assume that there are no negative cycles And we have already calculated the length of the shortest path from 2 to k and from k to 9 using only lowernumbered vertices That is we just need to check see whether dist 2 k 1k dist k 1jk is smaller than dist 2 j k 7 if so dist 2 jk must be updated to get dist 2 j k 1 Here is the FloydWarshall algorithm for 11 to n for j1 to n if 239 is m E then dist 2 j0 2 j else dist 2 j0oo for k1 to n for 11 to n for j1 to n dist ijk mindist 2 kk71dist kjk71dist 2 jk71 As promised it takes OV2 time The Traveling Salesman Problem A traveling salesman is getting ready for a big sales tour Starting at his hometown suitcase in hand he will conduct a journey in which each of his target cities is visited exactly once before he returns home Given the pairwise distances between cities what is the best order in which to visit them so as to minimize the overall distance traveled Denote the cities by 1 n the salesman s hometown being 1 and let D dij be the matrix of intercity distances The goal is to design a tour which starts and ends at 1 includes all other cities exactly once and has minimum total length Figure 61 shows a example involving ve cities Can you spot the optimal tour Even in this tiny example it is tricky for a human to nd the solution imagine what happens when hundreds of cities are involved It turns out this problem is also dif cult for computers In fact the traveling salesman problem TSP is one of the most notorious computational tasks There is a long history of attempts at solving it a long saga of failures and partial successes and along the way major advances in algorithms and complexity theory The most basic piece of bad news about the TSP which we will better understand in a couple of chapters is that it is highly unlikely to be solvable in polynomial time How long does it take then Well the bruteforce approach is to evaluate every possible tour and return the best one Since there are n possibilities this strategy takes time On n1 We will now see that dynamic programming yields a much faster solution though not a polynomial one What is the appropriate subproblem for the TSP Subproblems refer to partial solutions and in this case the most obvious partial solution is the initial portion of a tour Suppose we have started at city 1 as required have visited a few cities and are now in city 9 What information do we need in order to extend this partial tour We certainly need to know 9 since 12 this will determine which cities are most convenient to visit next And we also need to know all the cities visited so far so that we don t repeat any of them Here then is an appropriate subproblem For a subset ofcities S g 12 n which includes 1 andj E S let CSj be the length of the shortest path visiting each node in S exactly once starting at 1 and ending at 9 Now let s express 0 S j in terms of smaller subproblems We need to start at 1 and end at 9 What should we pick as the secondlast city It has to be some 2 E S which case the overall path length is the distance from 1 to 2 namely CS 7 j2 plus the length of the nal edge dij We must pick the best such 2 CSjier iir1jCS 7 gm 61 The subproblems are ordered by S Here s the code for jl to 77 Cljjdlj for 53 to n for all subsets S 12n of size 5 and containing 1 for all jESj7 1 minieg i XS 7 return minj7g1C1njdjl There are 2 n subproblems and each one takes linear time to solve The total running time is therefore 00122 7 Independent sets in trees A subset of nodes S C V is an independent set of graph G V E if there are no edges between them For instance in Figure 71 the nodes 1 5 are an independent set but 14 5 are not because of the edge between 4 and 5 The largest independent set is 2 3 6 Like several other problems we have seen in this chapter knapsack traveling salesman problem nding the largest independent set in a graph is believed to be intractable However when the graph happens to be a tree the problem can be solved in linear time using dynamic programming And what are the appropriate subproblems Already in the chain matriX multiplication problem we noticed that the layered structure of a tree provides a natural de nition of a subproblem 7 as long as one node of the tree has been identi ed as a root Here s the algorithm start by rooting the tree at any node T Now each node de nes a subtree 7 the one hanging from it This immediately suggests subproblems I the size of the largest independent set of the subtree hanging from a Our nal goal is r Dynamic programming proceeds as always from smaller subproblems to larger ones that is to say bottomup in the rooted tree Suppose we know the largest independent sets for all subtrees below a certain node u in other words suppose we know 10 for all descendants w 13 of u How can we compute I Let s split the computation into two cases any independent set either includes u or it doesn t Figure 72 u max 1 Z 10 Z 10 grandchildren w of u children w of u If the independent set includes u then we get one point for it but we aren t allowed to include the children of u 7 therefore we move onto the grandchildren This is the rst case in the formula On the other hand if we don t include u then we don t get a point for it but we can move on to its children The number of subproblems is exactly the number of vertices With a little care the running time can be made linear OV Memoization In dynamic programming we write out a recursive formula which expresses large problems in terms of smaller ones and then use it to ll out a table of solution values in a bottomup manner from smallest subproblem to largest The formula also suggests a recursive algorithm but we saw earlier that naive recursion can be terribly inef cient because it solves the same subproblems over and over again What about a more intelligent recursive implementation one that remembers its previous invocations and thereby avoids repeating them Such an algorithm would use hashing to store the values of KW which have already been computed At each recursive call KW the algorithm would rst check if the answer is already in the hash table and proceed to its calculation only when it is not This trick is called memoization knapsack W declare a hash table H initially empty containing values of K m indexed by 10 function knapsack m if u is in H return Kw else Kw maxknapsack w 7 10239 11 mi 3 w insert Kw in H with key 10 return Kw Since this algorithm never repeats a subproblem its running time is 0nW just like the dynamic program However the constant factor in this bigO is substantially larger because of the overhead of recursion In some cases though memoization pays off Here s why dynamic programming automat ically solves every subproblem that could conceivably be needed while memoization only ends up solving the ones that are actually used For instance suppose that W and all the weights 10239 are multiples of 100 Then any subproblem KW is useless unless 100 divides w The memoized recursive algorithm will never look at these extraneous table entries Figure 51 i A X B X C X D ii A X B X C X D iii A X B X 0 X D On Time and Memory The amount of time it takes to run a dynamic programming algorithm is easy to discern once you know the dag of the subproblems It is the total number of edges in the dag The reason is simple All we do in dynamic programming is visit the nodes in linearized order and at each node examine onebyone the edges coming into it typically doing constant work for each edge This way each edge is examined once But how much computer memory is required There is no simple parameter of the dag that characterizes the memory requirements of the algorithm But usually one can get away with much less memory than one word for every subproblem vertex in the dag The reason is that the space used to store the value of a subproblem can be released for reuse as soon as all larger subproblems that depend on this subproblem have been solved For example in the FloydWarshall algorithm the value of dist 2 j k is not needed after we are done computing the dist 3 k 1 values Therefore one needs only two V gtlt V arrays to store the dist values one for odd values of k and one for even values In other words when computing dist 2 j k we overwrite the value of dist 2 j k 7 2 And let us not forget that as always in dynamic programming we also need one more array prev 2 j storing the next to last vertex in the current shortest path from 2 to j a value that must be updated with dist 2 j We omit this mundane but crucial bookkeeping step from our dynamic programming algorithms Can you see why the edit distance dag in Figure 31 only needs memory equal to the length of the shorter of the two strings Figure 61 The optimal traveling salesman tour has length ten Figure 71 The largest independent set in this graph has size three M Design and analysis of algorithms Lecture 36 amp 37 Edyta Szyma ska edyta cc gatech edu I 383510 A Pa 2005 r p 1 Subset sum numerical problem n AA AL L 39 V Given a sequence of integers a1 an and a parameter k decide whether there is a subset of integers whose sum is exactly k I 383510 A FaH 2005 r p 2 Subset sum numerical problem n AA AL L 39 V Given a sequence of integers a1 an and a parameter k decide whether there is a subset of integers whose sum is exactly k Formally does there exist a subset g 1 n such that iEI I 383510 A FaH 2005 r p 2 Subset sum numerical problem n AA AL L 39 V Given a sequence of integers a1 an and a parameter k decide whether there is a subset of integers whose sum is exactly k Formally does there exist a subset g 1 n such that iEI I 383510 A FaH 2005 r p 2 Subset sum numerical problem n AA AL L 39 V Given a sequence of integers a1 an and a parameter k decide whether there is a subset of integers whose sum is exactly k Formally does there exist a subset g 1 n such that Z CLZ39 k iEI Theorem Subset Sum is NPcomplete I 383510 A FaH 2005 r p 2 Subset sum numerical problem n AA AL L 39 V Given a sequence of integers a1 an and a parameter k decide whether there is a subset of integers whose sum is exactly k Formally does there exist a subset g 1 n such that Z CLZ39 k iEI Theorem Subset Sum is NPcomplete Proof Reduction from Vertex Cover I 383510 A FaH 2005 r p 2 From Vertex Cover to Subset Sum M Idea of the reduction Start from a graph G and a parameter k 6 Create a set of integers and a parameter k e Prove that G has vertex cover of size k if and only if there is a subset of the integers that sum to k I 083510 A Fall 2005 p 37 From Vertex Cover to Subset Sum M Let G be given with V 1 2 n and some k I 383510 A Pa 2005 r p 4 From Vertex Cover to Subset Sum M Let G be given with V 1 2 n and some k We want a construction with the following properties integers two groups Cli S E V bij s WM 6 E parameter k e if we have a subset I of the integers that sums up to k then ai s in I correspond to a vertex cover 0 in G and the bij s correspond to edges in G which have exactly one endpoint in C o the construction will force C k I 083510 A Fall 2005 p 47 From Vertex Cover to Subset Sum M Defining the integers I 383510 A Pa 2005 r p 5 From Vertex Cover to Subset Sum M Defining the integers 6 create a matrix with n m rows corresponding to all ai s and bij s and m 1 columns corresponding to bij s plus one extra column X c in X put a 1 for all ai s and a O othenNise e in column ij put a 1 for 61 aj and bij and O elsewhere 771 1 6 definek k4m Z 242 0 2 I l 083510 A Fall 2005 p 57 From Vertex Cover to Subset Sum W Defining the integers 6 create a matrix with n m rows corresponding to all ai s and bij s and m 1 columns corresponding to bij s plus one extra column X c in X put a 1 for all ai s and a O othenNise e in column ij put a 1 for 61 aj and bij and O elsewhere 771 1 6 definek k4m Z 242 40 Each row is a representation of an integer in base 4 l l CS3510 A Fall 2005 p 57 From Vertex Cover to Subset Sum I 383510 A Pa 2005 r p 6 From Vertex Cover to Subset Sum I 383510 A Pa 2005 r p 6 From Vertex Cover to Subset Sum 34 383510 A Pa 2005 r p 6 35 23 25 15 12 1 2 1 5 25 23 35 45 34 o4 From Vertex Cover to Subset Sum 34 383510 A Fa112005 r p 6 35 23 25 15 12 12 15 25 23 35 45 34 5 04 3472 24160074 kl 0 i From Vertex Cover to Subset Sum L 11 V 1 x 12 15 25 23 35 45 34 1 1 1 1 0 0 0 0 0 21504 2 1 1 0 1 1 0 0 0 20800 2 3 3 1 0 0 0 1 1 0 1 16704 1 4 1 0 0 0 0 0 1 1 16389 5 1 0 1 1 0 1 1 0 17684 5 04 12 0 1 0 0 0 0 0 0 4096 k 3472150 010 0 0 0 01024 7141 60074 25 0 0 0 1 0 0 0 0 256 10 23 0 0 0 0 1 0 0 0 64 35 0 0 0 0 0 1 0 0 16 45 0 0 0 0 0 0 1 0 4 34 0 0 0 0 0 0 0 1 1 I 383510 A Pa 2005 r p 6 Reduction correctness M From Covers to Subsets I 383510 A Pa 2005 r p 7 Reduction correctness M From Covers to Subsets 6 Suppose 3 vertex cover 0 C k 6 Choose ai s and 9ng as described 6 When we sum up these integers in base 4 then we have a 2 in all digits except for column X Then we have a 1 in X k times a Thus the sum is k I 083510 A Fall 2005 p 77 Reduction correctness M From Subsets to Covers I 383510 A Pa 2005 r p 8 Reduction correctness W From Subsets to Covers 6 Suppose we find a subset 0 g V and E g E st EdiF Z bijZkl iEC ijEE Observe that we never have a carry in the m less significant digits 3 3 ones in every column that s why base 4 works 6 bij s can contribute at most 1 in every column and H has a 2 in all the m less significant digits and thus for every 759 we have eitherz e C orj e C and C is a cover 6 Vi ai 2 4m and k gives a quotient of k when divided i by 47 Thus 0 cannot contain more than k elements 39 CS3510 A Fall 2005 p 87 Unary subset sum is in P 4 1 LA L rv Unary Subset sum given integers a1 a2 an and an integer K in unary is there a subset of these integers that sum exactly to K I 383510 A FaH 2005 r p 9 Unary subset sum is in P 4 1 LA L rv Unary Subset sum given integers a1 a2 an and an integer K in unary is there a subset of these integers that sum exactly to K Input size n K bits 383510 A FaH 2005 r p 9 Unary subset sum is in P 1 LA L r v Unary Subset sum given integers a1 a2 an and an integer K in unary is there a subset of these integers that sum exactly to K Input size n K bits Solution dynamic programming see the algorithm for Knapsack Problem will solve it in time 0n2K I 383510 A Faii 2005 r p 9 Unary subset sum is in P 1 L4 L r V Unary Subset sum given integers a1 a2 an and an integer K in unary is there a subset of these integers that sum exactly to K Input size n K bits Solution dynamic programming see the algorithm for Knapsack Problem will solve it in time 0n2K 0n2K is polynomial in the size of input in unary but exponential if K is in binary I 383510 A Fall 2005 r p 9 More NPcomlete problems M CSAT 3SAT Independent Set Ham Cycle 3D matchig Vertex Cover Clique TSP Set Cover Knapsack Dominating Set I 383510 A FaH 2005 r p 10 More NPcomlete problems n AA r7 CSAT 3 SAT Independent Set Ham Cycle 3D matchig VeIteX Cover Clique TSP Set Cover KnapsaCk Dominating Set See Computers and lntractability A guide to the theory of NPcompleteness by Michael R Garey and David S Johnson W H Freeman 1979 I 383510 A FaH 2005 r p 10 An Epilogue Undecidability M Are there problems that are not even in NP I 383510 A Pa 2005 r p MW An Epilogue Undecidability M Are there problems that are not even in NP YES I 383510 A Pa 2005 r p MW An Epilogue Undecidability 4 1 LA L rv V 39 Are there problems that are not even in NP YES There are problems for which there are NO ALGORITHMS at all I 383510 A Fall 2005 r p MW An Epilogue Undecidability M Are there problems that are not even in NP YES There are problems for which there are NO ALGORITHMS at all Example There is barber in a town who shaves every man who doesn t shave himself I 383510 A Fall 2005 r p MW An Epilogue Undecidability M Are there problems that are not even in NP YES There are problems for which there are NO ALGORITHMS at all Example There is barber in a town who shaves every man who doesn t shave himself Who shaves the barber I 383510 A Fall 2005 r p MW The halting problem IIIrqlInllI quotquot h quot r Consider a task to write the following Boolean function termPX e P is a program in the same language 6 X is a data file termPX returns A TRUE if program P with input file X eventually terminates A FALSE if program P loops on file X forever I CS3510 A Fall 2005 p 127 The halting problem IIIrqlInllI quotquot h quot r Consider a task to write the following Boolean function termPX e P is a program in the same language 6 X is a data file termPX returns A TRUE if program P with input file X eventually terminates A FALSE if program P loops on file X forever Is this task possible to perform I CS3510 A Fall 2005 p 127 The Halting Problem M NO Proof I 383510 A Pa 2005 r p 13 The Halting Problem nnalInqIIhII quot quot quot NO Proo Suppose that we have written such a Boolean function and used it in the following program Boolean function diagP if termPP then loop forever I 383510 A FaH 2005 r p 13 The Halting Problem M NO Proo Suppose that we have written such a Boolean function and used it in the following program Boolean function diagP if termPP then loop forever Run diag diag Does it terminate I 383510 A FaH 2005 r p 13 The Halting Problem M NO Proo Suppose that we have written such a Boolean function and used it in the following program Boolean function diagP if termPP then loop forever Run diag diag Does it terminate It does if it does not I 383510 A FaH 2005 r p 13 M Design and analysis of algorithms Lecture 25amp 26 Edyta Szyma ska edyta cc gatech edu I 383510 A Pa 2005 r p 116 Edit distance M Application Spell Checkers I 383510 A Fall 2005 r p 216 Edit distance M Application Spell Checkers ocurrance I 383510 A Fall 2005 r p 216 Edit distance M Application Spell Checkers ocurrance Perhaps you mean occurrence I 383510 A Fall 2005 r p 216 Edit distance M Application Spell Checkers ocurrance Perhaps you mean occurrence ocurrance and occurrence are close by I 383510 A Fall 2005 r p 216 Edit distance M Application Spell Checkers ocurrance Perhaps you mean occurrence ocurrance and occurrence are close by How to define closeness minimum number of edits insertions deletions and substitutions of characters needed to transform string 56 into string y I 383510 A Fall 2005 r p 216 Edit distance M Application Spell Checkers ocurrance Perhaps you mean occurrence ocurrance and occurrence are close by How to define closeness minimum number of edits insertions deletions and substitutions of characters needed to transform string 56 into string y Example 1 dto fro 2 t0 gt fo substitution gt fro insertion I 383510 A Fall 2005 r p 216 Edit distance M Solution dynamic programming I 383510 A FaH 2005 r p 316 Edit distance M Solution dynamic programming What are the subproblems I 383510 A FaH 2005 r p 316 Edit distance M Solution dynamic programming What are the subproblems Example 2 dexp0nential polynomial I 383510 A FaH 2005 r p 316 Edit distance M Solution dynamic programming What are the subproblems Example 2 dexp0nential polynomial possible subproblem the smallest number of edits needed to transform some prefix of x say expo into some prefix of 3 say pol I 383510 A FaH 2005 r p 316 Edit distance M Solution dynamic programming What are the subproblems Example 2 dexp0nential polynomial possible subproblem the smallest number of edits needed to transform some prefix of x say expo into some prefix of 3 say pol How to transform expo to po I 383510 A FaH 2005 r p 316 Edit distance W Solution dynamic programming What are the subproblems Example 2 dexp0nential polynomial possible subproblem the smallest number of edits needed to transform some prefix of as say expo into some prefix of 3 say pol How to transform expo to po a left to right 6 right to left 6 middleout i 083510 A Fall 2005 p 316 Edit distance W INPUT two words as m and y y n 6 insert gm 6 substitute gt W I CS3510 A Fall 2005 p 416 Edit distance M INPUT two words as m and y y n OUTPUT Em n where E7Lj is the edit distance between prefixes x1t and y1 j 6 insert gm 6 substitute gt W I 083510 A Fall 2005 p 416 Edit distance M INPUT two words as m and y y n OUTPUT Em n where E7Lj is the edit distance between prefixes x1i and y1 j To figure out E7Lj check 33ft gm Ifxz y gm then we have one of the following operations 6 delete t nsertyb 7 6 substitute gt W I 1 083510 A Fall 2005 p 416 Edit distance M INPUT two words as m and y y n OUTPUT Em n where E7Lj is the edit distance between prefixes x1i and y1 j To figure out E7Lj check 33ft gm Ifxz y gm then we have one of the following operations 6 delete Etj 1j 1 insert yb Eltmgt Eltzyj 1 1 6 substitute gt gm Etj 1j 1 1 I 1 083510 A Fall 2005 p 416 Edit distance M INPUT two words as m and y y n OUTPUT Em n where E7Lj is the edit distance between prefixes x1i and y1 j To figure out E7Lj check xii yij Ifxiz y yij then we have one of the following operations 6 delete Etj 1j 1 insert yijiy Eltzyjgt Eltzyj 1 1 6 substitute gt gm Etj 1j 1 1 Otherwise xiii yiji we have Ez j Ez 1j 1 I CS3510 A Fall 2005 p 416 Edit distance M INPUT two words as m and y y n OUTPUT Em n where E7Lj is the edit distance between prefixes x1i and y1 j To figure out E7Lj check xii yij Ifxiz y yij then we have one of the following operations 6 delete Etj Ei 1j1 insertym Eij Eij 11 6 substitute gt gm Etj 1j 1 1 Otherwise xiii yiji we have Ez j Ez 1j 1 Base case E0j j and Ez 0 7 I CS3510 A Fall 2005 p 416 Edit distance M INPUT two words as m and y y n OUTPUT Em n where E7Lj is the edit distance between prefixes x1i and y1 j To figure out E7Lj check xii yij Ifxiz y yij then we have one of the following operations 6 delete Etj E7L 1j1 insert yijia EUJ Elti7j11 6 substitute gt gm Etj 1j 1 1 Otherwise xiii yiji we have Ez j Ez 1j 1 Base case E0j j and Ez 0 7 Let d7 f f7 j 1 if W W otherwith Algorithm for Edit distance 4 1 LA L rv V 39 EDITDISTANCE5Cy forz39 1 to m do Ez 0 z forj1t0ndo E0j j forz 1t0mdo forz 1t0ndo minEz 1j1E j 11Ei 1j 1diffij return Emn I 383510 A F5112005 r p 516 Algorithm for Edit distance n AA AL 39 w L 39 EDITDISTANCE5Cy forz39 1 to m do Ez 0 z forj1t0ndo E0j j forz 1t0mdo forz 1t0ndo Eltmgt return Em n The collection of subproblems forms a two dimensional table Ez j I 383510 A Fall 2005 r p 516 Algorithm for Edit distance n AA r7 AL L 39 EDITDISTANCE5Cy forz39 1 to m do Ez 0 z forj1t0ndo E0j j forz 1t0mdo forz 1t0ndo Eltmgt return Em n The collection of subproblems forms a two dimensional table Ez j Running time 0mn I 383510 A Fall 2005 r p 516 Edit distance solution 383510 A Pa 2005 r p 616 10 10 10 10 A 10 Edit distance solution 383510 A Pa 2005 r p 616 10 10 10 10 A 10 Edit distance solution 383510 A Pa 2005 r p 716 10 10 10 10 A 10 Edit distance solution 383510 A Pa 2005 r p 716 10 10 10 10 A 10 The underlying dag M Every dynamic programming algorithm has an underlying dag structure I 383510 A FaH 2005 r p 816 The underlying dag M Nodes positions z j Edges precedence constraints gt gt gt I 383510 A Pa 2005 r p 816 L The underlying dag A V 1 Nodes positions z j Edges precedence constraints i l P 0 EO XO P0 00 NO EO NO T0 10 A0 LO 000000000000 0000000000000 000000000000h 0000000000004 000000000000 j gt 1 N 000000000000 0000000000003gt 000000000000h gtvlti7j 1 gt i7jgt7lti 17j 1 gt I 383510 A Pa 2005 r p 816 The underlying dag 1 LA L r v V 39 Nodes positions z j Edges precedence constraints gt gt gt In fact we can reduce the Edit Distance Problem to the shortest paths in the dag with weights on edges How I 383510 A FaH 2005 r p 816 The underlying dag 1 LA L r v V 39 Nodes positions z j Edges precedence constraints gt gt gt In fact we can reduce the Edit Distance Problem to the shortest paths in the dag with weights on edges How I 383510 A FaH 2005 r p 816 The underlying dag 1 LA L r v V 39 Nodes positions z j Edges precedence constraints gt gt gt In fact we can reduce the Edit Distance Problem to the shortest paths in the dag with weights on edges How I 383510 A FaH 2005 r p 816 Knapsack M INPUT n items of weights 1111 wn and values 211212 7 a knapsack of fixed weight W I 383510 A Pa 2005 r p 916 Knapsack 1 LA L r v V 39 INPUT n items of weights M1 wn and values 211212 2 a knapsack of fixed weight W OUTPUT find the most valuable subset S of items that fit into the given knapsack ie S maximizes Z 21 subject to the 23965 restriction 2 mi 3 W 23965 I 383510 A Fall 2005 r p 916 Knapsack LA L 11 INPUT n items of weights M1 wn and values 211212 2 a knapsack of fixed weight W OUTPUT find the most valuable subset S of items that fit into the given knapsack ie S maximizes Z 21 subject to the 23965 restriction 2 mi 3 W 23965 The problem has more parameters than our previous examples longest increasing subsequence edit distance I 383510 A Fall 2005 r p 916 Example W 10 Knapsack M item weight value 1 6 30 2 3 14 3 4 16 4 2 9 383510 A F5112005 r p 1016 Knapsack M Example W 10 item weight value 1 6 30 2 3 14 3 4 16 4 2 9 Version I unlimited quantities of each item are available I 383510 A Fall 2005 r p 1016 Knapsack M Example W 10 item weight value 6 30 2 3 14 3 4 16 4 2 9 Version I unlimited quantities of each item are available optimal solution 1 x 30 2 x 9 48 383510 A Fall 2005 r p 1016 Knapsack Example W 10 item weight value 1 6 30 2 3 14 3 4 16 4 2 9 Version I unlimited quantities of each item are available optimal solution 1 x 30 2 x 9 48 Version only one of each item is available I 383510 A Fall 2005 r p 1016 Knapsack Example W 10 item weight value 1 6 30 2 3 14 3 4 16 4 2 9 Version I unlimited quantities of each item are available optimal solution 1 x 30 2 x 9 48 Version only one of each item is available optimal solution 30 16 46 I 383510 A Fall 2005 r p 1016 Knapsack Example W 10 item weight value 1 6 30 2 3 14 3 4 16 4 2 9 Version I unlimited quantities of each item are available optimal solution 1 x 30 2 x 9 48 Version only one of each item is available optimal solution 30 16 46 Neither version of this problem is likely to have a polynomial time algorithm I 383510 A Fall 2005 r p 1016 Knapsack with repetitions M Can a greedy method find an optimal solution for this problem I 383510 A Fa112005 r p 1116 Knapsack with repetitions Subproblems 6 smaller knapsack capacities w s W 9 feweritems12jj n I CS3510 A Fall 2005 p 1116 Knapsack with repetitions W Subproblems 3 smaller knapsack capacities w s W s fewer items 12j j g n Define Kw maximum value achievable with a knapsack of capacity w l CS3510 A Fall 2005 p 1116 Knapsack with repetitions M Subproblems 6 smaller knapsack capacities w s W s fewer items 12j j g n Define Kw maximum value achievable with a knapsack of capacity w How to express KW in terms of smaller subproblems I CS3510 A Fall 2005 p 1116 Knapsack with repetitions W Subproblems 6 smaller knapsack capacities w s W s fewer items 12j j g n Define Kw maximum value achievable with a knapsack of capacity w How to express KW in terms of smaller subproblems lfthe optimal solution to KW includes item 7 then removing it from knapsack should yield an optimal solution I CS3510 A Fall 2005 p 1116 Knapsack with repetitions W Subproblems 6 smaller knapsack capacities w s W s fewer items 12j j g n Define Kw maximum value achievable with a knapsack of capacity w How to express KW in terms of smaller subproblems lfthe optimal solution to KW includes item 7 then removing it from knapsack should yield an optimal solution Trying all possibilities for i we get Kw magi Kw w 02 I CS3510 A Fall 2005 p 1116 Knapsack with repetitions M KNAPSACKnw U W K0 0 form 1 to Wdo mix vi zzwiw return K W I 383510 A Pa 2005 r p 1216 Knapsack with repetitions M KNAPSACKnw U W K0 0 form 1 to Wdo mix vi return Kai Running time 0nW ie pseudopolynomial in n efficient only for small values of W I 383510 A Fall 2005 r p 1216 Knapsack with repetitions M KNAPSACKn w U W K0 0 form 1 to Wdo vi return KW Running time 0nW ie pseudopolynomial in n efficient only for small values of W Example weightw l1l2l3l4l5l6l7l8l9l1ol l 039 l0l0l9l14l l l l l l 383510 A Fall 2005 r p 1216 Knapsack with repetitions M Underlying dag vertices integers from 0 to W edges ww E E if w w wi for some i g n I 383510 A Pa 2005 r p 1316 Knapsack with repetitions 4 1 LA L rv V 39 Underlying dag vertices integers from 0 to W edges ww E E if w w wi for some i g 71 Optimal Solution the shortest path in the dag I 383510 A FaH 2005 r p 1316 Knapsack with repetitions 4 1 LA L rv V 39 Underlying dag vertices integers from 0 to W edges ww E E if w w wi for some i g 71 Optimal Solution the shortest path in the dag I 383510 A FaH 2005 r p 1316 01 Knapsack M The subproblems need to carry information about items used I 383510 A Pa 2005 r p 1416 01 Knapsack 4 1 LA L rv V 39 The subproblems need to carry information about items used j w maximal value achievable with knapsack of capacity 111 and items 1 2 j We need Km W I 383510 A Fall 2005 r p 1416 01 Knapsack AL L V n AA 39 w The subproblems need to carry information about items used j w maximal value achievable with knapsack of capacity 111 and items 1 2 j We need Km W j w in terms of smaller subproblems I 383510 A Fall 2005 r p 1416 01 Knapsack AL L it n AA 39 w The subproblems need to carry information about items used j w maximal value achievable with knapsack of capacity 111 and items 1 2 j We need Km W j w in terms of smaller subproblems 17w wjgt Uj7Kltj I 383510 A Fall 2005 r p 1416 01 Knapsack M The subproblems need to carry information about items used Kj111 maximal value achievable with knapsack of capacity 111 and items 1 2 j We need Kn W Kj111 in terms of smaller subproblems Kj111 maXKj 1111 111939 11jKj 1111 01 KNAPSACKn 111 11 W forj 0t0ndo Kj0 0 form 0to W do K0111 0 forj1t0ndo forw 1 to Wdo if111j gt 111 then Kj111 Kj 1111 else Kj111 maXKj 1111 111 11jKj 1111 return K01 W I 383510 AF5112005 r 1 1416 01 Knapsack 11 Example W 10 weightw 1 2 3 4 5 6 7 8 9 1 0 j 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 30 30 3O 30 30 2 0 0 0 1 4 1 4 1 4 30 30 3O 44 44 3 0 0 0 1 4 1 6 1 6 30 30 3O 44 46 4 0 0 9 1 4 1 6 23 30 30 39 44 46 I 383510 A Fa112005 r p 1516 Greedy algorithms1 1 Minimum spanning trees Suppose you are asked to network a collection of computers by linking selected pairs of them This translates into a graph problem in which nodes are computers undirected edges are potential links and the goal is to pick enough of these edges so that the nodes are connected But this is not all because each link also has a maintenance cost re ected in that edge s weight What is the cheapest possible network One immediate observation is that the optimal set of edges cannot contain a cycle because removing an edge from this cycle would reduce the cost without compromising connectivity 1 Removing a cycle edge cannot disconnect a graph So the solution must be connected and acyclic undirected graphs of this kind are called trees The particular tree we want is the one with minimum total weight known as the minimum spanning tree Here is its formal de nition Input An undirected graph G V7 E edge weights we Output A tree T V7 E which minimizes weightT 2105 56E In the example above the minimum spanning tree has a cost of 16 However this is not the only optimal solution Can you spot another 11 A greedy approach Kruskal s minimum spanning tree algorithm starts with the empty graph and then selects edges from E according to the following rule Repeatedly add the next lightest edge which doesn t produce a cycle 1Copyright 2004 S Dasgupta CH Papadimitriou and UV Vazirani Figure 11 Kruskal s algorithm In other words it constructs the tree edge by edge and apart from taking care to avoid cycles simply picks whichever edge is cheapest at the moment This is a greedy algorithm every decision it makes is the one with the most obvious immediate advantage Figure 11 shows an example We start with an empty graph and then attempt to add edges in increasing order of weight B70 C7D B7D 07F D7F E7FA7D A7B 07E A70 Some ties have been broken arbitrarily The rst two succeed but the third B 7 D would produce a cycle if added So we ignore it and move along The nal result is a tree with cost 14 the minimum possible At the outset of Kruskal s algorithm each node is in a connected component all by itself Then edges start being added and the connected components get bigger and fewer in number An edge e M11 is incorporated only if it doesn t produce a cycle that is only if my lie in different components The precise effect of adding the edge is to merge these two connected components and thereby to reduce the overall number of components by one Suppose there are n nodes Over the course of this incremental process the number of connected components decreases from n to one which means that n 7 1 edges must have been added along the way This consideration does not apply solely to the trees produced by Kruskal s algorithm For any tree imagine starting with an empty graph with nodes but no edges and then inserting the tree s edges one by one in any arbitrary order Our earlier argument still holds 2 A tree on n nodes has n 7 1 edges What happens when an edge e is removed from a tree Well we can pretend that e was the last edge to be added when the tree was being built recall that the ordering was irrelevant By rewinding the tree construction process one step we nd we must have two connected components Neither of them can contain cycles so they must be trees 3 Removing an edge from a tree breaks it into two smaller trees Many more properties of trees can be derived by reasoning in this way We now use sim ilar logic to establish a simple rule which justi es the correctness of a whole slew of greedy minimum spanning tree algorithms including Kruskal s 12 The cut property Suppose that in the process of building a minimum spanning tree henceforth abbreviated MST we have already chosen some edges X g E and are so far on the right track there is

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.