### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# AlgorithmsData Structure CS 561

UNM

GPA 3.76

### View Full Document

## 21

## 0

## Popular in Course

## Popular in ComputerScienence

This 109 page Class Notes was uploaded by Trent Dare on Wednesday September 23, 2015. The Class Notes belongs to CS 561 at University of New Mexico taught by Jared Saia in Fall. Since its upload, it has received 21 views. For similar materials see /class/212192/cs-561-university-of-new-mexico in ComputerScienence at University of New Mexico.

## Popular in ComputerScienence

## Reviews for AlgorithmsData Structure

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/23/15

39 Today s Outline CS 561 Lecture 7 Randomized Quicksort Sorting Lowerbound Jared Saia Bucket Sort University of New Mexico Dictionary ADT 39 R Partition 39 Randomized Quicksort Apr is the array to be partitioned pgt1 and r lt size of A Let A be the array A after the function is run Then A pr contains the same elements as Apr PRE A is the array to be sorted pgt1 and r is lt the size of A Further PUST Apr is in sorted order all elements in A pres1 are lt Ai A res Ai RQuicksort Apr and all elements in A res1r are gt Ai where i is if pltr a random number between p and r q RPartition Apr R Partition Apr RQuicksort Apq1 i Randompr exchange Ar and Ai RQuicksort Aq1r return PartitionApr Analysis 39 R Quicksort is a randomized algorithm The run time is a random variable We39d like to analyze the expected run time of R Quicksort To do this we first need to learn some basic probability theory 39 Plan of Attack If you get hold of the head of a snake the rest of it is mere rope Akan Proverb c We will analyze the total number of comparisons made by quicksort c We will let X be the total number of comparisons made by R Quicksort c We will write X as the sum of a bunch of indicator random variables 0 We will use linearity of expectation to compute the expected value of X 39 Notation Let A be the array to be sorted 0 Let zi be the i th smallest element in the array A 0 Let Zi j zi7 zi1 Zj 39 Indicator Random Variables 0 Let XM be 1 if zi is compared with zj and 0 otherwise 0 Note that X 2 2731 21 Xm 0 Further note that n71 n n71 n EX E Z Z Xm39 Z Z EXij 71 j7i1 11 139 L i j Questions 39 More Questions Q What is 1 SO What is EXij Peither z or zj are first elems in Zm chosen as pivots It is Pz is compared to 2 What is Pz is compared to 2 It is A Pz chosen as first elem in Z Pzj chosen as first elem in Z Further note that number of elems in Zm is j 7i 1 so Peither z or zj are the first elems in Zm chosen as pivots l P chosen as first elem in Z 7 Why Zz m J47 l If no element in Zm has been chosen yet no two elements in Zm have yet been compared and all of Zm is in same 1 Pz chosen as first elem in Z Ilst 7 7 J 7 l 1 If some element in Zm other than 2 or zj is chosen first 2 and zj will be split into separate lists and hence will never be compared Pz or zj are first elems in Zm chosen as pivots 7 7 9 I 39 ConCIUSIon 39 Putting it together n71 n EltZ Z Xm39 i1ji1 n71 n X Z EXij i1ji1 n71 n 2 i1 ji1j i 1 Pzi is compared to Zj n71 nii 2 j7i 1 7 i1 k1 k 1 H71 n 2 i1 k1 k H71 2 Olog n 3971 Z On log n 39 Questions Take Away 39 0 Q Why is 21 Ologn o A 2 X lk 10 o The expected number of comparisons for r quicksort is Onlog n 161 0 Competitive with mergesort and heapsort g 2lnn 1 11 o Randomized version is better than deterministic version 0 Where the last step follows by an integral bound on the sum p 1067 7 39 How Fast Can We Sort 39 Comparison Sorts 0 Definition An sorting algorithm is a comparison sort if the sorted order they determine is based only on comparisons between input elements 0 Heapsort mergesort quicksort bubblesort and insertion sort are all comparison sorts c We will show that any comparison sort must take Qn log n c Q What is a lowerbound on the runtime of any sorting al gorithm c We know that Qn is a trivial lowerbound 0 But all the algorithms we39ve seen so far are Onlogn or 0012 so is Qnlog n a lowerbound Comparisons 39 Assume we have an input sequence A a1a2an In a comparison sort we only perform tests of the form ai lt aj a lt aj 1i aj a gt aj or a gt aj to determine the relative order of all elements in A We39ll assume that all elements are distinct and so note that the only comparison we need to make is a g aj This comparison gives us a yes or no answer 39 Decision Tree Model 0 A decision tree is a full binary tree that gives the possible sequences of comparisons made for a particular input array A 0 Each internal node is labelled with the indices of the two elements to be compared 0 Each leaf node gives a permutation ofA 39 DeCISIon Tree Model 0 The execution of the sorting algorithm corresponds to a path from the root node to a leaf node in the tree 0 We take the left child of the node if the comparison is g and we take the right child if the comparison is gt o The internal nodes along this path give the comparisons made by the alg and the leaf node gives the output of the sorting algorithm 39 Leaf Nodes c Any correct sorting algorithm must be able to produce each possible permutation of the input 0 Thus there must be at least n leaf nodes 0 The length of the longest path from the root node to a leaf in this tree gives the worst case run time of the algorithm ie the height of the tree gives the worst case runtime Example In Class Exercise l 0 Consider the problem of sorting an array of size two A a17a2 0 Following is a decision tree for this problem 0 Give a decision tree for sorting an array of size three A 2 11171127113 0 What is the height What is the number of leaf nodes Height of DeCISIon Tree 39 Height of DeCISIon Tree 39 Q What is logn o A It is 39 39 39 39 I o QdWiat Is the height of a binary tree With at least n leaf 39016017 l I I I l Iogn Iogmi l I I I log 1 no es gt o A If h is the height we know that 2h 2 n 7 712 mm2 0 Taking log of both sides we get h 2 logn Z nQXIOgnT log 2 Qnogn 0 Thus any decision tree for sorting n elements will have a height of Qn log n 39 Take Away 39 Bucket Sort 0 We39ve just proven that any comparison based sorting algo Bucket sort assumes that the input is drawn from a uniform rithm takes Qnlog n time distribution over the range 01 0 This does not mean that all sorting algorithms take Qnlog n time Basic idea is to divide the interval 01 into n equal size regions or buckets We expect that a small number of elements in A will fall into 0 In fact there are non comparison based sorting algorithms each bucket which under certain circumstances are asymptotically faster To get the output we can sort the numbers in each bucket and just output the sorted buckets in order 39 Bucket Sort 39 Bucket Sort PRE A is the array to be sorted all elements in Ai are between 0 and 1 inclusive PUST returns a list which is the elements of A in sorted order BucketSortA new List Claim If the input numbers are distributed uniformly over 11 lengthA the range 01 then Bucket sort takes expected time On for i1iltni Let Tn be the run time of bucket sort on a list of size n Let m be the random variable givingthe number of elements in bucket Bi for i0iltn1i Then Tn em 271 001 sort list Bi with insertion sort insert Ai at end of list BfloornAi return the concatenated list BEOB1Bn1 39 AnalySIS 39 AnalySIS 0 We know Tn em 27 001 0 Taking expectation of both sides we have n71 ETn En Z 00112 We claim that 130112 2 7 111 i0 To prove this we define indicator random variables Xij l 2 900 Z EOni2 If A falls In bucket 7 and O otherWIse defined for all 1 i0 Ogignilandjlgjgn Thus ni 221 We can now compute E01 by expanding the square and regrouping terms H71 2 9n 2 003013 i0 o The second step follows by linearity of expectation o The last step holds since for any constant a and random variable X EaX aEX see Equation C21 in the text 39 AnalySIS 39 AnalySIS We can evaluate the two summations separately Xij is l n with probability ln and 0 otherwise E X192 Thus EXi2jlln0lilnln j1 Where k j the random variables Xij and Xik are indepen Elti X XX dent j1 161 m For any two independent random variables X and Y EXY n 2 EXEY see C3 in the book for a proof of this E Z Xij Z Z Xinik Thus we have that j1 l j nl k n j EltXXkgt EltXgtEltXkgt Z Eltijgt Z Z EltleXlkgtgt i Z 11 1jn1knkj man 1712 Analysis 39 AnalySIs 39 Substituting these two expected values back into our main 0 Recall that ETn 9n 23OEni2 equation we get 0 We can now plug in the equation E01 2 2 7 ln to get n 2 2 H71 Em 2 E09 Z Z EXinik Emn n Z 27 111 31 1gjgn1gkgnk7ampj i0 n G 6 Z Z Z lnQ 11 1Sjsn13ksnkj n nln nn7 lln2 1 71 1n 0 Thus the entire bucket sort algorithm runs in expected linear 2 7 ln time 39 Dictionary ADT 39 Dictionary ADT 0 Frequently we think of the items being stored in the dictio nary as keys 0 The keys typically have records associated with them which are carried around with the key but not used by the ADT implementation 0 Thus we can implement functions like Insertkr puts the item kr into the dictionary if the key k is not already there otherwise returns an error Deletek deletes the item with key k from the dictionary Lookupk returns the item kr if k is in the dictionary otherwise returns null A dictionary ADT implements the following operations 0 Insertlt puts the item X into the dictionary 0 Deletegtlt deletes the item X from the dictionary 0 IsInX returns true iff the item X is in the dictionary 39 Implementing Dictionaries o The simplest way to implement a dictionary ADT is with a linked list Let Z be a linked list data structure assume we have the following operations defined for l headl returns a pointer to the head of the list nextp given a pointer p into the list returns a pointer to the next element in the list if such exists null otherwise previousp given a pointer p into the list returns a pointer to the previous element in the list if such exists null otherwise keyp given a pointer into the list returns the key value of that item recordp given a pointer into the list returns the record value of that item 39 At Home ExerCIse Implement a dictionary with a linked list 0 Q1 Write the operation Lookupk which returns a pointer to the item with key k if it is in the dictionary or null otherwise Q2 Write the operation Insertkr Q3 Write the operation Deletek Q4 For a dictionary with n elements what is the runtime of all of these operations for the linked list data structure Q5 Describe how you would use this dictionary ADT to count the number of occurences of each word in an online book 39 Dictionaries This linked list implementation of dictionaries is very slow 0 Q Can we do better 0 A Yes with hash tables AVL trees etc 39 Hash Tables Hash Tables implement the Dictionary ADT namely 0 Insertx 01 expected time n worst case 0 Lookupx 01 expected time n worst case 0 Deletex 01 expected time n worst case Direct Address Functions Direct Addressing l l Suppose universe of keys is U 0 l m7 l where m is DA SearchTk return Tk not tOO large DAInsert Tx Tkeyx x 0 Assume no two elements have the same key DA DeleteTx Tkeyx NIL c We use an array TOm 7 l to store the keys 39 S39Ot k contains the elem With key k Each of these operations takes 01 time 39 Direct Addressmg Problem 0 If universe U is large storing the array T may be impractical 0 Also much space can be wasted in T if number of objects stored is small 0 Q Can we do better A Yes we can trade time for space CS 561 Lecture 18 Jared Saia University of New Mexico 39 Today s Outline 0 Dynamic Tables 39 Pseudocode TableInsertTx Oallocate T with 1 slotTsize1 if Tsize if Tnum Tsize allocate newTable with 2Tsize slots insert all items in Ttab1e into newTable Ttab1e newTable Tsize 2Tsize Ttab1eTnum x Tnum 39 Potential Method 0 Let39s now analyze Table Insert using the potential method 0 Let numi be the num value for the i th call to Table Insert 0 Let sizei be the size value for the i th call to Table Insert 0 Then let bi 2 numi 7 sizei 39 In Class Exercise 39 Table Delete Recall that 1139 ci pi 7 121 We39ve shown that a Table Insert has 01 amortized cost To implement Table Delete it is enough to remove or zero out the specified item from the table However it is also desirable to contract the table when the load factor gets too small Storage for old table can then be freed to the heap 0 Show that this potential function is O initially and always nonnegative 0 Compute ai for the case where Table Insert does not trigger an expansion 0 Compute ai for the case where Table Insert does trigger an expansion note that numiil numii l sizeiil numii l sizei 2 7 1 Desirable Properties Naive Strategy l I We want to preserve two properties 0 A natural strategy for expansion and contraction is to double table size when an item is inserted into a full table and naive o the load factor of the dynamic tame is iower bounded by the size when a deletion would cause the table to become some constant less than half full 0 the amortized cost of a table operation is bounded above by This Strategy guaramees that load faCtOI Of table never drops a constant below 12 Unfortunately this strategy can cause amortized cost of an operation to be large Assume we perform n operations where n is a power of 2 The first n2 operations are insertions At the end Of this Tnum Tsize 112 Now the remaining n2 operations are as follows IDDIIDDII where I represents an insertion and D represents a deletion Analysis 39 Note that the first insertion causes an expansion The two following deletions cause a contraction The next two insertions cause an expansion again etc etc The cost of each expansion and deletion is n and there are n of them Thus the total cost of n operations is 012 and so the amortized cost per operation is n The Solution 39 The Problem After an expansion we don39t perform enough deletions to pay for the contraction and vice versa The Solution We allow the load factor to drop below 12 In particular halve the table size when a deletion causes the table to be less than 14 full We can now create a potential function to show that Inser tion and Deletion are fast in an amortized sense 39 Recall Load Factor For a nonempty table T we define the load factor of T aT to be the number of items stored in the table divided by the size number of slots of the table We assign an empty table one with no items size 0 and load factor of 1 Note that the load factor of any table is always between 0 and 1 Further if we can say that the load factor of a table is always at least some constant c then the unused space in the table is never more than 1 7 c l The Potential 39 IntUItion 0 Note that when a 12 the potential is 0 DO 2 Tnumi Tsize if aT 2 12 0 When the load factor is l Tsize Tnum T Tnum Ts7lze2 7 Tnum if aT lt 12 so the potential can pay for an expansion 0 When the load factor is 14 Tsize 4Tnum which means 0 Note that this potential is legal since lt1gtO O and you can lt1gtT Tnum so the potential can pay for a contraction if prove that i 2 O for all i an item is deleted Analysis Table Insert l l 0 Let39s now role up our sleeves and show that the amortized o If cal1 2 12 analysis is identical to the analysis done in the costs of insertions and deletions are small In Class Exercise amortized cost per operation is 3 o We39ll do this by case analysis 0 If Gal1 lt 12 the table will not expand as a result of the 0 Let numi be the number of items in the table after the i th operation operation size be the size of the table after the i th opera 0 There are two subcases when cal1 lt 12 1 a lt 12 2 tion and a denote the load factor after the i th operation ozi 2 12 o In this case we have in Ci i7 i71 1 l sizei2 7 7 sizei712 7 numiil 2 l sizei2 7 7 sizei2 7 7 1 3 0 lt4 Ci 49139 i 49121 5 l 2 numi 7 sizei 7 sizei712 7 numZil 6 l 2 numiil l 7 sizei717sizei12 7 3 3 numiil 7 Es izeiil 3 8 3 3 aiil 57226121 7 Esizeiil 3 9 3 3 E 57226121 7 7sizei1 3 2 3 Take Away 39 So we39ve just show that in all cases the amortized cost of an insertion is 3 We did this by case analysis What remains to be shown is that the amortized cost of deletion is small We39ll also do this by case analysis 39 o For deletions numi mum1 Deletions c We will look at two main cases 1 cal1 lt 12 and 2 all1 2 12 c For the case where 041 lt 12 there are two subcases la the i th operation does not cause a contraction and lb the i th operation does cause a contraction 39 Case 1a 0 In this case all1 lt 12 and the i th operation causes a o If all1 lt 12 and the i th operation does not cause a con contraction traction we know sizei sizeFl and we have 0 We know that ci 2 numi l W 2 Ci 117 171 12 o and sizeiQ sizei714 numiil numi 1 Thus 1 sizeiQ 7 7 sizeiilQ 7 numiil l3 ai 2 Ci pi 171 l sizeiQ 7 7 sizeiQ 7 114 1 SizeiQ numi Sizei712 numi71 2 15 numi 1 numi 1 7mm 7 lt2numi 2 7 mm lt 1 39 Ta ke Away 0 Since we39ve shown that the amortized cost of every operation In thls case aiil 2 12 is at most a constant we39ve shown that an se uence of o Proving that the amortized cost is constant for this case is I I Iy q n operations on a Dynamic table take On time left as an exercise to the diligent student I 0 Note that In our scheme the load factor never drops below 0 Hintl Q In this case is it possible for the i th operation to be a contraction If so when can this occur Hint2 Try a 14 case analysis on a o This means that we also never have more than 34 of the 139 table that is just empty space 39 DISJOInt Sets 0 A disjoint set data structure maintains a collection 81 SQ Sk of disjoint dynamic sets 0 Each set is identified by a representative which is a member of that set 0 Let39s call the members of the sets objects 39 Operations We want to support the following operations 0 Make Setx creates a new set whose only member and representative is as o Uniongtlty unites the sets that contain as and y call them Sm and 3 into a new set that is SmUSy The new set is added to the data structure while Sm and 3 are deleted The representative of the new set is any member of the set 0 Find Setac Returns a pointer to the representative of the unique set containing as Analysis 39 We will analyze this data structure in terms of two parame ters l n the number of Make Set operations 2 m the total number of Make Set Union and Find Set operations Since the sets are always disjoint each Union operation re duces the number of sets by 1 So after 11 1 Union operations only one set remains Thus the number of Union operations is at most 11 l 39 AnalySIs 0 Note also that since the Make Set operations are included in the total number of operations we know that m 2 n c We will in general assume that the Make Set operations are the first n performed 39 Application Consider a simplified version of Myspace Every person is an object and every set represents a social clique Whenever a person in the set 31 forges a link to a person in the set SQ then we want to create a new larger social clique 31 U32 and delete 31 and S We might also want to find a representative of each set to make it easy to search through the set For obvious reasons we want these operation of Union Make Set and Find Set to be as fast as possible 39 Today s Outline The path that can be trodden is not the enduring and unchang ing Path The name that can be named is not the enduring and unchanging Name Tao Te Ching CS 561 Lecture 23 Jared Saia Sin le Source Shortest Paths University of New Mexico g o Dijkstra39s Algorithm 0 Bellman Ford Algorithm 39 Shortest Paths Problem 39 Negative Weights We39ll actually allow negative weights on edges The presence of a negative cycle might mean that there is Another interesting problem for graphs is that of finding shortest paths Assume we are given a weighted directed graph G V E with two special vertices a source s and a target t We want to find the shortest directed path from s to t In other words we want to find the path p starting at s and ending at t minimizing the function Mp Z we 56p no shortest path A shortest path from s to t exists if and only if there is at least one path from s to t but no path from s to t that touches a negative cycle In the following example there is no shortest path from s to t Single Source Shortest Paths SSSP Algorithms l Singles Source Shortest Paths SSSP is a more general prob lem 39 SSSP is the following problem find the shortest path from thelsmce vertex 5 Mary other vertex in the graph The problem is solved by finding a shortest path tree rooted at the vertex 5 that contains all the desired shortest paths A shortest path tree is not a MST We39ll now go over some algorithms for SSSP on directed graphs These algorithms will work for undirected graphs with slight modification In particular we must specifically prohibit alternating back and forth across the same undirected negative weight edge Like for graph traversal all the SSSP algorithms will be spe cial cases of a single generic algorithm SSSP Algorithms 39 39 Each vertex v in the graph will store two values which describe a tentative shortest path from s to 1 Initially we set 0 distv is the length of the tentative shortest path between s and v o dists O preds NULL o predv is the predecessor of v in this tentative shortest path 0 For every vertex 1 7b 5 distv 00 and predv NULL 0 The predecessor pointers automatically define a tentative shortest path tree 39 Relaxation We call an edge 744 tense if distu wuv lt distv If uv is tense then the tentative shortest path from s to v is incorrect since the path 5 to u and then 744 is shorter Our generic algorithm repeatedly finds a tense edge in the graph and relaxes it If there are no tense edges our algorithm is finished and we have our desired shortest path tree Relaxuv distv distu wuv predv u 39 Correctness 39 Claim 1 o If distv 7b 00 then distv is the total weight of the prede o The correctness of the relaxation algorithm follows directly cessor chain ending at vi from three simple claims 0 The run time of the algorithm will depend on the way that S A 39 39 39 Hpredwre v Hpre v A U39 we make choices about which edges to relax o This is easy to prove by induction on the number of edges in the path from s to u left as an exercise o If the algorithm halts then distv g ws v for any path SM 1 o This is easy to prove by induction on the number of edges in the path 5 u left as an exercise 0 The algorithm halts if and only if there is no negative cycle reachable from s o The only if39 direction is easy ifthere is a reachable negative cycle then after the first edge in the cycle is relaxed the cycle always has at least one tense edge 0 The if39 direction follows from the fact that every relagtltation step reduces either the number of vertices with distv 00 by l or reduces the sum of the finite shortest path lengths by some positive amount 39 Generic SSSP We haven39t yet said how to detect which edges can be relaxed or what order to relax them in The following Generic SSSP algorithm answers these ques tions We will maintain a bag of vertices initially containing just the source vertex 5 Whenever we take a vertex u out of the bag we scan all of its outgoing edges looking for something to relax Whenever we successfully relagtlt an edge 744 we put 1 in the bag 39 InitSSSP InitSSSP 5H dists O preds NULL for all vertices v s distv infinity predv NULL 39 GenericSSSP 39 Generic SSSP GenericSSSPs InitSSSPs put 5 in the bag while the bag is not empty Just as with graph traversal using different data structures take u from the bag for the bag gives us different algorithms for all edges uv Some obvious choices are a stack a queue and a heap if uv is tense Unfortunately if we use a stack we need to perform 92lEl Relaxuv relagtltation steps in the worst case an exercise for the diligent put v in the bag student The other possibilities are more efficient Diskstra 5 Algorithm 39 Duktra 5 Algorithm 39 If we implement the bag as a heap where the key of a vertex v is distv we obtain Dijkstra39s algorithm Dijkstra39s algorithm does particularly well if the graph has no negative weight edges In this case it39s not hard to show by induction of course that the vertices are scanned in increasing order of their shortest path distance from s It follows that each vertex is scanned at most once and thus that each edge is relaxed at most once Since the key of each vertex in the heap is its tentative dis tance from s the algorithm performs a DecreaseKey opera tion every time an edge is relaxed Thus the algorithm performs at most lEl DecreaseKey39s Similarly there are at most lVl Insert and EgtlttractMin oper ations Thus if we store the vertices in a Fibonacci heap the total running time of Dijkstra39s algorithm is OlEl lVl log lVl Negative Edges 39 o This analysis assumes that no edge has negative weight 0 The algorithm given here is still correct if there are negative weight edges but the worst case run time could be exponen tiaI o The algorithm in our text book gives incorrect results for graphs with negative edges which they make clear Example Four phages of algorimm run on a grth With no neBativg edges At each phase the shaded vertices are in the heap and the bold vertex has 39ust been scanned The bold edges describe the evolving shortest path tree 39 Bellman Ford o If we replace the bag in the GenericSSSP with a queue we get the Bellman Ford algorithm 0 Bellman Ford is efficient even if there are negative edges and it can be used to quickly detect the presence of negative cycles 0 If there are no negative edges however Dijkstra39s algorithm is faster than Bellman Ford 39 AnalySIs The easiest way to analyze this algorithm is to break the execution into phases Before we begin the alg we insert a token into the queue Whenever we take the token out of the queue we begin a new phase byjust reinserting the token into the queue The O th phase consists entirely of scanning the source vertex 8 The algorithm ends when the queue contains only the token Invariant 39 o A simple inductive argument left as an exercise shows the following invariant 0 At the end of the i th phase for each vertex v distv is less than or equal to the length of the shortest path 5 v 1 consisting ofi or fewer edges IExample Four phases of Bellman Ford39s algorithm run on a directed graph with negative edges Nodes are taken from the queue in the order sltgtabcltgtdfbltgtaedltgtdaltgtltgtwhereltgtisthetollten Shaded vertices are in the queue at the end of each phase The bold edges describe the evolving shortest path tree e5 l 39 Analysis Since a shortest path can only pass through each vertex once either the algorithm halts before the iVi th phase or the graph contains a negative cycle In each phase we scan each vertex at most once and so we relagtlt each edge at most once Hence the run time of a single phase is OiEi Thus the overall run time of Bellman Ford is OiViiEi 39 Book Bellman Ford Now that we understand how the phases of Bellman Ford work we can simplify the algorithm Instead of using a queue to perform a partial BFS in each phase we will just scan through the adjacency list directly and try to relax every edge in the graph This will be much closer to how the textbook presents Bellman Ford The run time will still be OiViiEi To show correctness we39ll have to show that are earlier in variant holds which can be proved by induction on i 39 Book Bellman Ford 39 Take Away BookBFS InitSSSPs refpeat lvl 95 H Dijkstra39s algorithm and Bellman Ford are both variants of Tfezeryf ge V m the GenericSSSP algorithm for solving SSSP 1 llquot 15 tense Dijkstra39s algorithm uses a Fibonacci heap for the bag while Relaxuv Bellman Ford uses a queue Dijkstra39s algorithm runs in time OlEl lVl log lVl if there are no negative edges Bellman Ford runs in time OlVllEl and can handle negative for every edge uv in E edges and detect negative cycles if uv is tense return Negative Cycle All Pairs Shortest Paths 39 Example 39 o For the single source shortest paths problem we wanted to find the shortest path from a source vertex 5 to all the other vertices in the graph We will now generalize this problem further to that offinding o For any vertex v we have distvv O and predvv the shortest path from every possible source to every possible NULL destination 0 If the shortest path from u to v is only one edge long then In particular for every pair of vertices u and v we need to distuv wu AU and preduv u compute the following information o If there39s no shortest path from u to u then distuv 00 distuv is the length of the shortest path if any from and preduv NULL u to v predu v is the second to last vertegtlt if any on the short est path if any from u to v 39 Lots of Single Sources The output of our shortest path algorithm will be a pair of lVl gtlt lVl arrays encoding all lVl2 distances and predecessors Many maps contain such a distance matric to find the distance from say Albuquerque to say Ruidoso you look in the row labeled Albuquerque and the column labeled Ruidoso We39ll focus only on computing the distance array The predecessor array from which you would compute the actual shortest paths can be computed with only minor ad ditions to the algorithms presented here 0 Most obvious solution to APSP is tojust run SSSP algorithm lVl timnes once for every possible source vertex 0 Specifically to fill in the subarray dists we invoke either Dijkstra39s or Bellman Ford starting at the source vertex 5 o We39ll call this algorithm ObviousAPSP 39 ObVIousAPSP 39 Analysis The running time of this algorithm depends on which SSSP UbviousAPsPVEw algorithm we use for every vertex s o If we use Bellman Ford the overall running time is OlVl2lEl dists SSSPVEws 0lVl4 o If all the edge weights are positive we can use Dijkstra39s in stead which decreases the run time to 6lVllEllVl2 log lVl OltlVl3 Problem 39 We39d like to have an algorithm which takes OlVl3 but which can also handle negative edge weights 0 We39ll see that a dynamic programming algorithm the Floyd Warshall algorithm will achieve this 0 Note the book discusses another algorithm Johnson39s al gorithm which is asymptotically better than Floyd Warshall on sparse graphs However we will not be discussing this algorithm in class 39 Today s Outline CS 561 LeCture 5 o Annihilator Wrap up 0 Loop Invariants Jared Saia o Binary Heaps University of New Mexico 39 Limitations 39 Transformations Idea Our method does not work on Tm Tltnilgtl or T01 2 Consider the recurrence givmg the run time of mergesort T017 1 Ign Tn 2Tn2 kn for some constant k Tl l 39 7 The problem is that and lgn do not have annihilators 55Wth we SSW tlhls39 f hll t t Ilk T 2 Ourtooly as it stands is limited e ave no ec nique or anni Ia Ing erms e n I However we can transform the recurrence Into one With Key Idea for strengthening It Is transformations I which we can work 39 Transformatlon 39 Now Solve Let n 2139 and rewrite Tniv T20 1 and T2i 2 2T mi 2 2T2i 1 k2l Now define a new sequence t as follows ti T2i Then tO 1 ti 2m 7 1 mi We39ve got a new recurrence t0 l ti 2ti7 l k2i We can easily find the annihilator for this recurrence L72 annihilates the homogeneous part L72 annihilates the non homogeneous part So L72L72 annihilates ti Thus ti 2 51 52 Reverse Transformation 39 We39ve got a solution for ti and we want to transform this into a solution for Tn 0 Recall that ti T2i and 2i 2n to c1i c2 Tlt2igt c1i c2 T01 c1lgnc2n cln lg n 0211 Ong n Success 39 Let39s recap what just happened We could not find the annihilator of Tn so We did a transformation to a recurrence we could solve ti we let n 2 2i and ti T2i We found the annihilator for ti and solved the recurrence for ti We reverse transformed the solution for ti back to a solu tion for Tn Another Example Now Solve l I Consider the recurrence Tn 2 9T kn where Tl l and k is some constant Let n 3139 and rewrite Tn T30 1 and T3i 9T3i 1 k3i Now define a sequence t as follows ti T3i Then tO 1 ti 9m 7 1 k3i o tO 1 ti 9m 7 1 k3i o This is annihilated by L7 9L7 3 0 So ti is of the form ti 519139 523139 39 Reverse Transformation 39 In Class ExerCIse Consider the recurrence Tn 2Tn4 kn where Tl l 4 i 139 t9 519 523 and k is some constant 0 Recall ti T3i and 3in Ki C19i623i 0 Q1 What is the transformed recurrence tz How do we T3l 019i 023i rewrite n and Tn to get this sequence Tn 5131392 523139 0 Q2 What is the annihilator of ti What is the solution for CW2 621 the recurrence ti Oltn2 0 Q3 What is the solution for Tn ie do the reverse transformation l A Final Example 39 A Final Example NOt always ObViOUS What sort Of tranSformation to do 0 This final recurrence is something we know how to solve o ti Oilog 1 Consider Tn 2T logn o The reverse transform gives Let n 2139 and rewrite Tn I I I T2i 2T2i2 14 t I O logz Define ti T2i HT 2 Oilogi 2ti2 7 Tn OOgn log log n 39 Correctness of Algorithms 39 Loop Invariants I I I I A useful tool for proving correctness is loop invariants Three The most Important aspect of algorithms is their correctness things must be shown about a loop invariant An algorithm by definition always gives the right answer to the problem A procedure which doesnt always give the right answer is a o Initialization Invariant is true before first iteration of loop heuristic 0 Maintenance If invariant is true before iteration i it is also All things being equal we prefer an algorithm to a heuristic true before Iteratlon 1 1 for any I1 I I I HOW do we prove an algorithm is really correct 0 Termination When the loop terminates the invariant gives a property which can be used to show the algorithm is correct 39 Example Loop Invariant We39ll prove the correctness of a simple algorithm which solves the following interview question Find the middle ofa linked list while only going through the list once The basic idea is to keep two pointers into the list one of the pointers moves twice as fast as the other Call the head of the list the O th elem and the tail of the list the n7 l st element assume that n7 l is an even number 39 Example Algorithm GetMiddle List 1 pSlow pFast 1 while pFastgtnextampamppFastgtnextgtnext pFast pFastgtnextgtnext pSlow pSlowgtnext return pSlow 39 Example Loop Invariant Invariant At the start of the i th iteration of the while loop pSlow points to the i th element in the list and pFast points to the 2i th element Initialization True when i 0 since both pointers are at the head Maintenance if pSlow pFast are at positions 1 and 21 re spectively before i th iteration they will be at positions i l 2i 1 respectively before the i l st iteration Termination When the loop terminates pFast is at ele ment n7 1 Then by the loop invariant pSlow is at element n7 l2 Thus pSlow points to the middle of the list 39 ChaHenge c Figure out how to use a similar idea to determine if there is a loop in a linked list without marking nodes What is a Heap o A heap data structure is an array that can be viewed as a nearly complete binary tree 0 Each element of the array corresponds to a value stored at some node of the tree 0 The tree is completely filled at all levels except for possibly the last which is filled from left to right hea p size A l An array A that represents a heap has two attributes length A which is the number of elements in the array heap size A which is the number of elems in the heap stored within the array 0 Le only the elements in Alheap size A are elements of the heap 39 Tree Structure Al is the root of the tree For all i l lt i lt heap size A Parent i liZi Left i 21 Right i 2 21 1 If Left i gt heap size A there is no left child ofi If Right i gt heap size A there is no right child ofi If Parent i lt 0 there is no parent ofi Example 9 o o 63ob 39 Max Heap Property Max Heap Property l l o For every node 1 other than the root AParent i Z Ai 0 Parent is always at least as large as its children 0 For every node 1 other than the root AParent i 2 Ai 0 Largest element is at the I OOt A Min heap is organized the opposite way Height of Heap Maintaining Heaps l l c Q How to maintain the heap property 0 A Max Heapify is given an array and an index i Assumes that the binary trees rooted at Lefti and Rightltigt are magtlt heaps but Ai may be smaller than its children 0 Max Heapify ensures that after its call the subtree rooted at i is a Max Heap 0 Height of a node in a heap is the number of edges in the longest simple downward path from the node to a leaf 0 Height of a heap ofn elements is 9log n Why 39 MaX Heapify 0 Main idea of the Max Heapify algorithm is that it percolates down the element that start at A39l to the point where the subtree rooted at i is a magtlt heap c To do this it repeatedly swaps Ai with its largest child until Ai is bigger than both its children 0 For simplicity the algorithm is described recursively 39 MaX Heapify Max Heapify Ai l Left39l r Right39l largest 7L if l g heap sizeA and Al gt Ai then largest l if r S heap sizeA and Ar gt Alargest then largest r if largest 7 then a egtltchange Ai and Alargest b Max Heapify Alargest 39 Example 0 x 0 0 39 Analysis Let Th be the runtime of magtlt heapify on a subtree of height h Then Tl 91 Th Th7 1 1 Solution to this recurrence is Th 901 Thus if we let Tn be the runtime of magtlt heapify on a sub tree of size n Tn Olog n since logn is the maximum height of heap of size n 39 Build MaX Heap 39 Build MaX Heap Build MaX Heap A c Q How can we convert an arbitrary array into a max heap o A Use Max Heapify in a bottom up manner 0 Note The elements Ann2i 1An are all leaf nodes of the tree so each is a 1 element heap to begin with l heap size A length A 2 for llengthQDZj 7 gt Oi 7 7 a do Max Heapify Ai Example 39 Loop Invariant A42167911538 39 Loop Invariant At the start of each iteration of the for loop each node i Li 2 n is the root of a max heap 39 Correctness 39 Maintenance 0 Maintenance First note that if the nodes i 1 n are the o Initialization i LnZl prior to first iteration But each roots of magtlt heaps before the call to Max Heapify Ai then node LnZl l LnZl 2 n is a leaf so is the root of a they will be the roots of magtlt heaps after the call Further trivial magtlt heap note that the children of node i are numbered higher than i 0 Termination At termination i 0 so each node ln and thus by the loop invariant are both roots of max heaps is the root of a magtlt heap In particular node 1 is the root Thus after the call to Max Heapify Al the node i is the of a max heap root of a magtlt heap Hence when we decrement i in the for loop the loop invariant is established Time Analysis Time Analysis I Better Analysis Note that Naive Analysis 0 An n element heap has height no more than logn c There are at most 11 nodes of any height h to see this consider the min number of nodes in a heap of height h 0 Time required by Max Heapify when called on a node of height h is Oh 0 Thus total time is zfggg ow o Max Heapify takes Olog n time per call 0 There are On calls to Max Heapify 0 Thus the running time is Onlogn 39 AnalySIs 5 co 7 M8 le i ll 0 Eli Vv ii 0 39 AnalySIs The last step follows since for all lacl lt 1 711 2 7 i0 1 95 Can get this equality by recalling that for all M lt 1 i200 17957 and taking the derivative of both sides 39 Hea p Sort Hea p Sort A l Build MaX Heap A 2 for ilength Ai gt ln i 7 a do exchange Al and Ai b heap size A heap size A l c Max Heapify Al 39 AnalySIs o Build MaX Heap takes On and each of the On calls to Max Heapify take Olog n so Heap Sort takes Onlog n o Correctness CS 561 Lecture 8 Jared Saia University of New Mexico 39 Today s Outline 0 Hash Tables 0 Trees Direct Addressing Problem 39 If universe U is large storing the array T may be impractical Also much space can be wasted in T if number of objects stored is small Q Can we do better A Yes we can trade time for space 39 Hash Tables Key Idea An element with key k is stored in slot hk where h is a hash function mapping U into the set 0 m7 1 Main problem Two keys can now hash to the same slot Q How do we resolve this problem Al Try to prevent it by hashing keys to random slots and making the table large enough A2 Chaining A3 Open Addressing 39 Chained Hash 39 Analysis In chaining all elements that hash to the same slot are put in a linked st 0 CH Insert and CH Delete take 01 time if the list is doubly linked and there are no duplicate keys 0 Q How long does CH Search take 0 A It depends In particular depends on the load factor a nm ie average number of elems in a list CHInsert TxInsert X at the head of list Thkeyx CHSearchTksearch for elem with key k in list Thk CHDeleteTxdelete X from the list Thkeyx CH Search Analysis CH Search Analysis I l Worst case analysis everyone hashes to one slot so n For average case make the simple uniform hashing assump tion any given elem is equally likely to hash into any of the m slots indep of the other elems Let n be a random variable giving the length of the list at the i th slot Then time to do a search for key k is l nhk c Q What is Enhk o A We know that hk is uniformly distributed among 0 m7 1 0 Thus Enhk nm a 39 Hash Functions 0 Want each key to be equally likely to hash to any of the m slots independently of the other keys 0 Key idea is to use the hash function to break up any pat terns that might egtltist in the data 0 We will always assume a key is a natural number can eg easily convert strings to naturaly numbers 39 Division Method 0 k mod m 0 Want m to be a prime number which is not too close to a power of 2 0 Why 39 Multiplication Method hk 2 lm kA mod 1l kA mod 1 means the fractional part of kA Advantage value of m is not critical need not be a prime A 7 l2 works well in practice 39 Open Addressmg All elements are stored in the hash table there are no sepa rate linked lists When we do a search we probe the hash table until we find an empty slot Sequence of probes depends on the key Thus hash function maps from a key to a probe sequence ie a permutation of the numbers 07m7 l 39 Open Addressing 39 Open Addressing o If we use open addressing the hash table can never fill up ie the load factor a can never exceed 1 An advanta e of 0 en addressin is that it avoids ointers o In general for open addressing the hash function depends g p I I p and the overhead of storing lists In each slot of the table on both the key to be Inserted and the probe number I I o This freed up memory can be used to create more slots In 0 Thus for a key k we get the probe sequence I I the table which can reduce the load factor and potentially hltkohk1gthltkme 1 speed up retrieval tIme o A disadvantage is that deletion is difficult If deletions occur in the hash table chaining is usually used 39 OA Insert 39 OA Search UAInsertTk 0 repeat j hki if Tj nil Tj k return j UAInsertTk i 0 repeat j hki if Tj k return j else i until Tjnil or im else i until im 39 OA Delete 39 OA Delete One solution is to mark the slot by storing in it the value c Deletion from an open address hash table is difficult DELETED c When we delete a key from slot 1 we can39t just mark that Then we modify OA Insert to treat such a slot as if it were slot as empty by storing nil there empty so that something can be stored in it o The problem is that this would make it impossible to find OA Search passes over these special slots while searching some keyk during whose insertion we probed sloti and found Note that if we use this trick search times are no longer it occupied dependent on the load factor a for this reason chaining is more commonly used when keys must be deleted Implementation Implementations l I All positions are taken modulo m and i ranges from 1 to m7 l c To analyze open address hashing we make the assumption of uniform hashing we assume that each key is equally likely 0 Linear Probing Initial probe is to position hk successive to have any of the m permutations of 0 l m7 l as its probes are to positions hk 1 probe sequence 0 Quadratic Probing Initial probes is to position hk succes 0 True uniform hashing is difficult to implement so in practice sive probes are to position hk 511 52 we generally use one of three approgtltimations on the next slide 0 Double Hashing Initial probe is to position hk successive probes are to positions hk ih2k Analysis Hash Tables Wrapup l l Recall that the load factor a is the number of elements stored in the hash table n divided by the total number of slots m In open address hashing we have at most one element per HaSh Tables implement the DiCtiO aI y ADT namelyi slot so a lt 1 We assume uniform hashing ie each probe maps to essen InsertOQ 01 expected time 901 worst case tially a random SlOt in the table 0 Lookupx 01 expected time n worst case We can show that the expected time for insertions is at most 0 Deletex 01 expected time n worst case ll 7 a the expected time for an unsuccessful search is ll 7 a and the expected time for a successful search is laln1lea Binary Search Trees Red Black Trees l Red Black trees a kind of binary tree also implement the Dic tionary ADT namely 0 Binary Search Trees are another data structure for imple menting the dictionary ADT o Insertx Olog n time o Lookupx Olog n time o Deletex Olog n time 39 Why BST Q When would you use a Search Tree Al When need a hard guarantee on the worst case run times eg mission critical code A2 When want something more dynamic than a hash table eg don39t want to have to enlarge a hash table when the load factor gets too large A3 Search trees can implement some other important op erations 39 Search Tree Operations Insert Lookup Delete MinimumMaximum PredecessorSuccessor 39 What is a BST 0 It39s a binary tree 0 Each node holds a key and record field and a pointer to left and right children 0 Binary Search Tree Property is maintained 39 Binary Search Tree Property 0 Let as be a node in a binary search tree Ify is a node in the left subtree of ac then keyy keygtlt If y is a node in the right subtree of as then keygtltgllteyy Example BST 39 Inorder Walk 0 BSTs are arranged in such a way that we can print out the elements in sorted order in n time o Inorder Tree Walk does this 39 Inorder Tree Walk InorderTWX if X is not nil InorderTin left X print keyx InorderTin right 30 Exa mple Tree Wa k 39 Analysis Search in BT l l TreeSearchxk if xnil or k keyx return 3 o Correctness if kltkeyX Run time return TreeSearch1eftxk else return TreeSearchright X k AnalySIs 39 In Class ExerCIse 39 What is the loop invariant for Tree Search What is Initialization Maintenance Termination 0 Let h be the height of the tree 0 The run time is Oh o Correctness 39 Today s Outline Listen and Understand That terminator is out there It can t be bargained with it can t be reasoned with It doesn t feel pity remorse or fear And it absolutely will not stop ever until you CS 561 Lecture 3 are dead The Terminator Guess and Check with Inequalities Solving Recurrences using Recursion Trees Solving Recurrences using the Masters Method Solving Recurrences using Annihilators Jared Saia University of New Mexico 39 Recurrences and Inequalities 39 Inequalities II Goal Prove by induction that for fn fn7 l fn7 2 f1 f2 1i fn S 2 1 2 o Often easier to prove that a recurrence is no more than some 39 Base Glase f1 Il 5 2 i f9 1 5 1 quantity than to prove that it equals something 39 IndUCtlve hypothes39S For a J lt 7 f0 5 2 0 Consider 1 01 2 fn7 1 160172 f1 f2 1 Induct39ve Step Guess that fn 2 n mi 1 fn72 27171 2717 2 2 1 277 39 RecurSIon tree method 39 RecurSIon tree method 0 Each node represents the cost of a single subproblem in a 0 Can use to get a 900d guess WhiCh is then refined and Verified recursive call using substitution method 0 First we sum the costs of the nodes in each level of the tree 0 Best method usually for recurrences where a term like 0 Then we sum the costs of all of the levels Tnc appears on the right hand side of the equality 39 Example 1 39 Example 1 0 Consider the recurrence for the running time of Mergesort Tn 2Tn2 n Tl 01 We can see that each level of the tree sums to n Further the depth of the tree is logn nZd 1 implies that d log Thus there are logn 1 levels each of which sums to n Hence Tn n log n n8 73 2 7i 7 7i 39 Example 2 39 Example 2 0 Let39s solve the recurrence Tn 3Tn4 n2 0 Note For simplicity from now on we39ll assume that Ti 91 for all small constants i This will save us from writing the base cases each time c We can see that the i th level of the tree sums to 316in2 0 Further the depth of the tree is log4n n4d 1 implies that n 2 n Az d Iog4n An4A2 mom 0 SO we can see that Tn Zggn3l6in2 mm mew mew mew rt16W nsw rt16W WW2 WK WWW 7i l l l l i l A 39 Solution 39 Master Theorem Iog4n 0 Divide and conquer algorithms often give us running time Z 316Zn2 recurrences of the form i0 Tn aTnb 9 0 Where a and b are constants and fn is some other function 0 n2 2316V i0 4n 0 The so called Master Method gives us a general method 1 g316 for solving such recurrences when fn is a simple polynomial On 39 Master Theorem 0 Unfortunately the Master Theorem doesn39t work for all func tions fn 0 Further many useful recurrences don39t look like Tn 0 However the theorem allows for very fast solution of recur rences when it applies 39 Master Theorem 0 Master Theorem isjust a special case of the use of recursion trees 0 Consider equation Tn aTnb fn c We start by drawing a recursion tree The Recursion Tree 39 The root contains the value fn It has or children each of which contains the value fnb Each of these nodes has or children containing the value row I I In general level 1 contains aZ nodes with values fnbl Hence the sum of the nodes at the i th level is aifnbi 39 Details The tree stops when we get to the base case for the recur rence o We39ll assume Tl f1 91 is the base case 0 Thus the depth of the tree is logbn and there are logbn 1 levels 39 RecurSIon Tree 0 Let Tn be the sum of all values stored in all levels of the tree To fna fnba2 fnb2 ai fltnbigt aL fnbL 0 Where L logbn is the depth of the tree 0 Since f1 91 the last term ofthis summation is 9aL 9alogbn 9nlogba 39 A Log Fact Aside 0 It39s not hard to see that alogb n39ogb alogbn nlogba alogbn alogamlogba logbn logan logba c We get to the last eqn by taking loga of both sides 0 The last eqn is true by our third basic log fact 39 Master Theorem 0 We can now state the Master Theorem 0 We will state it in a way slightly different from the book 0 Note The Master Method is just a short cut for the re cursion tree method It is less powerful than recursion trees 39 Master Method The recurrence Tn aTnb fn can be solved as follows 0 If afnb g Kfn for some constant K lt 1 then Tn fn o If afnb 2 Kfn for some constant K gt 1 then Tn 9nlogba 01f afnb fn then Tn logbn o If fn is a constant factor larger than afnb then the sum is a descending geometric series The sum of any geometric series is a constant times its largest term In this case the largest term is the first term fn If fn is a constant factor smaller than afnb then the sum is an ascending geometric series The sum of any ge ometric series is a constant times its largest term In this case this is the last term which by our earlier argument is 9nlogba Finally if afnb fn then each of the L 1 terms in the summation is equal to fn 39 Example Tn T3n4 n o If we write this as Tn aTnb fn then a 2 lb 2 4311 01n 0 Here afnb 3n4 is smaller than fn n by a factor of 43 50 Tn n 39 Example Karatsuba s multiplication algorithm Tn 3Tn2 n o If we write this as Tn aTnb fn then a 3b 2fn n 0 Here afnb 3n2 is bigger than fn n by a factor of 32 so Tn 9n390923 39 Example Mergesort Tn 2Tn2 71 0 If we write this as Tn aTnb fn then a 2b 2fn 2n 0 Here afnb 1 01 50 Tn n log n Example In Class Exercise l o Tn Tn2 n logn Consider the recurrence Tn 4Tn2 nlg n o If we write this as Tn aTnb fn then a 2 lb 2 Q What is fn and afnb 2fn nlogn Q Which of the three cases does the recurrence fall under 0 Here afnb n2logn2 is smaller than fn nlogn by when n is large a constant factor so Tn nlog n Q What is the solution to this recurrence 39 In Class ExerCIse 39 Take Away 0 Recursion tree and Master method are good tools for solving Consider the recurrence Tn 2Tn4 nlgn many recurrences Q What is fn and afnb 0 However these methods are limited they can39t help us get Q Which of the three cases does the recurrence fall under guesses for recurrences like fn fn7 l fn72 when n is large 0 For info on how to solve these other more difficult recur Q What is the solution to this recurrence rences review the notes on annihilators on the class web page 39 Intro to Annihilators 0 Suppose we are given a sequence of numbers A 110111112 gt o This might be a sequence like the Fibonacci numbers 0 Le A ltaoya17a27n T17T27T37gt 39 Annihilator Operators We define three basic operations we can perform on this se quence Multiply the sequence by a constant CA 2 ltca0ca1ca2 gt Shift the sequence to the left LA 2 lta1a2a3 gt Add two sequences ifA 110111112 gt and B b0b1b2 then AB lta0b07a1b17a2b2gt 39 Annihilator Description We first express our recurrence as a sequence T We use these three operators to annihilate T ie make it all 0 s Key rule can39t multiply by the constant 0 We can then determine the solution to the recurrence from the sequence of operations performed to annihilate T 39 Example Consider the recurrence Tn 2Tn7 l TO l o If we solve for the first few terms of this sequence we can see they are lt20212223 gt 0 Thus this recurrence becomes the sequence Tlt2 212223gt Example 11 39 Distributive Property 39 Let39s annihilate T 2021232 gt Multiplying by a constant c 2 gets 2Tlt220221222223gtlt217222324gt Shifting one place to the left gets LT lt21222324gt Adding the sequence LT and 72T gives LT72T 217212272223723gt lt000gt The annihilator of T is thus L72 The distributive property holds for these three operators Thus can rewrite LTi 2T as L 7 2T The operator L 7 2 annihilates T makes it the sequence of all 039s Thus L7 2 is called the annihilator of T 39 O the Forbidden Annihilator Multiplication by 0 will annihilate any sequence Thus we disallow multiplication by O as an operation In particular we disallow sic O for any 5 as an annihilator Must always have at least one L operator in any annihilator 39 Uniqueness An annihilator annihilates exactly one type of sequence In general the annihilator L7 c annihilates any sequence of the form 1061 If we find the annihilator we can find the type of sequence and thus solve the recurrence We will need to use the base case for the recurrence to solve for the constant 10 Example Example 11 l I What does L7c do to other sequences A 2 mod when d b 5 If we apply operator L 7 3 to sequence T above it fails to annihilate T LClta07a0d7a0d27a0d3wquotgt Llta07 aody 1100127 1100137 39 39 39 gt CW0 aody 1100127 1100137 39 39 39gt L7 3 T LT 73 T aod 0Lon7 mods7 gt 7 ltca0 caod COLon7 corods7 gt lt212223gtlt73x2073x2173x22gt lt273x20273x21273x22gt 273T7T aod 7 cao lon 7 caod a0d3 7 00Lon7 gt d 7 cm d 7 mod d 7 Ca0d2 gt d Clta07 aody 1100127 39 39 39gt d 7 CA Uniqueness Lookup Table l l o The last example implies that an annihilator annihilates one type of sequence but does not annihilate other types of sequences 0 Thus Annihilators can help us classify sequences and thereby solve recurrences o The annihilator L7 a annihilates any sequence of the form ltC1angt 39 Example First calculate the annihilator o Recurrence Tn 4Tn7 l TO 2 0 Sequence Tlt224242243gt o Calulate the annihilator LTlt24242243244 4Tlt24242243244 Thus LTe4Tlt000gt And so L74 is the annihilator 39 Example 11 Now use the annihilator to solve the recurrence Look up the annihilator in the Lookup Table It says The annihilator L 7 4 annihilates any sequence of the form 1471 Thus Tn 2 C14 but what is q We know TO 2 so TO 5140 2 and so C1 2 Thus Tn 2 9 4quot 39 In Class ExerCIse Consider the recurrence Tn 3 Tn7 l TO 3 0 Q1 Calculate TOTlT2 and T3 and write out the sequence T 0 Q2 Calculate LT and use it to compute the annihilator of T 0 Q3 Look up this annihilator in the lookup table to get the general solution of the recurrence for Tn 0 Q4 Now use the base case TO 3 to solve for the con stants in the general solution CS 561 Lecture 22 Jared Saia University of New Mexico 39 Generic Traverse Traverses put ni1s in bag while the bag is not empty if v is unmarked take some edge pv from the bag mark v parent v p for each edge vw incident to v put vw into the bag 39 Today s Outline 0 BFS and DFS Wrapup o Shortest Paths 39 DFS and BFS First Search 0 If we implement the bag by using a stack we have Depth First Search 0 If we implement the bag by using a queue we have Breadth quot Analysis 0 Note that if we use adjacency lists for the graph the overhead for the for loop is only a constant per edge no matter how we implement the bag 0 If we implement the bag using either stacks or queues each operation on the bag takes constant time 0 Hence the overall runtime is OlVl lEl OlEl DPS VS BPS 0 Note that DFS trees tend to be long and skinny while BFS trees are short and fat 0 In addition the BFS tree contains shortest paths from the start vertex 5 to every other vertex in its connected compo nent here we define the length of a path to be the number of edges in the path Final Note 0 Now assume the edges are weighted o If we implement the bag using a priority queue always extracting the minimum weight edge from the bag then we have a version of Prim39s algorithm 0 Each extraction from the bag now takes OlEl time so the total running time is OlVl lEl log lEl Example A depth first spanning tree and a breadth first spanning tree of one component of the example graph with start vertex 1 39 Searching Disconnected Graphs 39 DFS and BPS If the graph is disconnected then Traverse only visits nodes in the connected component of the start vertex 5 If we want to visit all vertices we can use the following wrapper around Traverse 0 Note that we can do DFS and BFS equally well on undirected and directed graphs 0 If the graph is undirected there are two types of edges in G edges that are in the DPS or BFS tree and edges that are not in this tree 0 If the graph is directed there are several types of edges TraverseA11 for all vertices v if v is unmarked Traverse v DFS In Directed Graphs 39 Acyclic graphs 39 Tree edges are edges that are in the tree itself Back edges are those edges 744 connecting a vertex u to an ancestor v in the DPS tree 0 Useful Fact A directed graph G is acyclic if and only if a Forward edges are nontree edges uv that connect a vertex DFS of G yeilds no back edges u to a descendant in a DFS tree 0 Challenge Try to prove this fact Cross edges are all other edges They go between two ver tices where neither vertex is a descendant of the other Take Away Shortest Paths Problem I Another interesting problem for graphs is that of finding shortest paths 0 BFS and DFS are two useful algorithms for exploring graphs Assume we are given a weighted directed graph G 2 ME 0 Each of these algorithms is an instantiation of the Traverse with two speciai verticesi a source S and a target t algorithm BPS uses a queue to how the edges and DFS We want to find the shortest directed path from s to t uses a StaCk In other words we want to find the path p starting at s and 0 Each of these algorithms constructs a spanning tree of all ending at t minimizing the function the nodes which are reachable from the start node 5 who Z we 56p Example 39 Imagine we want to find the fastest way to drive from Albu querqueNM to SeattleWA c We might use a graph whose vertices are cities edges are roads weights are driving times 5 is Albuquerque and t is Seattle 0 The graph is directed since driving times along the same road might be different in different directions eg because of construction speed traps etc 0 Every algorithm known for solving this problem actually solves the following more general single source shortest paths or SSSP problem 0 Find the shortest path from the source vertex 5 to every other vertex in the graph 0 This problem is usually solved by finding a shortest path tree rooted at s that contains all the desired shortest paths 39 Shortest Path Tree It39s not hard to see that if the shortest paths are unique then they form a tree To prove this werrTEE39d39 only observe that the 39s39u bfp39aths of shortest paths are themselves shortest paths If there are multiple shotest paths to the same vertex we can always choose just one of them so that the union of the paths is a tree If there are shortest paths to two vertices u and U which diverge then meet then diverge again we can modify one of the paths so that the two paths diverge once only M Example Ifsgtagtbgtcgtdgtv and saaaxayadaual e both shortest paths then SHaHbHCHdHU is also a shortest path 39 MST vs SPT 39 0 Note that the minimum spanning tre xshorte can be different Orao o For one thing there may be only one MST but there can be multiple shortest path trees one for every source vertex l 39 Example 39 A minimum spanning tree left and a shortest path tree rooted at the topmost vertex right Negative Weights SSSP Algorithms l l We39ll actually allow negative weights on edges The presence of a negative cycle might mean that there is no shortest path A shortest path from s to t exists if and only if there is at least one path from s to t but no path from s to t that touches a negative cycle In the following example there is no shortest path from s to t We39ll now go over some algorithms for SSSP on directed graphs These algorithms will work for undirected graphs with slight modification In particular we must specifically prohibit alternating back and forth across the same undirected negative weight edge Like for graph traversal all the SSSP algorithms will be spe cial cases of a single generic algorithm 39 SSSP Algorithms 39 Each vertex v in the graph will store two values which describe a tentative shortest path from s to 1 Initially we set 0 distv is the length of the tentative shortest path between s and v o dists O preds NULL o predv is the predecessor of v in this tentative shortest path 0 For every vertex 1 7b 5 distv 00 and predv NULL 0 The predecessor pointers automatically define a tentative shortest path tree 39 Relaxation We call an edge 744 tense if distu wuv lt distv If uv is tense then the tentative shortest path from s to v is incorrect since the path 5 to u and then 744 is shorter Our generic algorithm repeatedly finds a tense edge in the graph and relaxes it If there are no tense edges our algorithm is finished and we have our desired shortest path tree Relaxuv distv distu wuv predv u 39 Correctness 39 Claim 1 o If distv 7b 00 then distv is the total weight of the prede o The correctness of the relaxation algorithm follows directly cessor chain ending at vi from three simple claims 0 The run time of the algorithm will depend on the way that S A 39 39 39 Hpredwre v Hpre v A U39 we make choices about which edges to relax o This is easy to prove by induction on the number of edges in the path from s to u left as an exercise o If the algorithm halts then distv g ws v for any path SM 1 o This is easy to prove by induction on the number of edges in the path 5 u left as an exercise 0 The algorithm halts if and only if there is no negative cycle reachable from s o The only if39 direction is easy ifthere is a reachable negative cycle then after the first edge in the cycle is relaxed the cycle always has at least one tense edge 0 The if39 direction follows from the fact that every relagtltation step reduces either the number of vertices with distv 00 by l or reduces the sum of the finite shortest path lengths by some positive amount 39 Generic SSSP We haven39t yet said how to detect which edges can be relaxed or what order to relax them in The following Generic SSSP algorithm answers these ques tions We will maintain a bag of vertices initially containing just the source vertex 5 Whenever we take a vertex u out of the bag we scan all of its outgoing edges looking for something to relax Whenever we successfully relagtlt an edge 744 we put 1 in the bag 39 InitSSSP InitSSSP 5H dists O preds NULL for all vertices v s distv infinity predv NULL 39 GenericSSSP 39 Generic SSSP GenericSSSPs InitSSSPs put 5 in the bag while the bag is not empty Just as with graph traversal using different data structures take u from the bag for the bag gives us different algorithms for all edges uv Some obvious choices are a stack a queue and a heap if uv is tense Unfortunately if we use a stack we need to perform 92lEl Relaxuv relagtltation steps in the worst case an exercise for the diligent put v in the bag student The other possibilities are more efficient Diskstra 5 Algorithm 39 Duktra 5 Algorithm 39 If we implement the bag as a heap where the key of a vertex v is distv we obtain Dijkstra39s algorithm Dijkstra39s algorithm does particularly well if the graph has no negative weight edges In this case it39s not hard to show by induction of course that the vertices are scanned in increasing order of their shortest path distance from s It follows that each vertex is scanned at most once and thus that each edge is relaxed at most once Since the key of each vertex in the heap is its tentative dis tance from s the algorithm performs a DecreaseKey opera tion every time an edge is relaxed Thus the algorithm performs at most lEl DecreaseKey39s Similarly there are at most lVl Insert and EgtlttractMin oper ations Thus if we store the vertices in a Fibonacci heap the total running time of Dijkstra39s algorithm is OlEl lVl log lVl 39 Negative Edges Example 0 This analysis assumes that no edge has negative weight 0 The algorithm given here is still correct if there are negative weight edges but the worst case run time could be exponen tial o The algorithm in our text book gives incorrect results for graphs with negative edges which they make clear Four phages of algorimm run on a grth With no neg39ativg edges At each phase the shaded vertices are in the heap and the bold vertex has 39ust been scanned The bold edges describe the evolving shortest path tree 39 Today s Outline CS 561 Lecture 12 0 Intro to Dynamic Programming Jared Saia 0 String Alignment University of New Mexico 39 DP Intro 39 FibonaCCI Example Those who cannot remember the past are doomed to repeat it George Santayana The Life of Reason Book I Introduc tion and Reason in Common Sense 1905 Consider the following procedure for computing the n th Fi bonacci number Fibn What is Dynamic Programming if nlt2 return n else 0 Dynamic Programming is basically Divide and Conquer return Fibn1 Fibn2 with memorization 0 Basic Trick is Don t solve the same problem more than once 39 AnalySIS c Q What is the runtime of Fib o A Except for recursive calls the entire algorithm takes a If Tn is the run time of the c Q How can we solve Tn exactly 0 A We solved this recurrence using annihilaotrs in the last lecture to get Tn 2 51 52 531 where gt 127 135 constant number of steps algorithm on input n then we can say that TOTllTnTn72Tnill A It39s easy to show by induction that Tn 2F 1 7 1 This and b is very bad 39 Aside II 39 Aside III o If we solve for constants we get that TOl C1C2Cs Ta l 61 62 C3 0 So our final solution is Ta 3 C1 gt2C2 7263 l l 7 n 77 A717 n Solving this system of linear equations using Gaussian elim T01 lt1 b lt1 b l e 39 ination gives C3 71 l Cl1 i 021m i l The Problem DP Fibn if nlt2 o The reason Fib is so slow is that it computes the same Fi return 11 bonacci numbers over and over else 0 In general there are F1671 recursive calls to Fibn k if Fn is undefined c We can greatly speed up the algorithm by writing down the Fn DP Fibn 1 DP Fibn 2 results of the recursive calls and looking them up if needed return FED AnalySIs 39 Take Away 39 Dynamic Programming is different than Divide and Conquer in the following way 0 Divide and Conquer divides problem into independent sub FOI eVeI y Value Of 00 betWeen 1 arid ni DPFibgtlt is called problems solves the subproblems recursively and then com exaCt39y one time bines solutions to solve original problem EaCh ca does conStant work 0 Dynamic Programming is used when the subproblems are not Thus runtime Of DPFibn is n a huge saVings independent ie the subproblems share subsubproblems o For these kinds of problems divide and conquer does more work than necessary 0 Dynamic Programming solves each subproblem once only and saves the answer in a table for future reference l The Pattern 39 Edit Distance 0 Formulate the problem recursively Write down a formula for the whole problem as a simple combination of answers to smaller subproblems 0 Build solutions to your recurrence from the bottom up Write an algorithm that starts with the base cases of your recurrence and works its way up to the final solution by con sidering the intermediate subproblems in the correct order 0 The edit distance between two words is the minimum number of letter insertions letter deletions and letter substitutions required to transform one word into another For example the edit distance between FOOD and MONEY is at most four EUUD a MUQD a MUNAD a MUNEQ a MONEY Note Dynamic Programs store the results of intermediate sub problems This is frequently but not always done with some type of table Example String Alignment 39 39 Better way to display this process 0 String Alignment for FOOD and MONEY 0 Place the words one above the other in a table 0 Put a gap in the first word for every insertion and a gap in the second word for every deletion 0 Columns with two different characters correspond to substi Its not too hard to see that we cant do better than four for tutlons the edit distance between Food and Money 0 Then the number of editing steps is just the number of columns that don39t contain the same character twice FOO D MONEY 39 Example 11 0 Unfortunately it can be more difficult to compute the edit distance exactly Example ALGOR AL TR U 39 Key Observation 0 If we remove the last column in an optimal alignment the remaining alignment must also be optimal Easy to prove by contradiction Assume there is some better subalignment of all but the last column Then we can just paste the last column onto this better subalignment to get a better overall alignment Note The last column can be either 1 a blank on top aligned with a character on bottom 2 a character on top aligned with a blank on bottom or 3 a character on top aligned with a character on bottom DP Solution 39 To develop a DP algorithm for this problem we first need to find a recursive definition Assume we have a m length string A and an n length string B Let Eij be the edit distance between the first 1 characters of A and the firstj characters of B Then what we want to find is En m Recursive Definition 39 Say we want to compute Eij for some i and j 0 Further say that the Recursion Fairy can tell us the solu tion to Ez j for all i g 13 g j except for i i and j j c Q Can we compute Eij efficiently with help from the our fairy friend 39 Recur5ive Definition 39 Summary There are three poss39b39e Cases iet IAi 7 Bj 1 if Am and EU are different and 0 if they are the same Then 0 Insertion Eij l Ei7 lj 4 4 E 71 1 o Deletion Eij 1 Eij e 1 MM 2 min 7 i 17 Elt gt 1147 1 IAi at BM 0 Substitution If ai bj Eij Eiiljil else Eij Ei17j11 39 Base Cases 39 Recur5ive Alg c We now have enough info to directly create a recursive al gorithm It39s not too hard to see that o The run time of this recursive algorithm would be given by the following recurrence EOj j for all j since the j characters of B must be aligned with blanks Similarly Ei0 i for all 1 Twin Twine 1 Tm7 17nTne 17m7 1 01 0 Solution Tnn 91 Vi which is terribly terribly slow Tm7 O TOn 01 Better Idea 39 We can build up a m gtlt n table which contains all values of E931 We start by filling in the base cases for this table the entries in the O th row and O th column To fill in any other entry we need to know the values directly above to the left and above and to the left Thus we can fill in the table in the standard way left to right and top down to ensure that the entries we need to fill in each cell are always available 39 Example Table Bold numbers indicate places where characters in the strings are equal Arrows represent predecessors that define each entry hori zontal arrow is deletion vertical is insertion and diagonal is substitution Bold diagonal arrows are free substitutions of a letter for itself Any path of arrows from the top left to the bottom right cor ner gives an optimal alignment there are three paths in this example table so there are three optimal edit sequences ALGORITHM lgt2gt3gt4gt5gt6gt7gt8gt9 l 0gtlgt2gt3gt4gt5gt6gt7gt8 0441442443444445446447 l l lgt2gt3gt4 4H5gt6 i l 2 2H3gt4gt5gt6 lll 3 3 3 3H4H5H6 ll 4 4 3H4gt5gt6 M SeeoeemeeweeoermeagermermeeHero OH 00 i mg 01 Jgt 3 M l 00 i mg 01 Jgt we to l 4 l 5 l 6 l 7 l 8 The code 39 EditDistanceA1mB1n for i1iltmi Edit10 1 for j1jltnj Edit0j j for i1iltmi for j1jltnj if AEi Bj Edit1j minEditi1j1 Edit1j 11 EditEi 1j 1gt else Edit1j minEditi1j1 EditEij11 EditEi 1j 11 return Edit mn Analysis 39 Reconstructing an optimal alignment 39 o In this code we do not keep info around to reconstruct the Let n be the length of the first string and m the length of optimal alignment the second string 0 However it is a simple matter to keep around another array Then there are nm entries in the table and it takes 61 which stores for each cell a pointer to the cell that was used time to fill each entry to achieve the current cell39s minimum edit distance This implies that the run time of the algorithm is nm 0 To reconstruct a solution we then need only follow these Q Can you find a faster algorithm pointers from the bottom right corner up to the top left corner In Class Exercise Take Away l o Create a string alignment table for the two strings abba and bab Put abba at the top of the table and bab on the left side 0 Qi i 12 5 What is the i th row of your table 0 Q6 What is the minimum edit distance and how many align ments achieve it c To solve the string alignment problem we did the following 1 formulated the problem recursively 2 built a solution to the recurrence from the bottom up 0 Next we39ll see how a similar technique can be used to solve the matrix multiplication problem 39 Today s Outline CS 561 LeCture 15 o Greedy Algorithm Intro 0 Activity Selection Jared SaIa Knapsack University of New Mexico Greedy Algorithms Activity Selection l Greed is Good Michael Douglas in Wall Street A greedy algorithm always makes the choice that looks best at the moment Greedy algorithms do not always lead to optimal solutions but for many problems they do In the next week we will see several problems for which greedy algorithms produce optimal solutions including ac tivity selection fractional knapsack When we study graph theory we will also see that greedy algorithms can work well for computing shortest paths and finding minimum spanning trees 0 You are given a list of programs to run on a single processor 0 Each program has a start time and a finish time 0 However the processor can only run one program at any given time and there is no preemption ie once a program is running it must be completed Another Motivating Problem 39 Example 39 Imagine you are given the following set of start and stop times for activities ll 39 0 Suppose you are at a film fest all mOVIes look equally good I D E ll C39 ll and you want to see as many complete movies as possible 0 This problem is also exactly the same as the activity selection ll ll 3 problem i E l ll l l l l 39 Greedy ActIVIty Selector Sort the activities by their finish times Schedule the first activity in this list Now go through the rest of the sorted list in order scheduling c There are many ways to optimally schedule these activities activities whose start time is after or the same as the last 0 Brute Force examine every possible subset of the activites and find the largest subset of non overlapping activities 0 Q If there are n activities how many subsets are there scheduled activity 0 The book also gives a DP solution to the problem note code for this algorithm is in section 161 39 Greedy Algorithm 39 Greedy Scheduling of Activities Sorting the activities by their finish times AnalySIs 39 Optimality 39 Let n be the total number of activities The algorithm first sorts the activities by finish time taking 0 The big question here is Does the greedy algorithm give us On log n an optimal solution Then the algorithm visits each activity egtltactly once doing a o Surprisingly the answer turns out to be yes constant amount of work each time This takes On 0 We can prove this is true by contradiction Thus total time is Onlog n 39 Proof of Optimality Let A be the set of activities selected by the greedy algorithm Consider any non overlapping set of activities B We will show that lAl 2 lBl by showing that we can replace each activity in B with an activity in A This will show that A has at least as many activities as any other non overlapping schedule and thus that A is optimal 39 Proof of Optimality Let am be the first activity in A that is different than an activity in B Then A 11112 amam1 and B a17a27bmbm1 But since A was chosen by the greedy algorithm am must have a finish time which is earlier than the finish time of bag Thus B a1a2ambm1 is also a valid schedule 3 B a ban was gt Continuing this process we see that we can replace each activity in B with an activity in A QED c We wanted to show that the schedule A chosen by greedy was optimal c To do this we showed that the number of activities in A was at least as large as the number of activities in any other non overlapping set of activities 0 To show this we considered any arbitrary non overlapping set of activities B We showed that we could replace each activity in B with an activity in A Greedy pattern 39 The problem has a solution that can be given some numerical value The best optimal solution has the highestlowest value The solutions can be broken down into steps The steps have some order and at each step there is a choice that makes up the solution The choice is based on what s best at a given moment Need a criterion that will distinguish one choice from another Finally need to prove that the solution that you get by making these local choices is indeed optimal 39 ActIVIty Selection Pattern 39 Knapsack Problem Those problems for which greedy algorithms can be used are a subset of those problems for which dynamic programming 0 The value of the solution is the number of non overlapping can be used activities The best solution has the highest number 0 The sorting gives the order to the activities Each step is examining the next activity in order and decide whether to So it39s easy to mistakenly generate a dynamic program for a problem for which a greedy algorithm suffices Or to try to use a greedy algorithm when in fact dynamic include it programming is required 0 In each step the greedy algorithm chooses the activity which The knapsack problem illustrates this difference extends the length of the schedule as little as possible The 0 1 knapsack problem requires dynamic programming whereas for the fractional knapsack problem a greedy algo rithm suffices 39 O l Knapsack 39 Fractional Knapsack The problem A thief robbing a store finds n items the i th item is worth vi dollars and weighs wi pounds where mi and vi are integers The thief has a knapsack which can only hold W pounds for some integer W The thief39s goal is to take as valuable a load as possible Which values should the thief take 0 In this variant of the problem the thief can take fractions of items rather than the whole item 0 An item in the 0 1 knapsack is like a gold ingot whereas an item in the fractional knapsack is like gold dust This is called the 0 1 knapsack problem because each item is either taken or not taken the thief can not take a fractional amount 39 AnalySIs We can solve the fractional knapsack problem with a greedy algorithmi o If there are n items this greedy algorithm takes Onlogn time Compute the vaue per pound vimi for each item 0 We39ll show in the in class exercise that it returns the correct Sort the items by value per pound solution The thief then follows the greedy strategy of always taking 0 NOte hOWeVeI that the greedy algorithm does net work on as much as possible of the item remaining which has highest the 0 i 1 knapsaCk value per pound Faiiure on 01 Knapsack 39 Optimality of Greedy on Fractional 39 Say the knapsack holds weight 5 and there are three items Let item 1 have weight 1 and value 3 let item 2 have weight 2 and value 5 let item 3 have weight 3 and value 6 Then the value per pound of the items are 3522 respec o Greedy is not optimal on 0 1 knapsack but it is optimal on tively fractional knapsack The greedy algorithm will then choose item 1 and item 2 0 TO ShOW this we can use a moot by COhtI adiCtiOh for a total value of 8 However the optimal solution is to choose items 2 and 3 for a total value of 11 Assume the objects are sorted in order of cost per pound Let vi be the value for item 1 and let wi be its weight Let xi be the fraction of object 1 selected by greedy and let V be the total value obtained by greedy Consider some arbitrary solution B and let be the fraction of object 1 taken in B and let V be the total value obtained xi 7 9093 29017 909 by B W wk We want to show that V g V or that V 7 V 2 O 0 Let k be the smallest index with 0 lt l o Notethatforiltk eel1 and forigtk aciO 0 You will show that for all i n n 2 acivi 7 2 mi 0 Note that the last step follows because is positive and i1 i1 beca use n 0039 7 00 v39 n n n i i i i 2 m 7 x9 wi Z Z lt7 i1 i1 EC wk 739 UZ39 7 ZaraMm W7W 8 11 W gt o 9 n 7 k 2amp1quot xi wi 0 Where W is the total weight taken by greedy and W is the 17 n total weight for the strategy B 1716quot 2m 7 x2 wt 0 We know that W 2 W i1 O 39 In Class ExerCIse Consider the inequality U39 1 xi 7 x94 2 xi 7 x94 wi wk 0 Q1 Show this inequality is true for i lt k 0 Q2 Show it39s true for i k 0 Q3 Show it39s true for i gt k 39 Today s Outline CS 561 Lecture 20 0 Data Structures for Disjoint Sets continued Jared Saia University of New Mexico 39 DISJOInt Sets 39 Operations We want to support the following operations 0 A disjoint set data structure maintains a collection 31SQSk Make saw creates a new set Whose only member and representative is as of dIsJomt dynamic sets U I I It th t th t t I d H th 0 Each set is identified by a representative which is a member nlonx y39 um es e 5e 5 a con am 00 an y ca em of that set Sm and Sy into a new set that is SmUSy The new set is added to the data structure while Sm and S are deleted The representative of the new set is any member of the set 0 Find Setac Returns a pointer to the representative of the unique set containing as 0 Let39s call the members of the sets objects 39 Simple Union MakeSetx parentx x sizex 1 Simple Unionxy xRep FindSet X yRep FindSet y if sizexRep gt sizeyRep parentltyRepgt xRep else parentltxRepgt yRep sizeyRep sizeyRep sizexRep 39 Analysis We showed in last class that the heights of all trees are no more than logarithmic in the number of nodes in the tree Thus all of these operations take Olog n time Q Can we do better A Yes we can do much better in an amortized sense Shallow Threaded Trees 39 One good idea is to just have every object keep a pointer to the leader of it39s set In other words each set is represented by a tree of depth 1 Then Make Set and Find Set are completely trivial and they both take 01 time Q What about the Union operation To do a union we need to set all the leader pointers of one set to point to the leader of the other set To do this we need a way to visit all the nodes in one of the sets We can do this easily by threading a linked list through each set starting with the sets leaders The threads of two sets can be merged by the Union algo rithm in constant time l The Code 39 The Code Unionxy xRep FindSetx MakeSetx 1eaderx x yRep FindSety 1eadery xRep nextx NULL whilenextyNULL y nexty 1eadery xRep FindSetx return 1eaderx nexty nextxRep nextxRep yRep 39 Example 39 AnalySIs I 0 0 Worst case time of Union is a constant times the size of the 0 So if we merge a one element set with a n element set the run time can be n o In the worst case it39s easy to see that n operations can take em time for this alg Merging two sets stored as threaded trees Bold arrows point to leaders lighter arrows form the threads Shaded nodes have a new leader 39 Problem 39 The Code 0 The main problem here is that in the worst case we always get unlucky and choose to update the leader pointers of the MakeWeightedSetx larger set 1eaderx x 0 Instead let39s purposefully choose to update the leader point next x NULL ers of the smaller set sizex 1 c To do this we will need to keep track of the sizes of all the sets 39 The Code 39 Analysis Weighted Unionxy xRep FindSet X yRep FindSet y ifsizexRepgtsizeyRep Union xRep yRep The Weighted Union algorithm still takes n time to merge two n element sets However in an amortized sense it is more efficient Intuitively in order to merge two large sets we need to per form a large number of cheap Weighted Unions We will show that a sequence ofn Make Weighted Set oper ations and m Weighted Union operations takes Omn log n time in the worst case sizexRep sizexRep sizeyRep else UnionyRepxRep sizeyRep sizexRep sizeyRep Whenever the leader of an object as is changed by a call to Weighted Union the size of the set containing as increases by a factor of at least 2 Thus if the leader ofac has changed k times the set contain ing as has at least 2 members After the sequence of operations ends the largest set has at most n members Thus the leader of any object as has changed at most llog nj times 0 Let n be the number of calls to Make Weighted Set and m be the number of calls to Weighted Union 0 We39ve shown that each of the objects that are not in single ton sets had at most Ologn leader changes 0 Thus the total amount of work done in updating the leader pointers is On log n We39ve just shown that for n calls to Make Weighted Set and m calls to Weighted Union that total cost for updat ing leader pointers is Onlogn We know that other than the work needed to update these leader pointers each call to one of our functions does only constant work Thus total amount of work is Onlogn m Thus each Weighted Union call has amortized cost of Olog n 39 Analysis Using Simple Union Find takes logarithmic worst case time and everything else is constant 0 Using Weighted Union Union takes logarithmic amortized time and everything else is constant c A third method allows us to get both of these operations in almost constant amortized time Side Note We39vejust used the aggregate method of amortized analysis 39 Path Compression We start with the unthreaded tree representation from Simple Union Key Observation is that in any Find operation once we get the leader of an object ac we can speed up future Find39s by redirecting ac39s parent pointer directly to that leader We can also change the parent pointers of all ancestors of as all the way up to the root We39ll do this using recursion This modification to Find is called path compression 39 Example Path compression during Findc Shaded nodes have a new parent PC Find Code 39 PCFind X if x Parent X Parentx PCFindParentX return Parent X o For ease of analysis instead of keeping track of the size of each of the trees we will keep track of the rank 0 Each node will have an associated rank 0 This rank will give an estimate of the log of the number of elements in the set 39 Code 39 Rank Facts PCMakeSetX parentx x rankx 0 PC i ndx If an object as is not the set leader then the rank of as is strictly less than the rank of its parent yRep PC Flndy For a set XY sizeoo Z 2TankleadeTX can show using in ifrankxRep gt rankyRep duction ParentyRep XReP Since there are n objects the highest possible rank is Olog n 915 Only set leaders can change their rank parentltxRepgt yRep if rankxReprankyRep rankyRep 39 Rank Facts 39 Blocks Can also say that there are at most nQ39r objectS with rank 7439 We will also partition the objects into several numbered blocks as is assigned to block number logmnkac Intuitively logn is the number of times you need to hit the c When the rank of a set leader as changes from r 7 l to r mark all nodes in that set At least 2T nodes are marked and log button on your calculator after entering n before you et 1 each of these marked nodes will always have rank less than T g I I In other words as is in block b if c There are n nodes total and any obJect With rank r marks 2T of them 2 N be 1 lt rankltxgt g 2 N b 0 Thus there can be at most 11 objects of rank r where N is defined as in the next slide 39 Definition 39 Number of Blocks 0 QTTb is the tower function 0 Every object has a rank between 0 and llog nj 0 So the blocks numbers range from O to log llog nj 2 logn7 if b O 1 if b gt O 0 Hence there are logn blocks 2 22 l7 l 2i 2 22llb71 Number Objects in Block b Theorem I l Theorem If we use both PC Find and PC Union ie Path Compression and Weighted Union the worst case running time of a sequence of m operations n of which are MakeSet operations is Omlogn n n Each PC MakeSet aand PC Union operation takes constant 2 W time so we need only show that any sequence ofm PC Find operations require Omlogn time in the worst case 0 We will use a kind of accounting method to show this 0 Since there are at most 11 objects with any rank r the total number of objects in block b is at most QTTb TL n X y lt X y r2nltb71gt1 r2nb71gt1 39 Taxation The cost of PC Findac0 is proportional to the number of The leader acl pays into the leader account nodes on the path from x0 up to its leader The child of the leader 9011 pays into the child account 0 Each object 000901002 acl on the path from x0 to its leader Any other object xi in a different block from its parent eel1 will pay a 1 tax into one of several bank accounts pays into the block account 0 After all the Find operations are done the total amount of Any other object xi in the same block as its parent gel1 pays money in these accounts will give us the total running time into the path account Example Leader Child and Block accounts I l During any Find operation one dollar is paid into the leader account At most one dollar is paid into the child account At most one dollar is paid into the block account for each of the logn blocks Thus when the sequence of m operations ends these ac counts share a total of at most 2m mlogn dollars Different nodes on the find path pay into different accounts Bblock Ppath CChild Lleader Horizontal lines are boundaries between blocks Only the nodes on the find path are shown 34 39 Path Account The only remaining difficulty is the Path account Consider an object xi in block b that pays into the path account This object is not a set leader so its rank can never change The parent of xi is also not a set leader so after path com pression xi gets a new parent 001 whose rank is strictly larger than its old parent eel1 0 Since rankparentx is always increasing parent of xi must eventually be in a different block than xi after which xi will never pay into the path account Thus xi pays into the path account at most once for every rank in block b or less than 2 if b times total 39 Path Account 0 Since block b contains less than n2 if b objects and each of these objects contributes less than ZTTb dollars the total number of dollars contributed by objects in block b is less than n dollars to the path account 0 There are logn blocks so the path account receives less than nlogn dollars total 0 Thus the total amount of money in all four accounts is less than 2mmlgnnlgn Omlgn and this bounds the total running time of the m operations 39 Take Away 0 We can now say that each call to PC Find has amortized cost Ologn which is significantly better than the worst case cost of Olog n o The book shows that PC Find has amortized cost of OAn where An is an even slower growing function than logn CS 561 Lecture 17 Jared Saia University of New Mexico Today s Outline 39 Potential Method 0 Dynamic Tables Potential Method 39 The most powerful method and hardest to use Builds on the idea from physics of potential energy Instead of associating taxes with particular operations rep resent prepaid work as a potential that can be spent on later operations Potential is a function of the entire data structure 39 Potential Function Let D denote our data structure after i operations Let ct denote the potential of D Let 5 denote the cost of the i th operation this changes DZL1 into Di Then the amortized cost of the i th operation 11 is defined to be the actual cost plus the change in potential 39 Potential Method 0 So the total amortized cost ofn operations is the actual cost plus the change in potential 77 I7 77 Z 0 Z i i71 Zci ni 0 i1 i1 i1 39 Potential Method 0 Our task is to define a potential function so that 1 0 O 2 ch 2 O for all i o If we do this the total actual cost of any sequence of oper ations will be less than the total amortized cost 77 77 77 2 Zar bn Zai i1 i1 i1 39 Binary Counter Example 0 For the binary counter we can define the potential ct after the i th Increment operation to be the number of bits with value 1 0 Initially all bits are 0 so 0 0 further ch 2 O for all i gt 0 so this is a legal potential function 39 Binary Counter 0 We can describe both the actual cost of an Increment and the change in potential in terms of the number of bits set to l and reset to O c 2 bits flipped from O to l bits flipped l to 0 ch 7 i1 2 bits flipped from O to l 7 bits flipped l to O 0 Thus the amortized cost of the ith Increment is 11139 C ch 7 izl 2 gtlt bits flipped from O to l Binary Counter 39 Potential Method Recipe 39 Since Increment only changes one bit from a O to a l the amortized cost of Increment is 2 using this potential func I I I I Define a potential function for the data structure that Is 1 tion I I I I I a S thus the total cost for n call to increment is no more than 39 2n 0 Same as saying that the amortized cost is 2 the change in potential Binary Counter Example A Good Potential Function l Different potential functions lead to different amortized time bounds o For the binary counter the potential was exactly the total Trick to using the method is to get the best possible potential unspent taxes paid using the taxation method function 0 So it gave us the same amortized bound A good potential function goes up a little during any cheapfast o In general however there may be no way of interpreting the operation and goes down a lot during any expensiveslow op potential as taxes eration Unfortunately there39s no general technique for doing this other than trying lots of possibilities 39 Stack Example 0 Let39s now compute the costs of the different stack operations on a stack with 5 items 0 If the i th operation on the stack is a push operation on a stack containing 5 objects then 0 Consider again a stack with Multipop 0 Define the potential function lt1gt on the stack to be the num ber of objects on the stack 0 This potential function is legal since 0 O and 4 2 O iT i71 S1S 1 foralligt0 o Soaizci122 Multipop 39 Wrapup 39 Let the i th operation be MultipopSk and let M minks be the number of objects popped off the stack Then Di 7 bl1 s 7 k 7 s 716 o The amortized cost of each of these three operations is 01 Further ci k 0 Thus the worst case cost of n operations is On Thus ai 7k k 0 We can show similarly that the amortized cost of a pop operation is O 39 Dynamic Tables Consider the situation where we do not know in advance the number of items that will be stored in a table but we want constant time access We might allocate a fixed amount of space for the table only to find out later that this was not enough space In this case we need to copy over all objects stored in the original table into a new larger table Similarly if many objects are deleted we might want to re duce the size of the table 39 Dynamic Tables 0 The data structure that we want is a Dynamic Table aka Dynamic Array 0 We can show using amortized analysis that the amortized cost of an insertion and deletion into a Dynamic Table is 01 even though worst case cost may be much larger Load Factor 39 For a nonempty table T we define the load factor of T aT to be the number of items stored in the table divided by the size number of slots of the table We assign an empty table one with no items size 0 and load factor of 1 Note that the load factor of any table is always between 0 and 1 Further if we can say that the load factor of a table is always at least some constant c then the unused space in the table is never more than 1 7 c 39 Table Expansion Assume that the table is allocated as an array A table is full when all slots are used ie when the load factor is 1 When an insert occurs when the table is full we need to expand the table The way we will do this is to allocate an array which is twice the size of the old array and then copy all the elements of the old array into this new larger array If only insertions are performed this ensures that the load factor is always at least 12 39 Pseudocode 39 Amortlzed AnalySIs TableInsert T X if Tsize 0allocate T with 1 slotTsize1 if Tnum Tsize allocate newTable with 2Tsize slots insert all items in Ttable into newTable 0 Note that usually Table Insertjust does an elementary In sert into the array 0 However very occasionally it will do an expansion say that the cost of an expansion is equal to the size before We will the egtltpansio occurs 0 This is the cost of moving over all the old elements to the Ttable newTable Tsize 2Tsize TtableTnum x larger table Tnum 39 Aggregate AnalySIs 39 Taxatlon method 0 Let ci be the cost of the i th call to Table Insert o If i7 l is an exact power of 2 then we39ll need to do an expansion and so ci 2 i 0 Otherwise cl l o The total cost of n Table Insert operations is thus 0 Every time Table Insert is called we tax the operation 3 dollars n llognl o Intuitively the item inserted pays for 2 Ci 71 its insertion 11 7 moving itself when the table is eventually egtltpanded moving some other item that has already been moved once when the table is expanded n 2llognl1 11 2 gtilt 2l3909 l n2n 3n 39 Taxation Method 39 Taxation Method 0 Suppose that the size of the table is m right after an expan sion 0 Then the number of items in the table is 7712 0 Each time Table Insert is called we tax the operation 3 doi lars One dollar is used immediately to pay for the elementary insert Another dollar is stored with the item that is inserted The third dollar is placed as credit on one of the m2 items already in the table 0 Filling the table again requires m2 total calls to Table Insert 0 Thus by the time the table is full and we do another expan sion each item will have one dollar of credit on it o This dollar of credit can be used to pay for the movement of that item during the expansion 39 Potential Method 39 In Class ExerCIse Recall that 111 ci Di 7 121 Let39s OW analyze Tab39e39lnsert Sing the pOtentia39 methOd 0 Show that this potential function is O initially and always Let numi be the num value for the i th call to Table Insert nonnegative Let sizei be the size value for the i th call to Table Insert Compute W for the case where Table1nsert does not trigger Then let an expansion 4 2 numi 7 sizei 0 Compute ai for the case where Table Insert does trigger an expansion note that numiil numii l sizeiil numii l sizei 2 7 1 39 Outline CS 561 Lecture 10 0 Red Black Trees Chapter 13 Jared Saia University of New Mexico Red Black Properties 39 Example RB Tree 39 A BST is a red black tree if it satisfies the RES Properties Every node is either red or black The root is black Every leaf NIL is black If a node is red then both its children are black For each node all paths from the node to descendant leaves contain the same number of black nodes 39 Black Height o Black height of a node ac bhgtlt is the number of black nodes on any path from but not including ac down to a leaf node 0 Note that the black height of a node is well defined since all paths have the same number of black nodes 0 The black height of an RES Tree is just the black height of the root 39 Key Lemma 0 Lemma A RB Tree with n internal nodes has height at most 2 logn l 0 Proof Sketch l The subtree rooted at the node ac contains at least 217549 7 1 internal nodes 2 For the root r bhr 2 h2 thus n 2 2h2 7 1 Taking logs of both sides we get that h g 2 logn l Maintenance 39 Proof 39 l The subtree rooted at the node ac contains at least 217549 7 1 internal nodes Show by induction on the height of ac 0 BC If the height of ac is 0 then ac is a leaf and subtree rooted at ac does indeed contain 207 l 0 internal nodes IH For all nodes y of height less than ac the subtree rooted at y contains at least 217M107 1 internal nodes IS Consider a node ac which is an internal node with two childrenall internal nodes have two children Each child has black height of either bhac or bhac 7 l the former if it is red the latter if it is black Since the height of these children is less than ac we can apply the inductive hypothesis to conclude that each child has at least 25h 1 7 1 internal nodes This implies that the subtree rooted at ac has at least 2bhltgt1 7 1 2bhltgt1 7 1 1 217W 7 1 internal nodes This proves the claim o How do we ensure that the Red Black Properties are main tained 0 Le when we insert a new node what do we color it How do we re arrange the new tree so that the Red Black Property holds 0 How about for deletions 39 Left Rotate 39 Picture Left Rotategtlt takes a node as and rotates as with its right child Right Rotate is the symmetric operation Both Left Rotate and Right Rotate preserve the SST Prop erty We39ll use Left Rotate and Right Rotate in the RES Insert pro cedure Example 39 Binary Search Tree Property l Let as be a node in a binary search tree Ify is a node in the left subtree of ac then keyy keygtlt If y is a node in the right subtree of as then keyy2keygtlt In Class Exercise 39 RB InsertTz 39 Show that Left Rotategtlt maintains the SST Property In other words show that if the SST Property was true for the tree before the Left Rotategtlt operation then it39s true for the tree after the Operation Set leftz and rightz to be NIL Let y be the last node processed during a search for z in T Insert 2 as the appropriate child of y left child if keyzg y right child otherwise Color 2 red Call the procedure RB Insert Fixup 0 Show that after rotation the SST property holds for the entire subtree rooted at as 0 Show that after rotation the SST property holds for the subtree rooted at y 0 Now argue that after rotation the SST property holds for the entire tree 39 RB Insert FixupTz RB Insert FixupTz while colorpz is red case 1 z s uncle y is red do case 1 2 z s uncle y is black and z is a right child case 2 3 z s uncle y is black and z is a left child case 3 colorrootT black Case 2 and 3 39 Loop Invariant 39 At the start of each iteration of the loop 0 Node 2 is red 0 If parentz is the root then parentz is black 0 If there is a violation of the red black properties there is at most one violation and it is either property 2 or 4 If there is a violation of property 2 it occurs because 2 is the root and is red If there is a violation of property 4 it occurs because both 2 and parentz are red 39 Pseudocode 39 Other Balanced BSTs 0 Detailed Pseudocode for RES Insert and RB Insert Fixup is in the book Chapter 133 o A detailed proof of correctness for RB Insert Fixup in the the same Chapter 0 Code for RB Deetion is also in Chapter 13 o We39ll now briefly discuss some other balanced BSTs 0 They all implement Insert Delete Lookup Successor Pre decessor Maximum and Minimum efficiently

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.