Computer Science II
Computer Science II COP 3503
University of Central Florida
Popular in Course
Popular in Computer Programming
This 58 page Class Notes was uploaded by Luisa Beer on Thursday October 22, 2015. The Class Notes belongs to COP 3503 at University of Central Florida taught by Staff in Fall. Since its upload, it has received 16 views. For similar materials see /class/227465/cop-3503-university-of-central-florida in Computer Programming at University of Central Florida.
Reviews for Computer Science II
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/22/15
COP 3503 Computer Science 11 CLASS NOTES Mergesort The mergesort sorting algorithm uses the divide and conquer strategy in which the original problem is split into two halfsize recursively solved problems If the overhead of the base case was linear ON then the overall running time of the algorithm was ON log N The mergesort is such an algorithm and is commonly employed for external sorting The mergesort algorithm is a recursive subquadratic algorithm as follows 1 if the number of items to sort is 0 or 1 return 2 recursively sort the first and second halves separately 3 merge the two sorted halves into a single sorted group Since this algorithm uses the divide and conquer strategy and employs the halving principle we know that the sorting is done in Olog2N and thus we need only to show that merging two sorted groups into a single sorted group can be performed in linear time to prove the running time is ON log N A linear merge algorithm A linear merge algorithm requires three separate arrays A B and C two input and one output plus an indeX counter per array actr bctr and cctr The indeX counters are initially set to the first position in each of their arrays with incrementation as follows if Aactr lt Bbctr Ccctr Aactr cctr39 actr39 else Ccctr Bbctr cctr39 bctr Day 12 1 Example Linear Merge A 13l4l7 T actr B 25911 T bctr 1 347 T 25911 T 13l4l7 T 25911 T T 12 T 13l4l7 T 25911 T 123 T 13l4l7 T 25911 T 1234 T 13l4l7 T 25911 T 12345 T 13l4l7 T 25911 T 123457 T 13l4l7 T 25911 T 123457 9II T 25911 123457 911 13l4l7 T Day 122 The mergesort is not often used as an internal sorting method The reason is that the output array represents a linear increase in the memory requirements and additional work is required to copy the components of the arrays Quicksort With an average running time of ON log N quicksort is the fastestknown sorting algorithm Quicksort has a worst case running time of ON2 which can be made statistically impossible to achieve Quicksort is a recursive algorithm whose performance is easy to prove yet has a tricky implementation since slight variations in the code can make significant differences in the running time The quicksort algorithm is as follows Call QuicksortA if the number of elements to be sorted is 0 or 1 return pick any element v in the array A this is the pivot element partition Av into two disjoint groups Left xeAv X S v and Right xeAv X 2 v Return the result of QuicksortLeft followed by v followed by QuicksortRight LWNH U Notes base case includes possibility that the number of elements is 0 since the recursive calls may generate empty subsets Any element can theoretically be the pivot element although in reality the pivot element is not randomly chosen The partitioning must be performed in place so that no additional arrays are required Quicksort outperforms mergesort because the time required to partition the array is less than the time required to merge two arrays Analysis Best Case The partitions are always 12 the size of the input at each recursive step thus by the halving principle the sorting component requires Olog2 N and with linear overhead for the base case we have ON log N which is equal to that of the mergesort Worst Case The partitions are very lopsided meaning that either L 0 or nl or R nl or 0 at each recursive step Suppose that TN is the time for Quicksort on array of N elements Left contains no elements Right contains all of the elements except the pivot element this means that the pivot element is always chosen to be the smallest element in the partition 1 time unit is required to sort 0 or 1 Day 123 elements and N time units are required to partition a set containing N elements Then if N gt 1 we have TNTN1N This means that the time required to quicksort N elements is equal to the time required to recursively sort the Nl elements in the Right subset plus the time required to partition the N elements By telescoping the equation above we have TN TN1 N TN1 TNZ Nl TNZ TN3 N2 HIT2T12 TNT123 4 NNN12ON2 Therefore you never want to select a pivot element that leads to an unbalanced paritioning Average Case If each partition is equally likely to contain 0 l 2 Nl elements then the average running time of the quicksort algorithm is ON log N More formally this is stated as TLeftaverage TRightaverage TO Tl T2 TN lN TNaverage TLeftaverage TRightaverage N 2TLeftaverage N 2TO Tl T2 TNlN N with manipulation you arrive at TNN1 TNlN 2Nl Telescoping yields TNNl 2l 12 13 lNl 52 which is Olog2 N Therefore multiplying both side by Nl gives TN ON log N Day 124 Picking the Pivot Don t do this randomly or by picking the first element or even by picking the larger of the first two elements A safe way is to set low first element and high last element and calculating lowhigh2 An even better way is to pick the median of three values low middle and high Lower Bound 0n Sorting We have seen that quicksort has a best case performance of ON log N The question becomes can we do better The bottom line is any algorithm that sorts which uses only element binary comparisons will require QN log N time in the worst case This means that any algorithm that sorts by using element binary comparisons must use at least roughly N log N comparisons for some input sequence This is also true for the average case performance Consider the problem of sorting the sequence S a b c composed of three distinct items that is a 7393 b and a 7393 c and b 7393 c The figure below illustrates a possible sorting algorithm in the form of a binary decision tree Each node of the decision tree represents one binary comparison in each node of the tree exactly two elements are compared Since there are two possible outcomes for each comparison each nonleaf node in the tree will be of degree two Suppose that a lt b lt c in our example Consider how the algorithm determines this fact Y N Y I N Y N Y i N cltaltb bltaltc Y N altcltb bltclta cltblta altbltc A Decision Tree for Comparison Sorting Dd25 For the example the first comparison compares a and b which will reveal that a lt b since we assumed a lt b lt c The second comparison compares a and c which will determine that a lt c At this point it has been determined that a lt b and a lt 0 yet the relative order of b and c has not yet been determined Therefore a third comparison is required to determine the relative order of b and 0 Notice that the algorithm works correctly in all cases because every permutation of the sequence S appears as a leaf node in the decision tree Furthermore the number of comparisons required in the worst case is equal to the height of the decision tree Any sorting algorithm that uses binary element comparisons can be represented by a binary decision tree The height of that decision tree will determine the worst case running time of the algorithm In general the size and shape of the decision tree depends on the particular sorting algorithm and the number of elements to be sorted Given an input sequence of n items to be sorted every binary decision tree that correctly sorts the input sequence must have at least n leaves one for each permutation of the input Therefore since the maximum number of leaf nodes in a binary tree of height h is 2h that the height of the binary decision tree is at least flogz ml Therefore n n2 I IogznflzloganZngi 2 ZIoanZ 2 n2Iog2 n2 Qnlog n i1 i1 Since the height of the decision tree is Qn log n the number of comparisons done by any sorting algorithm that sorts using only binary comparisons is Qn log n Assuming that each comparison can be done in constant time the running time of any such algorithm is Qn log n Day 126 COP 3503 Computer Science 11 CLASS NOTES The previous day s notes introduced the special case of the binary tree known as the binary search tree BST As mentioned before general search trees are called M way search trees and are beyond the scope of this course A binary search tree is a special case of an mway search tree defined as follows Binary Search Tree A binary search tree T is a finite set of keys Either the set is empty T Q or the set consists of a root r and exactly two binary search trees T L and TR T r TL TR such that the following properties are satisfied 1 All keys contained in left subtree TL are less than V 2 All keys contained in right subtree TR are greater than V nsertion into a BST In the previous day s notes we saw that insertion into a BST is a straight forward procedure which basically involves searching for the value to be inserted knowing that it is presently not in the tree The search will always be unsuccessful and will terminate on a leaf node actual termination will occur on the null child reference at this leaf node Once the deadend is found the insertion is straightforward II eleIion in a B ST The complexity of the deletion operation is dependent upon the position of the node to be deleted in the tree It is a far more difficult operation to delete a node which is the root of two nonnull subtrees than it is to delete a leaf node The complexity of the deletion operation is proportional to the number of children of the node to be deleted 1 The node is a leaf This is the trivial case in which the appropriate reference of the parent is set to null An example is shown below Day l7l J ED gbb A BST with a leaf node The BST with the leaf node marked for deletion deleted A BST remains The node has one child This case is also not too complicated but is more so than case 1 above The parent s reference to the node is reset to refer to the deleted node s child This has the effect of lifting up the deleted node s children by one level in the tree All greatgreat grandchildren will lose one great from their kinship designations An example is shown below 6 9 o Eb Day 17 2 A BST with an internal node haVing only one child marked to be deleted The marked internal node has only a right subtree so the parent of the deleted node will now reference the deleted node s child Note that it makes no difference if the node to be deleted has only a left or a right child The previous example illustrated the case when the only child was a right child The next example illustrates the case when the only child is a left child deleted shown in green Its only child is a left child 0 Initial BST with the node to be Q The BST after the deletion has occurred 3 The last case of deletion from a BST is the most difficult to handle There is no onestep operation that can be performed since the parent s right or left reference cannot refer to both node s children at the same time There are basically two different approaches that can be used to handle this case deletion Via merging and deletion Via copying We will look at both cases II eletion via merging This technique makes one tree out of the two subtrees of the node which will be deleted and then attaches it to that node s parent The main question is how can these two subtrees be merged The nature of the BST is that every value in the right subtree is greater than every value in the left subtree so the best thing to do is find within the left subtree the node with the greatest value and make this the parent of the right subtree Symmetrically the node with the lowest value can be found in the right subtree and made a parent of the left subtree Day 17 3 The desired node is the rightmost node of the le subtree This node can be located by traversing this subtree always taking light references until a null 39 39 39 39 This means that the node which has been located has no light child and there is no danger of light reference to the light subtree Symmetry would allow us to accomplish exactly L 39 L 39 L subtree to the left subtree This operation is depicted graphically in the diagram below e 7 R001 5 Root Delete node 6 node node1e t node Jen node righe f K Rightmost node of ill left sublme node right Day 174 public void deletebyMerging int element IBSTNode tmp node p root prev null while p null ampamp pkey element find the node with key element Prev p if pkey lt element p pright else p pleft node p if p null ampamp pkey element if n0deright null n0de has no right child node n0deleft node n0deright else prepare for merging tmp n0deleft move left else if n0deleft null n0de has no left child while tmpright null now go to the right as far as possible tmp tmpright tmpright n0deright set link between rightmost node of the left subtree and the right subtree node n0deleft if p root root node else if preV left p preV left node else preVright node end if else if root null System0utprintln key element is not in the tree else System0utprintln the tree is empty Day 17 5 A 7 node 7 node Ema my 7 M I x 2 node P mp lt 7 mas A quot4 5 lt6 The problem With deletion via merging is that it is possible for the height of the tree to increase a er a deletion In some cases the resulting tree may be highly unbalanced Some cases may decrease the overall height of the tree These tWo situations are shown below height offhe resulting tree increases 0 006 node 15 to be deleted a resulting tree has greater height Dayl76 height of the resulting tree decreases Q resulting tree has decreased height node 15 to be deleted It isn t that the algorithm is inefficient but it certainly is not perfect There is however a need for an algorithm that does not give the resulting tree the chance to increase in height through deletion The second technique deletion Via copying provides us with this solution eletion via co quot Deletion via copying was proposed by Thomas Hibbard and Donald Knuth If the node has two children it can be reduced to one of two simple cases The node is a leaf or the node has only one nonempty child This can be done by replacing the key being deleted with its immediate predecessor or successor As we mentioned in the previous technique of deleting via merging a key s predecessor is the key in the rightmost node in the left subtree via symmetry its immediate successor is the key in the leftmost node in the right subtree First the predecessor has to be located This is again done by moving one step to the left by first reaching the root of the node s left subtree and then moving as far to the right as possible Next the key of the located node replaces the key to be deleted At this point is where one of the two simple cases comes into play If the rightmost node is a leaf the first case applies39 however if it has one child the second case will apply In this way deletion via copying removes a key k1 by overwriting it with another key kg and then removing the node that holds k2 whereas deletion by merging consisted of removing a key k1 along with the node that holds it Deletion by copying is shown graphically in the diagrams below as a step by step process Day 17 7 node previous A node r tmp previous 1 x3lt y Arnade K node 39 quot4 A A A A r mp 1 previous mp f previous A previous 4 5r This deletion via copying does not increase the height of the tree but it is not without its own problems The problem arises if the algorithm is applies many times along with insertions into the BST This algorithm is asymmetric it always deletes the node of the immediate predecessor of information in node This has the effect of possibly reducing the height of the left subtree it might not change the height However the right subtree is unaffected by these changes Therefore the right subtree of node can grow a er subsequent insertions and if the information in node is again deleted the height of the right tree will remain the same After y insertions and deletions the entire tree becomes right skewed unbalanced favoring the right side To prevent this problem from occurring a simple improvement to the HibbardKnuth algorithm will make the algorithm symmetric In the improved algorithm the predecessor information which is to be deleted will alternate between left and right subtrees with each deletion operation In this fashion the first deletion will delete the predeces r info ation in a node in e left subtree and the next deletion will delete the successor in the right subtree Day178 public void deleteByCopying int element IBSTNode node p root prev null while p null ampamp pkeyequalselement f1nd node p with value element prev p if pkeyisLessThanelement p pright else p pleft node p if p null ampamp pkeyequalselement if noderight null node has no right child node nodeleft else if nodeleft null node has no left child node noderight else node has two children IBSTNode tmp nodeleft IBSTNode previous node while tmpright null previous tmp tmp tmpright nodekey tmpkey overwrite reference ie the copy if previous node previousleft tmpleft node s left child s right subtree is null else previousright tmpleft if p root root node else if prevleft p prev left node else prevright node else if root null Systemoutprintln key elementtoString is not in the tree else Systemoutprintln the tree is empty Simulations have shown that an expected path length for many insertions and asymmetric deletions the unimproved version of the algorithm gives an expected internal path length IPL of n log3 n for n nodes When the improved symmetric version of the algorithm is used the expected IPL becomes n log n Use of the simpler asymmetric version is however quite acceptable as the simulations have also shown that even for a BST containing 2048 nodes approximately 15 million insertions and asymmetric deletions will need to occur before the IPL becomes worse than in a randomly generated tree Theoretical Day 17 9 results for these algorithms are only fragmentary because of the extraordinary complexity of the problem Jonassen and Knuth analyzed the problem of random insertions and deletions for a tree consisting of only three nodes which required the use of Bessel functions and bivariate integral equations and the analysis of these turned out to rank among the more difficult of all exact analyses of algorithms that have been carried out to date Thus the reliance on simulation to determine the running times of these algorithms When we first began discussing the tree data structure we made two points in favor of trees 1 the tree can easily represent hierarchical relationships and 2 searching a tree assuming a BST is much faster than searching a list As we have discussed since then the second point is not always true It is possible for a tree to become so skewed unbalanced that it deteriorates into a list In order for the second point to be valid the search tree must be well balanced A binary tree is height balanced or simply balanced if the difference in height of both subtrees of any node in the tree is either zero or one Atree is said to be perfectly balanced if it is balanced and all of the leaves are found on one or two levels For example a perfectly balanced binary tree consisting of 10000 nodes the height of this tree will be l log10001 l l3289 14 In practical terms this means that if 10000 elements are stored in a perfectly balanced tree then at most 14 nodes will need to be checked to locate a specific element This is a substantial difference when compared to the worst case of 10000 elements in a list Therefore in trees which are to be used primarily for searching it is worth the effort to either build the tree so that it is balanced or modify the existing tree so that it is balanced There are a number of techniques that have been developed to balance binary trees Some of the techniques consist of constantly restructuring the tree when elements arrive and lead to an unbalanced tree Some of them consist of reordering the data and then build the tree according to some ordering of the data which will ensure that the tree is balanced when it is constructed If the data which is used to construct a BST arrives in either ascending or descending order the tree will be skewed to the point of representing a linear list Thus if the smallest value in the data set is the first value read the root of the tree will contain only a right subtree Similarly if the largest value in the data set is entered first the root of the tree will contain only a left subtree Before looking at Day 17 10 more sophisticated algorithms to balance binary trees lets examine a very simple technique When the data arrive store all of them into an array Once all the data have arrived sort the array using an efficient sorting algorithm Once sorted the element at the midpoint of the array will become the root of the BST The array can now be viewed as consisting of two subarrays one to the left of the midpoint and one to the right of the midpoint The middle element in the left subarray becomes the left child of the root node and the middle element in the right subarray becomes the right child of the root This process continues with further subdivision of the original array until all the elements in the array have been positioned in the BST A slight modification of this would be to completely generate the left subtree of the root before generating the right subtree of the root If this is done then the very simple recursive procedure shown below can be used to generate a balanced BST An example of this algorithm s execution is shown below Dayl7 ll Stream of data39 Array of sorted data a o 1 2 3 6 b 0 2 5 5 c 3 5 6 E El While this algorithm is certainly simple it has one serious drawback all the data must be put into an array before the balanced tree can be created This algorithm will not work when the tree must be in use before all of the data have arrived However the data from an unbalanced tree can be entered into an array using an inorder traversal of the tree The unbalanced tree could then be destroyed and a new one created from the data in the array using the balance algorithm In this fashion no sort is required to put the data into order While the previous algorithm was certainly simple it was basically inefficient in that an additional array was required which commonly required sorting before the balanced tree could be created Alternatively to avoid the sorting required deconstructing an existing unbalanced tree and reconstructing the tree which is quite inefficient except for very small trees in which case their unbalanced nature is probably not a hindrance in any case There are however several algorithms which require very little additional storage for intermediate variables and use no sorting procedure The DSW algorithm developed by Colin Day and later Day17 12 improved by Quentin Stout and Bette Warren is a very elegant algorithm which falls into this category The basic building block for tree transformations in the DSW algorithm is the rotation There are two types of rotations left and right which are symmetric to one another The right rotation of the node Ch for child about its parent Par is performed according to the following algorithm rotateRight Gr Par Oh if Par is not the root of the tree i39e if Gris not null grandparent Gr ofchilalC39hqbecomes Ch s parent by replacing Par right subtree of Ch becomes left subtree of Ch s parent Par nodeCh acquires Far as its right child A right rotation is shown graphically below a 3 a AA A before rotation after rotation The heart of the right rotation is the third step in the algorithm when Par the parent of child Ch becomes the child of Ch ie when the roles of the parent and child interchange Notice that this exchange of roles does not affect the principal property of the tree namely that it is still a BST The first and second steps of the algorithm are required to ensure that after the rotation the BST property is preserved The example below should clarify this property Day l7 l3 before rotation after rotation The basic operation of the DSW algorithm is to convert an arbitrary BST into a linked listlike structure called a backbone or vine Then this elongated tree is transformed in a series of passes into a perfectly balanced tree by repeatedly rotating every second node of the backbone about its parent In the first phase the backbone is created according to the following algorithm This execution of this algorithm is shown in the diagram below Day l7 l4 10 10 10 10 10 20 15 15 15 15 IS 30 20 Z 10 20 25 4O 30 itmD 25 quotW 23 23 23 78 25 40 23 30 5 25 13 28 18 4o 30km s m 2 b c 8 40 m d1 Le 4 397 Since the rotation requires knowledge about the parent of Imp an additional reference must be maintained when the algorithm is implemented In the best case the tree is already a backbone and the while loop will execute n times and no rotation is performed In the worst case when the root does not have a right child the while loop will be executed 2nI times and n1 rotations will be performed where n is the number of nodes in the tree Thus the run time of the rst phase of the DSW algorithm is On In the second phase the backbone is transformed into a tree but this time the tree will be perfectly balanced by having leaves only on two adjacent levels In each pass down the backbone every second node is rotated about its parent One such pass decreases the size of the backbone by onehalf Only the rst pass may not reach the end of the backbone It is used to account for the difference between the number n of nodes in the current tree and the number 2ng Jl of nodes in the closest complete binary tree Thus over owing nodes are treated separately The core of the DSW algorithm is given below mam ggf m b 1131 The diagram below illustrates this part of the DSW algorithm Day 17 15 25 20 30 1o 23 28 4o 28 40 5 15 a 40 This example starts with the backbone a generated in the previous example The rst pass through the backbone to produce the backbone shown in b Now two passes are executed In each backbone the nodes to be promoted by one level by le rotations are shown as squares their parents about which they are rotated are shown as circles To compute the complexity of the tree building phase note that the number of iterations performed by the while loop equals 1gltmw WWW 7115731 Z 2 71m71gm1 l The number of rotations can now be given by the expression nrmmrlgm1nrlgm1anlgn1j Thus the number of rotations is On Because creating a backbone also required at most On rotations the cost of global rebalancing with the DSW algorithm is optimal in terms of time because it grows linearly with n and requires a very small and xed amount of stomge Day 17 l6 COP 3503 Computer Science 11 CLASS NOTES A Hamiltonian cycle in a graph is a cycle that passes through all the vertices of the graph Recall that we mentioned this type of cycle when we were examining Euler paths and Euler circuits A graph is called a Hamiltonian graph if it includes at least one Hamiltonian cycle There is no formula which characterizes a Hamiltonian graph as there was with an Euler graph However it should be obvious that all complete graphs are Hamiltonian The question is how to find a Hamiltonian cycle in a graph The answer lies in the following theorem Theorem If edge m E E graph G E U edge Vu is Hamiltonian and degreev degree i 2 I l7 then graph G is alsoHamiltonian Basically this theorem says that some Hamiltonian graphs allow us to create Hamiltonian graphs by eliminating some of the edges in the graph The theorem leads directly to an algorithm that first eXpands the original graph to a graph with more edges in which finding a Hamiltonian cycle is easy a complete graph and then manipulates the Hamiltonian cycle by adding some edges and removing other edges so that eventually a Hamiltonian cycle is formed that includes the edges that belong to the original graph An algorithm that is based on this theorem was developed by Chvatal in 1985 The algorithm is shown below HamiltonianCycle graph G V E set lable ofall edges to 0 kl HE GHZG I I while GH contains nonadjacent vertices v u where degHv degHu 2 V H H U edgevu GH V label edgevu k if there exists a Hamiltonian cycle C while k maXlabeledgepq edgepq e C gt 0 C a cycle due to a crossover with each edge labeled by a number lt k Day 26 l Consider the following example illustrating the Hamiltonian cycle problem Initial graph Complete Graph H The complete graph shown above is constructed as follows In each iteration two nonadjacent vertices are connected with an edge it the total number of their neighbors is not less than the number of all vertices in the graph First look at all the vertices not adjacent to vertex A For vertex C degHA degHC 33 6 2 I V I 6 thus edgeAC labeled with number lis includede in H Next vertex E is considered and since the degree of A just increased by 1 when it acquired new neighbor vertex B we have degHA degHE 42 6 edgeAE labeled with 2 is included in H The next vertex for which new neighbors are attempted to be established is B which is of degree 2 There are three nonadjacent vertices D E and F with degrees 2 2 and 3 respectively Therefore the sum of B s degree and the degree of any of these three vertices does not reach 6 and thus no edge is included in H In the next four iterations new neighbors are attempted for vertices Day 26 2 C D E and F For node C the only nonadjacent node is D with degree 3 so we have degHC degHD 43 7 2 V I 6 so edgeCD labeled 3 is included in H For node D the only nonadjacent node is B with degree 2 so we have degHD degHB 42 6 2 IV I 6 thus edgeBD labeled 4 is included in H For node E the nonadjacent nodes are B and F both with degree 3 so we have degHE degHB 33 2 IV I 6 thus edgeEB labeled 5 is added to H and since degHE degHF 43 7 2 IVI 6 edgemF labeled 6 is added to H Finally for node F the nonadjacent nodes are B and E both with degree 4 so we have degHF degHE 34 7 2 IV I 6 thus edgeFE labeled 6 is added to H and since degHF degHB 35 8 2 I V I 6 edgeEB labeled 7 is added to H In the second phase of the Hamiltonian cycle algorithm a Hamiltonian cycle in H is found which is A C E F D B A In this cycle an edge with the highest label is found edgeEF with label 6 as shown in the drawing below 6 quot3 The vertices in the cycle are so ordered that the vertices in this highest labeled edge are on the extreme ends So the original Hamiltonian cycle becomes F D B A C E Then moving left to right in the this new sequence of edges in the cycle we try to find crossover edges by checking edges from the two neighbor vertices to the vertices at the end of the sequence so that the edges cross each other The first possibility is vertices D and B with edgeBF and edgeDE but this pair is rejected because the label of edgeBF is greater than the largest label of the current cycle which is 6 After this the vertices B and A and the edges connecting them to the ends of the sequence edgeAF and edgeBE are checked the edges are acceptable their labels are 0 and 5 so the old cycle F D B A C E F is transformed into a new cycle F A C E B D F This is shown in the next diagram FDBACE W CIOSSOVerS Day 26 3 Modi ed Hamiltonian Cycle In this new modified cycle the edge BE has the highest label with value 5 so the cycle is presented with the vertices of this edge at the extreme ends of the sequence B D F A C E To find the crossover edges the first pair of edges edgeBF and edgeDE but the label of edgeBF 7 which is greater than the largest label of the current cycle so the pair is rejected The next pair is edge AB and edgeEF however once again the pair is unacceptable since the label on edgeEF is 6 The next possibility is the pair edgeBC and edgeAE which is acceptable since the highest label is 2 Thus the new cycle B C E A F D B is formed crossovers Day 26 4 Finally in this latest cycle a pair of crossover edges is found edgeAB and edgeDE that are acceptable since labels on both edges are 0 and a new cycle is formed which finally includes only edges with labels of 0 and thus are edges which appeared in the original graph and the algorithm terminates with the following Hamiltonian cycle including only edges from G BCEAFD W CIOSSOVCI The final Hamiltonian cycle is shown in the next diagram Final Hamiltonian Cycle BAFDEC B Day 26 5 COP 3503 Computer Science 11 CLASS NOTES Consider the three figures a c shown below A puzzle for you to solve is to reconstruct these three figures using a pencil and paper drawing each line exactly once without lifting the pencil from the paper while drawing the figure To make the puzzle even harder see if you can draw the figure following the rules above but have the pencil finish at the same point you originally started the drawing Try to do this before you read any further in the notes a b C It turns out that these puzzles have a fairly simple solution Figure a can only be drawn within the specified rules if the starting point is the lowerleft or lowerright hand corner and it is not possible to finish at the starting point Figure b is easily drawn with the finishing point being the same as the starting point see the last page of the notes for one possible solution Figure c cannot be drawn at all within the specified rules even though it appears to be the simplest of the drawings These puzzles are converted into graph theory problems by assigning a verteX to each intersection Then the edges are assigned in the natural manner The corresponding graphs are shown below a b c Day 24 l Once the puzzle has been converted into the graphs as shown above we need to find a path that Visits every edges exactly once If the extra challenge is to be solved then a cycle must be found that Visits every edge exactly once This problem was solved in 1736 by the mathematician Euler and is commonly regarded as the beginning of graph theory This problem is referred to as an Euler path or Euler circuit problem depending upon the specific problem statement The Euler path and Euler circuit problems although slightly different problems have the same basic solution and we will focus only on the Euler circuit problem For a given graph to have an Euler circuit certain properties must hold in the graph Namely since an Euler circuit must begin and end on the same vertex such a circuit is only possible if 1 the graph is connected and 2 each vertex in the graph has an even degree If any vertex were to have an odd degree then eventually you would reach the point where only one edge into that vertex is unvisited and taking that edge into that vertex would strand you at that vertex If exactly two vertices have an odd degree then a Euler path is still possible since you are not required to begin and end on the same vertex in an Euler path if the path begins on one of the odd degree vertices and ends on the other odd degree vertex If more than two vertices have an odd degree then an Euler path is not possible Looking at the puzzles from above and applying this knowledge we see that for puzzle a has only an Euler path beginning at either the lower left or lower right corners which are the two vertices with an odd degree All other vertices in this graph have an even degree of either 2 or 4 Puzzle c has neither an Euler circuit nor an Euler path since there are four vertices in this graph which have an odd degree Puzzle b however has no vertices of odd degree and thus does have an Euler circuit as well as an Euler path The necessary and sufficient condition for a graph to have an Euler circuit turns out to be exactly the conditions we have described above Thus any connected graph in which all the vertices have even degree must have an Euler circuit It also turns out that this circuit can be found in linear time The basic algorithm which is capable of performing this operation is a basic depthfirst search The basic problem that must be overcome by such an algorithm is that only a portion of the graph may have been visited before you return to the original starting vertex If all the edges coming out of the start vertex have been traversed then part of the graph will be untraversed The easiest way to fix this problem is to find the first vertex on the path which has an untraversed edge and perform another depthfirst search from this node This will give another circuit which can be spliced into the original This process is continued until all edges have been traversed As an example consider the graph shown below It should be easy for you to verify that this graph does in fact have an Euler circuit Day 24 2 Graph for Euler Circuit Example Suppose we start our Euler circuit at vertex 5 and traverse the circuit 5 4 10 5 Then we are stuck and most of the circuit is still untraversed This would look like the figure shown below with the edges that have been traversed removed from the graph 0 j 0 Q We then continue from vertex 4 which still has unexplored edges A depthfirst search might produce the path 4 l 3 7 4 ll 10 7 9 3 4 If this path is spliced into the previous path of 5 4 10 5 then we arrive at a new path of 5 4 l 3 7 4 ll 10 7 9 3 4 10 5 This graph is shown in the next figure 0 0 06gt Day 24 3 Notice that in this last graph all the vertices must have even degree so we are guaranteed to find a cycle that can be spliced into our current path This graph however might not be connected in our example it isn t but this is not important at this stage because it is our traversal process and path splicing that has produced the unconnected graph the original graph was connected The next vertex along the path which still has unvisited untraversed edges is vertex 3 again starting from the first vertex in the circuit being constructed At this point a possible circuit would be 3 2 8 9 6 3 When this is spliced into the current circuit we have the path 5 4 l 3 2 8 9 6 3 7 41110 7 9 3 410 5 This graph is shown below Finally along the current path the next node with untraversed edges is node 9 and the algorithm should find the circuit 9 12 10 9 When this final circuit is spliced into the current circuit we have 5 4 l 3 2 8 9 12 10 9 6 3 7 4 ll 10 7 9 3 4 10 5 resulting from the graph shown below 6 O Q G G 0 G Since all edges have now been traversed our circuit that has been spliced together is 5 41 3 2 8 91210 9 6 3 7 41110 7 9 3 410 5 an Euler circuit for the original graph The implementation issues that concern any algorithm which determines an Euler circuit are concerned mainly with the efficiency of the circuit splicing operation Day 24 4 To do this efficiently requires that the circuit being constructed be maintained as a linked list so that new subcircuits can be easily added to the middle of an existing circuit as we did in the example above To avoid repetitious scanning of the adjacency lists which define the graph it is best to maintain for each list a record of the last edge traversed When a path is spliced in the search for a new vertex from which to perform the next depthfirst search must begin at the start of the splice point This will guarantee that the total work performed on the vertex search phase is O I E I during the entire lifetime of the algorithm With the appropriate data structures in place the running time of an algorithm to determine the Euler circuit will be 0 I E I IV I There is another classic problem in graph theory which is closely related to the Euler circuit problem called the Hamiltonian cycle problem The Hamiltonian cycle problem requires finding in an undirected graph a simple cycle which visits every vertex in the graph Although this problem seems very similar to the Euler circuit problem in which every edges must be visited there is no known algorithm which is guaranteed to run in polynomial time that solves the Hamiltonian cycle problem The best known algorithms to solve this problem will require exponential time for some input graphs Consider a similar problem To prove that a given graph has a Hamiltonian cycle you need to simply produce one proof by example However to prove that a given graph does not have a Hamiltonian cycle is a much more difficult task It is so difficult that no one has yet demonstrated a polynomial time algorithm to solve this problem The only solutions so far require the enumeration of every cycle in the graph and checking these cycles one by one an exponential in terms of the number of vertices time algorithm For those of you who have had Discrete Structures this means that the NonHamiltonian cycle problem is not known to be in the NP Now we ll consider the problem of finding for a given undirected graph the minimum spanning tree see the earlier set of notes defining graph terminology for the definition of the minimum spanning tree This problem can also be applied to directed graphs but this is a more difficult problem so we will focus on undirected graphs A minimum spanning tree for a graph G exists iff G is connected A properly implemented algorithm should be able to detect the connectedness of G and handle it appropriately but we will assume that G is a connected graph Consider the two graphs shown below in Figures a and b Figure b represents the minimum spanning tree for the graph of Figure a In this particular example the minimum spanning tree happens to be unique this can happen in certain graphs but it is unusual as most graphs may have several minimum spanning trees Day 24 5 7 1 Figure a 7A weighted undirected graph 4 X 2 0 2 a 4 6 C 1 C Figure b i The minimum spanning treefor Figure a above Notice that the number of edges in the minimum spanning tree is IV I 1 The minimum spanning tree is a tree because it is acyclic it is spanning because it covers every vertex and it is minimum for the obVious reason Before we go any further I can hear a few of you asking why is this important beyond a theoretical interest The answer is this suppose that you are an electrician and your job is to wire a new house for electricity How much cable do you order you want to do the job with the minimum amount of cable to keep your costs down The problem you need to solve all other constraints ie county codes aside is a minimum spanning tree Suppose you want to visit a number of di erent cities which you can get to in using a variety of di erent routes you would like to use as little 145gallon gasoline as possible you need to nd the minimum spanning tree for your set of cities Thus this type of graph theoretical question does in fact have real practical applications This is why its important Day 24 6 For any spanning tree T if an edge e that is not in T is added a cycle will be created The removal of any edge on the cycle will reinstate the spanning tree property The cost of the spanning tree is lowered if e has a lower cost than the edge that was removed If as a spanning tree is created the edge that is added is the one with the minimum cost the creation of the cycle will be avoided and the cost associated with the tree cannot be improved because any replacement edge would have an associated cost of at least as much as the edge already included in the spanning tree This leads us to the simple conclusion that a greedy approach can be applied to create a minimum spanning tree There are several different algorithms that construct minimum spanning trees They differ primarily in the technique which is employed to select the next edge to be included in the spanning tree We will examine only one of these algorithms called Prim s Algorithm named for its initial developed R C Prim working at Bell Labs in 1957 in the support of the development of the U S telephone system Prim s technique is to grow the minimum spanning tree MST in successive stages In each stage one node is selected from a forest of nodes trees and is selected as the root and an edge is added with its associated vertex to the minimum spanning tree being constructed At any point during the execution of the algorithm there will be a set of vertices which have been included in the MST and a set of vertices that are not yet included in the MST The algorithm will find at each stage a new vertex to include in the MST by choosing the edge u v such that the cost of u v is the smallest among all edges where u is in the MST and v is not Notice as Prim s technique is illustrated that the technique used by Prim to find the MST and that used by Dijkstra to find the shortest path are very similar in their operation To illustrate Prim s algorithm we ll use the graph of Figure a from the previous page As with Dijkstra s algorithm Prim s algorithm uses a table to drive the generation of the MST The initial table for this graph is shown below vertex visited um vertex calming welght change to mm welght 1 F 0 0 2 F 00 0 3 F 00 0 4 F 00 0 5 F 00 0 6 F 00 0 7 F 00 0 Initial table for Prim s algorithm for graph of Figure a above Day 24 7 In the table the minimum weight column indicates the weight associated with the minimum cost edge connected the vertex in this row to a vertex which has been visisted We will assume that the algorithm begins constructing the MST starting with vertex 1 This will select vertex 1 and then the values for vertices 2 3 and 4 will be updated those vertices adjacent to vertex 1 This will produce the following table vertex visited um vertex calming welght change to mm welght l T 0 0 2 F 2 l 3 F 4 l 4 F l l 5 F 00 0 6 F 00 0 7 F 00 0 Table after selection of vertex 1 as the initial vertex Based upon the greedy approach the minimum cost edge will be chosen as the edge 1 4 Thus the table is updated to indicate that vertex 4 has been selected and edges adjacent to vertex 4 will be examined but not vertex 1 because it has been marked as ccvisited Notice in this next table that the minimum weight associated with vertex 2 remains unchanged since its current weight is 2 and the weight from vertex 4 to vertex 2 is 3 which is larger All the other entries are updated however since lower costs are found from vertex 4 This is re ected in the next table vertex Visited minimum vertex causing welght change to mm welght l T 0 0 2 F l 3 F 2 4 4 T l l 5 F 7 4 6 F 8 4 7 F 4 4 Table after vertex 4 has been selected The next vertex chosen will be 2 notice that this is arbitrarily chosen breaking the tie with vertex 3 which is also not visited but has the same cost as vertex 2 Selection of vertex 2 does not cause any changes in the table since the only one it could affect is vertex 5 since vertex 3 is adjacent only to l 4 and 5 and l and 4 have been visited and the distance to 5 from 2 is 10 which is greater than the Day 24 8 current distance to 5 which is 7 So next vertex 3 is selected and this will cause the minimum weight to vertex 6 to be update to 5 since this is the weight associated with the edge 3 6 which is less than the current distance associated with vertex 6 This is shown in the next table vertex visited um vertex aiming welght change to mln welght l T 0 0 2 T 2 l 3 T 2 4 4 T l l 5 F 7 4 6 F 5 3 7 F 4 4 Table after vertex 2 and then vertex 3 are visited Next vertex 7 will be selected as the current vertex causing an update to both vertices 5 and 6 illustrated in the next table vertex visited um vertex aiming welght change to mln welght l T 0 0 2 T 2 3 T 2 4 4 T l l 5 F 6 7 6 F l 7 7 T 4 4 Table after vertex 7 is selected Next vertex 6 is selected which causes no updates to the table Then finally vertex 5 is selected which also causes no updates to the table The final table which results from this algorithm is shown below vertex visited minimum weight vertex causing change to min weight 1 T 0 0 2 T 2 3 T 2 4 4 T l l 5 T 6 7 6 T l 7 7 T 4 4 Day 24 9 The edges from G which comprise the MST for G are read directly from the final version ofthe table 2 1 cost 2 3 4 cost 2 4 1 cost 1 5 7 cost 6 6 7 cost 1 and 7 4 cost 4 The total cost associated with the MST is 16 For every edge uv in E for an undirected graph G the edge v u is also in E the running time for Prim s algorithm is O I V I 2 without using binary heaps which is optimal for dense graphs and O I E I log I V I using binary heaps which is suitable for sparse graphs Another very wellknown algorithm which generates the MST for a weighted undirected graph G is Kruskal s algorithm This is a similar greedy strategy but continually selects edges in the order of smallest weight first and accepts such an edge as part of the MST only if its inclusion does not cause a cycle In practice Kruskal s algorithm is slightly more efficient that Prim s algorithm As we mentioned earlier there are many many problems to which graph theory can be applied We have only scratched the surface of the application of the graph data structure We ll conclude our look at graphs with an application of acyclic graphs which is commonly applied to synchronization problems called a topological sort A topological order is an ordering of the vertices of a directed acyclic graph such that if there is a path from vertex a to vertex b then a appears before b in the topological ordering of the vertices This situation can be used to model university course prerequisites in which case an edge a b indicates that course a must be completed before enrollment in course b is allowed A topological order of the vertices representing the courses would be any sequence of the vertices that does not violate the prerequisite conditions A topological sort produces a topological ordering of a directed acyclic graph An arbitrary DAG may have many topological orderings and typically we are interested in producing only one of those which might be possible Certain situations particularly related to process or transaction synchronization may require producing a specific topological ordering rather than a general topological ordering To illustrate the concept of topological ordering consider modeling a set of processes transactions which are concurrently executing operations against a database The various processes are simultaneously executing either read or write operations on specific objects data within the database The sequence of operations actually performed on the database must proceed in such a way that no transaction is reading obsolete data or writing data which has been previously updated but done so by a process which read the data after the updating process Day 24 10 read that data Basically what this really means is that any concurrent schedule of x transactions can be allowed to change the state of the database only if that concurrent schedule is equivalent to some serial schedule of the same x transactions A serial schedule is one in which each transaction in the schedule is run in isolation from start to finish without any transactions concurrently executing To clarify why this situation can cause problems consider the following scenario of only two concurrent processes and their actions on the database Time t Transaction T1 reads object A let A 4 Time t1 Transaction T2 reads object A A 4 Time t2 Transaction T1 writes object A assume update to A 10 Time t3 Transaction T2 writes object A assume update to A 20 Time t4 Database state is incorrect At time t4 the state of the database is inconsistent since the update of A that was made by T2 was performed on a value of A that was read by T1 before T2 read A and T1 subsequently updated the value of A Therefore T2 is operating with an obsolete value of A Database synchronization will require that something be done to prevent this from occurring There are many ways to do this but this isn t a database course so we ll just look at how to realize there is a problem and not what to do about the problem You ll see how to handle this in COP 4710 In the example above the two concurrent transactions executed the concurrent schedule T1 T2 T1 T2 Since there are only two transactions in the schedule there are only two possible serial schedules which might be equivalent to the concurrent schedule one is T1 T2 and the other is T2 T1 In the example above the concurrent schedule is not equivalent to either of these serial schedules and thus the concurrent schedule is said to be nonserializable and therefore invalid and the system cannot allow the transactions to proceed in this fashion and must rectify this situation Suppose that we model a situation such as that described above with a graph where the vertices of the graph represent the transactions which are concurrently executing The edges of this graph represent the sequencing of how the transactions have interacted in the concurrent schedule Notice that we are working strictly with a DAG here If there happens to be a cycle in the graph this would represent a con ict in the schedule We ll illustrate this later Do not worry about how this graph was constructed in general it will be constructed from information available from the OS Again as before the meaning of an edge a b in this graph is that transaction a precedes transaction I Day24ll 3 9 A graph representing a concurrent schedule of 9 transactions Since the graph above is a DAG can you confirm that it is in fact a DAG a topological sort of the graph will produce a topological ordering of the transactions in the concurrent schedule which will be equivalent to a serial schedule of the transactions In other words the nine transactions have interacted with the database in an interleaved concurrent fashion and what we are now going to find out is the serial equivalent to the concurrent schedule Given a DAG there must be at least one vertex which has no edges incident upon it ie its in degree is 0 Note if there does not eXist such a node the graph must contain a cycle A topological sort begins at this node If there is more than one such node one is arbitrarily chosen The topological sort then removes this node from the graph and all the edges which emanate from that node This will produce a graph which has one less node than the original graph the edge set is also reduced but this is not a concern One property of acyclic graphs that is important for a topological sort to function properly is that the removal of any node and the edges which emanate from that node cannot induce a cycle in the resulting graph Therefore we know that the resulting graph is also a DAG and it must now contain at least one node with in degree 0 and this node will be selected next Consider the graph from above and how this process works as shown in the sequence of diagrams below Day 24 12 Original graph with in degree 0 nodes T4 and T6 in contrasting color The topological sort begins by removing arbitrarily selected one of the two in degree 0 nodes and all emanating edges from the graph In this case we will select node T4 for removal Thus our topological sort has begun and our topological ordering begins with T4 Let s call this ordering TS TS T4 The graph which results from this removal is shown below Resulting graph after removing T4 In degree 0 nodes are T1 T8 and T6 For the next step let s arbitrarily select T6 for removal This causes T6 to be added to TS and the graph to change to the one shown below TS T4 T6 Day24 l3 69 6 a Next we select node T1 for removal causing the changes TS T4 T6 T1 and the graph gt Next T8 is selected the only option at this point So TS T4 T6 T1 T8 and the graph becomes R 69 Day 24 14 Next T5 must be selected causing TS to become T4 T6 T1 T8 T5 and the graph to become 9 3963 Next T9 is arbitrarily selected so that TS T4 T6 T1 T8 T5 T9 and the graph to become 6 39 T2 is the next to be selected and removed causing TS to become T4 T6 T1 T8 T5 T9 T2 and the graph to become Next T3 is removed which will leave T7 as the single node to be removed last So our final TS T4 T6 T1 T8 T5 T9 T2 T3 T7 Thus the topological sort of the graph has produced the topological ordering of T4 T6 T1 T8 T5 T9 T2 T3 T7 This is a serial schedule equivalent to the concurrent schedule in which the transactions actually executed So the database will be left in the state which is equivalent to executing T4 from start to finish followed by executing T6 from start to finish and so on Notice that since we were able to arbitrarily chose from different nodes during the sorting process that the topological ordering we have produced is not unique You should try to generate some of the other topological orderings that this graph contains Day24 15 Start is on node a with the edges labeled in the order in which they are drawn Finishing on node a to complete the Euler circuit Day 24 16 COP 3503 Computer Science 11 CLASS NOTES The mergesort sorting algorithm uses the divide and conquer strategy in which the original problem is split into two halfsize recursively solved problems If the overhead of the base case was linear ON then the overall running time of the algorithm was ON log N The mergesort is such an algorithm and is commonly employed for external sorting The mergesort algorithm is a recursive subquadratic algorithm as follows 1 if the number of items to sort is 0 or 1 return 2 recursively sort the first and second halves separately 3 merge the two sorted halves into a single sorted group Since this algorithm uses the divide and conquer strategy and employs the halving principle we know that the sorting is done in Olog2N and thus we need only to show that merging two sorted groups into a single sorted group can be performed in linear time to prove the running time is ON log N A linear merge algorithm requires three separate arrays A B and C two input and one output plus an indeX counter per array actr bctr and cctr The indeX counters are initially set to the first position in each of their arrays with incrementation as follows if Aactr lt Bbctr Ccctr Aactr39 cctr39 actr39 else Ccctr Bbctr cctr39 bctr Daylll Example Linear Merge A 13l4l7 T actr B 25911 T bctr 1 347 T 25911 T 13l4l7 T 25911 T T 12 T 13l4l7 T 25911 T 123 T 13l4l7 T 25911 T 1234 T 13l4l7 T 25911 T 12345 T 13l4l7 T 25911 T 123457 T 13l4l7 T 25911 T 123457 9II T 25911 123457 911 13l4l7 T DayllZ The mergesort is not often used as an internal sorting method The reason is that the output array represents a linear increase in the memory requirements and additional work is required to copy the components of the arrays With an average running time of ON log N quicksort is the fastestknown sorting algorithm Quicksort has a worst case running time of ON2 which can be made statistically impossible to achieve Quicksort is a recursive algorithm whose performance is easy to prove yet has a tricky implementation since slight variations in the code can make significant differences in the running time The quicksort algorithm is as follows Call QuicksortA if the number of elements to be sorted is 0 or 1 return pick any element v in the array A this is the pivot element partition Av into two disjoint groups Left xeAv X S v and Right xeAv X 2 v Return the result of QuicksortLeft followed by v followed by QuicksortRight U Notes base case includes possibility that the number of elements is 0 since the recursive calls may generate empty subsets Any element can theoretically be the pivot element although in reality the pivot element is not randomly chosen The partitioning must be performed in place so that no additional arrays are required Quicksort outperforms mergesort because the time required to partition the array is less than the time required to merge two arrays Best Case The partitions are always 12 the size of the input at each recursive step thus by the halving principle the sorting component requires Olog2 N and with linear overhead for the base case we have ON log N which is equal to that of the mergesort Worst Case The partitions are very lopsided meaning that either L 0 or nl or R nl or 0 at each recursive step Suppose that TN is the time for Quicksort on array of N elements Left contains no elements Right contains all of the elements except the pivot element this means that the pivot element is always chosen to be the smallest element in the partition 1 time unit is required to sort 0 or 1 Dayll3 elements and N time units are required to partition a set containing N elements Then if N gt 1 we have TNTN1N This means that the time required to quicksort N elements is equal to the time required to recursively sort the Nl elements in the Right subset plus the time required to partition the N elements By telescoping the equation above we have TN TN1 N TN1 TNZ Nl TNZ TN3 N2 HIT2T12 TNT123 4 NNN12ON2 Therefore you never want to select a pivot element that leads to an unbalanced paritioning Average Case If each partition is equally likely to contain 0 l 2 Nl elements then the average running time of the quicksort algorithm is ON log N More formally this is stated as TLeftaverage TRightaverage TO Tl T2 TN lN TNaverage TLeftaverage TRightaverage N 2TLeftaverage N 2TO Tl T2 TNlN N with manipulation you arrive at TNN1 TNlN 2Nl Telescoping yields TNNl 2l 12 13 lNl 52 which is Olog2 N Therefore multiplying both side by Nl gives TN ON log N Dayll4 Picking the Pivot Don t do this randomly or by picking the first element or even by picking the larger of the first two elements A safe way is to set low first element and high last element and calculating lowhigh2 An even better way is to pick the median of three values low middle and high We have seen that quicksort has a best case performance of ON log N The question becomes can we do better The bottom line is any algorithm that sorts which uses only element binary comparisons will require QN log N time in the worst case This means that any algorithm that sorts by using element binary comparisons must use at least roughly N log N comparisons for some input sequence This is also true for the average case performance Consider the problem of sorting the sequence S a b c composed of three distinct items that is a 7393 b and a 7393 c and b 7393 c The figure below illustrates a possible sorting algorithm in the form of a binary decision tree Each node of the decision tree represents one binary comparison in each node of the tree exactly two elements are compared Since there are two possible outcomes for each comparison each nonleaf node in the tree will be of degree two Suppose that a lt b lt c in our example Consider how the algorithm determines this fact Y N Y I N Y N Y i N cltaltb bltaltc Y N altcltb bltclta cltblta altbltc A Decision Tree for Comparison Sorting Ddl5 For the example the first comparison compares a and b which will reveal that a lt b since we assumed a lt b lt c The second comparison compares a and c which will determine that a lt c At this point it has been determined that a lt b and a lt 0 yet the relative order of b and c has not yet been determined Therefore a third comparison is required to determine the relative order of b and 0 Notice that the algorithm works correctly in all cases because every permutation of the sequence S appears as a leaf node in the decision tree Furthermore the number of comparisons required in the worst case is equal to the height of the decision tree Any sorting algorithm that uses binary element comparisons can be represented by a binary decision tree The height of that decision tree will determine the worst case running time of the algorithm In general the size and shape of the decision tree depends on the particular sorting algorithm and the number of elements to be sorted Given an input sequence of n items to be sorted every binary decision tree that correctly sorts the input sequence must have at least n leaves one for each permutation of the input Therefore since the maXimum number of leaf nodes in a binary tree of height h is 2h that the height of the binary decision tree is at least flogz ml Therefore n n2 Ilogznlzlogzn22ogzi 2 ZIoanZ 2 n2092 n2 Qnlog n i1 i1 Since the height of the decision tree is Qn log n the number of comparisons done by any sorting algorithm that sorts using only binary comparisons is Qn log n Assuming that each comparison can be done in constant time the running time of any such algorithm is Qn log n The ON2 limit for sorting based upon inversion removal for adjacent elements is too costly for large sorts and must be broken down to improve efficiency and decrease runtimes Several different techniques have been developed which involve the basic divide and conquer strategy some more subtly than others The diminishing increment sort developed by Donald Shell is one such sort as are the merge sort and quick sort techniques The basic approach to Shell s sort is given by the following pseudocode algorithm Dayll6 divide data into h subarrays fori 1ish i sort subarray data sort array data If h is too small then the subarray dataw could be too large and the sort will remain inefficient On the other hand if h is too large then too many small subarrays are created and although they are sorted the overall order of the original array will remain largely unchanged If only one partition of the original array is performed the gain in execution time will be slight To solve this problem several different subdivisions must be used and for every subdivision the same procedure will be applied separately This is shown in the pseudocode algorithm shown below determine numbers ht h1 of ways of dividiny array data into subarrays for h ht39 tgt l39 t h ht divide data into h subarrays fori l39 i Sh i sort su barray data sort array data39 This is the basic idea behind the Shell sort The division of the array into several subarrays is done in such a fashion that elements spaced further apart are compared first then the elements closer to each other are compared and so on until adjacent elements are compared on the last pass The original array is subdivided into subarrays by picking every hth element as part of one subarray Therefore there are h subarrays and for every h l h datahthi dataht gtlti h 1 For example if h 3 the array data will be subdivided into three subarrays data 1 data 2 and data 3 so that data310 data0 data31l data3 data31 2 data6 data31i data3i data320 datal data32l data4 data32 2 data7 data32i data3ll data330 data2 data33l data5 data33 2 data8 data33i data3l2 Dayll7 If h 3 the process of extracting subarrays and sorting them is called a 3sort If h 3 the process is called a 5sort and so on 7 Shell Sort View 5 subarrays 35 before sorting 5 subarrays 35 after sorting Dayll8 3 subarrays before sorting 3 subarrays after sorting shown as insertion sort Day119 IGeneric Java Implementation of Shell Sor d void Shellsort Object data int i j k h hCnt increments new int2039 Comparable temp39 create an appropriate number of increments h for h l i 039 h lt datalength39 i incrementsi h h 3h l39 loop on the number of different increments h for i39 i 2 039 i h incrementsi39 loop on the number of subarrays h sorted in the ith pass for hCnt h39 hCnt lt 2h39 hCnt insertion sort for subarray containing every hth element of array data for j hCnt39 j lt datalength39 temp Comparabledataj39 kL while kh Z 0 ampamp tempcompareTodatakh lt 0 datak datakh39 k h39 datak temp39 jm The quicksort algorithm was developed by CAR Hoare and is a recursive divide and conquer approach to sorting It is also the fastest known sort The original array is divided into two subarrays the first of which contains elements less than or equal to a chosen key called the bound or pivot The second array includes elements equal to or greater than the pivot The two subarrays can be sorted separately but before this is done the partition process is repeated for both subarrays As a result two new pivots are chosen one for each subarray The four subarrays are created because each subarray in the first phase is not divided into two segments This process of partitioning the subarrays is continued until there are only onecell arrays which do not require sorting by default an array of one Day 11 10 element is sorted Through the process of dividing task of sorting a large array into two simpler tasks and then further dividing those tasks into even simpler tasks it turns out that in the process of getting prepared to sort the data have become sorted Quicksort is an inherently recursive algorithm because it is applied to both subarrays of an array at each level of partitioning A psuedocode version of quicksort is shown below Pseudocode version quicksort array if array length gt I chose pivot partition array into subarrayl and subarray2 while there are elements left in array if element lt pivot include element in either subarrayl e1 e1 S pivot or in subarray2 e1 e1 2 pivot quicksort sub array 1 quicksortsubarray2 To partition the array two operations need to be performed 1 a pivot has to be found and 2 the array must be scanned to place the elements into the proper subarrays Choosing a good pivot is not a trivial task The main problem is that the two subarrays should contain approximately the same number of elements A number of different strategies eXist for selecting a pivot One of the simplest consists of choosing the first element of an array For some situations this technique might suffice however since many arrays to be sorted already have many elements in the proper positions a safer approach would be to chose the element in the middle of the array Selecting the element which is the median of the elements in the array is the safest approach but is in general too costly and is simulated with median of three or median of five techniques The task of scanning the array and dividing the elements between the two subarrays is rather vague in the pseudocode algorithm shown above and is in many ways implementation dependent Notice in particular that the algorithm above does not specifically indicate to which subarray an element equal to the pivot belongs This is done so that such elements can be used to balance the number of elements in the subarrays Shown below is a Java implementation of a quicksort algorithm followed by an example of how the quicksort algorithm functions Day 11 11 IGeneric Java version of quickSOrtl void quicksort Object data int first int last int lower first 1 upper last39 swapdata first firstlast239 Comparable pivot Comparable datafirst39 while lower lt upper while ComparabledatalowercompareTopivot lt 0 lower39 while pivotcompareTodataupper lt 00 upperquot if lower lt upper swapdata lower upper39 else lower39 swapdata upper first if first lt upper1 quicksortdata first upper139 if upper1 lt last quicksortdata upper1 last void quicksortObject data if datalength lt 2 return int max 039 find the largest element and put it at the end of data for int i 139 i lt datalength39 i if ComparabledatamaxcompareTodatai lt 0 max 139 swapdata datalength1 max largest element now in its place quicksortdata 0 datalength239 final position The next two figures illustrate an example of the operation of the quicksort algorithm on the data array containing 8 5 4 7 6 1 6 3 8 12 10 In this example the pivot is used as a boundary item and is placed on the borderline between the two subarrays obtained as a result of one call to quicksort In this fashion the pivot is located in its final position and can be excluded from further processing To ensure that the pivot is not moved around it is placed in the first position and after the partitioning process is completed it is moved to its proper location which will be the rightmost position in the first subarray In the partitioning process the largest element the 12 is located and interchanged with Day 11 12 the last element in the array This results in the array 8 5 4 7 6 1 6 3 8 10 12 Since the last element is already in its proper position it requires no further processing In the first partitioning lower 1 and upper 9 and the first element of the array8 is swapped with the pivot 6 in position 4 so the array becomes 6 5 4 7 8 1 6 3 8 10 12 see part b of the figure In the first iteration of the outer while loop the inner while loop moves lower to position 3 with 7 which is greater than the pivot The second inner while loop moves upper to position 7 with 3 which is less than the pivot see part c Next the elements in these two cells are interchanged producing the array 6 5 4 3 8 1 6 7 8 10 12 see part d Then lower is incremented to 4 and upper is decremented to 6 see part e This will conclude the first iteration of the outer while loop In its second iteration neither of the two inner while loops modifies any of the two indicies because lower indicates a position occupied by 8 which is greater than the pivot and upper indicates a position occupied by 6 which is equal to the pivot The two numbers are swapped see part f and both indicies are updated to 5 see part g In the third iteration of the outer while loop lower is moved to the next position containing 8 which is greater than the pivot and upper stays at the same position because the 1 in this position is smaller than the piovt see part h But at that point lower and upper cross each other so no swapping occurs and after a redundant increment of lower to 7 the outer while loop terminates At this point upper is the index of the rightmost element of the first subarray with the element not exceeding the pivot so the element in this position is swapped with the pivot see part i In this fashion the pivot is placed in its final position and is excluded from subsequent processing The two subarrays that are processed next are the left subarray with elements to the left of the pivot and the right subarray with elements to the right of the pivot see part 139 Then the partitioning of these two subarrays begins separately and then for the subarrays of these subarrays until the subarrays have less than two elements This entire process is summarized in the second of the two figures that follow The worst case occurs if in each invocation of quicksort the smallest element of the array is chosen as the pivot To see this try quicksort on the array 5 3 1 2 4 6 8 The first pivot will be 1 causing an empty subarray and the subarray 3 5 2 4 6 to be formed The new pivot is 2 which again will form an empty subarray plus the array 5 3 4 6 Thus the algorithm operates on arrays of size n1 n2 2 The partitions require n2 n3 1 comparisons This results in a run time of ON2 The best case occurs when the pivot divides the array into two subarrays of equal size and will be ON log N Day 11 13 Example Paritioning theArray 8 5 4 7 6 1 6 3 8 12 10 with Quicksort 547816381012 03 mile Aer 5 5478153810J2 m 103m tape 1 6 5 4 3 x I 6 7 8 10 12 d ume 6 5 4 3 8 1 a 7 s 10 12 eJ we up pex T lower upper 6 5436L87XIUIZ A lower Ipper D 54361878102 h Y7 upper lower T upper lower Day 11 14
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'