### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Data Str Algrm Anl CSC 517

UM

GPA 3.61

### View Full Document

## 74

## 0

## Popular in Course

## Popular in ComputerScienence

This 17 page Class Notes was uploaded by Ms. Nathen O'Keefe on Thursday September 17, 2015. The Class Notes belongs to CSC 517 at University of Miami taught by B. Rosenberg in Fall. Since its upload, it has received 74 views. For similar materials see /class/205765/csc-517-university-of-miami in ComputerScienence at University of Miami.

## Reviews for Data Str Algrm Anl

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/17/15

Skip Lists A Probabilistic Alternative to Balanced Trees Skip lists are a alata structure that can be useal in place of balanceal trees Skip lists use probabilistic balancing rather than strictly enforced balancing anal as a result the algorithms for insertion anal deletion in skip lists are much simpler anal significantly faster than equivalent algorithms for balanceal trees William Pugh Binary trees can be used for representing abstract data types such as dictionaries and ordered lists They work well when the elements are inserted in a random order Some sequences of operations such as inserting the elements in order produce degenerate data structures that give very poor performance If it were possible to randomly perrnute the list of items to be in serted trees would work well with high probability for any in put sequence In most cases queries must be answered on line so randomly perrnuting the input is impractical Balanced tree algorithms rearrange the tree as operations are performed to maintain certain balance conditions and assure good perfor mance Skip lists are a probabilistic alternative to balanced trees Skip lists are balanced by consulting a random number gen erator Although skip lists have bad worstcase performance no input sequence consistently produces the worstcase per formance much like quicksort when the pivot element is cho sen randomly It is very unlikely a skip list data structure will be signi cantly unbalanced eg for a dictionary ofmore than 250 elements the chance that a search will take more than 3 times the expected time is less than one in a million Skip lists have balance properties similar to that of search trees built by random insertions yet do not require insertions to be random Balancing a data structure probabilistically is easier than explicitly maintaining the balance For many applications skip lists are a more natural representation than trees also leading to simpler algorithms The simplicity of skip list algo rithms makes them easier to implement and provides signi cant constant factor speed improvements over balanced tree and selfadjusting tree algorithms Skip lists are also very space ef cient They can easily be con gured to require an average of l 13 pointers per element or even less and do not require balance or priority information to be stored with each node SIGP LISTS We might need to examine every node of the list when search ing a linkedlist Figure la If the list is stored in sorted order and every other node of the list also has a pointer to the node two ahead it in the list Figure lb we have to examine no more than rt2l 1 nodes where n is the length ofthe list Also giving every fourth node a pointer four ahead Figure le requires that no more tlranlrI4l 2 nodes be examined If every 2quot1h node has a pointer 2i nodes ahead Figure 1d the number of nodes that must be examined can be reduced to llog2 nl while only doubling the number of pointers This data structure could be used for fast searching but insertion and deletion would be impractical A node that has k forward pointers is called a level k node If every 2quot1h node has a pointer 2i nodes ahead then levels of nodes are distributed in a simple pattern 50 are level 1 25 are level 2 125 are level 3 and so on What would happen if the levels of nodes were chosen randomly but in the same roportions eg as in Figure 1e A node s iLh forward pointer instead of pointing 2quot 1 nodes ahead points to the next node of level i or higher Insertions or deletions would require only local modi cations the level of a node chosen randomly when the node is inserted need never change Some arrangements of levels would give poor execution times but we will see that such arrangements are rare Because these data structures are linked lists with extra pointers that skip over intermediate nodes I named them skip lists SIGP LIST ALGORITHJVIS This section gives algorithms to search for insert and delete elements in a dictionary or symbol table The Search opera tion returns the contents of the value associated with the de sired key or failure if the key is not present The Insert opera tion associates a speci ed key with a new value inserting the key if it had not already been present The Delete operation deletes the speci ed key It is easy to support additional oper ations such as nd the minimum key or nd the next key Each element is represented by a node the level of which is chosen randomly when the node is inserted without regard for the number of elements in the data structure A level i node has i forward pointers indexed l throughi We do not need to store the level of a node in the node Levels are capped at some appropriate constant MaxLevel The level of a list is the maximum level currently in the list or 1 if the list is empty The header of a list has forward pointers at levels one through MaxLevel The forward pointers of the header at levels higher than the current maximum level of the list point to NIL FIGURE 1 Linked lists with additional pointers Initialization An element NIL is allocated and given a key greater than any legal key All levels of all skip lists are terminated with NIL A new list is initialized so that the the level of the list is equal to l and all forward pointers of the list s header point to NIL Search Algorithm We search for an element by traversing forward pointers that do not overshoot the node containing the element being searched for Figure 2 When no more progress can be made at the current level of forward pointers the search moves down to the next level When we can make no more progress at level 1 we must be immediately in front of the node that contains the desired element if it is in the list Insertion and Deletion Algorithms To insert or delete a node we simply search and splice as shown in Figure 3 Figure 4 gives algorithms for insertion and deletion A vector update is maintained so that when the search is complete and we are ready to perform the splice updatei contains a pointer to the rightmost node of level i or higher that is to the left of the location of the inser tiondeletion If an insertion generates a node with a level greater than Searchlist searchKey x listaheader loop invariant xakey lt searehKey fori listalevel downto1 do while xafonNardi gtkey lt searchKey do x xaforward W xakey lt searehKey Sxaforward akey x xaforwardfl if xakey searchKey then return xavalue else return failure FIGURE 2 Skip list search algorithm the previous maximum level of the list we update the maxi mum level of the list and initialize the appropriate portions of the update vector A er each deletion we check if we have deleted the maximum element of the list and if so decrease the maximum level of the list Choosing a Random Level Initially we discussed a probability distribution where half of the nodes that have level i pointers also have level il point ers To get away from magic constants we say that a fraction p of the nodes with level i pointers also have level il point ers for our original discussion p 12 Levels are generated randomly by an algorithm equivalent to the one in Figure 5 Levels are generated without reference to the number of ele ments in the list At what level do we start a search De ning Ln In a skip list of 16 elements generated witle l2 we might happen to have 9 elements of level 1 3 elements of level 2 3 elements oflevel 3 and 1 element oflevel 14 this would be very unlikely but it could happen How should we handle this If we use the standard algorithm and start our search at level 14 we will do a lot ofuseless work Where should we start the search Our analysis suggests that ideally we would start a search at the level L where we expect lp nodes This happens whenL loglp n Since we will be referring frequently to this formula we will use Ln to denote loglp n There are a number of solutions to the problem of deciding how to handle the case where there is an element with an unusually large level in the list Don t worry be happy Simply start a search at the highest level present in the list As we will see in our analysis the probability that the maximum level in a list of n elements is signi cantly larger thanLn is very small Starting a search at the maximum level in the list does not add more than a small constant to the expected search time This is the approach used in the algorithms described in this paper Search path updatei gtforwardi list after insertion updated pointers in grey FIGURE 3 Pictorial description of steps involved in performing an insertion Use less than you are given Although an element may con tain room for 14 pointers we don t need to use all 14 We can choose to utilize only Ln levels There are a number of ways to implement this but they all complicate the algo rithms and do not noticeably improve performance so this approach is not recommended F ix the dice If we generate a random level that is more than one greater than the current maximum level in the list we simply use one plus the current maximum level in the list as the level of the new node In practice and intuitively this change seems to work well However it totally destroys our ability to analyze the resulting algorithms since the level of a node is no longer completely random Programmers should probably feel free to implement this purists should avoid it Determining MaxLevel Since we can safely cap levels at Ln we should choose MaxLevel LN where N is an upper bound on the number ofelements in a skip list lfp 12 usinthrxLevel 16 is appropriate for data structures containing up to 216 elements ANALYSIS OF SIGP LIST ALGORITHJVIS The time required to execute the Search Delete and Insert operations is dominated by the time required to search for the appropriate element For the Insert and Delete operations there is an additional cost proportional to the level of the node being inserted or deleted The time required to nd an element is proportional to the length of the search path which is de termined by the pattern in which elements with different levels appear as we traverse the list Probabilistic Philosophy The structure of a skip list is determined only by the number randomLevelO lvl l randomO that returns a random value in 01 while randomo lt p and M lt MaxLevel do lvl lvl 1 return v lnsertlist searchKey newValue local update l MaxLevel x listaheader fori listalevel downto1 do while xafonNardi gtkey lt searchKey do x xaforward xakey lt searchKey Sxaforward jakey updatei x x xaforwarclfl if xakey searchKey then xavalue newValue else lvl randomLevelO if lvl gt listalevel then fori listalevel l to lvl do updatei listaheader listalevel lvl x makeNoclelvl searchKey value fori 1 to level do xaforward updateiaforwardi updateiaforwardi x Deletelist searchKey local update l MaxLevel x listaheader fori listalevel downto1 do while xafonNardi gtkey lt searchKey do x xaforward updatei x x xaforwarclfl if xakey searchKey then fori 1 to listalevel do if updateiaforwardi x then break updateiaforwardi xaforward freex while listalevel gt 1 and istaheaderaforwardlistalevel NIL do listalevel listalevel l FIGURE 5 Algorithm to calculate a random level FIGURE 4 Skip List insertion and deletion algorithms elements in the skip list and the results of consulting the ran dom number generator The sequence of operations that pro duced the current skip list does not matter We assume an ad versarial user does not have access to the levels of nodes otherwise he could create situations with worstcase running times by deleting all nodes that were not level 1 The probabilities of poor running times for successive op erations on the same data structure are NOT independent two successive searches for the same element will both take ex actly the same time More will be said about this later Analysis of expected search cost We analyze the search path backwards travelling up and to the le Although the levels of nodes in the list are known and xed when the search is performed we act as if the level of a node is being determined only when it is observed while backtracking the search path At any particular point in the climb we are at a situation similar to situation a in Figure 6 7 we are at the im forward pointer of a node x and we have no knowledge about the levels of nodes to the le of x or about the level of x other than that the level of x must be at least 139 Assume the x is not the header the is equivalent to assuming the list extends in nitely to the le If the level of x is equal to i then we are in situation b lfthe level ofx is greater than i then we are in situation 0 The probability that we are in situation c is p Each time we are in situation c we climb up alevel Let Ck the expected cost ie length of a search path that climbs up k levels in an in nite list CO 0 Ck lip cost in situation b p cost in situation 0 By substituting and simplifying we get 000 117 1 t C10 tr 1 t CIHD Ck lp Ckil Ck kp Our assumption that the list is in nite is a pessimistic as sumption When we bump into the header in our backwards climb we simply climb up it without performing any le ward movements This gives us an upper bound of Ln7lp on the expected length of the path that climbs from level 1 to level Ln in a list of n elements We use this analysis go up to level Ln and use a different analysis technique for the rest of the journey The number of leftward movements remaining is bounded by the number of elements of level Ln or higher in the entire list which has an expected value of Up We also move upwards from level Ln to the maximum level in the list The probability that the maximum level of the list is a greater than kis equal to 14lipk39quot which is at most npk We can calculate the expected maximum level is at most Ln llip Putting our results together we nd Toml expected cost to climb out of a list of n elements SLnp 1117 which is Olog n Number of comparisons Our result is an analysis of the length ofthe search path The number of comparisons required is one plus the length of the search path a comparison is performed for each position in the search path the length of the search path is the num ber of hops between positions in the search path Probabilistic Analysis It is also possible to analyze the probability distribution of search costs The probabilistic analysis is somewhat more complicated see box From the probabilistic analysis we can calculate an upper bound on the probability that the actual cost of a search exceeds the expected cost by more than a speci ed ratio Some results of this analysis are shown in Figure 8 Choosing 11 Table 1 gives the relative times and space requirements for different values of p Decreasing p also increases the variabil situation 17 Still need to climb k levels from here Need to climb k levels from here Need to climb only k l levels from here situation c FIGURE 6 Possible situations in backwards traversal of the search path i ofrunning times If lp is a power of2 it will be easy to generate a random level from a stream of random bits it re quires an average of log2 lplip random bits to generate a random level Since some of the constant overheads are re lated to Ln rather than Lnp choosing p 14 rather than 12 slightly improves the constant factors of the speed of the algorithms as well I suggest that a value of 14 be used forp unless the variability of running times is a primary concern in which casep should be 12 Sequences of operations The expected total time for a sequence of operations is equal to the sum of the expected times of each of the operations in the sequence Thus the expected time for any sequence of m searches in a data structure that contains n elements is Om log n However the pattern of searches affects the probability distribution of the actual time to perform the entire sequence of operations If we search for the same item twice in the same data structure both searches will take exactly the same amount of time Thus the variance of the total time will be four times the variance ofa single search lfthe search times for two ele ments are independent the variance of the total time is equal to the sum ofthe variances ofthe individual searches Searchng for the same element over and over again maxi mizes the variance ALTERNATIVE DATA STRUCTURES Balancedtrees eg AVL trees Knu73 Wir76 and self adjusting trees ST85 canbe used for the same problems as skip lists All three techniques have performance bounds of the same order A choice among these schemes involves sev eral factors the dif culty of implementing the algorithms constant factors type of bound amortized probabilistic or worstcase and performance on a nonuniform distribution of queries Implementation dif culty For most applications implementers generally agree skip lists are signi cantly easier to implement than either balanced tree algorithms or selfadjusting tree algorithms Normalized search Avg of pointers p times ie normalized per no e LquotP 116 1110 12 1 2 1e 094 158 14 1 133 18 133 114 16 2 107 TABLE 1 7 Relative search speed and space requirements depending on the value of p Constant factors Constant factors can make a signi cant difference in the practical application of an algorithm This is particularly true for sublinear algorithms For example assume that algo rithmsA and B both require Olog n time to process a query but thatB is twice as fast asA in the time algorithmA takes to process a query on a data set of size n algorithm B can pro cess a query on a data set of size n2 There are two important but qualitatively different contri butions to the constant factors of an algori First the in herent complexity of the algorithm places a lower bound on an 39 39 S quotJ39 39i trees are arranged as searches are performed39 this imposes a signi cant overhead on any implementation of selfadjusting trees Skip list algorithms seem to have very low inherent constantfactor overheads the inner loop of the deletion algorithm for skip lists compiles to just six instructions on the 68020 Second if the algorithm is complex rograrnrners are de terred from implementing optimizations For example bal anced tree algorithms are normally described using recursive insert and delete procedures since that is the most simple and intuitive method of describing the algorithms A recursive in sert or delete procedure incurs a procedure call overhead By using nonrecursive insert and delete procedures some of this overhead can be eliminated However the complexity of non recursive algorithms for insertion and deletion in a balanced tree is intimidating and this complexity deters most program mers from eliminating recursion in these routines Skip list al miuu u re 10 4 Prob 0 p 14 n 10395 p12 pl2n4096 107 p l2n65536 10398 I 10399 30 Ratio of actual cost to expected cost FIGURE 8 This graph shows a plot of an upper bound on the probability of a search taking substantially longer than expected The vertical axis show the probability that the length of the search path for a search exceeds the average length by more than the ratio on the horizontal axis For example for p 12 and n 4096 the probability that the search path will be more than three times the expected length is less than one in 200 million This graph was calculated using our probabilistic upper bound recursive 273 trees Selfadjustin g trees topdown splaying bottomup splaying 0054 msec 105 015msec 30 049msec 96 39 Search Time Insertion Time Deletion Time Skip lists 0051 msec 10 0065 msec 10 0059 msec 10 nonrecursiveAIZ trees 0046 msec 091 010 msec 155 0085 msec 146 021msec 32 021 msec 365 016msec 25 051 msec 78 018msec 31 053msec 90 Table 2 Timings of implementations of different algorithms gorithms are already nonrecursive and they are simple enough that programmers are not deterred from performing optimizations Table 2 compares the performance of implementations of skip lists and four other techniques All implementations were optimized for ef ciency The AVL tree algorithms were writ ten by James Macropol of Contel and based on those in Wir76 The 273 tree algorithms are based on those presented in AHU83 Several other existing balanced tree packages were timed and found to be much slower than the results pre sented below The selfadjusting tree algorithms are based on those presented in ST85 The times in this table re ect the CPU time on a Sun360 to perform an operation in a data structure containing 215 elements with integer keys The val ues in parenthesis show the results relative to the skip list time The times for insertion and deletion do not include the time for memory management e g in C programs calls to malloc an ee Note that skip lists perform more comparisons than other methods the skip list algorithms presented here require an average of Lnp llip 1 comparisons For tests using real numbers as keys skip lists were slightly slower than the nonrecursive AVL tree algorithms and search in a skip list was slightly slower than search in a 273 tree insertion and deletion using the skip list algorithms was still faster than us ing the recursive 273 tree algorithms If comparisons are very expensive it is possible to change the algorithms so that we never compare the search key against the key of a node more than once during a search For p 12 this produces an upper bound on the expected number of comparisons of 72 32 log n This modi cation is discussed in Pug89b Type of performance bound These three classes of algorithm have different kinds of per formance bounds Balanced trees have worstcase time bounds selfadjusting trees have amortized time bounds and skip lists have probabilistic time bounds With selfadjusting trees an individual operation can take On time but the time bound always holds over a long sequence of operations For skip lists any operation or sequence of operations can take longer than expected although the probability of any opera tion taking signi cantly longer than expected is negligible In certain realtime applications we must be assured that an operation will complete within a certain time bound For such applications selfadjusting trees may be undesirable since they can take signi cantly longer on an individual oper ation than expected eg an individual search can take On time instead of Olog n time For realtime systems skip lists may be usable if an adequate safety margin is provided the chance that a search in a skip lists containing 1000 ele ments takes more than 5 times the expected time is about 1 in 1018 Nonuniform query distribution Selfadjusting trees have the property that they adjust to non uniform query distributions Since skip lists are faster than selfadjusting trees by a signi cant constant factor when a uniform query distribution is encountered selfadjusting tree are faster than skip lists only for highly skewed distributions We could attempt to devise selfadjusting skip lists However there seems little practical motivation to tamper with the sim plicity and fast performance of skip lists in an application where highly skewed distributions are expected either self adjusting trees or a skip list augmented by a cache may be preferable Pug90 ADDITIONAL WORK ON SIGP LISTS I have described a set of algorithms that allow multiple pro cessors to concurrently update a skip list in shared memory Pug89a This algorithms are much simpler than concurrent balanced tree algorithms They allow an unlimited number of readers and n busy writers in a skip list of n elements with very little lock contention Using skip lists it is easy to do most all the sorts of op erations you might wish to do with a balanced tree such as use search ngers merge skip lists and allow ranking operations eg determine the k element of a skip list Pug89b Tom Papadakis Ian Munro and Patricio Poblette PMP90 have done an exact analysis of the expected search time in a skip list The upper bound described in this paper is close to their exact bound39 the techniques they needed to use to derive an exact analysis are very complicated and sophisticated Their exact analysis shows that for p 12 and p 14 the upper bound given in this paper on the expected cost of a search is not more than 2 comparisons more than the exact expected cost I have adapted idea of probabilistic balancing to some other problems arising both in data structures and in incre mental computation PT88 We can generate the level of a node based on the result of applying a hash function to the element as opposed to using a random number generator This results in a scheme where for any set S there is a unique data structure that represents S and with high probability the data structure is approximately balanced If we combine this idea with an applicative ie persistent probabilistically bal anced data structure and a scheme such as hashedconsing A1178 which allows constanttime structural equality tests of applicative data structures we get a number of interesting properties such as constanttime equality tests for the repre sentations of sequences This scheme also has a number of applications for incremental computation Since skip lists are somewhat awkward to make applicative a probabilistically balanced tree scheme is used RELATED WORK James Discroll pointed out that R Spmgnoli suggested a method ofrandomly balancing search trees in 1981 Spr81 With Sprugnoli s approach the state of the data structure is not independent of the sequence of operations which built it This makes it much harder or impossible to formally analyze his algorithms Sprugnoli gives empirical evidence that his al gorithm has good expected performance but no theoretical re sults A randomized data structure for ordered sets is described in BLLSS86 However a search using that data structure re quires On12 expected time Cecilia Aragon and Rairnund Seidel describe a probabilis tically balanced search trees scheme AC89 The discuss how to adapt their data structure to nonuniform query distri butions SOURCE CODE AVAILABILITY Skip list source code libraries for both C and Pascal are avail able for anonymous rp from ftp cs umd edu C ON C LUSION S From a theoretical point of view there is no need for skip lists Balanced trees can do everything that can be done with skip lists and have good worstcase time bounds unlike skip lists However implementing balanced trees is an exacting task and as a result balanced tree algorithms are rarely imple mented except as part of a programming assignment in a data structures class Skip lists are a simple data structure that can be used in place of balanced trees for most applications Skip lists algo rithms are very easy to implement extend and modify Skip lists are about as fast as highly optimized balanced tree algo rithms and are substantially faster than casually implemented balanced tree algorithms ACKNOWLEDGEMENTS Thanks to the referees for their helpful comments Special thanks to all those people who supplied enthusiasm and en couragement during the years in which I struggled to get this work published especially Alan Demers Tim Teitelbaum and Doug cllroy This work was partially supported by an ATampT Bell Labs Fellowship and by NSF grant CCRi 8908900 REFEREN C ES AC89 Aragon Cecilia and Raimund Seidel Randomized Search Trees Proceedings of the 30th Ann IEEE Symp on Foundations of Computer Science pp 5407545 October 1989 AHU83 Aho A Hopcro J and Ullman J Data Structures and Algorithms AddisonWesley Publishing Company 1983 A1178 John Allen Anatomy ofLISP McGraw Hill Book Company NY 1978 BLLSS86 Knu73 PMP90 PT89 Pug89a Pug89b Pug90 Spr81 srss Wir76 Bentley J F T Leighton MF Lepley D Stanat and Steele A Randomized Data Structure For Ordered Sets MITLCS Technical Memo 297 May 1986 Knuth D Sorting and Searching The Art of Computer Programming Vol 3 AddisonWesley Publishing Company 1973 Papadakis Thomas Ian Munro and Patricio Poblette xact Analysis of Expected Search Cost in Skip Lists Tech Report 7777 Dept of Computer Science Univ of Waterloo January 1990 Pugh W and T Teitelbaum Incremental Computation via Function Caching Proc of the Sixteenth conference on the Principles of Programming Languages 1989 Pugh W Concurrent Maintenance of Skip Lists Tech Report TRCSZZZZ Dept of Computer Science University of Maryland College Park 1989 Pugh W Whateveryou might want to do using Balanced Trees you can do it faster and more simply using Skip Lists Tech Report CSrTR72286 Dept of Computer Science University of Maryland College Park July 1989 Pugh W Slow Optimally Balanced Search Strategies vs Cached Fast Uniforrnly Balanced Search Strategies to appear in Information Processing Letters Sprugnoli R Randomly Balanced Binary Trees Calcolo V17 1981 pp 99117 Sleator D and R Tarjan SelfAdjusting Binary Search Trees Journal ofthe ACM Vol 32 No 3 July 1985 pp 65 2666 Wirth N Algorithms Data Structures Programs PrenticeHall 1976 PROBABILISTIC ANALYSIS In addition to analyzing the expected performance of skip lists we can also analyze the probabilistic performance of skip lists This will allow us to calculate the probability that an operation takes longer than a speci ed time This analysis is based on the same ideas as our analysis of the expected cost so that analysis should be understood rst A random variable has a xed but unpredictable value and a predictable probability distribution and average If X is a random variable Prob X t denotes the probability thatX equals t and Prob X gt t denotes the probability thatX is greater than t For example ifX is the number obtained by throwing a unbiased die Prob X gt 3 12 It is o en preferable to nd simple upper bounds on values whose exact value is dif cult to calculate To discuss upper bounds on random variables we need to de ne a partial or dering and equality on the probability distributions of non negative random variables Definitions Pmb and Spmb LetX and Y be nonnegative independent random variables typically X and Y would denote the time to execute algorithms A X andAY We de ne X Sprob Yto be true if and only iffor any value t the probability thatX exceeds t is less than the probability that Y exceeds t More formally XPr0inffV t ProbXgt t Prob Ygtt and XSPminffV tProbXgt z SProb rm D For example the graph in Figure 7shows the probability distribution of three random variables X Y and Z Since the probability distribution curve forX is completely under the curves for Y and ZXSPr0b Y andXSPmb Z Since the probability curves for Y and Z intersect neither Y Sprob Z nor Z Sprob Y Since the expected value of a random variableX is simply the area under the curve ProbX gt t ifX Sprob Y then the average ofX is less than or equal to the average of Y We make use of two probability distributions De nition binomial distributions 7 Bt p Let t be a nonnegative integer and p be a probability The term Bt p denotes a random variable equal to the number of successes seen in a series of t independent random trials where the probability of a success in a trial is p The average and variance ofBtp are tp and tpl 7p respectively I Definition negative binomial distributionsi NBS Let s be a nonnegative integer and p be a probability The term NBs p denotes a random variable equal to the number of failures seen before the Sm success in a series of random independent trials where the probability of a success in a trial is p The average and variance of NBs p are slipp and slippz respectively I ProbXgtt ProbYgtt ProbZgtt Prob 0 I FIGURE 7 7 Plots of three probability distributions Probabilistic analysis of search cost The number of leftward movements we need to make before we move up a level in an in nite list has a negative binomial distribution it is the number of failures situations b s we see before we see the rst success situation e in a series of independent random trials where the probability of success is p Using the probabilistic notation introduced above Cost to climb one level in an in nite list prob 1 NB1gtP We can sum the costs of climbing each level to get the total cost to climb up to level Ln Cost to climb to level Ln in an in nite list prob Ln 1 NBLn LP Our assumption that the list is in nite is a pessimistic assumption Cost to climb to level Ln in a list of n elements Sprob Ln 1 NBLn LP Once we have climbed to level Ln the number of leftward movements is bounded by the number of elements of level Ln or greater in a list of n elements The number 0 elements of level Ln or greater in a list of n elements is a random variable ofthe form Bn lnp Let M be a random variable corresponding to the maximum level in a list of n elements The probability that the level ofa node is greater than kispk so Prob Mgt k 17 17715 lt npk Since npk p HW and Prob NBl lip l gt i we get an probabilistic upper bound ofM Spmb L61 NBl lip 1 Note that the average ofLn NB1 l 7p l isLn llip This gives a probabilistic upper bound on the cost once we have reached level Ln ofBn lnp Ln NBl l 7p l 7Ln Combining our results to get a probabilistic upper bound on the total length ofthe search path ie cost of the entire search total cost to climb out of a list of n elements Sprobalo 1NBQIquot 1gtPBquotgt 1MP NB1 1 p 1 The expected value of our upper bound is equal to Ln1Ln11r7r7 1P P1 P 1 LquotP 11 Pgt which is the same as our previously calculated upper bound on the expected cost of a search The variance of our upper bound is Ln11r72r72 1 1nPP P1 P2 lt 177Lnr7 P1 P2 2P 1Pz Figure 8 show a plot of an upper bound on the probability of an actual search taking substantially longer than average based on our probabilistic upper bound PROOFS FOR ALGORITHMS 1 Barton J Rosenberg University of Miami Friday January 19 2001 Theorem 1 For I E R and ab E Z ab gt 0 lavdbl llIalbl and latdbl HIMbl PROOF To motivate the proof consider the plane R X R with horizontal lines drawn at y y and y ab and rays drawn from the origin passing through the integer points on y 1 that is i l i l 2 l l n These rays also pass through some integer points on y b an y a l The set of all I on the line y ab such that fzab l k lies inside the cone describe by rays from the origin passing through 16 7 lab ab and kabab in cluding the rightmost ray but excluding the leftmost rayi The intersection of this half open cone with the line y I gives all 1 such that l bl kl The calculation lzal follows the ray from the origin passing through zab to its intersection with the y 12 line and then moving right along y b to the next integer pointi Following the zab ray leaves us within the halfopen cone and moving right to the next integer point does not leave the cone since the ray through kab ab which is the righthand boundary of the cone also passes through the integer point kb 12 We can follow this picture to state a formal proof 6 7 lab lt I S kab for some k E Z kila lt 11 S ka usebgt0 kila lt S ka useaEZ kil lt lzbla S k useagt0 hence Hzblal kl We did use that a E Z and the following example shows that this is necessary Let a b and z 2 Do the math However we did not need that b E Z hence we have a slightly stronger resulti A Theorem 2 Let a1 l l ad 6 R and ad gt 0 Then 1 Zaizi zd i0 PROOF For 1 gt1 171 171 Zaizl S E lail 11 S IdilK i0 i0 Where we have set 171 alzk 2 izdilK i0 Hence dil adzd 7 sz 1 S adzd Z aizl S adzd szil i0 that is d zdad 7 S Zaizi S zdad i0 Taking no large enough so that Kno S aux2 for all z 2 n0 1 adDz I S Zaizi S SadDIOR i0 Hence EelIi zd A Theorem 3 For all de gt 0 log l n 0n5 PROOF We rst prove by induction the case d E Z Basis d 1 Apply LlHospitalls rule to the indeterminate form nli noio ogn ne liirgolneneil ene A 0 Hence for any 6 gt 0 there is an no such that for n 2 no log nn5 lt c That is logn 0n5i Applying LlHospitalls to the case of the general power lim logd nn lim dlogd71nlneneil de lim logd 1 nn naoo 71400 Hence logd n 0n5 if logd 1 n 0n5 Which complete the induction step By the monotonicity in d of logd n the result extends to all d E R d gt 0 A Theorem 4 For all d gt 0 and a gt1 nd 0a PROOF The proof pattern is the same as the previous theoremi L7Hospitalls applied the case d 1 gives lim na lim lan loga 0 Again use LlHospitalls to prove the induction step and then extend to all real d 2 l by monotonicityi A Corollary 1 For all e gt 0 nlogn 0n15 PROOF This follows from a general result if 0gln and 49201 then f1nf2n 091ng2nt Applying this by setting f1n n and f2 log n To show the general result let cl and n1 be such that f1 lt 6191n for all n gt n1 This is possible because 091 Let c gt 0 be chosen and set 62 ccli Since f2 0ggn there exists an n2 such that f2 lt Cg 92n for all n gt ngi Multiplying the equalities both are positive and letting no be the maximum of n1 n27 f1 lt 6191n6292n cglnggn for all n gt nor Asymptotic Order Notation Burton Rosenberg September 10 2007 Introduction We carefully develop the notations which measure the asymptotic growth of functions The intuition is to group functions into function classes based on its growth shape The most used such grouping is 0f big oh of f where a function f serves as a model for an entire collection of functions the set 0f including all functions growing not faster than the function f De nition 1 BigOh Let f X a Y be a function g X gt Yl mmc gt 0 st Vm Z zocfm Z 9m Z 0 By abuse of notation g E 0f is written 9 0f rather than the more correct 9 E 0f The phrase 0f g77 has no meaning In general the sets X and Y are the sets of real numbers integers or naturals according to the situation Computer science is interested in functions which give only positive values or less drastically functions which eventually assume only positive values There is some long term simplicity in adopting this restriction although short term it requires additional de nitions and care De nition 2 Eventually Nonnegative Positive A function f X a Y is eventually non negative if there exists an m0 such that for all m 2 0 Z 0 The function is eventually positive if there exists an m0 such that for all m 2 0 gt 0 Theorem 1 If f is not eventually non negative is empty If f is eventually non negative then 0 0f and is therefore non empty Every element of is eventually non negative The function 0 X a Y is the zero function de ned as 0z 0 for all m In general any constant C can be viewed as a function of constant value C There is a related but stronger notion for the classi cation of functions LittleOh of f Whereas a function in 0f is bounded above by the function f a function in 0f is increasingly insigni cant to the function f De nition 3 LittleOh Let f X a Y be a function 0f gXgtYchgt03m0gt0 st VmZmo gtgx 20 Theorem 2 ff is not eventually positive 0f is empty ff is eventually positive then 0 0f and 0f is therefore non empty Every element of 0f is eventually non negative Note that while 00 is not empty it contains all functions that are eventually zero 00 is empty So the converse of the following theorem is not true Theorem 3 LittleOh is Stronger than BigOh Ifg 0f then 9 Of Proving a function in Of requires demonstrating mo and c satisfying the de nition For some demonstrations the function is so strongly a member of Of that the subtleties of Big Oh are a waste of time The class of functions 0f littleoh of f has stronger demands hence sim pler demonstrations A function in 0f is also in Of so one use of Little Oh is to provide demonstrations for Big Oh Theorem 4 Analytic De nition of LittleOh Let f X a Y be eventually positive Let g X gt Y be eventually non negative Then 9 0f if and only if 11330 gowns a 0 Example 1 logz Use L Hopital s rule to show log zz a 0 This gives logz 0a and therefore logz In the same way it can be shown that n logn On15 for any real 6 gt 0 Example 2 For 3 gt k gt 0 zk 0m7 hence zk 0z7 Order Properties of the Notation The order notation arranges functions from smaller to larger reminiscent ofthe ordering of numbers from smaller to larger In this analogy Big Oh is non strict inequality 9 Of can be interpreted as g g 1 Likewise LittleOh is strict inequality 9 0f is similar to g lt 1 Given three functions fg and h we expect that ordering is transitive if h g g and g g 1 then h g f The is indeed true for order notation Theorem 5 Transitivity Ifh 09 andg Of then It Of If h 09 and g 0f then h 0f Big Oh and Little Oh express non strict and strict less than Other notations express non strict and strict greater than De nition 4 BigOmega Let f X a Y be a function QfgXgtYl30cgt0 st VmZzo gmZCfZO Theorem 6 If f is not euentually non negatiue is empty Euery function in is euen tually non negatiue 90 is the collection of all euentually non negatiue functions De nition 5 LittleOmega Let f X a Y be a function wfgXgtYchgt03m0gt0 st VmZmo gxgtcfm20 Theorem 7 If f is not euentually non negatiue is empty Euery function in is euentu ally positiue w0 is the collection of all euentually positiue functions We claim that these de nitions are the appropriate anti symmetric variants of Big and LittleOh Theorem 8 antisymmetry g 0f exactly when f g 0f exactly when f Mg Given the anti symmetry theorems concerning Big and Little Oh can be quickly reestablished for Big and Little Omega Corollary 9 Strength Ifg wf then 9 Corollary 10 Transitivity Ifh 99 andg 9f then It Ifh wg andg wf then h The following theorem illustrates the strictness versus non strictness of the various notations Theorem 11 Re exivity Let f be eventually non negative Then f 0f and f Hence the intersection is non empty In contrast the intersection of 0f and is always empty In particular neither 0f nor contains The intersection of 0f and 9f produce the class f the class of functions sharing the asymptotic growth of f Just as x g y and y g z implies z y g 0f and f 09 implies g f since f 09 can be transformed to g 9f so 9 is in the intersection De nition 6 Theta Let f X a Y be a function gXgtYl3xmc1c2 gt0 st VmZmo c1fx 29x Zc2fm 20 Theorem 12 g if and only ifg and f 09 The set is the intersection of 0m and no The sets de ned by Theta form equivalence classes Equivalence classes are de ned by a relation which is re exive symmetric and transitive Corollary 13 Equivalence Properties The relation 9 enjoys the following properties 1 quotf 439 Iffi39s quot u u Wat f f 2 Symmetry Ifg then f g 3 Transitive If h g and g then h We close this section with a warning concerning the ordering of functions Functions have greater complexity than numbers The ordering of numbers by size leaves little room for variety Given two numbers one is the smaller or if not the numbers are the same Consider these functions 0 if z is even f5 1 else 0 if as is odd fow T 1 else Neither of these functions is Big Oh the other By order of growth these two functions are incom mensurable But see an exercise in the class text for variant de nitions which avoid this Example 3 Show n1 0a andm Oamb fora gt 0 Conclude that afmbf 9agmbg for afag gt 0 Often the simplest function of a 9 class lends its name to the class These functions would be described as The Order of an Approximation The order notation can be used to notate a function whose shape is not known exactly but to an approximation suitable for the purpose The Harmonic series is the sum of the reciprocals of rst n integers Since logz is the integral of 1z it is not surprising that the Harmonic series should approximate the log V L H Zli lnn 01 i1 This equation tells us that the approximation is perfect with the addition of an unknown 01 error function That is Hn lnn en for some en 01 In general the presence of the order notation inside an expression on the right hand side of an equality indicates that some speci c function makes the equality true however the most we know or care to know about that function is that it is a member of a certain order class E iciency of Algorithms The efficiency of an algorithm is given by the order of growth in run time with respect to input size For instance numerous simple sorting routines essentially compare all elements pairwise in the course of sorting For n elements this makes 0n2 comparisons and this places a bound on runtime Suppose two algorithms are composed The output of algorithm A with run time fA is the input to algorithm B with run time f3 What is the efficiency of the combined algorithm Assuming the rst algorithm leaves the size unchanged it is fA fBz This theorem allows for simpli cation Theorem 14 Let 9192 and a1a2 be non negatiue constants Then algl 0292 Suppose we would like to calculate the mode of z numbers that is the value which occurs most often This can be accomplished by sorting the numbers and then sweeping through the sorted numbers keeping track of the longest run of consecutive equal values If the sorting is done in time fAm 0z2 and the sweep is done in time fBm 0z the entire algorithm runs in time fAW 15390 0amp5 ln this algorithm sorting is the bottle neck Improve the sort to 0zlogz eg using mergesort and immediately the entire algorithm improves to 0zlog The moral of the story when an algorithm can be broken down into a sequence of sub algorithms the sub algorithm with largest run time rules the runtime of the combined algorithm It should be intuitive that dil admd ad11z a0 0md for ad gt 0 It is actually a bit of a mess to prove due to the fact that for i lt at some or all of the a might be negative Theorem 15 If f and g 09 then fg 0fg Given this theorem a fairly neat way about a proof is the factorization dil I 90 admd 1 2aiadlidgt i0 followed by showing 1 2aiadzi d 01 A little more work gives the matching 9 bound Corollary 16 Let x ow with ad gt 0 Then f GM Example 4 A quick summary of our results so far 0lt1ltlogzltmltzlogzltz15ltz2ltz3lt An algorithm running with any of these run times is consider feasible It runs in polynomial time that is 0mk for some k Algorithms taking strictly more time than polynomial are consider infeasible Theorem 17 mk 0am for any k and any a gt 1 Example 5 Our hierarchy continues z2ltm3ltlt2ltewltltzltz1210 1lt2116lt2 lt An algorithm taking more than polynomial time such as an exponential time algorithm running in 02m is considered infeasible The reason for this is that small changes in input size increase computation time greatly In practice any useful algorithm will be applied to increasingly complex problems Due to the growth rate computation power will not keep up with the demands of problem size For all practical purposes then the problem in not solvable by this algorithm Exponential time algorithms sometimes invoke the following very simple strategy try all possible combinations of outputs and pick the one that s correct77 Here is an exponential time algorithm for sorting a pack of cards There are 52 cards They can be rearranged into 52 orders Try all orderings one by one until the correct one appears To give the reader an idea of the inefficiency of this method it wouldn t be much less efficient to sort cards by repeatedly shuf ing them each time looking to see if the cards just happened to fall in order This should give an intuition to the practical infeasibility of exponential time algorithms Exercises H Give proofs for all theorems and corollaries given in these notes to Give proofs for the examples given in these notes 03 Prove or disprove if 91 92 91 and a1 a2 are positive constants then algl aggg F Prove or disprove if f 91 and g 99 then f g Mfg U Prove or disprove if f 0f and g 09 then f and g are both 0f g a Give the analogous analytic de nition for 9 which agrees with the transpose symmetry of o and 2 Warning the limit requires that the denominator be eventually positive

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.