DATABASES COP 4710
Popular in Course
MTED 5318, Teaching and Learning with Techonology in the Mathematics Classroom
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
Popular in Computer Programming
This 25 page Class Notes was uploaded by Vito Quigley on Thursday September 17, 2015. The Class Notes belongs to COP 4710 at Florida State University taught by Feifei Li in Fall. Since its upload, it has received 80 views. For similar materials see /class/205475/cop-4710-florida-state-university in Computer Programming at Florida State University.
Reviews for DATABASES
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/17/15
TreeStructured Indexes Range Searches o Find all students with gpa gt 3 039 39 If data is in sorted file do binary search to find first such student then scan to find others Cost of binary search can be quite high 0 Simple idea Create an index39 file Level of indirection again w i Page1 Pagez Page3 l lPageN Data File E Can do binary search on smaller index file index entry P K P Tree l 1391 I K2P K 2 000 m Index file may still be quite large But we can apply the idea repeatedly Nonleaf Pages CM dull Overflow gt El l page Primary pages E Leafpages contain data entries B Tree The Most Widely Used Index 0 Insert delete at log F N cost keep tree height balanced F fanout N leaf pages Minimum 50 occupancy except for root Each node contains d lt m lt 2d entries The parameter d is called the orderof the tree 0 Supports equality and rangesearches efficiently d Index Entries Direct search Data Entries quotSequence setquot Example B Tree 0 Search begins at root and key comparisons direct it to a leaf 0 Search for 5 15 all data entries gt 24 Root 13 17 i A 221 I 24I 27129I I E Based on the search for 15 3 we know it is not in the tree A A i I2I3I5I7I I14 I16I I II19120 B Trees in Practice Typical order 100 Typical fillfactor 67 average fanout 133 Typical capacities Height 4 1334 312900700 records Height 3 1333 2352637 records 0 Can often hold top levels in buffer pool Level 1 1 page 8 Kbytes Level 2 133 pages 1 Mbyte Level 3 17689 pages 133 MBytes Index Classification Clustered vs unclustered If order of data records is the same as or close to order of index data entries then called clustered index A file can be clustered on at most one search key Cost of retrieving data records through index varies greatly based on whether index is clustered or not Clustered vs Unclustered Index 0 Suppose that the data records are stored in a Heap file To build clustered index first sort the Heap file with some free space on each block for future inserts Overflow blocks may be needed for inserts Thus order of data recs is close to but not identical to the sort order Data entries Data entries CLUSTERED Index File Data file Data Records Data Records Index entries direct search for data entries Unclustered vs Clustered Indexes What are the tradeoffs Clustered Pros Efficient for range searches May be able to do some types of compression Possible locality benefits related data Clustered Cons Expensive to maintain on the fly or sloppy with reorganization Inserting a Data Entry into a 3 Tree 0 Find correct leaf L 0 Put data entry onto L If L has enough space done Else must grit L into L and a new node L2 Redistribute entries evenly copy up middle key 0 Insert index entry pointing to LZinto parent of L o This can happen recursively To split index node redistribute entries evenly but push up middle key Contrast with leaf splits Splits grow tree root split increases height Tree growth gets Wider or one eve teler at ten Example B Tree Inserting 8 Root 13 17 Example B Tree Inserting 8 A A 19f201221 241271291 33 34 38 39 A A 2I3l I 578 I1416 33 Notice that root was split leading to increase in height 33 In this example we can avoid split by redistributing entries however this is usually not done in practice Inserting 8 into Example B Tree Entry to be inserted in parent node I Note that 5 is copied up and continues to appear in the leaf A 578 lo 0 0 Observe how minimum occupancy is guaranteed in both leaf and index pg splits Note difference between copy up and pushup 23 Entry to be inserted in parent node Note that 17 is pushed up and only appears once in the index Contrast be sure you this with a leaf split understand the reasons for this Deleting a Data Entry from a 8 Tree Start at root find leaf L where entry belongs Remove the entry If L is at least halffull done If L has only d1 entries 0 Try to redistribute borrowing from sibling ac acent node with same parent as L o If redistribution fails merge L and sibling If merge occurred must delete entry pointing to L or sibling from parent of L Merge could propagate to root decreasing height Example Tree including 8 Delete 19 and 20 IHIEIIIII A Deleting 19 is easy Example Tree including 8 Delete 19 and 20 Deleting 19 is easy Deleting 20 is done with redistribution Notice how middle key is copied up And Then Deleting 24 39 St e e39 u 30 II 0 Observe toss of index entry on right A A and pull down Of I22 I 27 I29 I I I33 I34 I38 I39 I index entry below RA 5 13 17 30 ra jrk II14I16 23 578 jraL erL I II22I27I29I II33I34I38I39I Example of Nonleaf Redistribution Tree is shown below during deletion of 24 What could be a possible initial tree 0 In contrast to previous example can redistribute entry from left child of root to right child m m 7 23 578 1416 quot1181 2 121l 221271291 331341381391 After Redistribution Intuitively entries are redistributed by pushing through the splitting entry in the parent node 0 It suffices to redistribute index entry with key 20 we ve redistributed 17 as well for illustration Root A m m m m m 23 578 1416 quot1181 2 21 221271291 33343839 Bulk Loading of a 3 Tree If we have a large collection of records and we want to create a 3 tree on some field doing so by repeatedly inserting records is very slow Also leads to minimal leaf utilization why 0 Bulk Loading can be done much more efficiently Initialization Sort all data entries insert pointer to first leaf page in a new root page Rok Sorted pages of data entries not yet in B tree 34 9l 1 1213 2 22 2331 3536 3841 44 Bulk Loading Contd Root Index entries for leaf pages always entered into rightmost index MM W page just above leaf level When this fills up it splits Split W Em Ea Em E may go up rightmost path to the root Much faster than repeated inserts especially when one considers locking Data entry pages not yet in B tree Data entry pages not yet in B tree Summary of Bulk Loading 0 Option 1 multiple inserts Slow Does not give sequential storage of leaves 0 Option 2 Bulk Loading Has advantages for concurrency control Fewer IOs during build Leaves will be stored sequentially and linked of course Can control fill factorquot on pages A Note on Order 0rderd concept replaced by physical space criterion in practice at least halffull Index pages can typically hold many more entries than leaf pages Summary Treestructured indexes are ideal for range searches also good for equality searches ISAM is a static structure Only leaf pages modified overflow pages needed Overflow chains can degrade performance unless size of data set and data distribution stay constant B tree is a dynamic structure Insertsdeletes leave tree heightbalanced log F N cost High fanout F means depth rarely more than 3 or 4 Almost always better than maintaining a sorted file Summary Contd Typically 67 occupancy on average Bulk loading can be much faster than repeated inserts for creating a 3 tree on a large data set 0 Most widely used index in database management systems because of its versatility One of the most optimized components of a DBMS
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'