Class Note for EECS 647 with Professor Huan at KU 4
Class Note for EECS 647 with Professor Huan at KU 4
Popular in Course
Popular in Department
This 29 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Kansas taught by a professor in Fall. Since its upload, it has received 14 views.
Reviews for Class Note for EECS 647 with Professor Huan at KU 4
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
EECS 647 Introduction to Database Systems Instructor Luke Huan Spring 2009 Today s Topic o How to locate data in a le fast 0 Go with a simple example rst a ISAM Indexed sequence access method 0 An overview of the indexing strategy 0 Treebased indexes o Btree 3312009 Luke Huan Univ of Kansas Basics 0 Given a value locate the records with this value equality search SELECT FROM R WHERE A value SELECT FROM R S WHERE RA SB o Other search criteria eg 0 Range search SELECT FROM R WHERE A gt value 0 Keyword search 3312009 Luke Huan Univ of Kansas TreeStructured Indexes Introduction o Treestructured indexing techniques support both range search and equalin search 0 ISAM 1ndexed equential Access Method 0 static structure early index technology Motivation for Index c Find all students with gpa gt 30 o If data le is sorted do binary search 0 Cost of binary search in a database can be quite high Why a Simple idea Create an indeX le Wm M Index File k1k2 39 P JO 1P1 K2 P2 ooo Kum 4 V J in Page 1 Page 2 Page 3 Page N Data File Can do binary search on smaller index le ISAM o What if an index is still too big 0 Put a another sparse index on top of that ISAM Index Sequential Access Method more or less Example look up 1 9 7 Index blocks 1 00 123 m 139 100 119 12 Data blocks 3312009 Luke Huan Univ of Kansas 6 Updates with ISAM Example insert 1 O 7 Example delete 12 9 00 123 m 139 Index blocks 1 100 10H 119 12 m ver ow block 0 Over ow chains and empty data blocks degrade performance Data blocks 0 Worst case most records go into one long chain 3312009 Luke Huan Univ of Kansas 7 A Note of Caution o ISAM is an oldfashioned idea 0 Btrees are usually better as we ll see 0 But ISAM is a good place to start to understand the idea of indexing o Upshot 0 Don t brag about being an ISAM expert on your resume a Do understand how they work and tradeoffs with B trees 3312009 Luke Huan Univ of Kansas Overview of Indexes o Dense vs sparse indexes 0 Primary vs secondary indexes o Clustered vs unclustered indexes o Treebased indexes vs hashbased indexes 3312009 Luke Huan Univ of Kansas Dense and sparse indexes o Dense one index entry for each search key value a The index entry include page id record id 0 Sparse one index entry for each block 0 Records must be clustered according to the search key a The index entry include only page id 3312009 Luke Huan Univ of Kansas 10 Dense versus sparse indexes 0 Index size a Sparse index is smaller 0 Requirement on records 0 Records must be clustered for sparse index 0 Lookup a Sparse index is smaller and may t in memory a Dense index can directly tell if a record exists 0 Update a Easier for sparse index 3312009 Luke Huan Univ of Kansas 11 Primary and secondary indexes 0 Primary index 9 Created for the primary key of a table 9 Can be sparse if we sort the le according to the primary key 0 Secondary index 0 Usually dense 0 SQL 0 PRIMARY KEY declaration automatically creates a primary index UNIQUE key automatically creates a secondary index 9 Additional secondary index can be created on nonkey attributes CREATE INDEX StudentGPAIndeX ON StudentGPA 3312009 Luke Huan Univ of Kansas 12 Clustered Indexes 0 An index is clustered if the entry order in the index le is almost the same as that in the data le otherwise it is unclustered 3312009 Luke Huan Univ of Kansas 13 Hashbased Indexes 0 Based on hash table o Organized as a set of ltkey valuegt tuples 0 Key is the attribute for indexing 0 Value is ltpage id record idgt l711lt39 kor 1 DLIclltel 1 A721 73 5 DLlclltel 2 A7101 r1 1 0 Lickel 3 A7217 7 2 Rourlel I Iill WLlckel 1 A7218 From Database System Concepts 5th Ed Silberschatz Korth and Sudarshan 3312009 Luke Huan Univ of Kansas 14 Example 0 We may have multiple different indexes per le a Eg Employee EID name age salary a Primary key is EID name is unique age and salary are nonkey attributes a We may or not sort the le by EID with a Btree index on EID primary name secondary and age secondary and a hash index on salary 3312009 Luke Huan Univ of Kansas 15 Recap 0 An treebased index structure built for the primary key of a le that is sorted according to the primary key PRIMARY SPARSE CLUSTERED Index File Data file 3312009 Luke Huan Univ of Kansas 16 Recap 0 An treebased index structure built for nonkey attribute of a le that is sorted according to the primary key SECONDARY DENSE UNCLUSTERED 9 Index File W Data me 3312009 Luke Huan Univ of Kansas Btree o A treebased index le could be primary or secondary could be sparse or dense could be clustered or unclustered o A hierarchy of intervals a Balanced more or less good performance guarantee Sample Btree nodes t0 keys 100 k degree 2 Nonleaf t0 keys to keys to keys to keys lOOklt120 120klt150 150klt180 18039k C 00 Leaf 120 quot to next leaf node in sequence to records with these k values or store records directly in leaves 3312009 Luke Huan Univ of Kansas 19 Btree balancing properties 0 Height constraint all leaves at the same lowest level 0 Fanout constraint all nodes at least half full except root Max Max Min Min pointers kevs active pointers kevs Nonleaf 2d1 2d d1 1 Root 2d1 2d 2 1 Leaf 2d 2d d d 3312009 Luke Huan Univ of Kansas 20 Lookups SELECT FROM R WHERE k 179 SELECT FROM R WHERE k 32 degree 2 I20 I50 I80 Not found IOO lOl llO I20 I30 150 156 17 om mmx I mm 3312009 Luke Huan Univ of Kansas Range query SELECT FROM R WHERE k gt 32 AND C lt 179 DOC L00kup32 o o o o oo 1 om mm mm 2 2 And follow nextleaf pointers 3312009 Luke Huan Univ of Kansas 22 Insertion 0 Insert a record with search key value 32 C C C E Look up Where the inserted key should go 39 O O O O C O O O 0 mm 000quot OOx I mm Lomh 000 And insert it right there 3312009 Luke Huan Univ of Kansas 23 Performance analysis o How many IO s are required for each operation h the height of the tree more or less Plus one or two to manipulate actual records Plus 0h for reorganization should be very rare if f is large Minus one if we cache the root in memory 0 How big is h 3312009 Roughly logfmout N Where N is the number of records Btree properties guarantee that fanout is least f 2 for all nonroot nodes Fanout is typically large in hundreds many keys and pointers can fit into one block A 4level Btree is enough for typical tables Luke Huan Univ of Kansas 24 Btree in practice 0 Complex reorganization for deletion often is not implemented eg Oracle InformiX 0 Leave nodes less than half full and periodically reorganize 0 Most commercial DBMS use Btree instead of hashing based indexes because Btree handles range queries 3312009 Luke Huan Univ of Kansas 25 The Halloween Problem 0 Story from the early days of System R UPDATE Payroll SET salary salary ll WHERE salary gt 100000 0 There is a Btree index on PayroleaZary o The update never stopped Why 0 Solutions 0 Scan index in reverse 0 Before update scan index to create a complete todo list 0 During update maintain a done list 0 Tag every row with transaction statement id 3312009 Luke Huan Univ of Kansas 26 Btree versus ISAM o ISAM is more static Btree is more dynamic o ISAM is more compact at least initially 0 Fewer levels and IO s than Btree o Overtime ISAM may not be balanced 0 Cannot provide guaranteed performance as Btree does 3312009 Luke Huan Univ of Kansas 27 Btree versus Btree o Btree Why not store records or record pointers in nonleaf nodes 0 These records can be accessed with fewer IO s 0 Problems 0 Storing more data in a node decreases fanout and increases h 0 Records in leaves require more IO s to access 0 Vast majority of the records live in leaves 3312009 Luke Huan Univ of Kansas 28 Smnmaw 0 Two types of queries 0 Keysearch o Rangequery o Btree operations 0 Search 0 Insert Sphtch d 0 Delete a Redistribution o Btree sorting 0 Next week with other diskbased sorting algorithms 3312009 Luke Huan Univ of Kansas 29
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'