Database Des&Implmnt CMPSCI 645
Popular in Course
Popular in ComputerScienence
verified elite notetaker
This 219 page Class Notes was uploaded by Roman McCullough on Friday October 30, 2015. The Class Notes belongs to CMPSCI 645 at University of Massachusetts taught by Staff in Fall. Since its upload, it has received 8 views. For similar materials see /class/232266/cmpsci-645-university-of-massachusetts in ComputerScienence at University of Massachusetts.
Reviews for Database Des&Implmnt
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/30/15
Evaluation of Relational Operations CMPSCI 645 Mar 11 2008 Slides Courtesy of R Ramakrishnan and Gehrke Relational Operations 9 We will consider how to implement Selection 0 Selects a subset of rows from relation Projection 75 Deletes unwanted columns from relation 39 m39n gtltl Allows us to combine two relations Setdi erence Tuples in reln 1 but not in reln 2 Union U Tuples in reln 1 and in reln 2 Aggregation SUM MIN etc and GROUP BY Order By Returns tuples in specified order rt After we cover the operations we will discuss how to optimize queries formed by composing them Outline 6 Sorting oz Evaluation of joins oz Evaluation of other operations Why Sort 3 A classic problem in computer science oz Important utility in DBMS I Data requested in sorted order e g ORDER BY eg find students in increasing gpa order I Sorting useful for eliminating duplicates e g SELECT DISTINCT I Sortmerge join algorithm involves sorting I Sorting is first step in bulk loading B tree index 3 Problem sort 1Gb of data with 1Mb of RAM 2Way Sort Requires 3 Bu ers oz Pass 1 Read a page sort it write it only one buffer page is used 3 Pass 2 3 etc three buffer pages used I l I Disk Main memory buffers Disk 0 99 99 0 TwoWay External Mer e Sort U s El I Inputfile Each pass we readwr1te I 1 1 PASSO each page in file 2N m El I 139Pageruns PASS 1 N pages in the file gt the E 2page runs number of passes m BASS 2 l39log2 N I 1 m So total cost is 4page runs rASS 3 2Nlog2 N39 1 Idea Divide and conquer sort subfiles and merge 8page runs General External Merge Sort n More than 3 buffer pages How can we utilize them 3 To sort a file with N pages using B buffer pages Pass 0 use B buffer pages Produce N B sorted runs of B pages each Pass 2 etC merge B 1 runs Disk DiSk B Main memor buffers Cost of External Merge Sort 3 Number of passes 1 1083 1fN B 3 Cost 2N of passes oz Eg with 5 buffer pages to sort 108 page file Pass 0 108 5 22 sorted runs of 5 pages each last run is only 3 pages Pass 1 22 4 6 sorted runs of 20 pages each last run is only 8 pages Pass 2 2 sorted runs 80 pages and 28 pages Pass 3 Sorted file of 108 pages Number of Passes of External Sort N B3 B5 B9 B17 B129 B257 100 7 4 3 2 1 1 1000 10 5 4 3 2 2 10000 13 7 5 4 2 2 100000 17 9 6 5 3 3 1000000 20 10 7 5 3 3 10000000 23 12 8 6 4 3 100000000 26 14 9 7 4 4 1000000000 30 15 10 8 5 4 Replacement Sort Produces sorted runs as long as possible Pick tuple r in the current set with the smallest value that is 2 largest value in output e g 8 in the example III Input Current Set 1 buffer B2 buffers l Output 1 buffer Fill the space in current set by adding tuples from input Write output buffer out if full extending the current run Current run terminates if every tuple in the current set is smaller than the largest tuple in oumut When used in Pass 0 for sorting can write out sorted runs of size 2B on average 10 Blocked 10 for External Merge Sort 3 longer runs often means fewer passes 3 Actually we don t do 10 a page at a time oz In fact read a block of pages sequentially 3 Suggests we should make each buffer input output be a block of pages But this will reduce fanout during merge passes In practice most files still sorted in 23 passes 11 Number of Passes 0f Optimized Sort N B1000 B5000 B10000 100 1 1 1 1000 1 1 1 10000 2 2 1 100000 3 2 2 1000000 3 2 2 10000000 4 3 3 100000000 5 3 3 1000000000 5 4 3 It Block size 32 12 Double Bu ering 3 To reduce wait time for IO request to complete can prefetch into shadow block Potentially more passes in practice most files L111 sorted in 23 passes IINPUT 239 b block size Disk B main memory buffers kway merge Sorting Records 3 Sorting has become highly competitive Parallel sorting is the name of the game 3 Datamation sort benchmark Sort 1M records of size 100 bytes in 1985 15 minutes World records 118 seconds 1998 record 39 16 offtheshelf PC each with 2 Pentium processor tow hard disks running NT40 3 New benchmarks proposed Minute Sort How many can you sort in 1 minute Dollar Sort How many can you sort for 100 14 Using B Trees for Sorting 3 Scenario Table to be sorted has B tree index on sorting columns oz Idea Can retrieve records in order by traversing leaf pages oz Is this a good idea oz Cases to consider B tree is Clustered Good idea B tree is not Clustered Could be a very bad idea 15 st Cost root to the left rt If Alternative 2 is used Clustered B Tree Used for Sorting Index Directs search Data Entries Additional cost of a retrieving data records E l E El each page fetched just Ol lCG most leaf then retrieve all leaf pages Alternative 1 Data Records n Always better than external sorting 16 Unclustered B Tree Used for Sorting 0 Alternative 2 for data entries each data entry contains rid of a data record In general one 10 per data record Index Directs search H m Data Entries quotSequence setquot Worse case 1 O pN p records per page N pages in file Data Records 17 External Sorting vs Unelustered Index N Sorting p1 p10 p100 100 200 100 1000 10000 1000 2000 1000 10000 100000 10000 40000 10000 100000 1000000 100000 600000 100000 1000000 10000000 1000000 8000000 1000000 10000000 100000000 10000000 80000000 10000000 100000000 1000000000 n p of records per page M B1000 and block size32 for sorting n p100 is the more realistic value 18 Summary oz External sorting is important DBMS may dedicate part of buffer pool for sorting 3 External merge sort minimizes disk lO cost Pass 0 Produces sorted runs of size B buffer pages Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less 1 0 cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3 3 Clustered B tree is good for sorting unclustered tree is usually very bad 19 Outline 6 Sorting oz Evaluation of joins oz Evaluation of other operations 20 Some Common Techniques oz Algorithms for evaluating relational operators use some simple ideas extensively I Indexing Can use WHERE conditions to retrieve small set of tuples selections joins I Iteration Sometimes faster to scan all tuples even if there is an index And sometimes we can scan the data entries in an index instead of the table itself I Partitioning By using sorting or hashing we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs Watch for these techniques as we discuss query evaluation 21 Schema for Examples Sailors sid integer sname string rating integer age real Reserves sid integer bid integer day date rname string 3 Reserves Each tuple is 40 bytes long 100 tuples per page 1000 pages 3 Sailors Each tuple is 50 bytes long 80 tuples per page 500 pages 22 Equality oins With One loin Column SELECT FROM Reserves R1 Sailors S1 WHERE R1sidS1sid 3 In algebra R gtlt1 S Common relational operation I R X S is large R X S followed by a selection is inefficient I Must be carefully optimized ozo Assume M pages in R pR tuples per page N pages in S pS tuples per page In our examples R is Reserves and S is Sailors oz We will consider more complex join conditions later oz Cost metric of 103 We will ignore output costs 23 Simple Nested Loops loin foreach tuple r in R do foreach tuple s in S do if r1 sj then add ltr sgt to result oz For each tuple in the outer relation R we scan the entire inner relation S Cost M pR M N 1000 1001000500 1000 5 107 IOs Assuming each I 0 takes 10 ms the join will take about 140 hours 24 PageOriented Nested Loops loin 3 For each age of R get each page of S and write out matching pairs of tuples ltr sgt where r is in Rpage and Sis in Spage Cost M M N 1000 1000500 501000 IOs Assuming each I 0 takes 10 ms the join will take about 14 hours 3 Choice of the smaller relation as the outer If smaller relation S is outer cost 500 5001000 500500 I Os 25 Block Nested Loops loin 6 Take the smaller relation say R as outer the other as inner 4 Use one buffer for scanning the inner S one buffer for output and use all remaining buffers to hold block of outer R For each matching tuple r in R block s in Spage add ltr sgt to result Then read next page in S until S is finished Then read next R block scan 5 R in R l amp S Hash table for block ofR JO 9811 t k lt Bl pages D7 Input buffer for S Output buffer 26 Examples of Block Nested Loops 3 Cost Scan of outer outer blocks scan of inner outer blocks pages of outer block size Given available buffer size B block size is at most B2 MNMB2 oz With Sailors S as outer a block has 100 pages of S Cost of scanning S is 500 I Os a total of 5 blocks Per block of S we scan Reserves 51000 I Os Total 500 5 1000 5500 IOs a little over 1 minute 27 Index Nested Loops loin foreach tuple r in R do foreach tuple sin S Where r1 sj do add ltr sgt to result oz If there is an index on the join column of one relation say S can make it the inn er and exploit the index Cost M MpR cost of finding matching S tuples oz For each R tuple cost of probing S index is about 12 for hash index 24 for B tree Cost of then finding S tuples assuming Alt 2 or 3 for data entries depends on clustering Clustered index 1 l O typical Unclustered up to 1 l 0 per matching S tuple 28 Examples of Index Nested Loops oz Hashindex Alt 2 on sid of Sailors as inner Scan Reserves 1000 page I Os 1001000 tuples For each Reserves tuple 12 l Os to get data entry in index plus 1 l O to get the exactly one matching Sailors tuple Total 1000 100100022 221000 lOs t Hashindex Alt 2 on sid of Reserves as inner Scan Sailors 500 page I Os 80500 tuples For each Sailors tuple 12 l Os to find index page with data entries plus cost of retrieving matching Reserves tuples If uniform distribution 25 reservations per sailor 100000 40000 Cost of retrieving them is 1 or 25 l Os cluster Total 500805002237 88500148500 l Os 29 SortMerge loin R S 3 1 m R and S on the join column 2 Merge them on join col and output result tuples oz Merge repeat until either R or S is finished Scanning Advance scan of R until current Rtuplegtcurrent S tuple advance scan of S until current Stuplegtcurrent R tuple do this until current R tuple current S tuple Matching Now all R tuples with same value in Ri current R group and all S tuples with same value in Sj current S group match output ltr sgt for all pairs of such tuples oz R is scanned once each S group is scanned once per matching R tuple Multiple scans of an S group are likely to find needed pages in buffer 30 Example of SortMerge loin E m GEL mame 28 1 03 12496 guppy 1 ring Lisa 28 1 03 1 1396 yuppy 28 yuppy 9 3 5 O 3 1 1 0 1 1 01 096 dustln 3 1 lubber 8 55 5 3 1 1 02 1 01 296 lubber 44 guppy 5 350 3 1 101 101 196 lubber 58 rusty 10 350 58 103 111296 dustin 4 Cost M log M N log N MN 39 The cost of merging MN could be MN very unlikely MN is guaranteed in foreign key join why As with sorting log M and log N are small numbers eg 3 4 4 With 35 100 or 300 buffer pages both Reserves and Sailors can be sorted in 2 passes total join cost 7500 BNL cost 2500 3300 5500 3100 15000 335 31 st Partitioning Partition Hashloin both relations using hash fn h R tuples in partition i will only match S tuples in partition i Probing Read in partition i of R build hash table on Ri using h2 ltgt M Sean partition i of S search for matches Original Relation Partitions INPUT hash funlcltion Partitions ofR amp S Join Result Hash table for partition hash Ri k lt B1 pages fn ID 112 ID I D I CID M 0 0 0 j Output El Inplgrb liffer buffer V Disk B main memory buffers Disk 32 Observations on Hashloin oz partitions S B 1 and size of largest partition S B 2 to be held in memory Assuming uniformly sized partitions we get M 13 1 lt 13 2 ie B must be gt W Hashjoin works if the smaller relation satisfies above 3 If we build an inmemory hash table to speed up the matching of tuples a little more memory is needed oz If hash function h does not partition uniformly one or more R partitions may not fit in memory Can apply hashjoin technique recursively to do the join of this Rpartition with corresponding Spartition 33 Cost ofHashIoin 3 Partitioning readswrites both relns 2MN Probing reads both relns MN l Os The total is 3MN I In our running example a total of 4500 l Os using hash join less than 1 min compared to 140 hours W NLJ 3 SortMerge Join vs Hash Join Given a minimum amount of memory both have a cost of 3MN l Os Hash Join superior on this count if relation sizes differ greatly Assuming MltN What if sqrtM lt B lt sqrtN Also Hash Join is shown to be highly parallelizable SortMerge less sensitive to data skew result is sorted 34 General loin Conditions oz Equalities over several attributes e g RsidSsid AND RrnameSsname For Index NL build index on ltsid snamegt S is inner or use existing indexes on sid or smime and check the other join condition on the fly For SortMerge and Hash Join sort partition on combination of the two join columns 3 Inequality conditions eg Rrname lt Ssname For Index NL need B tree index 39 Range probes on inner matches likely to be much higher than for equality joins clustered index is much preferred Hash Join Sort Merge Join not applicable Block NL quite likely to be a Winner here 35 Outline 6 Sorting oz Evaluation of joins oz Evaluation of other operations 36 Using an Index for Selections 3 Cost depends on gualifying tuples and clustering Cost of finding qualifying data entries typically small plus cost of retrieving records could be large W o Clustering Consider a selection of the form gpa gt 30 and assume 10 of tuples qualify 100 pages 10000 tuples With a Clustered index cost is little more than 100 1 OS if unclustered upto 10000 1 OS 339 Important re nement for nnclnstered indexes 1 Find qualifying data entries 2 Sort the rid s of the data records to be retrieved 3 Fetch rids in order 37 Two Approaches to General Selections oz First approach 1 Find the most selective access path retrieve tuples using it and 2 apply any remaining terms that don t match the index on the y Most selective access path An index or file scan that we estimate will require the fewest page 1 OS Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuples pages fetched Consider daylt8994 AND bid5 AND sid3 A B tree index on day can be used then bid5 and sid3 must be checked for each retrieved tuple A hash index on ltbid sidgt could be used daylt8994 must then be checked on the fly 38 Intersection of Kids 3 Second approach if we have 2 or more matching indexes that use Alternatives 2 or 3 for data entries Get sets of rids of data records using each matching index Then intersect these sets of rids Retrieve the records and apply any remaining terms Consider daylt8994 AND bid5 AND sid3 If we have a B tree index on day and an index on sid both using Alternative 2 we can retrieve rids of records satisfying daylt8994 using the first rids of records satisfying sid3 using the second intersect these rids retrieve records and check bid5 39 The Projection Operation SELECT DISTINCT Rsid Rbid FROM Reserves R 3 Projection consists of two steps Remove unwanted attributes ie those not specified in the projection Eliminate any duplicate tuples that are produced 39339 Algorithms single relation sorting and hashing based on all remaining attributes 40 Projection Based on Sorting 3 Modify Pass 0 of external sort to eliminate unwanted fields Thus runs of about 2B pages are produced but tuples in runs are smaller than input tuples Size ratio depends on and size of fields that are dropped 3 Modify merging passes to eliminate duplicates Thus number of result tuples smaller than input Difference depends on of duplicates 3 Cost In Pass 0 read original relation size M write out same number of smaller tuples ln merging passes fewer tuples written out in each pass Using Reserves example 1000 input pages reduced to 250 in Pass 0 if size ratio is 025 41 Projection Based on Hashing 3 Partitioning phase Read R using one input buffer For each tuple discard unwanted fields apply hash function hi to choose one of Bl output buffers Result is Bl partitions of tuples with no unwanted fields 2 tuples from different partitions guaranteed to be distinct 339 Duplicate elimination phase For each partition read it and build an inmemory hash table using hash fn h2 ltgt hi on all fields while discarding duplicates If partition does not fit in memory can apply hashbased projection algorithm recursively to this partition 42 Discussion of Projection 3 Sortbased approach is the standard better handling of skew and result is sorted oz If an index on the relation contains all wanted attributes in its search key can do index only scan Apply projection techniques to data entries much smaller oz If an ordered ie tree index contains all wanted attributes as prefix of search key can do even better Retrieve data entries in order indexonly scan discard unwanted fields compare adjacent tuples to check for duplicates 43 Set Operations 3 Intersection and crossproduct special cases of join I Intersection equality on all fields 3 Union Distinct and Except similar we ll do union 3 Sorting based approach to union Sort both relations on combination of all attributes Scan sorted relations and merge them removing duplicates oz Hash based approach to union Partition R and S using hash function h For each S partition build inmemory hash table using h2 Scan R partition For each tuple probe the hash table If the tuple is in the hash table discard it OW add it to the hash table 44 Aggregate Operations AVG MIN etc oto Without grouping In general requires scanning the relation Given index whose search key includes all attributes in the SELECT or WHERE clauses can do indexonly scan oz With grouping GROUP BY Sort on groupby attributes then scan relation and compute aggregate for each group Can improve upon this by combining sorting and aggregate computation Similar approach based on hashing on groupby attributes Given tree index whose search key includes all attributes in SELECT WHERE and GROUP BY clauses can do indexonly scan if groupby attributes form pre x of search key can retrieve data entries tuples in groupby order 45 Summary 3 A virtue of relational DBMSS queries are composed of a few basic operators the implementation of these operators can be carefully tuned 3 Algorithms for evaluating relational operators use some simple ideas extensively I Indexing Can use WHERE conditions to retrieve small set of tuples selections joins I Iteration Sometimes faster to scan all tuples even if there is an index And sometimes we can scan the data entries in an index instead of the table itself I Partitioning By using sorting or hashing we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs 46 Summary contd 3 Many implementation techniques for each operator no universally superior technique for most operators 3 Must consider available alternatives for each operation in a query and choose best one based on I system state e g memory and I statistics table size tuples matching value k 3 This is part of the broader task of optimizing a query composed of several ops 47 Parallel DBMS CMPSCI 645 Slide content due to Ramakrishnan Gehrke Hellerstein Gray Figures taken from Dewitt and Gray Parallel Database Systems The Future of High Performance Database Systems CACM 1992 Parallel vs Distributed D35 3 Parallel database systems Improve performance through parallelizing various operations loading data indexing query evaluation Data may be distributed but purely for performance reasons 9 Distributed database systems Data is physically stored across various sites each of which runs DBMS and can function independently Data distribution determined by local ownership and availability in addition to performance 2 Why Parallel Access To Data At 10 NIBS 1000 x parallel 12 days to scan 15 minute to scan 3 Parallelism divide a big problem into many smaller ones to be solved in parallel DBMS The Success Story 3 DBMSs are the most only successful application of parallelism Teradata Tandem vs Thinking Machines KSR Every major DBMS vendor has some server 3 Reasons fOI39 SUCCESS Bulkprocessing partition isrn Natural pipelining Inexpensive hardware can do the trick Users appprogrammers don t need to think in Some Terminology 3 SpeedUp More I39GSOUI39CGS means proportionally less time for given amount of data problem size constant system grows 3 ScaleUp If resources increased in proportion to increase in data size time is constant problems size system both grow Xact sec throughput sec Xact response time degree of ism Ideal V degree of ism Enemies of good speedup scaleup 3 Start up work If thousands of processes must be started this can dominate actual computation time 9 Interference The slowdown each new process imposes on all others when accessing shared resources 3 Skew Variance in the size of jobs for each process Service time for whole job is the service time of slowest step of job 7 Architecture Issue Shared What Interconnection Network I Global Shared Memory 41 Shared Memory Multiprocessor mm Interconnection Net nam work mm Shared Disk Multiprocessor Interconnection Network Shared memory Shared disk 3 Alternative architectures Shared nothing Shared memory all processors shared common global memory and access to all disks Shared disk all processors have private memory but direct access to all disks Shared nothing each memory disk owned by processor which acts as server for data 8 Architecture Issue Shared What Interconnection Network I Global Shared Memory Shared Memory Multiprocessor Shared Disk Multiprocessor Shared memory Shared disk Shared nothing Advantages oMinimize interference by minimizing shared resources Expoit commodity processors and memory Disk and memory accesses are local Traffic on interconnection network is minimized Shared nothing each memory disk owned by processor which acts as server for data 8 Di ferent Types of DBMS lism 3 lntraoperator parallelism get all machines working to compute a given operation scan sort join 3 Interoperator parallelism each operator may run concurrently on a different site exploits pipelining 3 Interquery parallelism different queries run on different sites 3 We ll focus on intraoperator ism Limits ofpipelined parallelism in DBMS 3 Relational pipelines usually not very long 3 Some relational operators block e g sorting aggregation 3 Execution cost of one operator may be much higher than another example of skew 3 As a result partitioned parallelism is key to achieving speedup and scaleup Automatic Data Partitioning Partitioning a table Good for equijoins Good for equijoins Good to spread load range queries groupby Shared disk and memory less sensitive to partitioning Shared nothing benefits from quotgoodquot partitioning Parallel query processing 7 split each Join output into 3 streams W merge the 3 join input streams at each insert node Perform 13 of the join split each B scan output into 3 streams merge the 3 input streams at each join node Two relational scans consuming two input relations A and B and feeding their outputs to a join operator that in turn produces a data stream C Parallel Scans 3 Scan in parallel and merge 3 Selection may not require all sites for range or hash partitioning 3 Indexes can be built at each partition Parallel Hash 0m OUTPUT INPUT hash Original Relations o funfltion R then S Disk B main memory buffers Disk 3 In first phase partitions get distributed to different sites A good hash function automatically distributes work evenly 9 Do second phase at each site 3 Almost always the winner for equi join Data ow Network for 0in 11139 Bi B i Ai AJ39 Bi BJ tii A39i Bi Bi Aj SJ r 1quot 39 L SPLIT J L SPLIT J L SPLIT J SPLTT FJ Lj l ERGEIJ l b TERGE J Ll flERGE J MERGE J ungff Ema R Mva f EvaW hair Kievff M wwfquot Ruff r 4 l l SCAN H SCAN l JOIN I 39 134 r Hquot 39f I quot R f AI BI A B 39h39 quot39 3 Good use of split merge makes it easier to build parallel versions of sequential join code Complex Parallel Query Plans 3 Complex Queries InterOperator parallelism Pipelining between Operators 9 note that sort and phase 1 of hashjoin block the pipeline Bushy Trees Parallel query optimization issues 3 Cost estimation in parallel environment 3 Consider bushy plans much larger plan space 3 Some parameters only known at runtime number of free processors available buffer space Sequential vs Parallel Optimization oz Best serial plan Best plan 3 Trivial counterexample Table partitioned with local secondary index at two nodes Range query all of node 1 and 1 of no Node 1 should do a scan of its partition Node 2 should use secondary index 9 SELECT FROM telephoneboollt WHERE name lt NoGood Parallel DBMS Summary 3 isrn natural to query processing Both pipeline and partition isrn 3 SharedNothing vs SharedMern Shareddisk too but less standard Sharedmern easy costly Doesn t scaleup Sharednothing cheap scales well harder to implement 3 lntraop Interop 8 Interquery isrn all possible DBMS Summary cont 3 Data layout choices important 3 Most DB operations can be done partitionl Sort Sortmerge join hashjoin 3 Complex plans Allow for pipeline ism but sorts hashes block the pipeline Partition ism achieved Via bushy trees DBMS Summary cont 3 Hardest part of the equation optimization 2phase optimization simplest but can be ineffective More complex schemes still at the research stage 3 We haven t said anything about Xacts logging Easy in sharedmemory architecture Takes some care in sharednothing Concurrency Control CMPSCI 645 Apr 3 2008 Slide content adapted from Ramakrishnan amp Gehrke Zack Ives Review the ACID Properties 4 Particularly important ensuring ACID properties I Atomicity each operation looks atomic to the user I Consistency each operation in isolation keeps the database in a consistent state this is the responsibility of the user I Isolation should be able to understand what s going on by considering each separate transaction independently I Durability updates stay in the DBMS Review properties of schedules 3 Serializable schedule A schedule that is equivalent to some serial execution of the transactions I Con ictserializability I Viewserializability rt Recoverable schedule if Tj reads data written by Ti then Ti commits before Tj commits rt Cascadeless schedule if Tj reads data written by Ti then Ti commits before read operation of Tj Today 3 Enforcing desirable schedules I Lockbased 39 Strict 2PL 2PL Phantoms 0 Index locking I Weak consistency in SQL LockBased Concurrency Control 3 DBMS must ensure I only serializable recoverable schedules are allowed I No actions of committed trans lost while undoing aborted trans 00 Lock associated with some object I shared or exclusive oz Locking protocol set of rules to be followed by each transaction to ensure good properties Lock Compatibility Matrix Locks on a data item are granted based on a lock compatibility matrix Mode of Data Item None Shared Exclusive Shared Y Y N Request mode Exclusive Y N N When a transaction requests a lock it must wait block until the lock is granted Transaction performing locking T1 10ckXA RA WA unlockA 10ckSB RB unlockB TwoPhase Locking ZPL 00 TwoPhase Locking Protocol Each Xact must obtain a S shared lock on object before reading and an X exclusive lock on object before writing A transaction can not request additional locks once it releases any locks o growing phase o shrinking phase Strict Two Phase Locking Strict ZPL 3 Strict Twophase Locking Strict ZPL Protocol Each Xact must obtain a S shared lock on object before reading and an X exclusive lock on object before writing A transaction can not request additional locks once it releases any locks o growing phase o shrinking phase All X exclusive locks acquired by a transaction must be held until commit Not admissible under ZPL T2 RA WA RB W03 Commit Lockbased protocols 3 2PL ensures conflict serializability I Transactions can be ordered by their end of growing phase called lock point I A 2PL schedule is equivalent to the serial schedule where transactions ordered by lock point order 3 Strict 2PL ensures conflict serializable and cascadeless schedules I Writers hold an X lock until they commit Schedule following strict ZPL T1 SA RA Lock Management Lock and unlock requests are handled by the lock manager Lock table entry for an object Number of transactions currently holding a lock Type of lock held shared or exclusive Pointer to queue of lock requests Locking and unlocking have to be atomic operations Lock upgrade transaction that holds a shared locllt can be upgraded to hold an exclusive locllt Deadlocks 3 Deadlock Cycle of transactions waiting for locks to be released by each other 00 Tend to be rare in practice 3 Two ways of dealing with deadlocks Deadlock prevention Deadlock detection Deadlock gmnmd gmnmd queued queued 3 Deadlock must be prevented or avoided Deadlock Detection 3 Create a waitsfor graph Nodes are transactions There is an edge from Ti t0 Tj if Ti is waiting for Tj to release a lock add edge when queueing a lock request remove edge when granting lock request oz Periodically Check for cycles in the waitsfor graph Deadlock Detection Continued T1 T2 T3 T4 8A RA SB Deadlock Prevention 3 Assign priorities based on timestamps Assume Ti wants a lock that Tj holds Two policies are possible WaitDie If Ti has higher priority Ti waits for Tj otherwise Ti aborts Woundwait If Ti has higher priority Tj aborts otherwise Ti waits oz If a transaction restarts make sure it has its original timestamp Performance of Locking 3 Lockbased schemes resolve con icting schedules by blocking and aborting I in practice few deadlocks and relatively few aborts I most of penalty from blocking 00 To increase throughput I lock smallest Objects possible I reduce time locks are held I reduce hotspots What should we lock T1 T2 SELECT Srating MNSage UPDATE FROM Sailors S SailorsName Rating Age WHERE Srating 8 VALUES Joe 8 33 oz T1 Slock on Sailors T2 Xlock on Sailors 3 T1 Slock on all rows with rating8 T2 X lock on Joe s tuple Phantom 3 T1 quotFind oldest sailor for each of the rating levels 1 and 2 I T1 locllts all pages containing sailor records with rating 1 and finds m sailor say age 71 3 T2 Insert new sailor rating1 age96 3 T2 Deletes oldest sailor with rating 2 and say age 80 and commits oz T1 now locllts all pages containing sailor records with rating 2 and finds oldest say age 63 The Problem 3 T1 implicitly assumes that it has locked the set of all sailor records with rating 1 I Assumption only holds if no sailor records are added while T1 is executing I Need some mechanism to enforce this assumption Index locking and predicate locking oz Example shows that con ict serializability guarantees serializability only if the set of objects is fixed 3 Strict 2PL will not assure serializability The Phantom Problem Phantom problem A transaction retrieves a collection of tuples and sees different results even though it did not modify the tuples itself I Conceptually rnust lock all possible rows I Can lock entire table I Better use index locking Index Locking 3 If there is an index on the rating field using Alternative 2 T1 should lock the index pag containing the data entries with rating 1 I If there are no records with rating 1 T1 must lock the index page where such a data entry would be if it existed 3 If there is no suitable index T1 must lock all pages and lock the file table to prevent new pages from being added to ensure that no new records with rating 1 are added Predicate Locking 3 Grant lock on all records that satisfy some logical predicate e g age gt 2salary 3 Index locking is a special case of predicate locking for which an index supports efficient implementation of the predicate lock oz In general predicate locking has a lot of locking overhead Locking in B Trees oz How can we efficiently locllt a particular leaf node oz One solution Ignore the tree structure just lock pages while traversing the tree following 2PL 3 This has terrible performance I Root node and many higher level nodes become bottlenecks because every tree access begins at the root Two Useful Observations oz Higher levels of the tree only direct searches for leaf pages oz For inserts a node on a path from root to modified leaf must be locked X mode of course only if a split can propagate up to it from the modified leaf Similar point holds wrt deletes 3 We can exploit these observations to design efficient locking protocols that guarantee serializability even though they violate ZPL A Simple Tree Locking Algorithm 3 Search Start at root and go down repeatedly S lock child then unlock parent oz Insert Delete Start at root and go down obtaining X locks as needed Once child is locked check if it is w I If child is safe release all locks on ancestors 3 Safe node Node such that changes will not propagate up beyond this node I Inserts Node is not full I Deletes Node is not halfempty Do E l 1 Search 38 xamp e 2 Insert 45 W IF f C G H 1 D E I Transaction support in SQL 3 Transaction automatically started for SELECT UPDATE CREATE oz Transaction ends with COMMIT or ROLLBACK abort oz SQL 99 supports SAVEPOINTs which are simple nested transactions Specify isolation level I General rules of thumb wrt isolation I Fully serializable isolation is more expensive than no isolation 0 We can t do as many things concurrently or we have to undo them frequently 3 For performance we generally want to specify the most relaxed isolation level that s acceptable I Note that we re slightly violating a correctness constraint to get performance Specifying isolation level in SQL SET TRANSACTION READ WRITE READ ONLY ISOLATION LEVEL LEVEL SERIALIZABLE REPEATABLE READ READ COMMITTED J Less 39So39at39on READ UNCOMMITED The default isolation level is SERIALIZABLE Locks sets of objects avoids phantoms REPEATABLE READ oz T reads only changes made by committed transactions 3 No value read written by T is changed by another transaction until T completes 00 Phantoms possible inserts of qualifying tuples not avoided Locks only individual objects READ COMMITTED oz T reads only changes made by committed transactions t No value read written by T is changed by another transaction until T completes rt Value read by T may be modified while T in progress oz Phantoms possible X locks on written objects held to end 8 looks on read objects but released immediately READ UN C OMMI TTE D 3 Greatest exposure to other transactions oz Dirty reads possible oz Can t make changes must be READ ONLY 3 Does not obtain shared locks before reading I Thus no locks ever requested Acceptable Dirty Read If we are just Checking availability of an airline seat a dirty read might be fine Why is that Reservation transaction EXEC SQL select occupied into 000 from Flights where Num 123 and date110399 and seat 23f if loco EXEC SQL update Flights set occupiedtrue where Num 123 and date110399 and seat 23f else noll user that seat is unavailable Real systems 3 IBM DB2 Informix Microsoft SQL Server Sybase all use Strict PL or variants 3 Oracle use multiversion CC we didn t cover this oz All deal with deadlocks using waitsfor graph Summary 3 There are several lockbased concurrency control schemes Strict 2PL 2PL Con icts between transactions can be detected in the dependency graph 3 The lock manager keeps track of the locks issued Deadlocks can either be prevented or detected 3 Na39ive locking strategies may have the phantom problem Summary Corlth 3 Index locking is common and affects performance significantly I Needed when accessing records Via index I Needed for locking logical sets of records index locking predicate locking 3 Treestructured indexes I Straightforward use of 2PL very inefficient 3 In practice better techniques now known do recordlevel rather than pagelevel locking Overview of Storage and Indexing CMPSCI 645 Feb 28 2008 Slides Courtesy of R Ramakrishnan and Gehrke DBMS Architecture 3 Query Parser Query Rewriter Query Optimizer Query Executor V Lock Manager File amp Access Methods Log Manager Buffer Manager V I Disk Space Manager I Data 011 External Storage 4 Disks Can retrieve random page at fixed cost I But reading several consecutive pages is much cheaper than reading them in random order 4 Tapes Can only read pages in sequence I Cheaper than disks used for archival storage 4 Page Unit of information read from or written to disk I Size of page DBMS parameter 4KB 8KB oz Disk space manager I Abstraction a collection of pages I Allocate deallocate a page I Read write a page 4 Page 11 O I Pages read from disk and pages written to disk I Dominant cost of database operations Bu er Management 3 Architecture I Data is read into memory for processing I Data is written to disk for persistent storage oz Buffer manager stages pages between external storage and main memory buffer pool 3 Access method layer makes calls to the buffer manager Access Methods 4 Access methods routines to manage various diskbased data structures I Files of records I Various kinds of indexes oz File of records I Important abstraction of external storage in a DBMS I Record id rid is sufficient to physically locate a record 4 Indexes I Auxiliary data structures I Given values in index search key fields find the record ids of records with those values File organizations 8 access methods Many alternatives exist each ideal for some situations and not so good in others Heap unordered files Suitable when typical access is a file scan retrieving all records Sorted Files Best if records must be retrieved in some order or only a range of records is needed Indexes Data structures to organize records via trees or hashing Like sorted files they speed up searches for a subset of records based on values in certain search key fields Updates are much faster than in sorted files Indexes 4 An index on a file speeds up selections on the search keyfields for the index Any subset of the fields of a relation can be the search key for an index on the relation Search key is not the same as key minimal set of fields that uniquely identify a record in a relation 439 An index contains a collection of data entries and supports efficient retrieval of all data entries k with a given key value k I Data entry versus data record I Given data entry k we can find record with key k in at most one disk IO Details soon Q 9 Q 9 B Tree Indexes NonJeaf Pages Leaf d 1bdofobr1br1b Sorted by search key Leaf pages contain data entries and are Chained prev amp next Nonleaf pages have index entries only used to direct searches index entry I K1 P1 K2P2 000 To i i i ltF Example B Tree Root Note how data entries in leaf level are sorted 7 A A A A 23 678 1416 llzzlwl 2729 33343839 4 Equality selection find 28 29 4 Range selection find all gt 15 and lt 30 4 Insert delete Find data entry in leaf then change it Need to adjust parent sometimes I And change sometimes bubbles up the tree HashBased Indexes oz Good for equality selections Search key Index IS a collectlon of value k buckets I Bucket primary page plus zero or more over ow pages 1 2 is N 1 39 Buckets contain data entries Ii Ii l 3 Hashing mctiorz h hk amp bucket of data entries of the search key value k I No need for quotindex entries in this scheme 10 Alternatives for Data Entry k in Index oz In a data entry k we can store I Altl Data record With key value k I Alt2 ltk rid of data record with search key value kgt I Alt3 ltk list of rids of data records With search key kgt 3 Choice of alternative for data entries is orthogonal to indexing technique used to locate data entries with a key value k I Indexing techniques B tree index hash index I Typically indexes contain auxiliary information that directs searches to the desired data entries 11 Alternatives for Data Entries Corlth oz Alternative 1 Index structure is a file organization for data records instead of a Heap file or sorted file At most one index on a given collection of data records can use Alternative 1 Otherwise data records are duplicated leading to redundant storage and potential inconsistency If data records are very large of pages containing data entries is high Implies size of auxiliary information in the index is also large typically eg B tree 12 Alternatives for Data Entries Corlth oz Alternatives 2 and 3 Data entries with search keys and rids typically much smaller than data records Index structure used to direct search which depends on size of data entries is much smaller than with Alternative 1 80 better than Alternative 1 with large data records especially if search keys are small Alternative 3 more compact than Alternative 2 Alternative 3 leads to variable sized data entries even if search keys are of fixed length Variable sizes records data entries are hard to manage 13 Index Classi cation 3 Key Primary key Candidate key 3 Primary index vs secondary index I If search key contains primary key then called primary index I Other indexes are called secondary indexes 3 Unique index Search key contains a candidate key No data entries can have the same value 14 Index Classi cation Corlth 3 Clustered vs unclustered If order of data records is the same as or close to order of data entries then it s a clustered index Alternative 1 implies clustered Alternatives 2 and 3 are clustered only if data records are sorted on the search key field A file can be clustered on at most one search key Cost of retrieving data records through index varies greatly based on Whether index is clustered or not 15 Clustered vs Unclustered Index 3 Suppose that Alternative 2 is used for data entries and that the data records are stored in a Heap file To build clustered index first sort the Heap file with some free space on each page for future inserts Overflow pages may be needed for inserts Thus order of data recs is close to but not identical to the sort order Data entries Data entries CLUSTERED Index File d b h m DEE Index entries direct search for data entries Data Records Data Records Cost Model for Our Analysis We ignore CPU costs for simplicity B The number of data pages R Number of records per page D Average time to read or write disk page Measuring number of page lO s ignores gains of prefetching a sequence of pages thus even 1 0 cost is only approximated Averagecase analysis based on several simplistic assumptions n Good enough to show the overall trends 17 Comparing File Organizations Heap files random order insert at eof Sorted files sorted on ltage salgt 3 Clustered B tree file Alternative 1 search key ltage salgt Heap file with unclustered B tree index on search key ltage salgt oz Heap file with unclustered hash index on search key ltage salgt 18 Operations to Compare 3 Scan Fetch all records from disk ozo Equality search ozo Range selection Insert a record ozo Delete a record 19 Assumptions in Our Analysis ozo Heap Files Equality selection on key exactly one match 3 Sorted Files Files compacted after deletions 3 Indexes I Alt 2 3 data entry size 10 size of record Hash No overflow chains 80 page occupancy gt File size 125 data size BTree 67 occupancy typical implies file size 15 data size Balanced with fanout F 133 typical at each nonlevel 20 Assumptions contd 3 Scans I Leaf levels of a treeindex are chained I Index dataentries plus actual file scanned for unclustered indexes 3 Range searches I We use tree indexes to restrict the set of data records fetched but ignore hash indexes 21 of leaf pages B0167015B Cost of index IO 015BD of data entries on a leaf page R1006767R Cost of data IO 015B67RDBDR Key unclustered Btree one 10 per data entry of data entry pages B01800125B Cost of index IO 0125BD of data entries on a hash page R100808R Cost of data IO 0125B8RDBDR Key unclustered hash index one 10 per data entry 22 Cost of Operations Scan Equality Range Insert Delete Heap File BD 5BD BD 2D Search D Sorted File BD Dlong E gfg ilg Search Search pages Clustered 15BD DlogF15 231133 Search Search Tree Index B pages g D D Unclustered BDR D1Iogp ESE le Search Search Tree Index 15 1513 recs g 3D 3D Unclustered BDR 2D BD 4D 4D Hash Index 125 n Seveml assumptions underlie these rough estimates SQL CMPSCI 645 SQL Overview SQL Preliminaries Nested queries Integrity constraints Correlat39on Null values Query capabilities M df th d t b SELECT FROM O 39y39 9 9 aa 5 59 WHERE blocks 39 VIeWS Basic features ordering duplicates Set ops union intersect except Aggregation amp Grouping Review in the textbook Ch 5 The SQL Query Language Structured Query Language Developed by IBM system R in the 1970s Need fora standard since it is used by many vendors Evolving standard SOL86 SOL89 minor revision SOL92 major revision SQL99 major extensions SOL2003 minor revisions Two parts of SQL Data Definition Language DDL Createaterdeete tables and their attributes estabish and modify schema Data Manipulation Language DML Query and modify database instance Creating Relations in SQL Creates the Student relation CREATE TABLE Student Observe that the type domain sid CHARQO of each field is specified and name CHARQO enforced by the DBMS loglnCHAR1039 age INTEGER whenever tuples are added or a REAL modified gp As another example the I CREATE TABLE Takes Takes table holds Information Sid CHARQO about courses that students Cid CHARQO take grade CHAR2 Characters CHAR20 Numbers MONEY VARCHAR40 Data Types in SQL fixed length variable length BIGINT INT SMALLINT TINYINT REAL FLOAT differ in precision Times and dates DATE DATETIME Others Destroying and Altering Relations DROP TABLE Student Destroys the relation Student The schema information and the tuples are deleted ALTER TABLE Student ADD COLUMN firstYear integer The schema of Student is altered by adding a new field every tuple in the current instance is extended with a null value in the new field Integrity Constraints le IC condition that must be true for any instance of the database le are specified when schema is defined le are checked when relations are modified A legal instance of a relation is one that satisfies all specified le DBMS should only allow legal instances If the DBMS checks le stored data is more faithful to realworld meaning Avoids data entry errors too Key Constraints A set of fields is a key for a relation if 1 No two distinct tuples can have same values in all key fields and 2 This is not true for any subset of the key If part 2 false then fields are a superkey If there s more than one key for a relation one of the keys is chosen by DBA to be the primary key Eg sid is a key for Students What about name The set sid gpa is a superkey Student table STUDENT 50000 Dave davecs 19 32 53666 Jones jonescs 18 33 53688 Smith smithee 18 32 53650 Smith smithmath 19 37 53831 Madayan madayanmusic 11 18 53832 Guldu guldumusic 12 20 Specifying Key Constraints in SQL CREATE TABLE Student Sid CHAR20 name CHAR20 login CHAR10 age INTEGER gpa REAL UNIQUE name age PRIMARY KEY Sid Primary and Candidate Keys in SQL o Possibly many candidate keys specified using UNIQUE one of which is chosen as the primary key ozo For a given student and course CREATE TABLE Takes there is a single grade vs Sid CHARQO Students can take only one course Cid CHAR20 and receive a single grade for that grade CHAR2 course further no two students in a PRIMARY KEY sidcid course receive the same grade CREATE TABLE Takes to Used carelessly an IC can prevent Sid CHARQO the storage of database instances Cid CHARQO that arise in practice grade CHARQ PRIMARY KEY sid UNIQUE Cid grade Foreign Keys Referential Integrity Foreign key Set of fields in one relation that is used to refer to a tuple in another relation Must correspond to primary key of the second relation Like a logical pointer Eg sid is a foreign key referring to Students Takessid string Cid string grade string If all foreign key constraints are enforced referential intearitv is achieved ie no dangling references Can you name a data model wo referential integrity Links in HTML Foreign Keys in SQL Only students listed in the Students relation should be allowed to enroll for courses CREATE TABLE Takes sid CHAR20 cid CHAR20 grade CHAR2 PRIMARY KEY sidCid FOREIGN KEY sid REFERENCES Students STUDENT Takes 50000 Dave davecs 50000 445 A 53666 Jones jonescs 74 53688 Smith smithee 53666 435 B 53688 483 O 53650 Smith smithmath Enforcing Referential Integrity o Consider Student and Takes sid in Takes is a foreign key that references Student What should be done if a Takes tuple with a nonexistent student id is inserted Reject it What should be done if a Student tuple is deleted Also delete all Takes tuples that refer to it Disallow deletion of a Students tuple that is referred to Set sid in Takes tuples that refer to it to a default sid In SQL also Set sid in Takes tuples that refer to it to a special value null denoting unknown or inapplicable Similar if primary key of Students tuple is updated Referential Integrity in SQL and support all CREATE TABLE Takes 4 options on deletes and updates sid CHAR20 Default is NO ACTION delete cid CHAR 20 update IS rejected CASCADE also delete all grade CHARQ tuples that refer to deleted PRIMARY KEY Soldad tuple FOREIGN KEY 51d SET NULL SET DEFAULT sets REFERENCES Students foreign key value of ON DELETE CASCADE referencing tuple ON UPDATE SET DEFAULT Where do le Come From le are based upon the semantics of the realworld enterprise that is being described in the database relations We can check a database instance to see if an IC is violated but we can NEVER infer that an IC is true by looking at an instance An IC is a statement about all possible instances From example we know name is not a key but the assertion that sid is a key is given to us Key and foreign key le are the most common more general le supported too SQL Overview Query capabilities SELECT FROMWHERE blocks Basic features ordering duplicates Set operations union intersect except Aggregation amp Grouping Nested queries correlation Null values Example database Sailors sname rating age Boat bname color s sid bid day Key for each table indicated by underlined attributes Reserves Sailors H H M 7 H x He an 1N m quotMm H H 5 w 1 J L J l L SQL Query Basic form plus many many extensions SELECT ISTINCT targetlief FROM relatienIist WHERE qualification canditions For example SELECT Sid sname rating age FROM Sailers WHERE age gt 21 Basic SQL Query ml A list of attributes of relations in relation list 0 1 A list of relation names possibly with a rangevariable after each name Comparisons Attr op const or Attr1 cpAttr2 where op is one of lt gt s 2 7 combined using AND OR and NOT 0 ufrl is an optional keyword indicating that the answer should not contain duplicates Default is that duplicates are noteliminated Simple SQL Query Sailors l l gill lill Tibial 22 dustin 7 45 3 1 lubber 8 555 58 rusty 1O 3935 71 zorba 10 16 22 dustin 7 45 31 lubber 8 555 58 rusty 10 35 Conditions in the WHERE clause are like selection Cagelt21 Selection conditions What goes in the WHERE clause xyxlty xlty x y etc For number they have the usual meanings For CHAR and VARCHAR lexicographic ordering For dates and times what you expect Also pattern matching on strings s LIKE p The LIKE operator 5 LIKE p pattern matching on strings p may contain two special symbols 7 any sequence of Charaders 7 any single Charader Find aii Students wnose name begins and ends Witn b39 SELECT FROM S 101 HERE sname LIKE Awb Simple SQL Query Sailors 22 dustin 7 45 31 lubber 8 555 58 rusty 1O 3935 71 zorba 1O 16 V H ii ili ydus tin 45 7 i lubber 555 rusty 35 Conditions in the SELECT clause are like projection Hsnameage Note confusing terminology Conditions in the WHERE clause are like selection Oagelt21 Conditions in the SELECT clause are like projection Hsnameage Eliminating Duplicates SELECT DISTINCTsname Compare S ROM Sailors to Default behavior does not eliminate duplicates Ordering the Results SELECT sname rating age FROM Sailors WHERE a egt ORDER BY rating sname Ordering is ascending unless you specify the DESC keyword Ties are broken by the second attribute on the ORDER BY list etc Conceptual Evaluation Strategy o Semantics of an SQL query defined in terms of RA a conceptual evaluation strategy eqUiVI Compute the crossproduct of relationlist X Discard resulting tuples if they fail qualifications 0 Delete attributes that are not in targetlist H If DISTINCT is specified eliminate duplicate rows 0 Probably the least efficient way to compute a query optimizer will find more efficient plan Example of Conceptual Evaluation snaroe rating age E m d 22 mm 7 45 C 22 101 101096 31 lubber 8 555 58 rusty 10 350 58 103 111296 SELECT Ssname FROM Sailors 5 Reserves R WHERE SsidRsid AND Rbid103 sid sname rating age sid bid day 22 dustin 7 450 22 101 10 10 96 22 dustin 7 450 58 103 11 1296 31 lubber 8 555 22 101 10 10 96 31 lubber 8 555 58 103 11 12 96 58 rusty 10 350 22 101 10 10 96 58 rusty 10 350 58 103 11 12 96 Example SELECT sname FROM Sailors 8 Reserves R Boats B WHERE Ssid Rsid AND Rbid Bbid AND Bcolor red What does this query compute Find the names of sailors Who have reserved a red boat 32 Please write in SQL Find the colors of boats reserved by Lubber SELECT Bcolor FROM Sailors 8 Reserves R Boats B WHERE Ssid Rsid AND Rbid Bbid AND Ssname Lubber 33 Renaming Columns Sailors 22 dustin 7 45 3 1 lubber 8 555 58 rusty 1O 3935 71 zorba 1O 16 dustin 45 lubber 555 rusty 35 Disambiguating Attributes Sometimes two relations have the same attr Personpname address worksfor Companycname address Which address SELECT DISTINCT pname address FROM Person Company WHERE worksfor oname SELECT DISTINCT Personpname Companyaddress FROM Person Company WHERE Personworksfor Companycname Range Variables in SQL Purchase buyer seller store product Find all stores that sold at least one product that was sold at BestBuy SELECT DISTINCT xstore FROM Purchase AS x Purchase AS y WHERE xproduct yproduct AND ystore BestBuy Please write in SQL Selfjoin on Flights The departure and arrival cities of trips consisting of two direct flights SELECT F1 depart F2arrive FROM Flights as F1 Flights as F2 WHERE F1 arrive F2depart FLIGHTS NYC Reno NYC Oakland Boston Tampa Oakland Boston Tampa NYC SQL Overview Query capabilities SELECT FROMWHERE blocks Basic features ordering duplicates Set operations union intersect except l Aggregation amp Grouping Nested queries correlation Null values 38 Set operations UNION INTERSECTION EXCEPT sometimes called MINUS Recall schemas must match for these operations 39 UNION example Find the names of sailors Who have reserved a red or a green boat SELECT sname FROM Sailors 8 Reserves R Boats B WHERE Ssid Rsid AND Rbid Bbid AND Bcolor red UNION SELECT sname FROM Sailors 8 Reserves R Boats B WHERE Ssid Rsid AND Rbid Bbid AND Bcolor green 40 UNION Duplicates ARE NOT eliminated by default in basic SELECTFROMWHERE queries Duplicate ARE eliminated by default for UNION queues To preserve duplicates in UNION you must use UNION ALL 41 UNION example alternative Find the names of sailors Who have reserved a red or a green boat SELECT DISTINCT sname FROM Sailors S Reserves R Boats B WHERE Ssid Rsid AND Rbid Bbid AND Bcolor red OR Bcolor green 42 Find the names of sailors Who have reserved a red or a green boat Find the names of sailors Who have reserved a red and a green boat SELECT sname FROM Sailors 8 Reserves R Boats B WHERE Ssid Rsid AND Bbid Bbid AND Bcolor red AND Bcolor green This doesn t work What does this query return Find the names of sailors Who have reserved a red and a green boat SELECT sname FROM Sailors 8 Reserves R Boats B WHERE Ssid Rsid AND Rbid Bbid AND Bcolor red INTERSECT SELECT sname FROM Sailors 8 Reserves R Boats B WHERE Ssid Rsid AND Rbid Bbid AND Bcolor green SQL Overview Query capabilities SELECT FROMWHERE blocks Basic features ordering duplicates Set ops union intersect except Aggregation amp Grouping 45 Aggregation SELECT AvgSage FROM Sailors WHERE Srating 10 SQL supports several aggregation operations Aggregation Count SELECT Coumm FROM Sailors WHERE rating gt 5 Except for COUNT all aggregations applyto a single attribute Aggregation Count COUNT applies to duplicates unless otherwise stated SELECT Countcategory FROM Product WHERE year gt 1995 Better SELECT CountDISTINCT category FROM Product WHERE year gt 1995 Simple Aggregation Purchaseproduct date price quantity Example 1 find total sales for the entire database SELECT Sumprice quantity FROM Purchase Example 1 find total sales of bagels SELECT Sumprice quantity FROM Purchase WHERE product bagel GROUP BY and HAVING Clauses We often want to apply aggregates to each of a number of groups of rows in a relation Find the age of the youngest sailor for each rating level SELECT MIN Sage FROM Sailors 8 WHERE Srating i Fori1210 50 Sailors Grouping SELECT Srating MNSage FROM Sailors S GROUP BY Srating New Table Queries With GROUP BY and HAVING m eELEGT reou iuuenE Roueer f m in v rig i gitW 3 oThe targetlistoontains i attribute names ii terms with aggregate operations eg MIN Sage The attribute list i must be a subset of grouping list Intuitively each answer tuple corresponds to a group and these attributes must have a single value per group Conceptual Evaluation oThe crossproduct of relationlist is computed tuples that fail qualification are discarded unneoessary fields are deleted and the remaining tuples are partitioned into groups by the value of attributes in groupinglist oThe groupqualification is then applied to eliminate some groups Expressions in group quaifioation must have a single value per group oOne answer tuple is generated per qualifying group 53 39 Find age of the youngest sailor with age 218 for each rating with at least 2 such sailors SELECT Srating MIN Sage sailors Instance AS minage sname rating age FROM Sailors S 22 dustin 7 450 WHERE sage gt 18 29 brutus 1 330 GROUP BY Srating 31 lubber 8 555 HAVING COUNT gt 1 32 andy 8 255 58 rusty 10 350 64 horatio 7 350 rating minage 71 zorba 10 160 Answer relation 3 255 74 horatio 9 350 7 350 85 art 3 255 8 255 95 bob 3 635 96 frodo 3 25 5 HashBased Indexes UMass Amherst March 6 2008 Slides Courtesy of R Ramakrishnan and Gehrke Introduction oz As for any index 3 alternatives for data entries kquot I Data record with key value k I ltk rid of data record with search key value kgt I ltk list of rids of data records with search key kgt Choice orthogonal to the indexing technique oz Hashbased indexes are best for equality selections Cannot support range searches 3 Static and dynamic hashing techniques exist tradeoffs for dynamic data Static Hashing hkey mod N Primary bucket pages Overflow pages oz hk mod N bucket to which data entry with key k belongs k1 k2 can lead to the same bucket 3 Static buckets N fixed I main pages allocated sequentially never deallocated I over ow pages if needed Static Hashing Contd 3 Hash fn works on search key field of record r Must distribute values over range 0 N 1 hkey mod N a key b mod N usually works well a and b are constants lots known about how to tune h 3 Buckets contain data entries 3 Long over ow chains can develop and degrade performance Extendible and Linear Hashing Dynamic techniques to fix this problem Extendible Hashing 393 Situation Bucket primary page becomes full Why not reorganize file by doubling of buckets Reading and writing all pages is expensive ld Use directory of pointers to buckets double of buckets by 1 doubling the directory 2 splitting just the bucket that over owed Directory much smaller than file so doubling it is much cheaper Only one page of data entries is split No over ow page Trick lies in how hash function is adjusted Example LOCAL DEPTH LL E E GLOBAL DEPTH D D1rect0ry IS array of s1ze 4 Bucket A global depth D 2 3 oz Each bucket has local depth L B quote 3 L s D 4 To find bucket for r 1 get 535 h 2 take last global depth bits of h If hr 5 binary 101 D39REmRY El 393 Bucket D Take last 2 bits go to bucket pointed to by 01 Bucket C DATA Entry PAGES Inserts oz If bucket is full m it allocate new page re distribute I If necessary double the directory Splitting or not can be decided by comparing global depth and local depth for the split bucket Split if global depth local depth Don t otherwise LOCAL DEPTH LL GLOBAL DEPTH D DIRECTORY DATA PAGES Insert r with hr20 Bucket A Bucket B Bucket C Bucket D Insert hr20 Causes Doubling LOCAL DEPTHW Bucket A LOCAL GLOBAL DEPTH GLOBAL DEPTH Bucket A Bucket B 1 5 2113 1 000 1 5 211 BucketB 001 10 l Bucket C 0 s 011 Bucket C 100 DIRECTORY 101 15 7 19 Bucket D 110 15 7 19 Bucket D 111 523E Bucket A2 4 12 2 splitimag DIRECTORY 4 12 Bucket of Bucket A split image39 of Bucket A 8 Points to Note 3 20 binary 10100 Last 2 bits 00 tell us r belongs in A or A2 Last Q bits needed to tell which Global depth of directory Max of bits needed to tell which bucket an entry belongs to Local depth of a bucket of bits used to determine if an entry belongs to this bucket 3 When does bucket split cause directory doubling Before insert local depth of bucket global depth Insert causes local depth to become gt global depth directory is doubled by copying it over and fixing pointer to split image page Use of least significant bits enables efficient doubling via copying of directory Directory Doubling inserting 8 gt 0001 ES 439 0100 39 1001 i quot 1011 1100 0001 001 010 0100 011 100 101 110 111 1100 1000 3237 1001 1011 5 Least Significant vs Most Significant Comments on Extendible Hashing If directory fits in memory equality search answered with one disk access else two 100MB file 100 bytes rec 4K pages 1000000 records as data entries and 25000 directory elements chances are high that directory will fit in memory Directory grows in spurts and if the distribution ofhush values is skewed directory can grow large Entries with same key value duplicates need overflow pages Delete removal of data entry from bucket I If bucket is empty can be merged with split image I If each directory element points to same bucket as its split image can halve directory Summary 3 Hashbased indexes best for equality searches cannot support range searches oz Static Hashing can lead to long over ow chains oz Extendible Hashing avoids over ow pages by splitting a full bucket when a new data entry is to be added to it But duplicates may require over ow pages Directory to keep track of buckets doubles periodically Can get large with skewed data additional 1 0 if this does not fit in main memory TreeStructured Indexes CMPSCI 645 Mar 6 2008 Slides Courtesy of R Ramakrishnan and Gehrke Review 3 As for any index 3 alternatives for data entries kquot I Data record with key value k I ltk rid of data record with search key value kgt I ltk list of rids of data records with search key kgt 00 Choice is orthogonal to the indexing technique used to locate data entries kquot 3 Treestructured indexing techniques support both range searches and equality searches B Tree Most Widely Used Index ozo lnserts deletes keep tree heightbalanced Log F N cost F fanout N leaf pages oz Minimum 50 occupancy except for root Each node contains 1 lt 111 lt 2d entries where dis called the order of the tree 3 Supports equality rangesearches updates efficiently Index Entries Direct search quotSequence setquot Example B Tree 3 Search begins at root and key comparisons direct it to a leaf 3 Search for 5 15 all data entries gt 24 Root I13ll1724J3 I I2 I3I5I7I I14I16I I I I19I20 22I I I24I27I29I I I33I34I38I39I n Based on the search for 15 we know it is not in the tree B Trees in Practice 3 Typical order 200 Typical fillfactor 67 average fanout 133 3 Typical capacities Height 4 1334 312900700 records Height 3 1333 2352637 records oz Can often hold top levels in buffer pool Level 1 1 page 8 Kbytes Level 2 133 pages 1 Mbyte Level 3 17689 pages 133 MBytes Inserting a Data Entry into a 3 Tree oz Find correct leaf L 3 Put data entry onto L If L has enough space done Else must split L into L and a new node L2 39 Redistribute entries evenly copy up middle key 39 Insert index entry pointing to L2 into parent of L 0 This can happen recursively T0 split index node redistribute entries evenly but push up middle key Contrast with leaf splits oz Splits quotgrowquot tree root split increases height Tree growth gets wider or one level taller at top Previous Example Inserting 8quotlt Root I13I1724H3 I2 I3I5I7I I14I16I I I I19I20 22I I I24I27I29I I I33I34I38I39I Inserting 8 into Example B Tree Entry to be inserted in parent node 4 Observe how I Note that 5 is gQpiegupand minimum continues to appear in the leaf occupancy is 2 3 guaranteed 1n I I I I I both leaf and index p g splits 578 3 Note difference Entry to be inserted in parent node between copy I aw 13224 up and pushup 39 be sure you 39 understand the l 5 13 II II II 3 reasons for this Example B Tree After Inserting 8 A A A A A 23 578 1416 1912 22 242729 33343839 st Notice that root was split leading to increase in height st In this example we can avoid split by redistributing entries between siblings but not usually done in practice Deleting a Data Entry from a 3 Tree oz Start at root find leaf L where entry belongs oz Remove the entry If L is at least halffull done If L has only d1 entries 0 Try to redistribute borrowing from sibling adjacent node with same parent as L 0 If redistribution fails W L and sibling oz If merge occurred must delete entry pointing to L or sibling from parent of L 3 Merge could propagate to root decreasing height 10 Current B Tree Delete 19quotlt Delete 20 A A A A A 23 578 1416 1912 22 242729 33343839 Example Tree After Inserting 8 gt5 Then Deleting 19 and 20 A 39 A r A A A PM 578 1416 IIZZTWI 2729 33343839 oz Deleting 19 is easy oz Deleting 20 is done with redistribution Notice how middle key is copied up And Then Deleting 24 3 Must merge 3 Observe toss of index entry on right A and pull down of index entry below R Ns H13 H quotH sou 23 578 1416 222729 33343839 Example ofNonleafRedistribu tion 00 Tree is shown below during deletion of 24 t In contrast to previous example can redistribute entry from left child of root to right child 1 6 A A A r A m r 23 578 1416 quot1181 2 21 221271291 331341381391 14 After Redistribu tion 3 lntuitively entries are redistributed by pushing through the splitting entry in the parent node oz It suffices to redistribute index entry with key 20 we ve redistributed 17 as well for illustration Root IIIIIIII AIEH l l l lll A 39 39 39 39 578 1416 17118 2 21 221271291 33343839 Pre x Key Compression 00 Important to increase fanout Why oz Key values in index entries only direct traffic can often compress them E g If we have adjacent index entries with search key values Damion Yogurt David Smith and Devarakonda Marthy we can abbreviate David Smith to Dav The other keys can be compressed too 39 Is this correct Not quite What if there is a data entry Davey 0aes Can only compress David Smith to Davi In general while compressing must leave each index entry greater than every key value in any subtree to its left oz Insert delete must be suitably modified Pre x key compression Compress to Dav or Davi Daniel Lee David Smith Devarakonda Balk Loading of a 3 Tree 3 If we have a large collection of records and we want to create a B tree on some field doing so by repeatedly inserting records is very slow oz Balk Loading can be done much more efficiently oz Initialization Sort all data entries insert pointer to first leaf page in a new root page Rok Sorted pages of data entries not yet in B tree 34 9l 1 1213 2 22 2331 35F6 38 44 Summary ofBalk Loading 3 Option 1 multiple inserts Slow Does not give sequential storage of leaves oz Option 2 Bulk Loading Has advantages for concurrency control Fewer I Os during build Leaves will be stored sequentially and linked of course Can control fill factor on pages A Note on Order 3 Order 1 concept replaced by physical space criterion in practice at least hal ill Index pages can typically hold many more entries than leaf pages Variable sized records and search keys mean different nodes will contain different numbers of entries Even with fixed length fields multiple records with the same search key value duplicates can lead to variablesized data entries if we use Alternative Summary 3 Treestructured indexes are ideal for range searches also good for equality searches 3 B tree is a dynamic structure lnserts deletes leave tree heightbalanced log F N cost High fanout means depth rarely more than 3 or 4 Almost always better than maintaining a sorted file Typically 67 occupancy on average If data entries are data records splits can change rids Summary Corlth 3 Key compression increases fanout reduces height oz Bulk loading can be much faster than repeated inserts for creating a B tree on a large data set 3 Most widely used index in database management systems because of its versatility One of the most optimized components of a DBMS
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'