### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Database Sys Implement CS 4420

GPA 3.81

### View Full Document

## 10

## 0

## Popular in Course

## Popular in ComputerScienence

This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 4420 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 10 views. For similar materials see /class/234142/cs-4420-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.

## Reviews for Database Sys Implement

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15

College of Compu CS 4420 Database System Implementation Final Review Ling Lin Assncizte Prnfessn Cnuege hr Cnmpmjhg Genrgjz Teeh mmw Final Coverage Chapter 1 Chapter 2 a Section 2 1 to 2 4 Chapter 3 a Section 31to 3 5 Chapter4 Chapter 5 a Section 5 1 to 5 3 5 Chapter 6 a Section 6 1 to s 4 Section 6 s to s 7 Chapter7 a Section 7 1 to 7 5 3 Section 7 7 to 7 7 5 Chapter 834041 9 skim 9 8 9sk1m IDA lEI 51I 6 ID 7 mmw DBZ6 FinalReview Open Book I You can bring only text book notes lectures but no midterm exam reviews I Calculator if you wish I Pens or pencils Dst FinalReview Data Storage Review I Memory Hierarchy o Cache onboardleve12 9 W between cache and processor lMG with lt10 nanoseconds 9 W between cache and memory 100 nanoseconds 0 Memory lOOMG 9 Random access Typical time to access data 10100 nanoseconds 0 Disk and Virtual Memory gt4GB 9 KiloMega Giga Tera Peta I Disk Organization 0 Platter 2 surfaces 0 Tracks Sectors gaps 0 Cylinder disk head disk controller 0 Seek time rotation speed 1pm D1326 FinalReview Focus on Typical Disk Platter Head Actuator Cylinder Track Sector physical Block logical Gap Top View I Cylinder E S ector Track DBZ5 FinalReview Disk Access Time I want block X Time Seek Time Rotational Delay Transfer Time Other DBZ6 FinalReview Seek Time Time to position the head assembly at the proper cylinder Beginning at some value x for a Average Random Seek distance of one cylinder 3 5 N N OI X Z Z SEEKTIME i gt j i1 j1 Tum s j i NN1 1 N Typical S 10 ms gt 40 ms DB26 FinalReview Rotational Dela The time for the disk to rotate so the first of the sectors containing the block reaches the head rotation latency A typical disk rotates completely takes lOms aquot k Block I Want DBZ6 FinalReview Transfer Time t I typical trate l gt 3 NIBsecond I transfer time block size trate D1326 FinalReview Data Storage QA I Question 1 Page 39 Exercise 221 Given that a disk has the following characteristics 0 10 surfaces with 10000 tracks each 0 Each track holds an average of 1000 sectors of 512 bytes 0 The time it takes the head to move n tracks is 1 0001n ms I What is the capacity of the disk 0 The disk has 10 10000 100000 tracks 0 The average track has 1000 512 512000 bytes 0 Thus the capacity is 512 gigabytes I What is the maximum seek time o The maximum seek time occurs when the heads have to move across all the tracks 0 Let n 10000 starting from one Thus 10001n 1 000110000 11 milliseconds DBZ6 FinalReview Data Storage QA Consider a disk with an actual formatted capacity of 8 gigabytes 233 bytes The disk has 16 surfaces and 1024 tracks The disk rotates at 7200 rpm The average seek time is 9 ms The block size is 8K3 a What is the capacity of a single track 8GBl6 1024 2A332A42A10 2Al9 bytestrack b Suppose we are reading a le F that occupies exactly one entire track Assume that a track can only be read starting at a particular position on the track How long does it take to read the entire le sequentially 215 ms lm7200 60000ms 7200 83 ms rotation 9ms 83m 832ms Dst FinalReview Indexing Techniques I Indexing 0 Conventional indexes Dense Vs Sparse o B Trees 0 B trees BTree and Sequential Indexed files I Hashing scheme 0 Static Hashing 0 Dynamic Hashing 9 Extensible Hashing 9 Linear Hashing DBZ6 FinalReview BTrees I Generalizes Multilevel Index 0 Number of levels varies with size of the data le being indexed but often 3 0 Useful for primary secondary indexes l B Tree 0 All nodes have the same format n keys n1 pointers l Btree rules for tree of order n 1 All leaves at same lowest level balanced tree 2 Pointers in leaves point to records except for sequence pointer Dst FinalReview 3 Number of pointerskeys for a Btree of order n Max Max Min Min ptrs keys ptrssdata keys r39ilgrl frlggb 1 n rn12l ln12l 1 nolhq got 1 n Ln12J Ln12J Root n1 n 1 1 Note by Hector GarciarMolina FinalRevieW Dst B Tree Exam I oot Max Max Min Min Ptrs keys Ptrsedata keys Nonleaf nonroot 4 3 2 nolhg oot 4 3 2 2 Root 4 3 1 1 15 BM FinalReview n2 W sequence pointers Index Pointer Data Pointer v l Dst FinalReview BTree Indexing A database system uses a variant of Btrees to maintain sorted tables In this variant the leaves contain the records themselves not pointers to the records A leaf can contain as many records as will fit in a block allowing for a sequence pointer Nonleaf nodes are also one block in size and contain as many keys and appropriate pointers as will fit Assume the following I 1 Blocks are 4096 bytes I Each record is 300 bytes long I 3 A block pointer is 10 bytes I 4 A record pointer is 12 bytes I 5 A key for the index is 8 bytes long I The nodes are 85 occupied For example if a leaf can hold 100 records it will only hold 85 If a nonleaf can hold 100 keys it will only hold 85 F or 85 calculations round to the nearest integer I 7 The indexed file has 1000000 records How many blocks will this index use 17 DBZ6 FinalReview BTree Index1ng Answer 1 How many records a leave node can hold 409610300l3 records If leaves are 85 occupied we can hold 11 records per node 2 How many blocks the B tree leaves can hold 90910 bIOCkS 1000000 11 9090909 3 How many keys an interior node can hold For interior nodes we need to determine the value of n from nl10n8 lt 4096 which gives us n227 DBZ6 FinalRevi ew BTree Indexing Answer 4 How about the height of the index tree Ifleave nodes are 85 occupied they actually use 22785 194 data pointers The leaflevel will require 90910194 4686 469 nodes The level above that will require 3 nodes which are all children of the root Then the root 5 What is the total The total is 909 104693 19 1 383 blocks Dst FinalRevi ew Hashing Tables I Hash function h search key 9 0 B1 I Buckets are blocks numbered 0B 1 I Main Idea 0 If a record with search key K exists then it must be in bucket hK I Advantage o Cuts search down by a factor of B 0 One disk IO of there is only one block per bucket Extract from Note by J Ullman 20 DBZ6 FinalRevi ew EXAMPLE 2 recordsbucket INSERT ha 1 hb 2 hc 1 hd 0 he 1 Ema romNotebyHGM 21 DB26 FinalRevi ew Efficiency of Hash Table Indexes I Efficient is highest if the following condition holds records lt buckets B recordsblock I Static Hash Table B never change I Problem 0 Performance of a hash table may degrade if there are too many records in one bucket I Solution Dynamic Hash Table 0 Methods to allow graceful growth of number of buckets and to maintain the above relationship 0 Extensible Hashing grows B by doubling it each time 0 Linear Hashing grows B by add one more bucket each time DBZ6 FinalReview Dynamic Hashing Framework I Common Feature 0 Hash function h hashes search key to a long k bitstring 0 Uses only some of the bits at any time to determine placement of keys in buckets I Extensible Hashing 0 Pro 9Can handle growing les with less wasted space and with no full reorganizations 0 Con 91ndirection Not bad if directory in memory 9Directory doubles in size Now it ts now it does not Dynamic Hashing Framework I Linear Hashing 0 Pro 9 Can handle growing les with less wasted space and with no full reorganizations 9 No indirection like extensible hashing 0 Con Can still have over ow chains my full D D D D D D D D H Need to move In here Very empty D D Would waste space DBZ5 FinalReview Types of Database Queries I Point Queries Partial Match Queries 0 Specify values of 71 dimensions ngtl I Range Queries 0 Specify ranges for 71 dimensions ngtl I Nearest Neighbor Queries 0 Closest point to a given point in a 7 dimensional space I Location Where am 1 queries 0 given a point nd where eg in which shape it is located DBZ6 FinalReview Query ProcessingOptimization SQL query parse tree T answer i logical query plan I statistics T Pi Improved lqp stimate result size lqp sizes P1c1P2c2t consider physical plans T NolebyHector GarciarMolina 26 Dst FinalReview Important Contents I Query Processing Steps I Logical Plan generation 0 Algebraic Transforms Rules 0 Good Transformation I TwoPhase Query Optimization I Cost Estimation 0 Result size 0 10s Dst FinalReview Estimating cost of query plan 1 Estimating size of results Selection Projection Join 2 Estimating of 105 I Selection Projection Join Sort I Count of disk blocks that must be read or written to execute query plan I Factors considered 0 Relations Contiguous or not 0 Index or not 0 Hash or not I Factors affecting the Ios 0 Join ordering 0 Join Algorithm used 9 iterative sortmerge index hash DBZ6 FinalReview Estimating IO costs lKeep statistics for relation R oTR tuples in R oSR of bytes in each R tuple oBR of blocks to hold all R tuples oVR A distinct values in R for attribute A o BR of blocks containing R tuples o KR max of tuples of R per block 0 M memory blocks available o HTi levels in indexl o Bi of leaf blocks in index i Dst FinalReview Example Question I c Consider the relations RAB with 1000 rows 1000 distinct values for column A and 50 distinct values for column B and the relation SBC with 5000 rows 5000 distinct values for column B and 100 distinct values for column C The estimated size of the join ofR and S denoted by R gt4 S is 5000 0 False 0 Given TR 1000 TS 5000 VRB 50 VSB5000 the size of the natural join of R and S is estimated by 0 TRTS MaxVRB VSB 1000 DBZ6 FinalReview Failure Recovery DB Operations and Transactions Atomic transaction Transaction Failure due to system crashes or media failure Recovery Techniques 0 Logging 0 Logging schemes 9 Undo Log WAL 9Redo Log 9Undo Redo Log 0 Checkpointing Dst FinalReview Example Question Suppose A and B are database elements and their initial values are both 0 After transaction T executed their values are both changed to l The content for log are given below ltSTART TgtltT A0gtltTgtltCOMMIT Tgt What kind of logging was used and what are the values for the two question marks I a Undo and B0 I b Undo and B l I c Redo and B l I d Redo and B0 I e Redo and A 1 Solution A DBZ6 FinalReview Concurrency Control Correctness informally I If we stop running transactions DB left consistent I Each transaction sees a consistent DB I Recovery Problems due to failures only I CC Problems due to data sharing only I CC Recovery Problems due to failures and sharing DB26 FinalReview Concepts Transaction sequence of riX WiX actions Con icting actions r1A W2A W1A W2Alt r1 A ltW2A Schedule represents chronological order in which actions are executed Serial schedule no interleaving of actions or transactions DBZ6 FinalReview De nition S1 S2 are con ict equivalent schedules if Si can be transformed into S2 by a series of swaps on noncon icting actions De nition A schedule is con ict serializable if it is con ict equivalent to some serial schedule Lemma S1 S2 con ict equivalent gt PSiPS2 Theorem PSi acyclic Cgt Si con ict serializable Dst FinalReview Precedence graph PS S is schedule Nodes transactions in S Arcs Ti gt Tj whenever piA qjA are actions in S PiA lts MA at least one of pi qj is a write Exercise 3 T1 ReadA A A100 WriteA ReadB B B100 WriteB Constraint AB FinalReview T2 ReadA A Ax2 WriteA ReadB B Bx2 WriteB DB26 Questionga is this schedule con ict serializable Which serial schedule it is equivalent to T1 T2 FinalReview Answer Yes ReadA A A100 WriteA ReadB B B100 WriteB It is equivalent to serial Schedule WriteA proceeding T2 ReadBB Bgtlt2 WriteB DBZ6 FinalReview Question b is this schedule con ict serializable Which serial schedule it is equivalent to Answer Not seria zable and it is NOT equivalent to T1 I T2 any serial schedule ReadA A lt AlOO WriteA ReadAA lt AXZ WriteA ReadBB lt BXZ WriteGB ReadGB B lt B100 WriteGB Dst FinalReview Locking Schemes I Binary I Shared Exclusive I Extending the SX locking scheme with new lock I Examples 0 Increment Lock 0 Update Lock 0 Intentional Lock IS IX SIX DBZ6 FinalReview Twophase Locking 2PL Protocol To guarantee serializability In a transaction all lock operations SiLock or XiLock precede the first unlock operation No locks can be acquired after the first lock is released We call a transaction satisfying twophase locking protocol if it obeys the above rules Twophase execution Growing phase lock acquisition only no unlock Shrinking phase lock release no more lock Lock point divides the two phases Lock point T Obtain look Release lock N0 of locks Phase 1 Phase 2 BEGIN END Dst FinalReview Deadlock Avoidance Conservative 2 PL Conservative 2 PL static requires a transaction T to 39 predeclare all the read set of items and write set of items and lock all the items it accesses before T begins execution 39 If any of the predeclared items can not be locked T does not lock any item at all Instead T waits and tries again until all the items are available for locking 39 Conservative 2 FL is deadlock free No of loci s f Obtain lock Release lock Phase 1 Phase 2 Transaction BEGIN period of END duration data item use 42 DBZ6 FinalReview Strict 2 PL In addition to the Basic 2 PL and Conservative 2 PL there is another version of 2 PL Strict 2 PL A transaction does not release any of its locks until after it w by committing Strict 2 PL is not deadlockfree unless combined with conservative 2 PL But it helps to overcome the update lost problem If all transactions in a schedule S follow the strict 2PL protocol S is called a strict schedule No of loc s T 1 Obtain look i Release lock BEGIN END Transaction period of duration data item use Dst FinalReview Con ictSerializable Schedule Three Rules 1 2 3 Rule 1 Wellformed transactions Ti iA piA uiA Rule 2 Legal scheduler S iA uiA 4 D no jA Rule 3 Two phase locking 2PL for transactions Ti iA uiA no unlocks no locks Dst Example Question 4 What are the problems of the following two schedules a 1 and 2 have lost updates problems b 81 and 2 both have dirty read problems c 81 and 2 have dirty read and lost update problem respectively d 81 and 2 have dirty read and fuzzy unrepeatable read respectively Solution D T1 T2 WA RA Rollback FinalReview Another 5 The following schedule observes a the basic two phase locking protocol basic 2PL only b the conservative ZPL c the strict 2PL 1 none of above Solution C FinalReview DBZ6 FinalRevi ew I S is recoverable if each transaction commits only after all transactions from which it read have committed I S avoids cascading rollback if each transaction may read only those values written by committed transactions I S is strict if each transaction may read and write only items previously written by committed transactions RC ST ACR 47 DB26 FinalReview Examples I Recoverable W1A W103 W2A r203 c1 02 0 T2 reads items that Tl writes Avoids Cascading Rollback W1A W103 W2A c1 r203 02 0 T2 can only read items modi ed by T1 after Tl commits Strict W1A W103 01 W2A T203 02 0 T2 can only read and write items used by T1 after Tl commits Assume that w2A is done without reading College of Comput g CS 4420 Database System Implementation Failure Recovery Ling Liu A sncizte Prnfess S An Cnllege nf Cnmpmjng Genrgjz Tech Today s Outline I Review oQuery Optimization Architecture lToday s Lecture oFailure Recovery DBIZ Query Optimization I Query Processing Algorithms o Iterative Approach nested loop 0 Sortbased o Hashbased o Indexbased I Query Optimization Architecture 0 Pipelining o Materialization DBIZ Pipelining I Many operations or certain implementations of them allow us to use pipeline 0 To accept one or both arguments in a stream without seeing the entire relation before starting I Materialization 0 When pipelining is not possible then the arguments must be materialized sorted on disk if it is large before beginning the processing DBIZ Pipelining VS Materalization Plan 1 pipelining Plan 2 materlization RENAME H N EENO WENO V 6W RESP quotManager E M W DBIZ Pipelining Summary Project Selection Duplication Removal grouping allow pipelining Intersection does not allow pipelining Nested loop join allows the outer argument to be pipelined but not the inner Sort or hashjoin allows pipelining of either argument but there are problems if both are pipelined o Sortjoin requires sharing of memory for sorting sublists o Hashj oin requires buffers for buckets of both relations in memory Index join allows pipelining of the nonindexed argument DBIZ Nested Loop I Iteration join conceptually for each r 6 R1 do for each s 6 R2 do if rC sC then output rs pair DBIZ Pipelining Summary Project Selection Duplication Removal grouping allow pipelining Intersection does not allow pipelining Nested loop join allows the outer argument to be pipelined but not the inner Sort or hashjoin allows pipelining of either argument but there are problems if both are pipelined o Sortjoin requires sharing of memory for sorting sublists o Hashj oin requires buffers for buckets of both relations in memory Index join allows pipelining of the nonindexed argument DBIZ Query Execution Cost Summary Access cost to secondary storage Search for data block index ReadWrite index and data blocks Storage cost IndeX and data blocks Intermediate les Computation cost Query planning Record search sort merge Actual transactionquery operations Communications cost Data transfer across a network DBIZ Access Plan I Access Plan is a concrete query processing plan which presents a detailed strategy for processing a query I The main cost factors to be considered include o the relational operations to be performed 0 indices to be used 0 the order in which tuples are to be accessed and o the order in which operations are to be performed DBIZ Statlstlcs I The following are kept in the system catalog for optimization purposes 0 File parameters block size 0 Number of tuples in each relation 0 Size oftuples 0 Key fields indices 0 Number of levels in an index 0 Highest key lowest key 0 Number of distinct values may be 0 Others frequency of operations join keys etc I All DBMSs keep the first four many keep all DBIZ Query Optimization Issues I Search Strategies 0 exhaustive search 9 optimal 9 combinatorial complexity in the number of relations 9 costbased o heuristics 9 not optimal 9 group common subexpressions 9 perform selection projection rst 9 replace a join by a series of semijoins 9 reorder operations to reduce intermediate a on Size 9 optimize individual operations DBIZ Query Optimization Issues I Optimization granularity 0 single query at a time 9 can not use common intermediate results 0 multiple queries at a time 9 efficient if many similar queries 9 decision space is much larger DBIZ Query Optimization Issues I Optimization timing 0 static 9 compilation 2 optimize prior to the execution 9 difficult to estimate the size of the intermediate results 2 error propagation 9 can amortize over many executions 0 dynamic 9 run time optimization 9 exact information on the intermediate relation sizes 9 have to reoptimize for multiple executions 0 hybrid 9 compile using a static algorithm 9 if the error in estimate sizes gt threshold reoptimize at run time mm CS 4420 Database System Implementation Midterm 1 Review Ling Lin Assncizte Prnfessnr Cnuege hr chmpmjhg Genrgia Tech mm Midterm 1 Coverage Chapter1 o Chapter 2 eSeetror 21to Seehor 2 4 o Chapter 3 eSeetror 315eehph 3 2 Seehor 3 4 Section 3 5 Chapter4 o Chapter 5 eSeetror 51to Seehor 5 3 5 o Chapter 6 eSeetror s1tp Seehor s 4 Section 6 s Seehor s 7 o chapter7 rSeetrorr 71to Seeuon 7 5 3 Section 7 7 to Section 7 7 5 D1311 Introductj on Data Storage Review I Memory Hierarchy o Cache onboardHevelZ 9 W between cache and processor lMG with lt10 nanoseconds 9 W between cache and memory 100 nanoseconds 0 Memory lOOMG 9 Random access Typical time to access data 10100 nanoseconds 0 Disk and Virtual Memory gt4GB 9 KiloMega Giga Tera Peta I Disk Organization Platter 2 surfaces Tracks Sectors gaps Cylinder disk head disk controller 0 O O 0 Seek time rotation speed rpm DB1 1 Lutroductj an Focus on Typical Diskquot Head Platter Head Actuator iACylmder Cylinder Track 39 Sector physical Block logical Gap Controller Introducti on Top View Lutroduc on Typical Numbers Diameter 1 inch gt 15 inches Cylinders 100 gt 2000 Surfaces 1 CDs gt Frackscyl 2 floppies gt 30 Sector Size 5123 gt 50K Capacity 360 KB old floppy gt 30 GB D1311 Introductj on Disk Access Time 39 block x in memory I want block X Time Seek Time Rotational Delay Transfer Time Other DB1 1 Lutroductj on Seek Time Time to position the head assembly at the proper cylinder Beginning at some value x for a Average Random Seek distance of one cylinder N N MEX Z Z SEEKTIME i gtj Tim K S 3921 1139 1 N NNl Typical S 10 ms gt 40 ms D1311 Introductj on Rotatlonal Del a1 The time for the disk to rotate so the first of the sectors containing the block reaches the head rotation latency A typical disk rotates completely takes lOms aquot f gt Head HereV Block I Want DB1 1 Lutroductj on Transfer Rate t I typical t l gt 3 NIBsecond I transfer time block size t D1311 Introductj on Data Storage QA I Question 1 Page 39 Exercise 221 The Megatron 777 disk has the following characteristics 0 10 surfaces with 10000 tracks each 0 Each track holds an average of 1000 sectors of 512 bytes 0 The time it takes the head to move 71 tracks is 1 0001n ms I What is the capacity of the disk 0 The disk has 10 10000 100000 tracks 0 The average track has 1000 512 512000 bytes 0 Thus the capacity is 512 gigabytes I What is the maximum seek time o The maximum seek time occurs when the heads have to move across all the tracks 0 Let n 9999 or 10000 starting from one Thus 10001n 1 000110000 11 milliseconds DB1 1 Lutroductj on Data Storage QA Consider a disk with an actual formatted capacity of 8 gigabytes 233 bytes The disk has 16 surfaces and 1024 tracks The disk rotates at 7200 rpm The average seek time is 9 ms The block size is 8K3 For each of the following questions state if there is enough information to answer If there is give the answer if not explain what is missing a What is the capacity of a single track 8GB161024 2 332 42 10 2019 bytestrack b Suppose we are reading a file F that occupies exactly one entire track Assume that a track can only be read starting at a particular position on the track How long does it take to read the entire le sequentially 215 ms 605 7200 83 ms rotation 9ms 832 c How long does it take to read a block Not enough information We need to know the percentage of overhead per track D1311 Introductj on BTrees I Generalizes Multilevel Index 0 Number of levels varies with size of the data le being indexed but often 3 0 Useful for primary secondary indexes l B Tree 0 All nodes have the same format n keys n1 pointers l Btree rules for tree of order n 1 All leaves at same lowest level balanced tree 2 Pointers in leaves point to records except for sequence pointer DB1 1 Lutroductj on 3 Number of pointerskeys for Btree Max Max Min Min ptrs keys ptrssdata keys Wiggly 1 n Fn121 Fn1211 nolhq got 1 n Ln12J Ln12J Root n1 n 1 1 Note by Hector GarciarMolina D1311 Introduction Full node min node f H Nonleaf O O O N In no 8 H H H E C Leaf g m Ln 2 1 E 39 E E Note by Hector GarciarMolina 15 BEN Introduction BTree Indexing I Find anyall Violations of a B tree structure inthe following diagram Circle each bad node and give a brief explanation of each error Assume the order of the tree is 4 014 4 keys 5 pointers 77751 3637 4043 4749 5255 16 DBll Introducti on BTree Indexing A database system uses a variant of B trees to maintain sorted tables In this variant the leaves contain the records themselves not pointers to the records A leaf can contain as many records as will fit in a block allowing for a sequence pointer Nonleaf nodes are also one block in size and contain as many keys and appropriate pointers as will fit Assume the following I 1 Blocks are 4096 bytes I Each record is 300 bytes long I 3 A block pointer is 10 bytes I 4 A record pointer is 12 bytes I 5 A key for the index is 8 bytes long I The nodes are 85 occupied For example if a leaf can hold 100 records it will only hold 85 If a nonleaf can hold 100 keys it will only hold 85 For 85 calculations round to the nearest integer I 7 The indexed file has 1000000 records How many blocks will this index use 17 DB1 l Lutroductj 0n BTree lndex1ng Answer 1 How many records a leave node can hold 409610300l3 records If leaves are 85 occupied we can hold 11 records per node 2 How many records the B tree leaves can hold 90910 bIOCkS 14mm m x 11 94mm 3 How many keys an interior node can hold For interior nodes we need to determine the value of n from nl10n8 lt 4096 which gives us n227 D1311 Introductj on BTree Indexing Answer 4 How about the height of the index tree The leaf level will require 469 nodes The level above that will require 3 nodes which are all children of the root Then the root 5 What is the total The total is 909 104693 19 1 383 blocks DB1 1 Lutroductj on Hashing QuestionAnswer Consider hash indexes with the following characteristics 1 A block can hold 50 valuepointer pairs The hash structure does not contain the records themselves only pointers to them 2 The file being indexed has 1000 records 3 The indexed field can contain duplicates Questions a For a linear hash index with an average occupancy utilization of 50 how many blocks will the buckets require in the worstcase b In case a what is the worstcase number of IOs to look up a value including the record itself c For an extensible hash index what is the worstcase largest possible size of the directory If there is no limit state so d For an extensible hash index what is the bestcase smallest possible size of the directory Hashing QuestionAnswer a For a linear hash index with an average occupancy utilization of 50 how many blocks will the buckets require in the worstcase Worst case is when all records have same key value They will placed in one block plus overflow blocks a total of 100050 20 blocks To get 50 utilization the hash table must have 40 blocks 39 empty and one full with 19 oven low chained b In case a what is the worstcase number of IOs to look up a value including the record itself 21 lOs 1 to get to bucket plus 19 to get to last block in the overflow chair plus 1 to get to record Hashing QuestionAnswer c For an extensible hash index What is the worst case largest possible size of the directory Is there a limit There is no limit worst case is again when all records have same key Common Mistakes o trying to calculate a limit based on buckets not realizing that there is no bucket limit gt thus no directory limit D1311 Introduclj on Hashing QuestionAnswer d For an extensible hash index what is the bestcase smallest possible size of the directory In the best case directory has 32 entries 20 blocks are needed to hold data so 16 entries are not enough to point to these 20 blocks Common Mistakes 0 not realizing that the directory must be a power of 2 I giving the size as 5 instead of 25 32 I trying to calculate size with blocks instead of the number of entries Que ProcessingOptimization SQL query a parse tree i l answer 7 l logical query plan 1 I Pi statistics 39 PEI best P1c1P2c2T est i mate costs improved lq r stimaiteiir39eisn ti itz lqp sizes I consider physical plan s 7 P1P2 NotebyHector GarciarMolina D1311 Introductj on Relational Query Optimization I High level Transformationbased Optimization Logical Query Plans 0 Algebraic Transformation or Query rewriting 0 Goal Reduce the search space of logical query plans 0 Mechanisms Heuristics for good transformations 9 Push Selection and Projection closer to the base relations in a SQL query ee 9 Reduce intermediate result size as much as possible Low Level costbased Optimization Executable Query Plans 0 Mechanism 9 Cost estimation using intermediate result size estimation and 10 cost estimation 9 Consider various indexes and their performance characteristics when estimating costs of alternative query p ans 0 Goal Pick the most economical physical query plan for query executron Estimating cost of query plan 1 Estimating size of results Selection Projection loin 2 Estimating of 105 I Count of disk blocks that must be read or written to execute query plan I Factors considered 0 Relations Contiguous or not 0 Index or not 0 Hash or not I Factors affecting the Ios 0 Join ordering 0 Join Algorithm used DBll Estimating IO costs lKeep statistics for relation R oTR tuples in R oSR of bytes in each R tuple oBR of blocks to hold all Rtuples 0VR A distinct values in R for attribute A o BCR of blocks containing R tuples o fR max of tuples ofR per block 0 M memory blocks available o HTi levels in index I o Bi of leaf blocks in index i Introducti on DBll Important Contents Query Processing Steps Logical Plan generation 0 Algebraic Transforms 0 Good Transformation TwoPhase Query Optimization Cost Estimation 0 Result size 0 10s Lutroductj on DB1 1 Introductj on Algebraic Transformations QA I Is this algebraic transformation correct 05E1E2 05E1 05E2 Answer 0 No It is false Pushing the select may remove additional tuples DB1 1 Lutroductj on Algebraic Transformations QA I Assume that we have two relations R and S Let p be a predicate containing only R attributes Let q be a predicate containing only S attributes Let m be a predicate containing only attributes from both R and S Demonstrate how to derive the following relational algebra equivalence rules from simpler rules You may use any rule mentioned in the book or in lecture but state explicitly which rule you are using 1 6R M 8 cmuch gtlt lt5qu 2 opvqR M S ch M SURIgtlt1 cqsn DBll Introducti on Query Algebra QA I Write relational algebra expressions for the following SQL queries a Find books ordered by customer 53 SELECT Title Author FROM Oust Order Book WHERE CustID 53 b Find other books purchased by the same customers that purchased book 1085 SELECT BTitle BAuthor BBookID FROM Book B Order Q Order R WHERE QBookID1085 AND QCustIDRCustrD AND RBookIDBBookID AND BBookID ltgt 1085 DBll Lutroducti on Logical Query Plan Generation I Given a query Find books by Oprah Winfrey bought by customers in California for under 20 nTitleBoole GState CA Author 0prah WinfrenyPricelt20 Cust gtlt1 Order gtlt Book For this relational algebra expressions 1 Show the logical query plan for the given expression drawn as a tree of relational operators with relations at the leaves 2 Transform the logical query plan you have drawn by pushing selections and projections down as far as you can 3 Write the transformed query plan with the selections and projections pushed down as a relational algebra expression DBll Introducti on Cost Estimation Estimate intermediate result size for typical operations 0 Selection 0 Projection 0 Join Estimate 10s for typical operations and implementation algorithms 0 Selection Projection 0 Join 9 iterative sortmerge index hash 0 Sort DBll Lutroducti an Example Question Query processors tend to use heuristic query optimization That is the optimal query plan is not found but heuristics are used to generate a plan that is not too bad In 100 words or less sketch an algorithm for nding the optimal physical query plan for a given logical query plan Are DBMS vendors likely to adopt your algorithm Why or why not

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.