Introduction to Database Systems
Introduction to Database Systems CS 4320
Popular in Course
Popular in ComputerScienence
This 113 page Class Notes was uploaded by Lacey Collier on Saturday September 26, 2015. The Class Notes belongs to CS 4320 at Cornell University taught by Staff in Fall. Since its upload, it has received 13 views. For similar materials see /class/214334/cs-4320-cornell-university in ComputerScienence at Cornell University.
Reviews for Introduction to Database Systems
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/26/15
SQL Queries Constraints Triggers RampG Chapter 5 CS4320 Example Instances ozo We will use these instances of the Sailors and Reserves relations in our examples ozo If the key for the Reserves relation contained only the attributes Sid and bid how would the semantics differ CS4320 SI 82 R1 m Lay 22 101 101096 58 103 111296 sname rating age 22 dustin 7 45 0 31 lubber 8 55 5 58 rusty 10 35 0 sname rating age 28 yuppy 9 35 0 3 1 lubber 8 55 5 44 guppy 5 35 0 5 8 rusty 10 3 5 0 Basic SQL Query SELECT DISTINCT targetlist FROM relationlist WHERE quali cation oz relation list A list of relation names possibly with a rangevariable after each name oz targetlist A list of attributes of relations in relationlist oz Quali cation Comparisons Attr 0 const or Attr1 0p Attr2 where 0 is one of lt gt S 2 75 combined using AND OR and NOT oz DISTINCT is an optional keyword indicating that the answer should not contain duplicates Default is that duplicates are n0t eliminated C84320 3 Conceptual Evaluation Strategy oz Semantics of an SQL query defined in terms of the following conceptual evaluation strategy Compute the crossproduct of relationlist Discard resulting tuples if they fail quali cations Delete attributes that are not in targetlist If DISTINCT is specified eliminate duplicate rows oz This strategy is probably the least efficient way to compute a query An optimizer will find more efficient strategies to compute the same answers CS4320 Example of Conceptual Evaluation SELECT Ssname FROM Sailors S Reserves R WHERE SsidRsid AND Rbid103 Sid sname rating age Sid bid day 22 dustin 7 450 22 101 10 10 96 22 dustin 7 450 58 103 111296 31 lubber 8 555 22 101 10 10 96 31 lubber 8 555 58 103 11 12 96 58 rusty 10 350 22 101 10 10 96 58 rusty 10 350 58 103 11 12 96 CS4320 A Note on Range Variables ozo Really needed only if the same relation appears twice in the FROM clause The previous query can also be written as SELECT Ssr1arne FROM Sailors S Reserves R 13 30051 Style WHERE SsidRsid AND bid103 0WD 0 use range variables OR SELECT snarne always FROM Sailors Reserves WHERE SailorssidReservessid AND bid103 CS4320 Find sailors who ve reserved at least one boat SELECT Ssid FROM Sailors S Reserves R WHERE SsidRsid oz Would adding DISTINCT to this query make a difference ozo What is the effect of replacing Ssid by Ssname in the SELECT clause Would adding DISTINCT to this variant of the query make a difference CS4320 Expressions and Strings SELECT Sage age1Sage5 2Sage AS age2 FROM Sailors S WHERE Ssname LIKE B B oz Illustrates use of arithmetic expressions and string pattern matching Find triples of ages of sailors and two elds de ned by expressions for sailors whose names begin and end with B and contain at least three characters oz AS and are two ways to name fields in result oz LIKE is used for string matching stands for any one character and stands for 0 or more arbitrary characters CS4320 Find sid s of sailors who ve reserved a red g a green boat UNION Can be used to compute the union of any two unioncompatible sets of tuples which are themselves the result of SQL queries If we replace OR by AND in the first version what do we get Also available EXCEPT What do we get if we replace UNION by EXCEPT CS4320 SELECT Ssid FROM Sailors S Boats B Reserves R WHERE SsidRsid AND RbidBbid AND Bcolor red OR Bcolor green SELECT Ssid FROM Sailors S Boats B Reserves R WHERE SsidRsid AND RbidBbid AND Bcolor red UNION SELECT Ssid FROM Sailors S Boats B Reserves R WHERE SsidRsid AND RbidBbid AND Bcolor green Find sid s of sailors who ve reserved a red and a green boat SELECT SSid FROM Sailors 8 Boats B1 Reserves R1 oz INTERSECT Can be used to Boats B2 Reserves R2 Compute the intersection WHERE SSidR1Sid AND R1bidB1bid AND SsidR2sid AND R2bidB2bid of an two union y AND B1c010r red AND B2c010r green compatible sets 0ftup1es ozo Included in the SQL 92 SELECT Ssid Key field standard but some FROM Sailors 8 Boats B Reserves R Systems don39t support it WHERE SsidRsid AND RbidBbid AND Bc010r red oz Contrast symmetry of the INTERSECT UNION and INTERSECT SELECT Ssid queries with how much FROM Sailors 8 Boats B Reserves R the other versions WHERE SSidRSid AND RbidZBbid AND Bc010r green C84320 10 Nested Queries Find names of sailors who ve reserved boat 103 SELECT Ssnarne FROM Sailors S WHERE Ssid IN SELECT Rsid FROM Reserves R WHERE Rbid103 ozo A very powerful feature of SQL a WHERE clause can itself contain an SQL query Actually so can FROM and HAVING clauses ozo To find sailors who ve not reserved 103 use NOT IN oz To understand semantics of nested queries think of a nested looys evaluation For each Sailors tuple check the quali cation by computing the subauery C84320 11 Nested Queries with Correlation Find names ofsczilors who ve reserved boat 103 SELECT Ssnarne FROM Sailors S WHERE EXISTS W FROM Reserves R WHERE Rbid103 AND SsidRsid oz EXISTS is another set comparison operator like IN ozo If UNIQUE is used and is replaced by Rbid finds sailors with at most one reservation for boat 103 UNIQUE checks for duplicate tuples denotes all attributes Why do we have to replace by Rbid ozo Illustrates why in general subquery must be re computed for each Sailors tuple C84320 12 More on SetComparison Operators oz We ve already seen 1N EXISTS and UNIQUE Can also use NOT IN NOT EXISTS and NOT UNIQUE ozo Also available 0 ANY 0p ALL gtltZS ozo Find sailors whose rating is greater than that of some sailor called Horatio SELECT FROM Sailors S WHERE Sratir1g gt ANY SELECT S2ratir1g FROM Sailors SZ WHERE 52sr1ame Horatio C84320 13 Rewriting INT ERSECT Queries Using IN Find sid s of sailors who ve reserved both a red and a green boat SELECT Ssid FROM Sailors S Boats B Reserves R WHERE SsidRsid AND RbidBbid AND Bcolor red AND Ssid IN SELECT S2sid FROM Sailors S2 Boats B2 Reserves R2 WHERE S2sidR2sid AND R2bidB2bid AND B2color green ozo Similarly EXCEPT queries rewritten using NOT IN oz To find names not sid s of Sailors who ve reserved both red and green boats just replace Ssid by Ssnmne in SELECT clause What about INTERSECT query CS4320 14 1 SELECT Ssname FROM Sailors S DZUZSZOTL Zn WHERE NOT EXISTS SELECT Bde FROM Boats B Find sailors who ve reserved all boats EXCEPT SELECT Rb1d oz Let s do 1t the hard FROM Reserves R way without EXCEPT WHERE R3951d53951d 2 SELECT Ssr1arne FROM Sailors S WHERE NOT EXISTS SELECT Bbid FROM Boats B Sailors 5 such that WHERE NOT EXISTS SELECT Rbid FROM Reserves R there is no boat B without WHERE RbidBbld AND RsidSsid a Reserves tuple showing 8 reserved B C84320 15 Aggregate Operators COUNT COUNT DISTINCT A SUM DISTINCT A f AVG DISTINCT A Significant extensmn o M AX A relational algebra MIN A SELECT COUNT K single column FROM sailors 5 SELECT Ssr1arne FROM Sailors S SELECT AVG sage WHERE Sratir1g SELECT MAX52rating FROM Sailors S FROM S 1 52 WHERE Sratir1g10 al ors SELECT COUNT DISTINCT Sratir1g SELECT AVG DISTINCT Sage FROM Sa OrS 5 FROM Sailors S WHERE Ssr1arne Bob WHERE srating10 CS4320 16 Find name and age of the oldest sailors ozo The first query is illegal We ll look into the reason a bit later when we discuss GROUP BY ozo The third query is equivalent to the second query and is allowed in the SQL 92 standard but is not supported in some systems CS4320 SELECT Ssnarne MAX Sage FROM Sailors S SELECT Ssnarne Sage FROM Sailors S WHERE Sage SELECT MAX S2age FROM Sailors S2 SELECT Ssnarne Sage FROM Sailors S WHERE SELECT MAX S2age FROM Sailors S2 Sage 17 Motivation for Grouping oz So far we ve applied aggregate operators to all qualifying tuples Sometimes we want to apply them to each of several groups of tuples oz Consider Find the age of the youngest sailor for each rating level In general we don t know how many rating levels exist and what the rating values for these levels are Suppose we know that rating values go from 1 to 10 we can write 10 queries that look like this SELECT MIN Sage PDT 1 1 2 I 101 FROM Sailors S WHERE Srating i CS4320 18 Queries With GROUP BY and HAVING SELECT DISTINCT targetlist FROM relationlist WHERE quali cation GROUP BY groupinglist HAVING groupqualification oz The targetlist contains i attribute names ii terms with aggregate operations e g MIN Snge The attribute list i must be a subset of groupinglist Intuitively each answer tuple corresponds to a group and these attributes must have a single value per group A group is a set of tuples that have the same value for all attributes in groupinglist C84320 19 Conceptual Evaluation oz The crossproduct of relationlist is computed tuples that fail quali cation are discarded annecessary fields are deleted and the remaining tuples are partitioned into groups by the value of attributes in groupinglist oz The groupquali cation is then applied to eliminate some groups Expressions in groupquali cation must have a single value per group In effect an attribute in groupquali cation that is not an argument of an aggregate op also appears in groupinglist SQL does not exploit primary key semantics here oz One answer tuple is generated per qualifying group C84320 20 Find age of the youngest sailor with age 2 18 for each rating with at least 2 such sailors SELECT Srating MIN Sage AS minage FROM Sailors S WHERE Sage gt 18 GROUP BY Srating HAVING COUNT gt 1 Sailors instance Answer relation CS4320 sname rating age 22 dustin 7 450 29 brutus 1 330 31 lubber 8 555 32 andy 8 255 58 rusty 10 350 64 horatio 7 350 71 zorba 10 160 74 horatio 9 350 85 art 3 255 95 bob 3 635 96 frodo 3 255 21 Find age of the youngest sailor with age 2 18 for each rating with at least 2 such sailors rating age 7 1 8 8 10 7 10 WWWO 450 330 555 255 350 350 160 350 255 635 255 rating age 1 330 255 635 255 450 350 555 255 OOOOUJUJUJ 350 350 CS4320 q rating minage 3 255 7 350 8 255 22 CS4320 oz Studentsid age univname deptname ozo Avg age of students for each dept oz Select univname deptname avgage ozo From student ozo Group by univname deptname 23 Find age of the youngest sailor with age 218 for each rating with at least 2 such sailors and with every sailor under 60 HAVING COUNT gt 1 AND EVERY Sage lt60 rating age 7 450 1 330 8 555 8 255 10 350 7 350 10 160 9 350 3 255 3 635 3 255 CS4320 q rating age 1 330 255 635 255 450 350 555 255 OOOOIUJUJUJ 350 t x O 350 rating minage 7 350 8 255 What is the result of changing EVERY to ANY 24 Find age of the youngest sailor with age 2 18 for each rating with at least 2 sailors between 18 and 60 SELECT Srating MIN Sage sailors msmnce AS minage sname rating age FROM Sailors S 22 dustin 7 450 WHERE Sage gt 18 AND Sage lt 60 29 bmtus 1 330 GROUP BY Slatng 31 lubber 8 555 HAVING COUNT gt 1 32 andy 8 255 58 rusty 10 350 64 horatio 7 350 71 zorba 10 160 Answer relation 74 horatio 9 350 85 art 3 255 95 bob 3 635 96 frodo 3 255 CS4320 For each red boat nd the number of reservations for this boat SELECT Bbid COUNT AS scount FROM Sailors S Boats B Reserves R WHERE SsidRsid AND RbidBbid AND Bcolor red GROUP BY Bbid ozo Grouping over a join of three relations oz What do we get if we remove Beolor red from the WHERE clause and add a HAVING clause with this condition oz What if we drop Sailors and the condition involving Ssid C84320 26 Find age of the youngest sailor with age gt 18 for each rating with at least 2 sailors ofany age SELECT Srating MIN Sage FROM Sailors S WHERE Sage gt 18 GROUP BY Srating HAVING 1 lt SELECT COUNT FROM Sailors S2 WHERE SratingS2rating ozo Shows HAVING clause can also contain a subquery ozo Compare this with the query where we considered only ratings with 2 sailors over 18 ozo What if HAVING clause is replaced by HAVING COUNT gt1 C84320 27 Find those ratings for which the average age is the minimum over all ratings oz Aggregate operations cannot be nested WRONG SELECT Srating FROM Sailors S WHERE Sage SELECT MIN AVG S2age FROM Sailors S2 oz Correct solution in SQL 92 SELECT Temprating Tempavgage FROM SELECT Srating AVG Sage AS avgage FROM Sailors S GROUP BY Srating AS Temp WHERE Tempavgage SELECT MIN Tempavgage FROM Temp C84320 28 Null Values ozo Field values in a tuple are sometimes unknown e g a rating has not been assigned or inapplicable e g no spouse s name SQL provides a special value MU for such situations oz The presence of null complicates many issues E g Special operators needed to check if value is is not null ls rutinggt8 true or false when rating is equal to null What about AND OR and NOT connectives We need a 3valued logic true false and unknown Meaning of constructs must be defined carefully e g WHERE clause eliminates rows that don t evaluate to true New operators in particular outer joins possible needed C84320 29 Integrity Constraints Review ozo An 1C describes conditions that every legal instance of a relational database must satisfy Inserts deletes updates that violate IC s are disallowed Can be used to ensure application semantics e g sid is a key or prevent inconsistencies e g snnme has to be a string age must be lt 200 oz lgges of I C s Domain constraints primary key constraints foreign key constraints general constraints Domain constraints Field values must be of right type Always enforced C84320 30 General Constraints oz Useful when more general ICs than keys are involved oz Can use queries to express constraint oz Constraints can be named CS4320 CREATE TABLE Sailors sid INTEGER snarne CHAR10 rating INTEGER age REAL PRIMARY KEY sid CHECK rating gt 1 AND rating lt 10 CREATE TABLE Reserves snarne CHAR10 bid INTEGER day DATE PRIMARY KEY bidday CONSTRAINT noInterlakeRes CHECK Interlake ltgt SELECT Bbnarne FROM Boats B WHERE Bbidbid 31 Constraints Over Multiple Relations CREATE TABLE Sailors Sid INTEGER Number ofbouts sname CHAR10 plus number of AWkward and rating INTEGER sailors is lt 100 wrong age REAL If Sallortshls PRIMARY KEY sid empty39 e CHECK number of Boats tuples can be anything a ASSERTION is the right solution CREATE ASSERTION smallClub not associated CHECK with either table SELECT COUNT Ssid FROM Sailors S SELECT COUNT Bbid FROM Boats B lt 100 C84320 32 SELECT COUNT Ssid FROM Sailors S SELECT COUNT Bbid FROM Boats B lt 100 Triggers oz Trigger procedure that starts automatically if specified changes occur to the DBMS ozo Three parts Event activates the trigger Condition tests whether the triggers should run Action what happens if the trigger runs C84320 33 Triggers Example SQL1999 CREATE TRIGGER youngSailorUpdate AFTER INSERT ON SAILORS REFERENCING NEW TABLE NewSaiIors FOR EACH STATEMENT INSERT INTO Y0ungSaiIorssid name age rating SELECT Sid name age rating FROM NewSaiIors N WHERE Nage lt 18 CS4320 34 Summary ozo SQL was an important factor in the early acceptance of the relational model more natural than earlier procedural query languages oz Relationally complete in fact significantly more expressive power than relational algebra oz Even queries that can be expressed in RA can often be expressed more naturally in SQL oz Many alternative ways to write a query optimizer should look for most efficient evaluation plan In practice users need to be aware of how queries are optimized and evaluated for best results C84320 35 Summary Corlth ozo NULL for unknown field values brings many complications oz SQL allows specification of rich integrity constraints ozo Triggers respond to changes in the database C84320 36 Crash Recovery RampG Chapter 18 CS4320 The ACID properties 3 A tornicity All actions in the Xact happen or none happen st C onsistency If each Xact is consistent and the DB starts consistent it ends up consistent st I solation Execution of one Xact is isolated from that of other Xacts st D urability If a Xact commits its effects persist oz The Recovery Manager guarantees Atornicity amp Durability CS4320 Motivation ozo Atomicity I Transactions may abort Rollback ozo Durability I What if DBMS stops running Causes ozo Desired Behavior after 1 system restarts crash T1 T1 T2 amp T3 should be T2 durable T3 I T4 amp T5 should be T4 I aborted effects not seen T5 C84320 3 CS4320 Assumptions oz Concurrency control is in effect I Strict 2PL in particular oz Updates are happening quotin place I ie data is overwritten 0n deleted from the disk ozo A simple scheme to guarantee Atomicity 8 Durability Handling the Bu er Pool ozo Force every write to disk I Poor response time No Steal Steal I But provides durability F I I oz Steal bufferpool frames orce Tr39v39al from uncommited Xacts If not poor throughput No Force I If so how can we ensure atomicity C84320 5 More on Steal and Force oz STEAL why enforcing Atomicity is hard I To steal frame F Current page in F say P is written to disk some Xact holds lock on P I What if the Xact with the lock on P aborts I Must remember the old value of P at steal time to support UNDOing the write to page P oz NO FORCE why enforcing Durability is hard I What if system crashes before a modified page is written to disk I Write as little as possible in a convenient place at commit timeto support REDOing modifications CS4320 Basic Idea Logging oz Record REDO and UNDO information for every update in a log I Sequential writes to log put it on a separate disk I Minimal info diff written to log so multiple updates fit in a single log page ozo Qg An ordered list of REDO UNDO actions I Log record contains ltXID pageID offset length old data new datagt I and additional control info which we ll see soon CS4320 WriteAhead Logging WAL ozo The WriteAhead Logging Protocol 6 Must force the log record for an update before the corresponding data page gets to disk Must write all log records for a Xact before commit ozo 1 guarantees Atomicity ozo 2 guarantees Durability ozo Exactly how is logging and recovery done I We ll study the ARIES algorithms CS4320 WALa E M the L08 LSNs pageLSNs ushedLSN ozo Each log record has a unique Log Sequence Number LSN Log records flushed to disk I LSNS always increasmg oz Each data page contains a pageLSN I The LSN 0f the most recent log record for an update to that page oz System keeps track of ushedLSN 7 I rll II I The max LSN ushed so far page Fog tall 1n RAM oz WAL Before a page is written I pageLSN S ushedLSN C84320 9 Log Records Possible log record types LogRecord fields 4 Update prevLSN z Commit XID ozo Abort type pagelD oz End signifies end of update length commit or abort records Offset oz Compensation Log only before39mage Records CLRs afterimage I for UNDO actions C84320 10 Other LogRelated State ozo Transaction Table I One entry per active Xact I Contains XID status running cornrnited aborted and lastLSN ozo Dirty Page Table I One entry per dirty page in buffer pool I Contains recLSN the LSN of the log record which first caused the page to be dirty C84320 11 Normal Execution ofcm Xact oz Series of reads 8 writes followed by commit or abort I We will assume that write is atomic on disk 39 In practice additional details to deal with nonatomic writes ozo Strict 2PL oz STEAL NOFORCE buffer management with WriteAhead Logging C84320 12 Checkpoin ting ozo Periodically the DBMS creates a checkpoint in order to minimize the time taken to recover in the event of a system crash Write to log I begincheckpoint record Indicates when chkpt began I endcheckpoint record Contains current Xact table and dirty page table This is a fuzzy checkpoint I Other Xacts continue to run so these tables accurate only as of the time of the begincheckpoint record I No attempt to force dirty pages to disk effectiveness of checkpoint limited by oldest unwritten change to a dirty page So it s a good idea to periodically ush dirty pages to disk I Store LSN of chkpt record in a safe place master record C84320 13 The Big Picture What s Stored Where LogRecords u Xact Table FZEVLSN Data pages lastLSN each status length pageLSN Dirty Page Table offset recLSN beforeimage master record afterimage flushedLSN C84320 14 Simple Transaction Abort ozo For now consider an explicit abort of a Xact I No crash involved oz We want to play back the log in reverse order UNDOing updates I Get lastLSN of Xact from Xact table I Can follow chain of log records backward via the prevLSN field I Before starting UNDO write an Abort log record I For recovering from crash during UNDO C84320 15 Q Abort cont 90 ext ozo To perform UNDO must have a lock on data I No problem ozo Before restoring old value of a page write a CLR I You continue logging while you UNDO I CLR has one extra field undonextLSN I Points to the next LSN to undo ie the preVLSN of the record we re currently undoing I CLRs never Undone but they might be Redone when repeating history guarantees Atomicity ozo At end of UNDO write an quotendquot log record C84320 16 CS4320 Transaction Commit ozo Write commit record to log ozo All log records up to Xact s lastLSN are ushed I Guarantees that ushedLSN 2 lastLSN I Note that log ushes are sequential synchronous writes to disk I Many log records per log page ozo Commit returns ozo Write end record to log 17 Crash Recovery Big Picture Oldest log rec of Xact active at crash Smallest recLSN in dirty page table after Analysis Last chkpt CRASH l quot CS4320 oz Start from a checkpoint found Via master record ozo Three phases Need to Figure out which Xacts committed since checkpoint which failed Analysis REDO all actions 9 repeat history UNDO effects of failed Xacts 18 Recovery The Analysis Phase oz Reconstruct state at checkpoint I via er1dchec1ltpoir1t record ozo Scan log forward from checkpoint I End record Remove Xact from Xact table I Other records Add Xact to Xact table set 1astLSNLSN change Xact status on commit I Update record If P not in Dirty Page Table I Add P to DPT set its recLSNLSN C84320 19 Recovery The REDO Phase oz We repeat History to reconstruct state at crash I Reapply all updates even of aborted Xactsl redo CLRs ozo Scan forward from log rec containing smallest recLSN in DPT For each CLR or update log rec LSN REDO the action unless I Affected page is not in the Dirty Page Table or I Affected page is in DPT but has recLSN gt LSN or I pageLSN in DB 2 LSN ozo To REDO an action I Reapply logged action I Set pageLSN to LSN No additional logging C84320 20 CS4320 Recovery The UNDO Phase ToUndo l l a lastLSN of a quotloserquot Xact Repeat I Choose largest LSN among TOUr1d0 I If this LSN is a CLR and undonextLSNNULL 0 Write an End record for this Xact I If this LSN is a CLR and undonextLSN NULL 0 Add undonextLSN t0 TOUr1d0 I Else this LSN is an update Undo the update write a CLR add prevLSN t0 TOUr1d0 Until ToUndo is empty 21 Example of Recovery RAM Xact Table lastLSN status Dirty Page Table recLSN flushed LSN ToUndo CS4320 LSN LOG 00 begincheckpoint 05 endchec1ltp0int 10 update T1 writes P5 20 update T2 writes P3 prevLSNs 30 T1 abort 40 CLR Uncgr i rs f io 45 5 T1 End 50 update T3 writes P1 60 update T2 writes P5 gt5lt CRASH RESTART 22 Example Crash During Restart RAM Xact Table lastLSN status Dirty Page Table recLSN flushed LSN ToUndO CS4320 LSN 0005 10 20 30 4045 50 60 70 8085 90 LOG beginchec1ltp0int endchec1ltp0int update T1 writes P5 update T2 writes P3 T1 abort update T3 writes P1 update T2 writes P5 Ilt CRASH RESTART 39 CLR Undo T2 LSN 60 CLR Undo T3 LSN 50 T3 end lt CRASH RESTART CLR Undo T2 LSN 20 T2 end CLR Undo T1 LSN 10 T1 End undonextLSN 23 Additional Crash Issues oz What happens if system crashes during Analysis During REDO oz How do you limit the amount of work in REDO I Flush asynchronously in the background I Watch hot spots oz How do you limit the amount of work in UNDO I Avoid longrunning Xacts C84320 24 CS4320 Summary of LoggingRecovery oz Recovery Manager guarantees Atomicity 8 Durability oz Use WAL to allow STEALNOFORCE w o sacrificing correctness ozo LSNs identify log records linked into backwards chains per transaction Via preVLSN oz pageLSN allows comparison of data page and log records 25 CS4320 Summary Cont oz Checkpointing A quick way to limit the amount of log to scan on recovery ozo Recovery works in 3 phases I Analysis Forward from checkpoint I Redo Forward from oldest recLSN I Undo Backward from end to first LSN of oldest Xact alive at crash oz Upon Undo write CLRs oz Redo repeats history Simplifies the logic 26 Schema Re nement and Normal Forms RampG Chapter 19 CS4320 The Evils of Redundancy oz Redundancy is at the root of several problems associated with relational schemas redundant storage insert delete update anomalies ozo Integrity constraints in particular functional dependencies can be used to identify schemas with such problems and to suggest refinements ozo Main refinement technique decomposition replacing ABCD with say AB and BCD or ACD and ABD oz Decomposition should be used judiciously Is there reason to decompose a relation What problems if any does the decomposition cause CS4320 Functional Dependencies FDs ozo A functional dependency X gt Y holds over relation R if for every allowable instance r of R t1 r tZE r 7Z39Xt1 7Z39X t2 implies 7Z39Yt1 7Z39Y t2 ie given two tuples in r if the X values agree then the Y values must also agree X and Y are sets of attributes ozo An FD is a statement about all allowable relations Must be identified based on semantics of application Given some allowable instance r1 of R we can check if it violates some FD f but we cannot tell if f holds over R oz K is a candidate key for R means that K gt R However K R does not require K to be minimal C84320 3 Example Constraints on Entity Set oz Consider relation obtained from HourlyEmps HourlyEmps name lot rating hrlywages hrsworked oz Notation We will denote this relation schema by listing the attributes SNLRWH This is really the set of attributes SNLRWH Sometimes we will refer to all attributes of a relation by using the relation name eg HourlyEmps for SNLRWH ozo Some FDs on HourlyEmps ssn is the key S gt SNLRWH rating determines hrlywages R gt W CS4320 RW Wa es Example Corlth g 8 10 H II Em 2 5 7 2 Problems due to R W Du y p8 N R Ugdate anomaly Can we changewmjust AttlShOO 8 the lst tuple 0fSNLRWH 231315368 Smiley 22 8 30 War Whatif 131243650 Smethurst 35 5 30 we want to msert an employee and don know Gu1du 5 the hourly wage for his 612674134 Madayan 35 8 4O 39 7 ratmg N L R W H Deletzon anomaly If we delete all employees with 123223666 Att1sh00 48 8 10 40 rating 5 we lose the 231315368 Smiley 22 8 10 3O mormatlon bout e 131243650 Smethurst 35 5 7 30 wage for ratmg 5 434263751 Guldu 35 5 7 32 Will2 smaller tables be better 612674134 Madayan 35 8 10 40 Reasoning About FDs ozo Given some FDs we can usually infer additional FDs ssn did did lot implies ssn lot ozo An FD f is imglied by a set of PBS P if f holds whenever all FDs in F hold F Closure of F is the set of all FDs that are implied by F ozo Armstrong s Axioms X Y Z are sets of attributes Re exioity If X E Y then Y gt X Augmentation If X gt Y then XZ YZ for any Z Tronsitioity If X Y and Y Z then X Z oz These are sound and complete inference rules for FDs CS4320 Reasoning About FDs Contd oz Couple of additional rules that follow from AA Union le gtY and X gt Z then X gt YZ Decomposition IfX gt YZ then X gt Y and X gt Z oz Example Contractscidsidjiddidpidqtyonlne and C is the key C CSJDPQV Project purchases each part using single contract JP gt C Dept purchases at most one part from a supplier SD gt P ozo JP gt C C gt CSJDPQV imply JP gt CSJDPQV ozo SD gt P implies SDJ gt JP ozo SDJ gt JP JP gt CSJDPQV imply SDJ gt CSJDPQV C84320 7 Reasoning About FDs Cortth oz Computing the closure of a set of FDs can be expensive Size of closure is exponential in attrs oz Typically we just want to check if a given FD X gt Y is in the closure of a set of FDs F An efficient check Compute attribute closure of X denoted X wrt F 0 Set of all attributes A such that X gt A is in F 0 There is a linear time algorithm to compute this Check if Y is in X ozo Does F A gt B B gt C C D gt E imply A gt E ie is A gt E in the closure F Jr Equivalently is E in 14 C84320 8 Normal Forms oz Returning to the issue of schema refinement the first question to ask is whether any refinement is needed ozo If a relation is in a certain normal form BCNF 3NF etc it is known that certain kinds of problems are avoided minimized This can be used to help us decide whether decomposing the relation will help ozo Role of FDs in detecting redundancy Consider a relation R with 3 attributes ABC 0 No FDs hold There is no redundancy here 0 Given A gt B Several tuples could have the same A value and if so they ll all have the same B value CS4320 BoyceCodd Normal Form B CNF ozo Reln R with FDs F is in BCNF if for all X gt A in F A E X called a trivial FD or X contains a key for R ozo In other words R is in BCNF if the only nontrivial FDs that hold over R are key constraints No dependency in R that can be predicted using FDs alone If we are shown two tuples that agree upon X Y A the X value we cannot infer the A value in one tuple from the A value in the other X yl If example relation is in BCNF the 2 tuples X yz must be identical since X is a key C84320 10 Third Normal Form BNF ozo Reln R with FDs F is in 3NF if for all X gt A in F A E X called a trivial FD or X contains a key for R or A is part of some key for R oz Minimality of a key is crucial in third condition above ozo If R is in BCNF obviously in 3NF ozo If R is in 3NF some redundancy is possible It is a compromise used when BCNF not achievable e g no good decomp or performance considerations Losslessjoin dependencypreserving decomposition of R into a collection of SNF relations always possible 0 CS432 11 What Does 3NF Achieve ozo If 3NF violated by X gt A one of the following holds X is a subset of some key K 0 We store X A pairs redundantly X is not a proper subset of any key 0 There is a chain of FDs K gt X gt A which means that we cannot associate an X value with a K value unless we also associate an A value with an X value ozo But even if reln is in 3NF these problems could arise eg Reserves SBDC S gt C C gt S is in 3NF but for each reservation of sailor S same S C pair is stored oz Thus 3NF is indeed a compromise relative to BCNF C84320 12 Decomposition of a Relation Scheme oz Suppose that relation R contains attributes A1 An A decomposition of R consists of replacing R by two or more relations such that Each new relation scheme contains a subset of the attributes of R and no attributes that do not appear in R and Every attribute of R appears as an attribute of one of the new relations ozo Intuitively decomposing R means we will store instances of the relation schemes produced by the decomposition instead of instances of R oz E g Can decompose SNLRWH into SNLRH and RW C84320 13 Example Decomposition oz Decompositions should be used only when needed SNLRWH has FDs S gt SNLRWH and R gt W Second FD causes Violation of 3NF W values repeatedly associated with R values Easiest way to fix this is to create a relation RW to store these associations and to remove W from the main schema ie we decompose SNLRWH into SNLRH and RW ozo The information to be stored consists of SNLRWH tuples If we just store the projections of these tuples onto SNLRH and RW are there any potential problems that we should be aware of C84320 14 Problems with Decompositions ozo There are three potential problems to consider I Some queries become more expensive 0 e g How much did sailor Joe earn salary WH 39 Given instances of the decomposed relations we may not be able to reconstruct the corresponding instance of the original relation 0 Fortunately not in the SNLRWH example 39 Checking some dependencies may require joining the instances of the decomposed relations 0 Fortunately not in the SNLRWH example oz Tradeo Must consider these issues vs redundancy C84320 15 Lossless oin Decompositions ozo Decomposition of R into X and Y is losslessz39oin wrt a set of FDs F if for every instance r that satisfies F 7527 gt4 7H0 7 ozo It is always true that r E EXr gtlt1 751139 In general the other direction does not hold If it does the decomposition is losslessjoin oz Definition extended to decomposition into 3 or more relations in a straightforward way oz It is essential that all decompositions used to deal with redundancy be lossless Avoids Problem 2 C84320 16 More on Lossless loin oz The decomposition of R into X and Y is losslessjoin wrt F if and only if the Closure of F contains X W Y gt X or X W Y gt Y ozo In particular the decomposition of R into UV and R V is losslessjoin if U gt V holds over R CS4320 lbt gt Mouton mama lbH Mouton JUINw mama li KNbr K NNNUINw wooooowo 17 Dependency Preserving Decomposition oz Consider CSJDPQV C is key JP gt C and SD gt P BCNF decomposition CSJDQV and SDP Problem Checking JP gt C requires a join oz Dependency preserving decomposition Intuitive If R is decomposed into X Y and Z and we enforce the FDs that hold on X on Y and on Z then all FDs that were given to hold on R must also hold Avoids Problem 3 oz Projection of set of FDs F If R is decomposed into X projection of F onto X denoted Fx is the set of FDs U gt V in F closure of F such that U V are in X C84320 18 Dependency Preserving Decompositions Con td ozo Decomposition of R into X and Y is dependency preserving if FX union FY F ie if we consider only dependencies in the closure F that can be checked in X without considering Y and in Y without considering X these imply all dependencies in F oz Important to consider F not F in this definition ABC A gt B B gt C C gt A decomposed into AB and BC Is this dependency preserving Is C gt A preserved ozo Dependency preserving does not imply lossless join ABC A gt B decomposed into AB and BC ozo And Viceversa Example C84320 19 Decomposition into BCNF ozo Consider relation R with FDs F If X gt Y violates BCNF decompose R into R Y and XY Repeated application of this idea will give us a collection of relations that are in BCNF lossless join decomposition and guaranteed to terminate eg CSJDPQV key C P gt C SD gt P gt S To deal with SD gt P decompose into SDP CSJDQV To deal with gt S decompose CSJDQV into S and CJDQV ozo In general several dependencies may cause violation of BCNF The order in which we deal wit them could lead to very different sets of relations C84320 20 BCNF and Dependency Preservation ozo In general there may not be a dependency preserving decomposition into BCNF eg CSZ CS gt Z Z gt C Can t decompose while preserving lst FD not in BCNF ozo Similarly decomposition of CSJDQV into SDP IS and CJDQV is not dependency preserving wrt the PBS JP gt C SD gt P and gt S However it is a lossless join decomposition In this case adding JPC to the collection of relations gives us a dependency preserving decomposition JPC tuples stored only for checking FD Redundancy C84320 21 Decomposition into 3NF ozo Obviously the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp into 3NF typically can stop earlier oz To ensure dependency preservation one idea If X gt Y is not preserved add relation XY Problem is that XY may violate 3NF e g consider the addition of CJP to preserve JP gt C What if we also have I gt C ozo Refinement Instead of the given set of FDs F use a minimal cover for F C84320 22 Minimal Cover for a Set ofFDs oz Minimal cover G for a set of FDs F Closure of F closure of G Right hand side of each FD in G is a single attribute If we modify G by deleting an FD or by deleting attributes from an FD in G the closure changes ozo Intuitively every FD in G is needed and as small as possible in order to get the same closure as F ozo eg A gt B ABCD gt E EF gt GH ACDF gt EG has the following minimal cover A gt B ACD gt E EF gt G and EF gt H oz MC gt LosslessJoin Dep Pres Decomp book C84320 23 Re ning an ER Diagram ozo 1st dia ram translated m W0rke sSNLDS DepartmentsDMB Lots associated with workers oz Suppose all workers in a dept are assigned the same lot D L ozo Redundancy fixed by m W0r1lters2 SNDS DeptL0tsDL ozo Can finetune this W0rkers2SNDS DepartmentsDMBL C84320 24 Summary of Schema Re nement ozo If a relation is in BCNF it is free of redundancies that can be detected using FDs Thus trying to ensure that all relations are in BCNF is a good heuristic ozo If a relation is not in BCNF we can try to decompose it into a collection of BCNF relations Must consider whether all FDs are preserved If a lossless join dependency preserving decomposition into BCNF is not possible or unsuitable given typical queries should consider decomposition into 3NF Decompositions should be carried out and or reexamined while keeping performance requirements in mind C84320 25 TreeStructured Indexes RampG Chapter 10 CS4320 Introduction oz As for any index 3 alternatives for data entries kquot I Data record with key value k I ltk rid of data record with search key value kgt I ltk list of rids of data records with search key kgt ozo Choice is orthogonal to the indexing technique used to locate data entries kquot oz Treestructured indexing techniques support both range searches and equality searches oz I SAM static structure B tree dynamic adjusts gracefully under inserts and deletes CS4320 Range Searches oz Find all students with gpa gt 30 If data is in sorted file d0 binary search to find first such student then scan to find others Cost of binary search can be quite high oz Simple idea Create an index file w H Page1 l l Pagez l Page3 l l Page N Data File Can do binary search on smaller index le CS4320 index entg P0 K1P1K2P 2 000 Kum ozo Index file may still e quite large But we can apply the idea repeatedly Nonleaf Pages Pages D D u Overflow gt El x page Primary pages Leafpages contain data entries CS4320 Comments on I SAM 3235 Index Pages oz File creation Leaf data pages allocated sequentially sorted by search key then index pages allocated then space for over ow pages Over ow pages oz Index entries ltsearch key value page idgt they direct search for data entries which are in leaf pages to Search Start at root use lltey comparisons to go to leaf Cost 0C log FN F entriesindex pg N leaf pgs oz M Find leaf data entry belongs to and put it there oz m Find and remove from leaf if empty over ow page deallocate Static tree structure insertsdeletes n ect only lenfpnges C84320 5 Example I SAM Tree oz Each node can hold 2 entries no need for next leaf page pointers Why V 7 Root I lll H 10 15 20 27 3337 4046 5155 6397 CS4320 After Inserting 23 48 41 42 Root I lll k Pages Primary V Leaf I10 I15 I I20 27 I I33I37I I40 I46 I I51 I55 I I63 I97 I Pages Pages CS4320 Then Deleting 42 51 97 Root I lll u v V y 1015l20 27333T4045 5563 l n Note that 51 r appears in index levels but not in leaf CS4320 B Tree Most Widely Used Iridex ozo lnsert delete at log F N cost keep tree height bdldrzeed F fanout N leaf pages ozo Minimum 50 occupancy except for root Each node contains 1 lt m lt 2d entries The parameter dis called the order of the tree oz Supports equality and rangesearches efficiently Index Entries Direct search Data Entries quotSequence setquot CS4320 Example B Tree oz Search begins at root and key comparisons direct it to a leaf as in 18AM oz Search for 5 15 all data entries gt 24 Root I13II17 II24II30I A N V quotN KN I2 I3I5I7I I14I16I I I I19I20 22I I I24I27I29I I I33I34I38I39I Based on the search for 15 3 we know it is not in the tree C84320 10 B Trees in Practice oz Typical order 100 Typical fillfactor 67 average fanout 133 ozo Typical capacities Height 4 1334 312900700 records Height 3 1333 2352637 records oz Can often hold top levels in buffer pool Level 1 1 page 8 Kbytes Level 2 133 pages 1 Mbyte Level 3 17689 pages 133 MBytes C84320 11 Inserting a Data Entry into a 3 Tree ozo Find correct leaf L oz Put data entry onto L If L has enough space done Else must split L into L and a new node L2 39 Redistribute entries evenly copy up middle key 39 Insert index entry pointing to L2 into parent of L ozo This can happen recursively T0 split index node redistribute entries evenly but push up middle key Contrast with leaf splits oz Splits grow tree root split increases height Tree growth gets wider or one level taller at top C84320 12 Inserting 8 into Example B Tree Entry to be inserted in parent node 3 Observe hOW J Note that 5 is copied up and continues to appear in the leaf minimum occu am is m P y 3 5 guaranteedin I2 I I I I I I7 I8 I I both leaf and index pg splits 4 NOte difference Entry to be inserted in parent node between copy I g v g 2 ig iafdcm ggtomy up and pushup 39 be sure ou unde th the l5H H H II HMH31 reasons for this C84320 13 Example B Tree After Inserting 8 m m m m N 23 578 1416 1912 22 242729 33343839 1 Notice that root was split leading to increase in height 3 In this example we can avoid split by redistributing entries however this is usually not done in practice C84320 14 Deleting a Data Entry from a 3 Tree oz Start at root find leaf L where entry belongs ozo Remove the entry If L is at least halffull done If L has only d1 entries 0 Try to redistribute borrowing from sibling adjacent node with same parent as L 0 If redistribution fails M L and sibling oz If merge occurred must delete entry pointing to L or sibling from parent of L oz Merge could propagate to root decreasing height C84320 15 Example Tree After Inserting 8 gt5 Then Deleting 19 and 20 m m m m m r 23 578 1416 22124 2729 33343839 ozo Deleting 19 is easy ozo Deleting 20 is done with redistribution Notice how middle key is copied up C84320 16 And Then Deleting 24 Mustmerge H so H H H oz Observe toss of m N Index entry on ght I 22 I 27 I 29 I I I33 I34 I38 I39 I and pull down of index entry below I s H13 quotH sou m m I m m 23 578 llwlwl 222729 33343839 C84320 17 Example ofNonleafRedistribution ozo Tree is shown below during deletion of 24 What could be a possible initial tree oz In contrast to previous example can redistribute entry from left child of root to right child Rock 1 w IE m A m m V m m 23 578 1416 quot1181 2 21 221271291 331341381391 C84320 18 After Redistribution ozo lntuitively entries are redistributed by pushing through the splitting entry in the parent node oz lt suffices to redistribute index entry with key 20 we ve redistributed 17 as well for illustration Root A m m N m N m 23 578 1416 quot1181 2 21l 221271291 33343839 C84320 19 Pre x Key Compression ozo Important to increase fanout Why ozo Key values in index entries only direct traffic can often compress them E g If we have adjacent index entries with search key values Damion Yogurt David Smith and Devarakonda Marthy we can abbreviate David Smith to Dav The other keys can be compressed too 39 Is this correct Not quite What if there is a data entry Davey 011857 Can only compress David Smith to Davi In general while compressing must leave each index entry greater than every key value in any subtree to its left oz lnsert delete must be suitably modified C84320 20 Balk Loading of a 3 Tree ozo If we have a large collection of records and we want to create a B tree on some field doing so by repeatedly inserting records is very slow oz Balk Loading can be done much more efficiently oz Initialization Sort all data entries insert pointer to first leaf page in a new root page Root FD Sorted pages of data entries not yet in B tree 3 4l 9l 1 1213 2 22 2331 3536 384144 CS4320 21 Bulk Loading Corith Rh ozo Index entries for leaf 39 l pages always 6 12 23 35 Data entry pages not yet In B tree entered into right most index page just above leaf level When this fills up it splits Split may go up rightmost path to the root oz Much faster than repealed mserts I 6 H II 12 H H 23H H H 38H II espec1ally when one considers locking CS4320 Data entry pages not yet in B tree Summary of Bulk Loading ozo Option 1 multiple inserts Slow Does not give sequential storage of leaves ozo Option 2 Bulk Loading Has advantages for concurrency control Fewer I Os during build Leaves will be stored sequentially and linked of course Can control fill factor on pages C84320 23 A Note on Order oz Order 1 concept replaced by physical space criterion in practice at least hal ill Index pages can typically hold many more entries than leaf pages Variable sized records and search keys mean differnt nodes will contain different numbers of entries Even with fixed length fields multiple records with the same search key value duplicates can lead to variablesized data entries if we use Alternative C84320 24 Summary oz Treestructured indexes are ideal for range searches also good for equality searches ozo ISAM is a static structure Only leaf pages modified over ow pages needed Over ow chains can degrade performance unless size of data set and data distribution stay constant ozo B tree is a dynamic structure lnserts deletes leave tree heightbalanced log F N cost High fanout means depth rarely more than 3 or 4 Almost always better than maintaining a sorted file C84320 25 Summary Corlth Typically 67 occupancy on average Usually preferable to ISAM modulo locking considerations adjusts to growth gracefully If data entries are data records splits can change rids oz Key compression increases fanout reduces height ozo Bulk loading can be much faster than repeated inserts for creating a B tree on a large data set oz Most widely used index in database management systems because of its versatility One of the most optimized components of a DBMS C84320 26
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'