INFO & DATABASE SYS 1
INFO & DATABASE SYS 1 CIS 4301
Popular in Course
Popular in Comm Sciences and Disorders
This 59 page Class Notes was uploaded by Aliyah Boyer on Friday September 18, 2015. The Class Notes belongs to CIS 4301 at University of Florida taught by Staff in Fall. Since its upload, it has received 7 views. For similar materials see /class/207035/cis-4301-university-of-florida in Comm Sciences and Disorders at University of Florida.
Reviews for INFO & DATABASE SYS 1
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/18/15
Schema Re nement and Normal Forms Chapter 19 The Evils 0f Redundancy Redundancy is at the root of several problems associated with relational schemas redundant storage insertdeleteupdate anornahes Integrity constraints in particular functiunul dependencies can be used to identify schemas with such problems and to suggest refinements Main re nement technique decumyusitiun replacing ABCD with say AB and BCD or ACD and ABD Decomposition should be used judiciously ls there reason to decompose a relation What problems if any does the decomposition cause Functional Dependencies FDs A functional dependency X Y holds over relation R if for every allowable instance 139 of tle 7 125 r IIXUI 7rX22 implies IIIl1 7502 i e given two toples in y if the x values agree then the Y values rnost also agree x andY are seLs ot attnhotes An FD is a statement about all allowable relations Must he identi ed based on sernanurs ot application i en some allowable insmnce 71 of R we can check if it violates sorne FD f but we cannot tdl iffholds over R K is a candidate key for R means that Kgt R However KgtR does not require K to he mlmmdl Example Constraints on Entity Set Consider relation obtained from HourlyiEmps HourlyiEmps name lat mimg hrlyiwuges hrsiwmked Nututiun We will denote this relation schema by listing the attributes This is really the set of attributes SNLRWH Sometimes we will refer to all attributes of a relation by using the relation name e g HourlyiEmps for SNLRWH Some FDs on HourlyiEmps SM is the key s SN39LRWH mimg determines hrlyiwuges Rgt W Example Comfy W Q w 5 7 9 Problems due toRgtW Howl Empsz llgdazmmmalh Can S N L R H wechangewmuss 123223666 Attishoo 48 s 40 thelst tupleofsNLRVVl U 231315368 Smiley 22 8 30 Imemmammal Whatlf 131M3650 5mm 35 5 30 We Want 0 insert an employeeandden39tknew 434453751 35 5 32 thehourlywageforhls 512574134 Madayan 35 s 40 rahng7 H s We deleteallemployeeswlth 123423565 M1511 48 rating5welosethe 231315368 Smiley 22 age for rating sl 414263751 Guldu s s ma ab he 131243650 Smethurst 35 5 7 30 W 5 s WillZsmallertables be better 512574134 Madaym 35 Reasoning About FDs C M Given some FDs we can usually infer additional FDs ssh 11111 11111 a lot implies ssh An FDfis imylied by a set of FDs F iff holds whenever all FDs in F hold 17 clasure afF is the set of all FDs that are implied by F Armstrong s Axioms X Y Z are sets of attributes Re exlvlgi If X Q Y then Y gt X Augmmiuimn If Xgt Y then XZ gt YZ for anyZ Trumlilmgi If XAY and Y a 2 then X a z These are suund and Cumplete inference rules for FDs ReasoningAbout FDs C0ntd Couple of additional rules that follow from AA Umlm ImegtY and X Z then X YZ Decampaslilzm IfX gt YZ then X gt Y and X gt Z Example Contractscidsidjidaidpidqtymulue and Cis the key C CSJDPQV Project purchases each part using single contract JP C Dept purchases at most one part from a supplier SD P JP C C CSJDPQV imply 39P CSJDPQV SD P implies SDI JP SDJ JP JP CSJDPQV imply SDJ CSJDPQV 7 Av Reasoning About FDs C0ntd mp Computing the closure of a set of FDs can be expensive Size of closure is exponential in attrs Typically we just want to check if a given FD X gt Y is in the closure of a set of FDs F An efficient check Compute attribute clasme otx denoted X wrtF Set or al attributes A such that x A is in 17 There is a linear time algorithm to compute this Check if Y is in DoesF A B B C C D E irnply A E ie is A E mthe closure F Equivalently is E mA 7 a Normal Forms Returning to the issue of schema re nement the rst question to ask is whether any refinement is needed H a relation is in a certain nurmulfm m BCNF 3NF etc it is known that certain kinds of problems are avoidedminimized This can be used to hel decide whether decomposing the relation will help Role of FDs in detecting redundancy Consider a relationR with 3 attributes ABC I No FDs hold There is no redundancy here Given A B Several tuples could have the same A value and it so they ll all have the same B value Boyce Coda Normal Form BCNF Reln R with FDs F is in BCNF if for all X A in F A e x called a invml FD or Xcontains a key or In other words R is in BCNF if the only nonetrivial FDs that hold over R are key constraints No dependency in R that can he predicted using FDs alone Ifwe are shown two hip1es that agree upon the X value we cannot infer the A value in one tuple from the A value in the other If example relation is inBCN39F the 2 hip1es mustbe identical since x is a key Third Normal Form 3NF g mo Reln R with FDs F is in 3NF if for all X A in F A E x called a mm FD or gtlt contains a key for R or Ais part of some key for R Minimulity of a key is crucial in third condition above H R is in BCNF obviously in 3NF If R is in 3NF some redundancy is possible It is a compromise used when BCNF not achievable eg no good decomp or performance considerations Lasszesse am dependencyrpresewmg decampaslimn ch mic u calleciwnzy BWreluimIsulw sgcssi e isnasenmmmsymneaaamm i at ii What Does 3NF Achieve M If 3NF violated by X A one of the following holds x is a subset of some key K We store x A pairs redundantly x is not a proper subset of any key I There is a chain of FDs K9 X 9 A which means that w c nnot associate anX value with a K value unless we also assodate an Avalue with an X value But even if reln is in 3NF these problems could arise e g Reserves SBDC 5 C C 9 S is in3NF butfor each reservation of sailor S same S C pair is stored Thus 3NF is indeed a compromise relative to BCNF 12 Decomposition ofa Relation Scheme Suppose that relation R contains attributes A1 An A decumyusitiun of R consists of replacing R by two or more relations such that new 1 btnb Hui Git ot R and no attributes tbat do not appear in R and Every attribute ot R appears as an attribute ot one ot the new re a ons Lntuitively decomposing R means we will store instances o the relation schemes produced by the decomposition instead of instances of R Eg Can decompose SNLRWH into SNLRH and RW 13 e 1 Example Decomposition Decompositions should be used only when needed SNLRWH has FDs S m SNLRWH and R W Second FD uu 39 39 393 W quot associated WithR values Easiest Way to fix this is to create a relation RW to store these assoda ons and to remove W from the main schema I i e we decompose SNLRWH into SNLRH and RW The information to be stored consists of SNLRWH tuples If we just store the projections of these tuples onto SNLRH and RW are there any potential problems that we should be aware of Problems with Decompositions Q There are three potential problems to consider Sorne queries become more expensive e g How much did sailor Joe earn salary WH Given instances ot the decomposed relations we may not be able to reconstruct the corresponding insmnce ot the original relation Fortunately not in the SNLRWH example Checking sorne dependencies may require joining the insmnces ot the decomposed relations Fortunately not in the SNLRWH example Trudeu Must consider these issues vs redundancy 15 Decomposition of R into X and Y is lusslessil39uin wrt a set of FDs F if for every instance r that satisfies F 7amp0 M 7510 7 It is always true that r E IFXr gtlt1 fry r In general the other direction does not hold If it does the decomposition is losslessrjoin Lossless join Decompositions Definition extended to decomposition into 3 or more relations in a straightforwar Way It is essential thatall decompositions used to deal with redundancy be lussless Avoids Problem 2 More on Lossless join The decomposition of R into X and Y is losslessjoin wrt F if and only if the closure of F contains In particular the decomposition of R into UV and R e V is losslessejoin if U gt V holds over R Dependency Preserving Decomposition 6amp3 Consider CSIDPQV C is key JP gt C and SD gt P BCNF decomposition CSIDQV and SDP Problem Checking JPgt C requires a join Dependency preserving decomposition Intuitive HR is decomposed into X Y and Z and we enforce the FDs thathold on X on Y and on Z then a FDs that were given to hold oanust also hold W Projection of set of FDs F If R is decomposed into X projection of F onto X denoted FX is the set of FDs U 9 V in F closure ufF such that U V are in X Dependency Preserving Decompositions d M Cont Decomposition of R into X and Y is degendengi eservingif FX union Fy F i e if we consider only dependencies m the Closure F that can be checked in X without considering Y and in Y without considering X these imply all dependendes in F Important to consider F not F in this de nition ABC AgtB BgtC 3 A decomposed into AB and BC Is quot 1 pica ling Is CgtAI Dependency preserving does not imply lossless join A B decomposed into AB and BC And viceeversa Example Decomposition into BCNF mo Consider relation R with FDs F If X Y violates BCNF decompose R into R 7 Y and XY Repeated application of this idea will give us a collection of relations that are m BCNF 1oss1ess join decomposition and guaranteed to terminate e g CSJDPQV key C 39Pgt C SD P I s To deal with SD gt P decompose into SDP CSID V To deal with I s decompose CSJDQV mto Is and CJDQV In general several dependencies may cause violation of BCNF The order in which we deal with them could lead to very different sets of relations BCNF and Dependency Preservation In general there may not be a dependency preserving decomposition into BCNF eg CSZ CS Z 29C Can t decompose while preserving 1stFD not in BCNF Similarly decomposition of CSIDQV into SDP IS and CIDQV is not dependency preserving wrt the FDSIPgt C SD gt P and gt S However itis a1oss1ess join decomposition In this case adding 39PC to the collection of relations gives us a dependency preservmg decomposi 39 J39PC hip1es stored only for checking FD Redundancy 2 Decomposition into 3NF gigs Obviously the algorithm for lossless join decornp into BCN e used to obtain a lossless join decomp into 3NF typically can stop earlier To ensure dependency preservation one idea is not preserved ad r a onXY Problem is thatXY may violate 3N39F e g consider the addition ofCJP to preserve39 JP c What it We also have I 9 C 7 Refinement Instead of the given set of FDs F use a minimal Cuverfm F Minimal Coverfor a Set ofFDs Minimal Cover G for a set of FDs F Closure of F closure of G Right hand side ot each FD in G is a single attribute ltwe modify G by deleting an FD or by deleting attributes from an FD in G the closure changes lntuitively every FD in G is needed and as small as possible in order to get the same closure as F eg A gt B ABCD gt E EF gt GH ACDF gt EG has the following rninirnal cover A B ACD E EF G and EF H MC gt Losslessloin Dep Pres Decomp in book23 Refining an ER Diagram M 1st diagram translated Before w WorkerssNLDS ranentsDMB Lots associated With workers Suppose all workers in a dept are assigned the same D G L Redundancy xed by M WorkersZSNDS DeparhnentsDMBL Summary of Schema Re nement If a relation is in BCNF it is free of redundancies that can be detected using FDs Thus ying to ensure that all relations are in BCNF is a good heuristic If a relation is not in BCNF we can try to decompose it into a collection of BCNF relations Must consider whether all FDs are preserved If a losslessr join dependency preserving decomposition into BCNF is not possible or unsuimble given typical queries should consider decomposition into 3NF Decompositions should be carried out andor reexamined while keeping perfmmmce requirements in Relational Algebra Chapter 4 Part A Relational Query Languages t Que languages Allow manipulation and retrieval of data from a database 9 Relational model supports simple powerful QLs strong formal foundation based on logic Allows for much optimization 9 Query Languages 1 programming languages QLs not erpected to be Turing completequot QLs not intended to be used for compla calculations QLs support easy ef dent access to large data sets Formal Relational Query Languag 9 Two mathematical Query Languages form the basis for real languages eg SQL and for implementation Relational Algebra More operational very useful for representing erecudon plans Relational Calculus Lets users describe what they want rather than how to computei Non operational deela rative Preliminaries E7 9 A query is applied to relation instances and the result of a query is also a relation stan Sehernas of input relations for a query are fixed but query will run regardless of instance The schema for the result of a given query is also fixed Determined by definition of query language constructs z Positional vs namedfield notation Positional notation easier for formal de nitions namedr eld notation more readable Both used in SQL R1 E m Lay Example Instances 22 101 1010 58 103 111296 a Sailors and Reserves relations for our examples 51 sl d 5mm mung age a We ll use positional or named field notation assume that names of fields in query results are inherited from names of fields in query inpu relations 52 Relational Algebra 9 Basic operations Selection 039 Selectsasubset of rows from relation Prgl39eetion 7139 od 1 39 Crossgroduet X Allows us to combine two relations etedi erenee Tuples in reln 1but notin reln2 Union Tuplesinreln1andinreln 2 9 Additional operations lntersectionjoin division renaming Not essentialbut I useful 9 Since each operation returns a relation operations can be composed Algebra is closed Projection 1 Deletes attributes that are not in projection lis e Schema ofresultcontains ecactly elds in the projection list with the same names that they had in the only input relation 6 Projection o erator has to diminate duplicates W hy Note real systems typically don t do duplicate elim39 a unless the user ecplin tly asks for it Why not tion sname rating yuppy 9 lubber 8 guppy 5 rusty 10 717 S2 snamerattng shame lt2 rating age Selection e Selects rows that satisfy selection condition e No duplicates in result Why Schema of result identical to schema of only input relation 6 Result relation can be the in utfor another relational algebra operation Operator eomposi ion shame 7139 mameJaImg 32 ra ting gratinggt 8 32 Sid sname rating age eAllottheseoperationstake 22 dustiquot 7 two input relations which 31 lubber 8 mustbe um oneeomgtztible 58 rusty 10 Samenumberoffields 44 guppy 5 C w p ndin elds 28 Y IPPY 9 have the same type Slvsz What is the schema of result sid sname rating age a rubber s 22 dustln 7 450 58 rusty 10 350 S175 Sl SZ CrossrProduct 9 Each row of S1 is paired with each row of R1 9 Result schema has one field per field of S1 and R1 with field names inherited if possible Con ict Both S1 and R1 have a field called Sid std b39 sname rating age std 1 day 22 dustin 7 450 2 101 101096 22 d m 7 450 5 103 11 1296 31 lubb r 0 555 22 101 101096 31 lubbe 0 555 50 103 111296 50 rusty 10 350 22 101 101096 50 rusty 10 350 50 103 111296 Renaming aerator p Clgt rid 5 sid2 SlgtltR1 joins t ondition loin R CS 08RXS sid sname rating age sid bid day 22 dustin 7 450 58 103 111296 31 lubber 8 555 58 103 111296 S1 7 iS lvidltRlxidR1 9 Result schema same as that of crossproduct 9 Fewer tuples than crossproduct might be able to compute more efficient y 9 Sometimes called a thatHam joins z guiiloin A special case of condition join where the condition 0 contains only equu i es 9 Result schema similar to crossproduct but only one copy of fields for which equality is specified 9 Natural loin Equijoin on all common fields Division 9 Not supported as a primitive operator but useful for expressing queries li e Find sailors who have reserved ull boats t LetA have 2 fields x and y B have only field V AB 3ltxygteA Vltygt e B ieAB contains all x tuples sailmsl such that for m y tuple boat in B then is aux Eu 1 in A Or If the set of y Values boats associated with an a Value sailor in A contains all y values in B the a value is in AB t In eneral x and y can be any lists of fields y is the list of fields in B andeyis the list of fields ofA Examples of Division AB ABZ Expressing AB Using Basic Opera 9 Division is not essential op just a useful shorthand Also true of joins but joins are so Common that systems implement joins specially t Idea For AB compute all x values that are not disqualified by some y value in B a Value is disquali ed ifby attaching y Value from B we obtain an icy tuple that is not in A Disquali ed x Values 7 x 7Z XAgtltBA AB 72XA 7 all disquali ed tuples Find names of sailors who ve reserved bout 1 9 Solution 1 Irmame039bid103 Reservesl Sailors 9 Solution 2 p Templ 039 Reserves bid103 p Temp2 Templ I I Sailors T emp2 sname 9 Solution 3 ll Reserves j Sailors sname 7 bid 103 Find ndmes ofsdilors who ve reserved d red I 9 Information about boat color only available in Boats so need an ex a oin Ir mm B 060 107 dBoats Reserves Sailors t A more efficient solution snam2 sm b1d039010rAway80025 ResJ Sailors A query optimizer cdn rtd this given the first solution 9 Can identify all red or green boats then find sailors who ve reserved one of these boats p Tempboats 039 COIOFZ39red39 vcolor39green39 Boats snameaempboats Re serves Sailors 9 Can also define Tempboats using union How What happens if V is replaced by A in this query 18 Find sailors who ve reserved a red and a green 9 Previous approach won t work Must identify sailors who ve reserved red boats sailors who ve reserved green boats then find the intersection note that Sid is a key for Sailors 0 ath red Boats j Reserves p Tempred nd p T empgreen sidacolor grew Boats Reserves mameTempred Tempgreen Sailors Find the names ofsailors who ve reserved all n 9 Uses division schemas of the input relations to must be carefully chosen p T empszds iidbidResenes bid Boats 7r mame Tempsids Sailors 6 To find sailors who ve reserved all Interlake boats lm39de Imame Interlak Boats Summary 9 The relational model has rigorously defined query languages that are simple and powerful t Relational algebra is more operational useful as internal representation for query evaluation plans 9 Several ways of expressing a given query a query optimizer should choose the mos efficient version Crash Recovery Chapter 18 Review The ACID properties A tomidty All actions in the Xaethappem or none happen C onsistency It each Xact is consistent and the DB starts consistent it ends up consistent I so1ation Execution of one Xact is isolated from that of other Xacts D urahih39ty It a Xact commits its effects persist The Recovery Manager guarantees Atomicity amp Durability Motivation Atomicity Transactions may ahort Rollback Durability What itDBMs stops running Causes Desired Behavior atter h system resta T1 quot 5 39 7 T1 T2 ampT3 should he T2 ah1e T3 I 7 T4 amp T5 should he T4 aborted effects not seen T5 Assumptions Concurrency control is in effect Strict ZPL in particular Updates are happening 1 in place i e data is overwritten on deleted from the disk A simple scheme to guarantee Atomicity amp Durability Handling the Bu er Pool Force every write to disk I Poorres onse time But provides durability Steal bufferepool frames No Steal Steal F rce Trivial from uncommited Xacts Ltnot poor throughput Noam Desired Hso owcanweensure atomidty s More on S teal and Force STEAL why enforcing Atomicity is hard To skulfmmz F Currentpage inF say P is written to disk some Xact holds lock on P What it the Xact With the lock on P aborts7 Must remember the old value cm at steal time to supportUN39DOing the wnte to page P NO FORCE why enforcing Durability is hard What it system rashes before a modi ed page is written to dis Write as little as possible in a convenient place at commit meto suppoit REDOing modi cations every update in a lag updates tin a single 1og page Log record connins Basic Idea Logging KW Record REDO and UNDO information for Sequential writes to log put it on a separate disk Minima 39 o ditt written to log so multiple g An ordered list of REDOUNDO actions ltXID pageID ottset 1ength o1d dam new datagt and additional control into which we ll see soon Corresponding dam page gets to 1 guarantees Atomicity 2 guarantees Durability wen study the ARIES algorithms Write Ahead Logging WAL The WriteeAhead Logging Protocol 0 Must torce the log record tor an update befme the disk Must write a111og records tor a gtltact begins camrm Exactly how is logging and recovery done WAL a the Log LSNs pageLSNs Number LSN LSNs always increasing Each dam gage contains a pageLSN I The LSN of the most re cent lag recmd tor annpdate to that e P38 System keeps track of flushedLSN I The max LSN ushed so far WAL Befare a page is written pageLSN g ushedLSN AM ushedLSN Each log record has a unique Log Sequence Lug moulds in disk ushed Log Records I for UNDO actions Possible log record types LngRecnrd fields Pdam prevLSN Comm XID type About I I I page End signifies end of update length commit or abort records gf et Compensation Log only 6 org Image Records CLRs a er Image Other Log Related State Transaction Table one entry per active Xact Contains x113 status runningcomrnitedaborted and lastLSN Dirty Page Table One entry per dirty page inbu er pool r TQMitheLSanthe I mcaused the page to be dirty Normal Execution of an Xact Series of reads amp writes followed by commit or abort 39 We will assume that write is atomic on dis In practice additional details to deal With nonratomic Writes Strict 2PL STEAL NCLFORCE buffer management with WriteiAhead Logging Checkpointin g Periodically the DBMS creates a checkpoint in order to minimize the li en to recover in the event of a system crash Write beginicheckpointrecord Lndicates whenchkptbegan eckpoint record Comma currentXuci table and dirty page table This is a fuzzy chec orot39 Other Xacts cootmoe to run so these tables accorate only as of the trme of the begrmcheckpomt record No Hof rc drrty p ges to drsk effectiveness of checkpoint hrmt d by oldes oowrrtteo s rt change to a arrty page ea to perroarcauy ush arrty pages to disli Store LSN of chkpt record in a safe place mum record 13 The Big Picture What s Stored Where LogRewrds Xact Table pxrlEVLSN 3933 P3995 lastLSN each status gggem Wth a length DageLSN Dirty Page Table fset recLSN beforer mage master record anemmage ushedLSN Simple Transaction Abort For now consider an explicit abort of a Xact I No crash involved We want to play back the log in reverse order UNDOing update I Get lastLSN of Xact mm Xact table I Can follow chain of log records backward via the prevLSN eld Before 5 write anAbmilog record For recovering from crash during U39NDOl Abort corit x 96quot or we To perform UNDO must have a lock on data No problem Before restoring old value of a page write a CLR You continue logging while you UN39DO I CLR has one extra eld undonextLSN Pomts to the next LSN to undo h e the prevLSN of the record were currently undoing CLRs never Undone but they might be Redone when repeating history guarantees Atomicity At end of UNDO write an end log record Transaction Commit Write commit record to log All log records up to Xact s lastLSN are ushed I Guarantees that ushedLSN 2 lastLSN Note thatlog ushes are sequential synchronous writes to disk Many log records per log page Commit returns Write end record to log Crash Recovery Big Picture oldest In 3 E m quot0m 5 z Start from a checkpoint found xtlve at Clash v1a master record Smalle z Three phases Need to IECLSN In dirty p 39 7 Figure out which Xacts table at committed since checkpoint my which failed Analysis 7 REDO all actions Last 11th 0 repeat history 7 UN39DO effects of failed Xacts CRASH Recovery The Analysis Phase Reconstruct state at checkpoint via endicheckpoint record Scan log forward from checkpoint End record Remove gtltact from gtltact mble Other records Add gtltact to gtltact mble set lastLSNLSN change Xact status 0 Update record It P not in Dirty Page Table I Add P to D P T set its recLSNLSN Recovery The REDO Phase We repeat Histury to reconstruct state at crash Peapply all updates even ot aborted Xactsz redo ems Scan forward from log rec containing srnalle t recLSN in DPT For each CLR or update log rec LSN REDO the action unl s Attected page is not in the Dirty Page Table or Attected page is inD P T but has recLSN gt LSN or pageLSN in DB ZLSN To REDO an action Peapply logged action SetpageLSN to LSN No additional logging Recovery The UNDO Phase ToUndo l l a lastLSN of a loser Xact Repeal Choose largestLSN among ToUndo I H 39s S CLR and undonextLSNNULL I Write an End record for this Xact I chis LSN is a CLR and undonextISN NULL I Add undonextLSN to ToUndo Else this LSN is an update Undo the update rite a CLR add prevLSN to ToUndo Until TnUndn is empty Example of Recovery LSN LOG 00 begtuheekpomt o5 c ckpomt XactTabte 10 update T1 wntesP5 lfrprevtsws aStLSN 20 update T2 wntes F5 status Dmy Page Tame 3 quot T1 ab rec N 40 CLR Undo ushedLSN 45 T1 End 50 update T3 wntes P1 ToUndo so u date T2 wntes F5 gtltCTltA5H RESTART LOG egtuheakpotnt endgheakpomt 10 update T1 wntes F5 20 update T2 wntes P3 undunextLSN XactTabte 30 T1 abort ESL 4045 CLR Undo T1 LsN 10 T1 End My page Tame 50 update T3 wntes F1 recLSN so update T2 wntes F5 u shedLSN ltCRA5H RE TA T 70 CLR Undo T2 LSN 60 8085 CLR Undo T3 SN 50 T3 end XcRAsH RESTART 90 CLR Undo T2 SN 20 T2 end Dlublse angement Systems3edK Kuruknshmnnndy Gehxke ToUndo Additional Crash Issues What happens if system crashes during Analysis During REDO How do you limit the amount of work in 7 Flush asynchronously in the background Wamh quothotspotsquot How do you limit the amount of work in UNDO Avoid longrrunm39ng Xacts Summary ofLoggmgRecovery Recovery Manager guarantees Atornicity amp Durability Use WAL to allow STEALNOFORCE w o sacrificing correctness LSNs identify log records linked into backwards chains per transaction via prevLSN pageLSN allows comparison of data page and log records Summary Cont Checkpointing A quick way to limit the amount of log to scan on recovery Recovery works in 3 phases Analysis Forward from checkpoint I Redo Forward from oldest recLSN I Undo Backward from end to rst LSN of oldest Xact alive at crash Upon Undo write CLRs Redo repeats history Simpli es the logic The Relational Model Chap ter 3 Why Study the Relational Model 9 Most widely used model Vendors IBM lnformix Microsoft Oracle Sybase etc 9 Legacy systems in older models EG lBM s lMS 9 Recent competitor objectoriented model ObjectStore Versant Ontos A synthesis emerging objeetnrelational model lnfurmlx UnlversalSerVer UniSQL 02 Oracle DEZ Relational Database De nitions 9 Relational database a set of relations 9 Relation made up of2 parts Instance a table with rows and columns Rows cardinality fields degree arity type of eeEn column EL Studentsszd string name string logm string age integer gpa real 9 Can think of a relation as aset of rows or taples ie all rows are distinct Example Instance ofStadents Relation 9 Cardinality 3 degree 5 all rows distinct 9 Do all columns in a relation instance have to be distinct Relational Query Languages 9 A major strength of the relational model supports simple powerful queryng of data 9 Queries can be written intuitivel and the DBMS is responsible for efficient evaluation The key precise semantics for relational queries Allows the optimizer to atensivelyresorder operations and still ensure that the answer does not Chan e The SQL Query Language 9 Developed by IBM system R in the 1970s 9 Need for a standard since it is used by many vendors 9 Standards SQLS sQLrs9 minor revision sQLr92 major revision SQLr99 rnajor extensions current standard The SQL Query Language 9 To find all 18 year old students we can write SELECT Yr sld name logln age gpa FROM gmdentss ass scs 18 34 WHERE Sage18 ee 18 32 ITo find just names and logins replace the rst line SELECT Shame Slogin Querying Multiple Relations 9 What does the following query compute SELECT Snam Edd FROM Students S Enrolled E WHERE SsidEsid AND Egrade Aquot 1 en 39 39 39 of Enrolled is this possible if the EMS ensures referentj l integrity we get Creating Relations in SQL 9 Creates the Students CREATE TABLE 5 re ation Observe that the s CHA udents id ID type domain of each fi ld BTECCHARGDJ is specified and enforced by 1 51 CHARODJ the DBMS whenever tuples age INTEGER gpa REAL are added or mod1f1e 9 As another example th CREATE TABLE E 11 d Enrolled table hold CHAMO me information about courses that students take R41 J Cid CHARZD Destroying anal Altering Relations DROP TABLE Students 9 Destroys the relation Students The schema information and the tuples are deleted ALTERTABLE Students ADD COLUMN firstYear integer 9 The schema of Students is altered by adding a new field every tuple in the current instance is extended with a null value in the new field Adding anal Deleting Tuples 9 Can insert a single tuple using INSERT INTO Students sid name login age gpa VALUES 53688 Smith smithee 15 32 9 Can delete all tuples satisfying some condition eg name Smith DELETE FROM Students S WHERE Snarne Smith Powerful variants of these commands are available more later 11 Integrity Constraints ICs 9 1C condition that must be true for any instance of the database eg domain constraints le are speci ed when schema is defined le are checked when relations are modi ed 9 A legal instance of a relation is one that satisfies all specified ICs MS should not allow illegal instances 9 If the DBMS checks ICs stored data is more faithful to realworld meaning Avoids data entry errors tool Primary Key Constraints t A set of fields is a for a relation if 1 No two distinct tuples can have same Values in all key fields and 2 This is not true for any subset of the key Part2 false A superkey If there s gt1 keyfor a relation one of the keys is chosen by DEA to be the primary key 9 Eg sid is a key for Students What about name The set Sid gpa is a superkey Primary and Candidate Keys in S 9 Possibly many candidate keys specified using UNIQUE one of which is chosen as the primary key a quotFor a given student and course CREATE TABLE Enrolled there is a single gradequot vs Sid CHAR20 tudents can take only one 0 CHANEL course and receive a single grade grade CHAIM for thatcourse further no two PRIMARY KEY sidrcid Students in a Course receive the CREATE TABLE Enrolled equot same Brad sid CHAR as Used carelessly an IC can prevent dd CHARGE s abase instances grade CHAIM I that arise in practice PREMARY KEY Sid UNIQUE cid grad it Foreign Keys Referential Integrity 9 Foreign key Set of fields in one relation that is used to refer to a tuple in another relation Must correspond to primary key of the second relation Like a logical pointer t Eg sid is a foreign key referring to Students Enrolledsid string eid string grade string If all foreign key constraints are enforced referential integriilis achieved ie no dangling references Can you name a data model Wo referential integrity Links in HTML Foreign Keys in SQL 9 Only students listed in the Students relation should be allowed to enroll for courses CREATE TABLE Enrolled sid CHAR20 cid CHARzn grade CHAR2 PRHVLARY KEY si ci FOREIGN KEY sid REFERENCES Students Enrolled 51 Cid 53666 Camaticlm grade 53666 Hl orleS Enforcing Referentiul Integrity 9 Consider Students and Enrolled Sid in Enrolled is a foreign key that references Students 9 What should be done if an Enrolled tuple with a nonexistent student id is inserted Reject it 9 What should be done if a Students tuple is deleted Also delete all Enrolled tuples that refer to it Disallow deletion of a Students tuple that is referred to Set sid in Enrolled tuples that refer to it to a default Sid ln SQL also Set sid in Enrolled tuples that refer to it to a n quot ice e Referentiul Integrity in SQL 5 SQL 92 and SQLtl999 CREATE TABLE Enrolled support all 4 options on sjd CHARM deletes and updates dd CHARM Default is NO ACTION grade CHAR2 deleteupdate is rejected PRIMARY KEY siddd v CASCADE also delete FOREIGN KEY sid all mples that refer to REFERENCES Students deleted mple ON DELETE CASCADE NULL SET DEFAULT 1 UM sets toreign key value of referencing tuple Where do ICs Come From 9 ICs are based upon the semantics of the real world enterprise that is being described in the database relations 9 We can check a database instance to see if an IC is violated but we can NEVER an IC is true by looking at an instance An IC is a statement about all possible instances From assertion that sidis a key is given to us 9 Key and foreign key ICs are the most common more generalle supported too example we know name is not a key but the infer that Logical DB Design ER to quot 7 quot 9 Entity sets to tables Employees CREATE TABLE Employees 1 lot INTEGER PRHVLARY KEY ssn Relationship Sets to Tables e In translating a reIationsnip set to a relation attributes of the relation must include Keys for each participating entity set as foreign keys I This set of attributes forms a superkey for th All descriptive attributes CREATE TABLE WorksIn ss HAR1 did INTEGER since DATE PRIMARY KEY ssn did FOREIGN KEY ssn REFERENCES Employees FOREIGNKEY did REFERENCES Departments o the key to a ages Translation to relational model 14771 Mummy Manyelne1 ManyelneMzny CREATE TABLE Manages ssn CHA 11 391 Map relationship to a m table Note that did is M r 5 now ssn REFERENCES Employees Separate tables for dad REFERENCES Departments Employees and rtments EPA CREATE TABLE DepLMgm 5 Sin each dad INTEGER department h dnarne CHAR2a and Departments Review Participation Constraints 9 Does every department have a manager Lt so this is a garticigation constraint the participation of Departments in Manages is said to be total vspart1 al Every did Value in Departments table must appear in a row of the Manages table with a nonrnull SSn Value Participation Constraints in SQL 9 We can capture participation constraints involving one entity set in a binary relationship but little else without resorting to CHECK constraints CREATE TABLE DeptMgr did INTEGER dname CHARZD budget REAL ssn CHAR11 NOT NULL since DA PRHVLARY KEY did ROREIGN KEY ssn REEERENCEs Employees ON DELETE NO ACTIO Review Weak Entities 9 A weak entity can be identified uniquely only by considering the primary key of another own 7 entity Owner entity set and weak entit l onetomanyretationsnip set 1 owner many weak entities ust have total participation in this identifying relationship set E mplnyevs 4 Translating Weak Entity Sets 9 Weak entity set and identifying relationship set are translated into a single table er entity is deteted an owned weak entities must aIso be delet d CREATE TABLE DepPo1icy pname CHARZD age INTEGER cost REAL ssn CHAR11 NOT NULL PRIMARY KEY pname ssn FOREIGN KEY ssn REFERENCES Employees ON DELETE CASCADE 5 As in C or other PLs attributes are inherited e If we declare A ISA B every A entity is also considered to be a B entity e Overlap constraints Can Joe be an HourlyEmps as well as a ContracLEmps entity Alloweddisallowed e Covering constraints Does every Employees entity also have t n o m s r a C n aCLEmps entity Yesno Translating ISA Hierarchies to Relatio 9 General approach 3 relations Employees HourlyEmps and ContraceEmps HourlyiEmps Every employee is recorde in Employees Eor hourly mp H a 39 J J 39 HourlyiEmps hourlyiwages hoarsiworked m must delete HourlyEmps tuple if referenced Employees tuple is deleted Queries involving all employees easy those involving just Hourlyjmps require a join to get some attributes 9 Alternative Just HourlyiErnps and ContractiErnps HourlyiEm s m name lot hourlyiwages hoarsiworked Each employee must be in one of these two subclasses Review Binary vs Ternary R l t39 h39 ea torts 1ps 9 What are the additional constraints in the 2nd diagram Binary vs Ternary Relationships Con TABL P es The key policyid INTEGER constraints allow m t I gimme ssn CHAR11 NOT NULL Policies and PRIMARY IltEYpolicyid Bme dary Wlth FOREIGN IltEY ssn REFERENCES Employees Depende ONDELETE CASCADE e Panici aBon CREATE TABLE Dependents constraints lead to pname CHARZD NOT NULL age constraints 1 I Policyjd INTEGER 5 Whatl f P110 PRIMARY KEY pname policyid 2 WEB 5 W 55 39 FOREIGN IltEY policyid REFERENCES Policies ONDELETE CASCADE ti Views 9 A view is just a relation but we store a de nitibn rather than a set of tuples CREATE VIEW YoungActiVeStudents name grade AS SELECT name E rade FROM Students S Enrolled E WHERE Ssid Reid and Sagelt21 9 Views can be dropped using the DROP How to handle DROP TABLE if there s a View on the table DROP TABLE command has options to let the user specify this 2 Views and Security 9 Views can be used to present necessary information or a summary While hiding details in underlying relations Given YoungStudentsbut not Students or Enrolled we can find students 5 who have are enrolled but not the id s of the courses they are enrolled in Relational Model Summary a A tabular representation of data a Simple and intuitive currently the mostwidely used a Integrity constraints can be specified by the DEA based on application semantics DBMS checks to violations Two important le primary and tossign keys v In addition we always have domain Constraints a Powerful and natural query languages exist 6 Rules to translate ER to relational model Introduction to Database Systems CIS 4301 Lecture Notes 1102006 What is a Database Collection of related data items that are being stored for recordkeeping amp analysis Could be stored on cards in Rolodex file cabinet computer Computerized databases are managed by a Database Management System DBMS Persistent storage Efficient safe storage of large amounts of data Programming interface Highlevel language for specifying operations user wishes to perform on data Transaction management Concurrent access to data provides recovery in light of failure CIS 4301 Spring 2006 Lecture 1 g Importance of DBMS Amount of electronically available data is exploding Cost of storage is continuously dropping Moore s law every 18 months speed of processorcapacity of disk doubles or price goes down by half Value of data as an organizational asset is widely accepted High demand in industry for powerful flexible data management systems to store data efficiently and get the most out of their large complex data sets eg data warehousing data mining Largest databases Federal Express WalMart Kight Ridder Dialog Tables with 1 billion or more rows Approaching 10 s of TB of data gh ink ofP the consequences of storing this much a a CIS 4301 Spring 2006 Lecture 1 g Brief History of Data Management Early DBMSs late 1960 s evolved from file based processing systems Need for supporting concurrent access to the data by many users recovery backup Roots in airline reservation systems SABRE banking systems corporate recordstorage systems Visualize the data much as it was stored Treebased hierarchical model Graphbased network model Cumbersome to use require programming to access data CIS 4301 Spring 2006 Lecture 1 g Advent of Modern DBMS Early 1970 s Ted Codd invented new data model reationa data model and the concept of data abstraction Soon thereafter team of lBM ers invented SQL Structured Query Language Became defacto standard for query languages based on the relational data model Commercial DBMS based on relational model are now widely accepted in industry eg Microsoft Access Oracle 9i Sybase Adaptive Server gt10 billion dollar industry CIS 4301 Spring 2006 Lecture 1 Characteristics of Modern Database Systems Support for concurrent access to data Safeguard data against accidentally loss Maintain integrity of database in light of changes Support for distributed data Control access to data More recently Support for nonstandard data Support for heterogeneous data Support for decisionsupport and analysis CIS 4301 Spring 2006 Lecture 1 Additional Requirements Increase usage and new applications for databases have resulted in additional requirements since early days High availability High reliability High throughput Low response time Extensible CIS 4301 Spring 2006 Lecture 1 Some Recent Trends DBMS are getting smaller and smaller DBMS that can store GB of data can run on PC Databases are getting bigger and bigger Multiple TBs terabyte 1012 bytes not uncommon Databases also able to store images video audio Database stored on secondary storage devices Use of Tertiary Storage in OLTPs Larger capacity disks but much slower response time 1020 msec vs several sec Tape CD etc usually involves robotic conveyance DBMS Supporting Parallel Computing Speedup query processing through parallelism eg read data from many disks However need special algorithms to partition data correctly CIS 4301 Spring 2006 Lecture 1 Types of DBMS Generalpurpose DBMS Multimedia DBMS Geographic information systems GIS Data warehouse DBMS Realtime DBMS Active DBMS CIS 4301 Spring 2006 Lecture 1 Actors i System Analyst Database Designer Application Programmer Project Manager Database Administrator System Administrator End Users Na39ive end users Sophisticated end users CIS 4301 Spring 2006 Lecture 1 10 When NOT to Use a DBMS Initial investment too high Too much overhead Application is simple welldefined not expected to change Stringent realtime requirements use specialized realtime DBMS Multiuser access to data is not required Alternative collection of files managed by access programs CIS 4301 Spring 2006 Lecture 1 11 g Some Terminology Database DB Collection of related data that exists over a long period of time Database Management System DBMS Collection of programs that allows users to create a new database and specify its structure gives users the ability to query and modify the data efficiently keeps the data secure from accidents or unauthorized use controls the access to the data for many users at once Database System DBS The database and DBMS software together make up what is known as the Database System CIS 4301 Spring 2006 Lecture 1 12 g DBMS Languages Data De nition Language DDL Used to define the conceptual and internal schemas Includes constraint definition language CDL for describing conditions that database instances must satisfy Includes storage definition language SDL to influence layout of physical schema some DBMSs Data Manipulation Language DML Used to describe operations on the instances of a database Procedural DML how vs declarativelDML what eg Relational Algebra eg SQL Note SQL includes a DML and a DDL in one Host Language Generalpurpose programming language which lets users embed DML commands data sublanguage into their code CIS 4301 Spring 2006 Lecture 1 13 Architecture of a DBMS Schema Modi cations Queries Modifications Database System Query Processor 3519326 1 Transaction Subsystem Storage Manager 39 Data Data Definition Metadata CIS 4301 Spring 2006 Lecture 1 Component Overview Data storage incl metadata eg names of relations attributes data types etc Often DBMS maintains an index Helps us find data items quickly given part of their value how Storage manager Handles requests from levels above retrieves data from store and returns it in format requested by queries Query processor Processes not only queries but also requests for modifications etc Figures out best way to retrieve data Transaction subsystem Handles concurrent transactions against database Three types of input at top CIS 4301 Spring 2006 Lecture 1 15 Transactional Requirements aintain database consistency Database restrictions stored as integrity constraints Burden of the userprogrammer to assure that transaction preserves all such constraints Guarantee that transaction is executed as a whole or not at all atomCity guarantee eg either deposit whole amount or no money at all Guarantee that no information is lost durability For multiple transactions running concurrently guarantee that transactions do not interfere With each other Isolation guarantee The effect of multiple concurrent transactions on database should be the same as that of a serial execution of the transactions why Atomicity durability and isolation are guaranteed by transaction subsystem More on transactions later in semester CIS 4301 Spring 2006 Lecture 1 16 5 Database Management Systems Chapter 1 Instructor Raghu Ramakrishnan r ghucswiscedu What Is a DBMS t A very large integrated collection of data 9 Models realworld entemrise Entities eg students courses Relationships eg Madonna is taking C9564 t A Database Management System DBMS is a software package designed to store and manage databases Files vs DBMS 9 Application must stage large datasets e main memory and secondary storage eg buffering pageoriente access 32bit addressing etc 9 Special code for different queries 9 Must protect data from inconsistency due to multiple concurrent users 9 Crash recove 9 Security and access control 9 Why Use a DBMS 9 Data independence and efficient access 9 Reduced application development time 9 Data integrity and security 9 Uniform data administration 9 Concurrent access recovery from crashes Why Study Databases 7 t Shift from comyatation to intormation at the low endquot scramble to webspace a mess at the high en quot sdentific applications 9 Datasets increasing in diversity and volume Digital libraries interactive video Human enome project EOS project need for DBMS aploding t DBMS encompasses most of CS 0s languages theory A I multimedia logic Data Models t A data model is a collection of concepts for describing data 9 A schema is a description of a particular collection of data using the a given data model 9 The relational model at data is the most widely used model toda Main concept relation basically a table with rows and co umns Every elation has a schema which describes the lds fie r columns or Levels of Abstraction Many views single ew smbe haw users see the dat 339 V Conceptual schema de nes logical st mcture V Physical schema describes the les and indexes used Example University Database 9 Conceptual schema Studentssid string name string login string ntegergpa39real age Coursesmd string enamerstring creditsrinteger Enrolle s irstring eidrstring gradestring 9 Physical schema Relations stored as unordered files lnda on rst column of Students 9 External Schema View C014rsejnfomdrstringenrolln1entinteger I An Data Independence 9 Applications insulated from how data is structured and stored 9 Logical data indeyendence Protection from changes in logical structure of data 9 Physical data indeyendence Protection from changes in physical structure of data One of the most important bene ts ofusing a DBMS Concurrency Control 9 Concurrent execution of user programs is essential for good DBMS performance Because disk accesses are frequent and relatively slow it is important to keep the cpu humming by o several user programs concurrently t Interleaving actions of different user programs can lead to inconsistency eg check is cleared while account balance is being computed t DBMS ensures such problems don t arise users can pretend they are using a singleuser system Transaction An Execution ofa DB Frog 9 Key concept is transaction which is an atomic sequence of database actions readswrites 9 Each transaction executed completely must leave the DB in a consistent state if D is consistent when the transaction begins T the data aan the DBMS will entorce these constraints Beyond this th mm 39 semantics of the data eg it does not understand how the interest on a bank account is computed us e s g thata transaction run alone preserves consistency is ultimately the user s responsibi ity Scheduling Concurrent Transactions t DBMS ensures that execution of T1 n is equivalent to some serial execution Tl Tn r L a a lock on the object and waits till the DBMS gives it the lock All locks are released at the end of the transaction Strict ZPL locking protocol ldea If an action of Ti say writing X affects Tj which perhaps reads X one of them say Ti will obtain the lock on x first and Tj is forced to wait until Ti completes this ettectively orders the transactions it Tj already has a lock on Y and Ti later requests a lock on Y Deadlock Ti or Tj is aborted and restarted Ensuring Atomicity 9 DBMS ensures atornicity allornothing property even if system crashes in the middle of a Xact 9 Idea Keep a 13g history of all actions carried out th S while executing a set of Before a change is made to the database the corresponding lo entry is forced to a safe location WAL grotowl39 OS support for this is often inadequate After a crash the effects of partially executed transactions are undone using the log Thanks to WAL if log entry a n quot 39 L 139 change was not applied to database The Log 9 The following actions are recorded in the log Ti writes an object the old value and the new value Lug recsrd must gs tn disldmthe changed pagel TiwrnrnitSaborts a log record indicating this action 9 Log records chained together by Xact id so it s easy to undo a specific Xact eg to resolve a deadlock 9 Log is often daplexed and archived on stable storage 9 All log related activities and in fact all CC related activities such as lockunlock dealing with deadlocks etc are handled transparently by the DBMS Databases make these folks happy 9 End users and DBMS vendors 9 DB application programmers 39 Eg smart webmasters 9 Database administrator DBAZ Designs logical physical schemas W Handles security and authorization Data availability crash recovery Database tuning as needs evolve Must understand how a DBMS we rks Structure of u DBMS mus mummy m 1 A typical DBMS has a layered architecture 1 The ti ure does not and Execuuon Relational Operators variations Summary 9 DBMS used to maintain query large datasets 9 Benefits include recovery from system crashes concurrent access quick application development data integrity and security 9 Levels of abstraction give data independence 9 A DBMS typically has a layered architecture 9 DBAs hold responsible jobs and are wellpaid t DBMS RampD is one of the broadest most exciting areas in CS