Introduction to Database Systems
Introduction to Database Systems COMPSCI 186
Popular in Course
Popular in ComputerScienence
This 9 page Class Notes was uploaded by Mr. Hayley Barton on Thursday October 22, 2015. The Class Notes belongs to COMPSCI 186 at University of California - Berkeley taught by Staff in Fall. Since its upload, it has received 27 views. For similar materials see /class/226662/compsci-186-university-of-california-berkeley in ComputerScienence at University of California - Berkeley.
Reviews for Introduction to Database Systems
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/22/15
Relational Calculus CS 186 Fall 2005 RampG Chapter 4 We will occasionally use this arrow notation unless there is danger of no confusion Ronald Graham Elemenfs omesey Theory Tuple Relational Calculus o Queghas the form T p 7 p77 denotes a formula in which tuple variable Tappears 0 Answer is the set of all tuples T for which the fonnulap T evaluates to hue Formula is recursively defined ozostart with simple atomc formulas get tuples from relations or make comparisons of values ozobuild bigger and better formulas using the logcal connec 39les Relational Calculus mm A n celaamaLcamhlstCJ Calculus has variabes constants comparison ups Iagica 39 nd quanti es 7 EC Variables range over i e get bound to Lipes Likes e DRC Variables range over daman elements field values Like QueryrByrExample QBE 7 Born TRC and DRC are simple subseE of tirstorder logic We ll focus on m nere Expressions in the calculus are called Iamiuas swer tuple is an assignment of constants to variables that make the formula evaluate to true ETRC Formulas 0 An Atomic fonmlla is one of le following R 6 Re Ra 0p 17 Ra 0p constant 0p is one of 2232 0 A fonmlla can be an atomic formula ti slapWqu where p and q are formulas 3RPR Where variable R is a tuple variable VRpR where variable R is a tuple variable 3 Free and Bound Variables The use of quantifiers Xande in a formula is said to Iu39adXin the formula A variable that is not bound is froe Let us revisit the definition of a query TI 77 There is an important restriction idle variable Tthat appars to the left of 39 must be the onyfree variable in the formula 7 in other words all other tuple variables must be bound using a quanti er Selection and Projection o Flnd all sailors with rating above 7 S l S eSuilors A Smuting gt 7 Modify this quew in answer Find sailors who are older than 18 or have a rating under 9 and are called Bob39 0 Find names and ages of sailors with rating above 7 S l 351 eSuilorsSlmtinggt 7 A smzme Slsmzme A Sage Sluge Note 5 is a tuple variable of 2 elds ie S is a projection of Salols onlv 2 fields are ever mentioned and 5i5 never used to range over anv relations in tne querv iiJoms Find sailors rated gt 7 who ve reserved boat 103 S Se Sailors A Srating gt 7 A ERRe Reserves A Rsid Ssid A Rbid 103 Note the use of to find a tuple in Reserves that joms with39 the Sailors tuple uncle consideration Joins continued S Se Sailors A Srating gt 7 A ERRe Reserves A Rsid Ssid A EBBe Boats A Bbid Rbid A Bcolor red Find sailors rated gt 7 who ve reserved a red boat 0 Observe how the parentheses control the scope of each quantifier s binding 0 ddsmy39 39 quot f lull il39e 1 rom SQL Division makes more sense here Find sailors who ve reserved all boats him use 179 S i Se SailorSA VBe Boats HRE Reserves Ssid Rsid ABMdRM 0 Find all sailors Ssuch thatforall tuples Bin Boats there is a tuple in Reserves showing that sailor Shas Division a trickier example Find sailors who ve reserved all Red boats S Se Sailors A VB 6 Boats Bcolor red gt ERRe Reserves A Ssid Rsid Bb39d Rb39d AlternativelyA 1 1 m S Se Sailors A VB 6 Boats Bcolor red39 v ERRe Reserves A Ssid Rsid A Bbid Rbid a gtb is the same as a vb b T F o If a is true b must rue Ifa is true and b is se the implication a evaluates to false 0 If a is not true we don39t care about b The expression is always true 39 a Unsafe Queries Expressive Power 0 Expressive Po 0 3 syntactically correct calculus queries thathave an infinite number of answers Unsafequenes e39gquot iSl Se Sailors Solution Don39t do that wer Theorem due to Cod even quew that can be exprssed in relational algebra can be expressed as a sal quew in DRC TRC the converse is also ir MW Query language e9 SQL can express every query that is expressible in relational algebracalculus actually SQL is more powerful as we will see E Summary 0 The relational model has rigomusly de ned query languages sImple and powerful o Relational algebra is more operational useful as internal reprsentation for query evaluation plans 0 Relational calculus is nonoperational users de ne queries in terms ofwhat they want not in terms of how to compute it Decara 39ve 0 Several ways of expressing a given q ery a quey op m39zershould choose the most ef cient version 0 Algebra and safe calculus have same explessive power lads to the notion of relational compelena39s Addendum Use of V 0 Vx Px is only true if Px is true for every x in the universe 0 Usually Vx x 6 Boats gt xcolor quotRed 0 2 logical implication a 2 b means that if a is true b must be true abisthesa1neas avb Find sailors who ve reserved all boats S l Se Sailors A VB Be Boats gt 3RRe Reserves A Ssid Rsid A Bbid Rbid 0 Find all sailors Ssuch that for each tuple 8 either it is not a tuple in Boats or there is a tuple in Reserves showing that sailor Shas reserved it S l Se Sailors A VB Be Boats v 3RRe Reserves A Ssid Rsid A Bbid Rbid E reserved all red boats S l Se Sailors A VB Be Boats A Bcolor red gt 3RRe Reserves A Ssid Rsicl A Bbid Rbicl 0 Find all sailors Ssuch that for each tuple 8 either it is not a tuple in Boats or there is a tuple in Reserves showing that sailor 5 has reserved it S l Se Sailors A VB Be Boats v Bcolor red v 3RRe Reserves A Ssid Rsid A Bbid Rbid SQL The Query Language Part 1 CS186 Fall 2005 RampG Chapter 5 Life is just abowl of queries 7Anon not Forrest Gump Relational Query Languages 0 A major strength of the relational model supports simple powerful quelyingof data 0 Two suhlangua es 0 DDL Data De nition Language de ne and modify schema atall 3 levels e Queris can be written intuitively The DBMS is responsible for efficient evaluation The key precise semantics for relational queris Allows the optimizer to reorderchange operations and ensure rst re answer does not change Internal cost model drivs use of indexes and choice 0 accss paths and physical operators EThe SQL Query Language 0 The most widely used relational quew language Current standard is SQL1999 ted yet Introduced ObJectrRelatlonal concepts and lots more 7 Many o w ich were pioneered in Postgres hee at Berkdeyl SQL200x is in draft SQL92 is a basic subset Most systems supporta diu PostgreSQL has some unique aspects XML supportintegration is the next challenge for SQL more on this In a later class a DDL Create Table CREATE TABLE tableiname columniname da ype minim DataTypes PostgreSOL include charactern e fixedelength character string charactervalylngn e variableelength character string smallint integer oigint numeric real double precision date time tirnestarnp seriale unique lD for indexing and cross reference PostgreSOL also allows Ole arrays inheritance rules conformance to the sewage standard is variable so We Won t use these in the prolect Create Table wcolumn constraints CREATE TABLE tableiname I columniname dataitype I DEFAULT defaultiexpr I columnicanstraint I i I I 1 Column Constraints I CONSTRAINT constraintiname OT NULL l NULL l UNIQUE l PRIMARY KEY l CHECK expression REFERENCES reltaple I refcolumn I I ON DELETE action I ON UPDATE action I action is one of NO ACTION CASCADE SET NULL SET DEFAULT expression for column constraint must produce a boolean result and reference the related column s value only E Create Table wtable constraints CREATE TABLE tableiname I columniname dataitype I DEFAULT defaultiexpr I columniconstraint I I I tableiconstraint I I Table Constraints CONSTRAINT constraintiname I IUNIOUE columniname I I I PRIMARY KEY columniname I 1 CHECK expression I FORElCN KEY columniname I I REFERENCES reltaple I refcolumn I I I I ON DELETE action ION UPDATE Here expressions keys etc can include multiple columns Create Table Examples REATE TABLE lms code CHAR5 PRIMARY KEY title VARCHAR40 did DECIMAL3 dateiprod DATE kind VAR CONSTRAINT production UNIQUEdateJrod FOREIGN KEY did REFERENCES distributors ON DELETE NO ACTION CREATE TABLE distributors did DECIMAL3 PRIMARY KEY name VARCHAR40 NSTRAINT con1 CHECK did gt 100 AND name ltgt 39 The SQL DML o Singletable queries are straightfonNard 0 To find all 18 year old students we can write FROM students 5 WHERE Sage18 o To find just names and logins replace the first line SELECT Sname s ogi n 239 Querying Multiple Relations 0 Can Specify a join over two tables as follows SELECT Sname c id FROM students S Enrolled E WHERE SsidEsid AND Egr ade B39 1 18 3 53666 Historyl Note obviously no referential integrity constraints have been used here rsult SELECT DISTINCT targetelist FROM relationelisf BaSIC QUEI Y WHERE qualification relationlist A list of relation names possibly with a rangeraffable after each name bargEtlist A list of attributes of tables in rea 39onlst guai ca 39on Comparisons combined using AN D OR and NOT Comparisons are Atir op const or Attrl op AttrZ where op is one of lt gt DISTINCT Op answer should not contain duplica In SQL SELECT the default is that duplicates are n0teliminated Result is called a multiset Iohalkeyword indicating that the tes Query Semantics o Semantics of an SQL query are defined in terms of the following conceptual evaluation strategy 1 do FROM clause compute crossgroductof tables eg StudenB and Enrolled use Check conditions discard tuples that fail called selection 3 do SELECT clause Delete unwanted fields called projection 4 If DISTINCT specified eliminate duplicate rows 0 Probably the least efficient way to compute a query39 An optimizer will find more efficient strategies to get the same answer E Step 1 Cross Product 39 11sz Sing 83 e m1 Egxzrle 53666 Jones Jonescs 18 34 c aticlUl c 53666 Jones Jonescs 18 eganU B 53666 Jones Jonescs 18 TopologyllZ A 53666 Jones Jonescs 18 rsroryws B 53688 Smith smithee 18 ah 1 c 53688 Smith smithee 18 ga3203 B 53688 Smith smithee 18 TopologyllZ A 53688 Smith smithee 18 HistoryIUS B SELECT Sname FROM students S Enro39l39led E WHERE SsidEsid AND Egr ade B39 Step 2 Discard tuples that fail predicate SELECT Sname Ec139d FROM Students S Enro39l39led WHERE SsidEsid AND Egr ade B39 SELECT Sname Ec39id FROM Students S Enro39l39led E WHERE SsidEsid AND Egrade B39 Reserves 51 Now the Details 22 95 We will use these Smym mm instances of w s e relations in our 22 examples 31 Lubber 8 555 95 Bob 3 635 Qustion If the key 1318 B65 Bouts Lid bname color rea Ion can In only the attributes 101 Ime ake blue 5 bid h 102 Interlake red oul t e 103 Clipper green semantics differ 104 Marine red Example Schemas CREATE TABLE Sa i10l 5 S id INTEGER PRIMARY KEY shame CHAR20rat39ing INTEGERage REAL CREATE TABLE Boats b id INTEGER PRIMARY KEY bname CHAR 20 co39lor CHAR10 PRIMARY KEY s39id bid day FOREIGN KEY b39id REFERENCES Boats ROM Sailors Reserves WHERE Sai1orssidReserVessid 103 Sid shame rating age Sid 22 dustin 7 450 22 22 dustin 7 450 95 31 lubber 8 555 22 31 lubber 8 555 95 95 Bob 3 635 22 95 Bob 3 6315 95 bid 101 1 1 1 1 1 88888 day 101096 111296 101096 111296 101096 111296 E Some Notes on Range Variables 0 Can associate range variablesquot with the tables in the FROM clause saves writing maks queris msier in understand 0 Needed when ambiguity could arise for example ifsame table used multiple tims in same FROM called a selfjoinquot FROM sa39i39lorsReserves WHERE sa39i39lorss39idReservess39id AND b39id103 SELECT shame SELECT S shame rewritten using FROM s 11 s 5 range variabls as WHERE S539dR5 id AND b39id103 395 More Notes required selfjoin example 0 Here s an example where range variables are SELECI39 xsname xage FROM Sai39lor s x Sai39lor s y WHERE xage gt yage ysname yage 0 Note that target list can be replaced by if you don39t want to do a projection me Find sailors who ve reserved at least one boat SELECT Ss id FROM Sai39lor s 5 Reserves R WHERE Ss idRs I 0 Would adding DISTINCT to this query make a difference 0 What is the effect of replacing 55dby Ssname in the SELECT clause Would adding DISTINCT to this variant of the quew make a difference E Expressions 0 Can use arithmetic expressions in SELECT clause plus other operations we ll discuss later 0 Use AS to provide column names SELECT Sage age S AS agel 2Sage AS age2 FR M a39i39lors S WHERE Ssname Dust39in 0 Can also have expressions in WHERE clause SELECT Slsname AS namel SZsname AS name2 FROM Sailors Sl Sailors SZ WHERE 2Slrat39ing SZrat39ing l String operations oSQL also supports some string operations o LIKE is used for string matching SELECT Sage Sage 5 AS agel 2Sage AS age2 FROM Sailors S WHERE Ssname LIKE BJ J39 7 stands for any one character and 39 stands for 0 or more arbitrary characters 39 Find sid39s of sailors who39ve reserved a red g a igreen ca 0 UNION Can be used to compute the union of any bnmmpa 39be seB of tuples which are themselves the result of SQL queries ats BReserves R b39id AND FR WHERE Rb idB Bco39lor red oR Bco39lor green SELECT R5 id F 0M Boats B Reserves R WHERE Rb39idBb39id AND Bco39lor red SELECT R5 id FROM Boats B Reserves R WHERE Rb39idBb39id AND Bco39lor green Find sid39s of sailors who ve reserved a red m a green boat 0 If we simply replace OR by AND in the previous query we get the wrong answer Why 0 Instead could use a selfjoin SELECT Rl5 id FROM Boats Bl Reserves Rl Boats B2 Reserves R2 WHERE Rl5 idR25 id ND Rlb39idBlb39id AND R2b39idBZb39id AND Blco39lor red AND B2co39lor green 3 AND Continued Kegheld book Can be used to SELECT 55139d compute the intersection FROM Sa39l39lors S Boats B 0 any 0 min7 Reserves R rumpaIIbesetsof WHERE 5 51d p39es39 AND Rb39idBb39id Also in text39 EXCEPT AND B39Comm red ometimescalledMINJS INTERSECT Included In theSQL92 SELECT 535 standard butmeny FROM Sa39l39lors S Boats B systems don39tsupport Reserves R em WHERE Ss idRs id s ButpostgreSQL does AND Rb39idBb39id AND Bco39lor green Nested Queries 0 Powerful feature of SQL WHERE clause can itself contain an L query Acmally so can FROM and HAVING clauss N r who39ve mserved bout 103 sname s HERE Ss39id IN SELECT Rs39id F RESEI39VES R WHERE Rb39id103 0 To find sailors who ve notreserved 103 use NOT IN 0 To understand semantits of nested queries think of a nested bogsevaluation Foresclr 537015 zipe check re quaI39 cabbn by compu 39ng re subquey E Nested Queries with Correlation Find names ofsailors who39ve reserved boat 103 SELECT ssname FROM sailors 5 WHERE EXISTS SELECT FROM Reserves R WHERE Rb ld103 AND S5 idR5 ld EXISTS is another set comparison operator like IN Can also specify NOT EXISTS If LNIQUE is used and is replaced by RJIIH finds sailors wit at most one reservation for boat 103 r UNIQLE checks for duplicate tupls in a subquew a More on SetComparison Operators 0 We ve already seen IN EXISTS and UNIQUE Can also use NOT IN NOT EXISTS and NOT UNIQUE 0 Also available up ANY op ALL 0 Find sailors whose rating is greater than that of some sailor called Horatlo FROM sailors 5 WHERE srat39ing gt ANY SELECT 52rat39ing FROM 5 39 ors 52 WHERE sZsname Horat39io mi Mns luple m 1 Think of subquew as a function call that runs a query 39 3 Rewriting INTERSECT Queries Using IN Find sid s of sailors who ve reserved both u red and u green bout Division in SQL SELECT RS ld FROM Boats B Reserves R WHERE Rb ldBb ld AND Bco39lor red AND R5 ld IN SELECT R25 ld Boats 52 Reserves R2 WHERE R2b ldBZb ld AND BZco39lor green 0 Similarly EXCEPT queries rewritten using NOT IN wo Id you change this to nd names not sills of Sailors who ve reserved both red and green ho ts Find sailors who ve reserved all boats SELECT Ssname FROM Sailors S Sailors S such that WHERE NOT EXlSTS SELECT there no bout B iatmt FROM oats B WHERE NOT EXlSTS SELECT 1 Reserves tnple showing S reserved B ROM Reserves R WHERE RbidBbid AND RsidSsid a Basic SQL Queries Summary 0 An advantage of the relational model is its well defined query semantics 0 SQL provides functionality close to that of the basic relational model some differences in duplicate handling null values set ope a or etc 0 Typically many ways to write a query the system is responsible for figuring a fast way to actually execute a query regardless of how it is written 0 Lots more functionality beyond these basic features Will be covered in subsequent lectures C UNT COUNT DISTLNCT A SUM DISTENCT A AVG DISTINCT A MAX A f V Aggregate Operators o Sig cant extension of relational algebra MIN A single column SELECT COUNT FROM Sailors S SELECT AVG Sage FROM Sailors S WHERE Sral1 ng10 SELECT COUNT DISTINCT Srai1 ng ROM Sailors WHERE Ssname Bob COUNT COUNT DISTINCT A SUM DETINCT A AVG DISTINCT A MAX A MN A Aggregate Operators single column SELECT Ssname FROM Sailors S WHERE Srau ng SELECT MAXSZrating FROM Sailors s2 SELECT AVG DISTINCT Sage FROM Sailors S WHERE Srating10 EFind name and age of the oldest sailors o The first query is incorrect 0 Third query equivalent to second query allowed in SQL92 standard but not supported in some systems PostgreSQL seems to run lt SELECT Ssname MAX Sage FROM Sailors S SELECT SsnameSage FROM Sailors S WHERE Sa e SELECT MAX Slage FROM Sailors 82 SELECT SsnameSage FROM Sailors S WHERE SELECT MAX Slage FROM Sailors S2 Sage 39 C GROUP BY and HAVING 0 So far we ve applied aggregate operators to all qualifying tuples metimes we want to apply them to each of several groupsof tupls 0 Consider Find le age af ne youngatsailarfar d1 Ia ngle I In general we don39t know how many rating levels exist and what the rating valus for these levels are Suppose we know that rating values go from 1 to 10 we can write 10 queris that look like this l For i 1 2 10 gage HERE Sraung i