DATABASES COP 4710
Popular in Course
MTED 5318, Teaching and Learning with Techonology in the Mathematics Classroom
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
Popular in Computer Programming
This 196 page Class Notes was uploaded by Vito Quigley on Thursday September 17, 2015. The Class Notes belongs to COP 4710 at Florida State University taught by Feifei Li in Fall. Since its upload, it has received 221 views. For similar materials see /class/205475/cop-4710-florida-state-university in Computer Programming at Florida State University.
Reviews for DATABASES
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/17/15
The Entity Relationship Model R ampG Chapter 2 Databases Model the Real World 0 Data Model allows us to translate real world things into structures computers can store 0 Many models Relational ER OO Network Hierarchical etc Relational Rows amp Columns Enrolled sid cid grade 53666 CamaticlOl C 53666 Regga6203 B 53650 Topology112 A 53666 History105 B Keys amp Foreign Keys to link Relations gt4 Students sid name login age gpa 53666 Jones jonescs 18 34 53688 Smith smitheecs 18 32 53650 Smith smithmath 19 38 Steps in Database Design Requirements Analysis user needs what must database do Conceptual Design high level descr often done wER model Logical Design translate ER into DBMS data model Schema Refinement consistency normalization Physical Design indexes disk layout Security Design who accesses what and how Conceptual Design 0 What are the entities and relationsnpsin the enterprise 0 What information about these entities and relationships should we store in the database 0 What are the integrity constraints or business rules that hold 0 A database schema in the ER Model can be represented pictorially ER diagrams Can map an ER diagram into a relational schema ER MOCIEI Basic 0 Employees Entity Realworld object distinguishable from other objects An entity is described using a set of attributes EnttKSet A collection of similar entities Eg all employees All entities in an entity set have the same set of attributes Until we consider hierarchies anyway Each entity set has a key undermed Each attribute has a domain ER Model Basics Contd Employees lt Departments Relationship Association among two or more entities Eg Attishoo works in Pharmacy department relationships can have their own attributes Relationshig Set Collection of similar relationships An nary relationship set R relates n entity sets E E7 each relationship in R involves entities e e E 67 6 En ER Model Basics Cont 9 super subor bUdget visor dinate ReportsTo Departments Same entity set can participate in different relationship sets or in different roles in the same set e 6 budget Key Constraints Departments An employee can work in many departments a dept can have many employees In contrast each dept has at most one manager according to the keg constrant Manyto Ho Many 1to1 on Manages Many Participation Constraints 0 Does every employee work in a department 0 If so this is a garticigation constraint the participation of Employees in WorksIn is said to be total vs partial What if every department has an employee working in it Basically means at least one w Means exactly onequot Weak Entities A weak entitycan be identified uniquely only by considering the primary key of another owner entity Owner entity set and weak entity set must participate in a onetomany relationship set one owner many weak entities Weak entity set must have total participation in this idemWing relationship set Dependents Weak entities have only a partial keyquot clashed underline Binary vs Ternary Relationships lt If each policy is owned by just 1 employee Bad design Key constraint on Policies would mean policy can only cover 1 0 dependent m Think through all the constraints in the 2nd diagram Better desi 5 W Binary vs Ternary Relationships Contd Previous example illustrated a case when two binary relationships were better than one ternary relationship 0 An example in the other direction a ternary relation Contracts relates entity sets Parts Departments and Suppliers and has descriptive attribute qty No combination of binary relationships is an adequate substitute Binary vs Ternary Relationships Contd Departments m Departments S cansupply P D needs P and D dealswith S does not imply that D has agreed to buy P from S How do we record qty Summary so far Entities and Entity Set boxes Relationships and Relationship sets diamonds binary nary Key constraints 111M MM arrows on 1 side Participation constraints bold for Total Weak entities require strong entity for key Next a couple more advanced concepts ISA IS a Hierarchies ozoAs in C or other PLs attributes are Inherited ozolf we declare A ISA B every A entity is also considered to be a B entity Overlap constraints Can Simon be an HourlyEmps as well as a ContractEmps entity Alloweddisallowed Covering constraints Does every Employees entity also have to be an HourlyEmps or a ContractEmps entity Yesno Reasons for using ISA To add descriptive attributes specific to a subclass ie not appropriate for all entities in the superclass To identify entities that participate in a particular relationship ie not all superclass entities palticipate Aggregation Used to model a relationship involving a relationsup set Allows us to treat relationship set as an entity set for purposes of participation in other relationships a Departments Aggregation vs ternary relationship oz Monitors is a distinct relationship with a descriptive attribute oz Also can say that each sponsorship is monitored by at most one employee Conceptual Design Using the ER Model ER modeling can get tricky Design choices Should a concept be modeled as an entity or an attribute Should a concept be modeled as an entity or a relationship Identifying relationships Binary or ternary Aggregation Note constraints of the ER Model A lot of data semantics can and should be captured But some constraints cannot be captured in ER diagrams o We ll refine things in our logical relational design Entity vs Attribute Should address be an attribute of Employees or an entity related to Employees Depends upon how we want to use address information and the semantics of the data 0 If we have several addresses per employee address must be an entity since attributes cannot be setvalued If the structure city street etc is important address must be modeled as an entity since attribute values are atomic Entity vs Attribute Cont 0 WorksIn2 does not 0 l allow an employee to Em lo ees Departments work In a department for two or more periods Similar to the problem of wanting to record several addresses for an employee we want to record several values of 8 I the descri tive attributes lt gt for each instance of this relationship W Entity vs Relationship OK as long as a manager gets a separate discretionary budget dbua get for each I dept What if manager s dbua get covers all a gt managed depts can repeat value but such redundancy is problematic Now you try it Try this at home Courses database 0 Courses Students Teachers 0 Courses have ids titles credits 0 Courses have multiple sections that have time rm and exactly one teacher 0 Must track students course schedules and transcripts including grades semester taken etc Must track which classes a professor has taught Database should work over multiple semesters These things get pretty hairy Many ER diagrams cover entire walls 0 A modest example A Cadastral ER Diagram I Legal 83 02mm m mummy Version 2 April 1999 PBSTwIaIp PLTW5IFSKD3M Dllkbt nisstmmpnmnwm MAW FLSS Desclmh ID 1me ID W Legalmea Descrp ol n PL n sclp ui ID W W430i want I Twin Number museum lmex align muss Twlxl ammo mD lmW sum an Towxi pnacmn semnunnxms m museum nan secolmrvsmuvuane alge umber m m auge u sway mam In Twltlln mpim Sam murmur FlI DIAB cadastral showing or recording property boundaries subdivision lines buildings and related details Source US Dept Interior Bureau of Land Management Federal Geographic Data Committee Cadastral Subcommittee http WWW fairvieWindustries com standardmodule caderdhtm u nuance um Rmmonm E BonerFoIItID L almea mm 39V I 9 pm nuance mm mm lognerlni Paw I Parcel Recon louqu ggglm 17 we x Famlmgammasms Rmmmmmn RiGolm pe pm m RE Sollu D22 ram Remm uoImnlsm DnElCllllE Record aoumrin BURT cucuarcum 93ml gamequot magnum Recon Holmale Salon in Rajquot Recon BONGMID am cemalm la kaoe vane Farcemane WWW one I g Dimmmvalle parcel Local ml at Long clam Le 11 Pamltmsacum parcel ID Tiamm IU Tranamn men I upquot Traua lm lD Resemamh 39mm A Page 1 of Summary of Conceptual Design Conceptual design follows requirements analysis Yields a highlevel description of data to be stored ER model popular for conceptual design Constructs are expressive close to the way people think about their applications Note There are many variations on ER model 0 Both graphically and conceptually Basic constructs entities relationships and attributes of entities and relationships Some additional constructs weak entities 5A hierarchies and aggregation Summary of ER Cont Several kinds of integrity constraints key constrants particpation constraints overlapcovering for ISA hierarchies Some foreign key constraints are also implicit in the definition of a relationship set Many other constraints notably functional dependencies cannot be expressed Constraints play an important role in determining the best database design for an enterprise Security Issues in Database Systems Introduction to DB Security Secrecy Users should not be able to see things they are not supposed to Eg A student can t see other students grades Integrity Users should not be able to modify things they are not supposed to Eg Only instructors can assign grades Availability Users should be able to see and modify things they are allowed to Access Controls A security policy specifies who is authorized to do what A security mechanism allows us to enforce a chosen security policy Two main mechanisms at the DBMS level Discretionary access control Mandatory access control Discretionary Access Control 0 Based on the concept of access rights or privileges for objects tables and views and mechanisms for giving users privileges and revoking privileges Creator of a table or a view automatically gets all privileges on it DMBS keeps track of who subsequently gains and loses privileges and ensures that only requests from users who have the necessary privileges at the time the request is issued are allowed GRANT Command l privileges Object users The following privileges can be specified SELECT Can read all columns including those added later via ALTER TABLE command a INSERTcolname Can insert tuples with nonnull or non default values in this column ozoINSERT means same right with respect to all columns a DELETE Can delete tuples REFERENCES colname Can define foreign keys in other tables that refer to this column 0 If a user has a privilege with the GRANT OPTION can pass privilege on to other users with or without passing on the GRANT OPTION 0 Only owner can execute CREATE ALTER and DROP GRANT and REVOKE of Privileges GRANT INSERT SELECT ON Sailors TO Horatio Horatio can query Sailors or insert tuples into it GRANT DELETE ON Sailors TO Yuppy WITH GRANT OPTION Yuppy can delete tuples and also authorize others to do so GRANT UPDATE rating ON Sailors TO Dustin Dustin can update only the rating field of Sailors tuples GRANT SELECT ON ActiveSailors TO Guppy Yuppy This does NOT allow the uppies to query Sailors directly REVOKE When a privilege is revoked from X it is also revoked from all users who got it solelyfrom X GRANTREVOKE on Views 0 If the creator of a view loses the SELECT privilege on an underlying table the view is dropped 0 If the creator of a view loses a privilege held with the grant option on an underlying table she loses the privilege on the view as well so do users who were granted that privilege on the view Views and Security 0 Views can be used to present necessary information or a summary while hiding details in underlying relations Given ActiveSaiIors but not Sailors or Reserves we can find sailors who have a reservation but not the bids of boats that have been reserved 0 Creator of view has a privilege on the view if she has the privilege on all underlying tables Together with GRANTREVOKE commands views are a very powerful access control tool RoleBased Authorization In SQL92 privileges are actually assigned to authorization ids which can denote a single user or a group of users In SQL1999 and in many current systems privileges are assigned to roles Roles can then be granted to users and to other roles Reflects how real organizations work Illustrates how standards often catch up with de factoquot standards embodied in popular systems Security to the Level of a Field 0 Can create a view that only returns one field of one tuple How 0 Then grant access to that view accordingly Allows for arbitralygranularity of control but Clumsy to specify though this can be hidden under a good UI Performance is unacceptable if we need to define fieldgranularity access frequently Too many view creations and lookups InternetOriented Security 0 Key Issues User authentication and trust When DB must be accessed from a secure location password based schemes are usually adequate For access over an external network trust is hard to achieve If someone with Sam s credit card wants to buy from you how can M be sure it is not someone who stole his card How can Sam be sure that the screen for entering his credit card information is indeed yours and not some rogue site spoofing you to steal such information How can he be sure that sensitive information is not sniffed while it is being sent over the network to you Encryption is a technique used to address these issues Encryption Masks data for secure transmission or storage Encryptdata encryption key encrypted data Decryptencrypted data decryption key original data Without decryption key the encrypted data is meaningless gibberish Symmetric Encryption Encryption key decryption key all authorized users know decryption key a wea ness DES used since 1977 has 56 bit key AES has 128bit optionally 192bit or 256 bit key PublicKey Encryption Each user has two keys User s public encryption key Known to all Decryption key Known only to this user Used in RSA scheme Turing Award R SA PublicKey Encryption Let the data be an integer I Choose a large gtgt I integer n p q p q are large say 1024bit distinct prime numbers Encryption Choose a random number 1 lt e lt L that is relatively prime to p1 q1 Encrypted data S I e mod L Decryption key d Chosen so that d e 1 mod p1q1 We can then show that I S d mod L It turns out that the roles of e and d can be reversed so they are simply called the public e L and private keys d L Certifying Servers SSL SET 0 If Amazon distributes their public key Sam39s browser will encrypt his order using it So only Amazon can decipher the order since no one else has Amazon s private key 0 But how can Sam or his browser know that the public key for Amazon is genuine The SSL protocol covers this Amazon contracts with say Verisign to issue a certificate ltVerisign Amazon amazoncom publickeygt This certificate is stored in encrypted form encrypted with Verisign s private key known only to Verisign Verisign s public key is known to all browsers which can therefore decrypt the certificate and obtain Amazon s public key and be confident that it is genuine The browser then generates a temporary session key encodes it using Amazon s public key and sends it to Amazon All subsequent msgs between the browser and Amazon are encoded using symmetric encryption eg DES which is more efficient than publickey encryption o What if Sam doesn39t trust Amazon with his credit card information Secure Electronic Transaction protocol 3way communication between Amazon Sam and a trusted server eg Visa Authenticating Users Amazon can simply use password authentication ie ask Sam to log into his Amazon account Done after SSL is used to establish a session key so that the transmission of the password is secure Amazon is still at risk if Sam s card is stolen and his password is hacked Business risk Digital Signatures Sam encrypts the order using his private key then encrypts the result using Amazon s public key Amazon decrypts the msg with their private key and then decrypts the result using Sam s public key which yields the original order Exploits interchangeability of publicprivate keys for encryptiondecryption Now no one can forge Sam s order and Sam cannot claim that someone else forged the order Public key digital signature schemes 1 Insecure Channel I KeyGen gtSK PK I SK Verm PK 6 gt valid Signm SK gt c Statistical DB Security Statistical DB Contains information about individuals but allows only aggregate queries eg average age rather than Joe s age New problem It may be possible to infer some secret information Eg If I know Joe is the oldest sailor I can ask How many sailors are older than Xquot for different values of X until I get the answer 1 this allows me to infer Joe s age Idea Insist that each query must involve at least N rows for some N Will this work No Why Minimum N is Not Enough By asking How many sailors older than X until the system rejects the query can identify a set of N sailors including Joe that are older than X let X55 at this point 0 Next ask What is the sum of ages of sailors older than X Let result be S1 0 Next ask What is sum of ages of sailors other than Joe who are older than X plus my age Let result be 2 5152 is Joe s age Summary Three main security objectives secrecy integrity availability DB admin is responsible for overall security Designs security policy maintains an audit trail or history of users accesses to DB Two main approaches to DBMS security discretionary and mandatory access control Discretionary control based on notion of privileges Mandatory control based on notion of security classes 0 Statistical DBs try to protect individual data by supporting only aggregate queries but often individual information can be inferred SQL The Quew Language Part 1 RampG Chapter 5 Relational Query Languages A major strength of the relational model supports simple powerful querying of data Two sublanguages DDL Data Definition Language define and modify schema at all 3 levels DML Data Manipulation Language Queries can be written intuitively The DBMS is responsible for efficient evaluation The key precise semantics for relational queries Allows the optimizer to reorderchange operations and ensure that the answer does not Change Internal cost model drives use of indexes and choice of access paths and physical operators The SQL Query Language c The most widely used relational query language Current standard is SQL1999 0 Not fully supported yet Introduced ObjectRelational concepts and lots more Many of which were pioneered in Postgres here at Berkeley SQL200x is in draft SQL92 is a basic subset Most systems support a medium MySQL has some unique aspects 0 as do most systems XML supportintegration is the next challenge for SQL DDL Create Table CREATE TABLE tabename columnname datatype EFAULT de awsiexpr gammacmsrrain rabiewnsrrainr Data Types include charactern fixedlength character string character varyingn variablelength character string smallint integer bigint numeric real double precision date time timestamp serial unique ID for indexing and cross reference Create Table wcolumn constraints CREATE TABLE tabename columnname datatype DEFAULT de auLexpr columnconstraint tebfeEw e rei f Column Constraints CONSTRAINT constraint name NOT NULL NULL UNIQUE PRIMARY KEY CHECK expression REFERENCES re abe refooumn ON DELETE action ON UPDATE action action is one of NO ACTION CASCADE SET NULL SET DEFAULT expression for column constraint must produce a boolean result and reference the related column s value only Create Table wtable constraints CREATE TABLE tabename columnname datatype DEFAULT de auLexpr columnconstraint tableconstraint Table Constraints CONSTRAINT constraint name UNIQUE columnname PRIMARY KEY columnname CHECK expression FOREIGN KEY columnname REFERENCES re abe refcolumn ON DELETE action ON UPDATE action Here expressions keys etc can include multiple columns Create Table Examples CREATE TABLE films code CHAR5 PRIMARY KEY title VARCHAR40 did DECIMAL3 dateprod DATE kind VARCHARlO CONSTRAINT production UNIQUEdateprod FOREIGN KEY did REFERENCES distributors ON DELETE NO ACTION CREATE TABLE distributors did DECIMAL3 PRIMARY KEY name VARCHAR40 CONSTRAINT con1 CHECK did gt 100 AND name ltgt The SQL DML Singletable queries are straightforward To find all 18 year old students we can write SELECT 7quot FROM Students S WHERE Sage18 sid name login age gpa 53666 Jones jonescs 18 34 53688 Smith smithee 18 32 To find just names and logins replace the first line SELECT Sname Slogin SELECT sname Querying Multiple Relations 0 Can specify a join over two tables as follows Ec139d FROM Students S Enrolled E WHERE SsidEsid AND Egrade B39 sid cid grade Sid name 1081 age 8193 53831 CarnatiolOl C 53666 Jones jonescs 18 34 53831 Regga6203 B 53688 Smith srnithee 18 32 53650 Topology112 A 53666 HistorleS B Note obviously no referential integrity result Sname E Ciod constraints have Jones HIStorymS been used here SELECT DISTINCT targetlist FROM relationlist BaSiC Query WHERE qualification relationIN A list of relation names possibly with a rangevariable after each name arget lLst A list of attributes of tables in relationIN quali cation Comparisons combined using AND OR and NOT Comparisons are Attr 0p const or Attr1 0p Attr2 where 00 is one of lt gt9 9 g 2 DISTINCT optional keyword indicating that the answer should not contain duplicates In SQL SELECT the default is that duplicates are n0teliminated Result is called a multiset Query Semantics Semantics of an SQL query are defined in terms of the following conceptual evaluation strategy 1 do FROM clause compute crossgroa uctof tables eg Students and Enrolled 2 do WHERE clause Check conditions discard tuples that fail called selection 3 do SELECT clause Delete unwanted fields called projection 4 If DISTINCT specified eliminate duplicate rows Probably the least efficient way to compute a query An optimizer will find more efficient strategies to get the same answer Step 1 Cross Product Ssid Sname Slogin Sage Sgpa Esid Ecid Egrade 53666 Jones jonescs 18 34 53831 CarnatiolOl C 53666 Jones jonescs 18 34 53832 Reggae203 B 53666 Jones jonescs 18 34 53650 Topology112 A 53666 Jones jonescs 18 34 53666 History105 B 53688 Smith smithee 18 32 53831 CarnatiolOl C 53688 Smith smithee 18 32 5 3831 Reggae203 B 53688 Smith smithee 18 32 53650 Topology112 A 53688 Smith smithee 18 32 53666 History105 B SELECT Sname Ecid FROM Students S Enr011ed E WHERE SsidEsid AND Egrade B39 Step 2 Discard tuples that fail predicate WHERE S S39idE Ssid Sname Slogin Sage Sgpa Esid Ecid Egrade 53666 Jones jonescs 18 34 53831 CamaticlOl C 53666 Jones jonescs 18 34 53832 Regga6203 0 53666 Jones ionescs 18 34 53650 Topologv112 A 666 Jones jonescs 18 34 lt53666 History105 53688 Smith smithee 18 32 53831 Carnat10101 C 53688 Smith smithee 18 32 53831 Regga6203 9 53688 Smith smithee 18 32 53650 Topology112 53688 Smith smithee 18 32 53666 History105 0 SELECT Sname Ecid FROM Students S Enr011ed E sid AND Egrade B39 Step 3 Discard Unwanted Columns Ssid Sname Slogin Sage Sgpa Esid Ecid 53666 Jones jonescs 18 34 53831 CamaticlOl 53666 Jones jonescs 18 34 53832 Regga6203 53666 Jones i0nescs 18 34 53650 Topologv112 663 Jones jonescs 18 34 lt53666History105 53688 Smith smithee 18 32 5 3831 CarnatiolOl 53688 Smith smithee 18 32 53831 Regga6203 53688 Smith smithee 18 32 53650 Topology112 53688 Smith smithee 18 32 53666 History105 SELECT Sname Ecid FROM Students S Enro11ed E WHERE SsidEsid AND Egrade B39 Reserves Now the Details We will use these Sailors instances of relations in our examples Question If the key for the Reserves relation contained only the attributes Sid and bio how would the semantics differ Boats Em 1ch 22 101 95 103 101096 111296 811211116 rating age Dustin Lubber Bob 7 450 8 555 3 635 bn arne color 101 102 103 104 Interlake Interlake Clipper Marine blue red green red Example Schemas CREATE TABLE Sa ors S39id INTEGER PRIMARY KEY sname CHARCZOratTng INTEGERage REAL CREATE TABLE Boats bid INTEGER PRIMARY KEY bname CHAR 20 co1or CHAR10 CREATE TABLE Reserves C sid INTEGER REFERENCES Sai1ors bid INTEGER day DATE PRIMARY KEY sid bid day FOREIGN KEY bid REFERENCES Boats Another Join Query SELECT sname FROM Sailors Reserves WHERE SailorssidReservessid AND bid103 Sid sname rating age Sid bid day 22 dustin 7 450 22 101 101096 22 dustin 7 450 58 103 111296 31 lubber 8 555 22 101 10 10 96 31 lubber 8 55 5 58 103 11 12 96 95 Bob 3 635 22 101 101096 95 Bob 3 635 95 103 11 1296 Some Notes on Range Variables Can associate range variables with the tables in the FROM clause saves writing makes queries easier to understand Needed when ambiguity could arise for example if same table used multiple times in same FROM called a selfjoin SELECT Shame FROM SailorsReserves WHERE SailorssidReservessid AND bid103 Can be SELECT Ssname rewritten using FROM Sailors S Reserves R range variables as WHERE SsidRsid AND bid103 More Notes 0 Here s an example where range variables are requwedSeroH1exanu ex SELECT xsname xage ysname yage FROM Sailors x Sailors y WHERE xage gt yage Note that target list can be replaced by if you don t want to do a projection SELECT FROM Sailors x WHERE xage gt 20 Find sailors who ve reserved at least one boat SELECT SS39id FROM Sailors S Rese WHERE SS39idRS39id rves R Would adding DISTINCT to this query make a difference 0 What is the effect of replacing 5570 by 55name in the SELECT clause Would adding DISTINCT to this variant of make a difference the query Expressions Can use arithmetic expressions in SELECT clause plus other operations we ll discuss later 0 Use AS to provide column names SELECT Sage Sage 5 AS agel 2Sage AS age2 FROM Sailors S WHERE Ssname Dustin Can also have expressions in WHERE clause SELECT Slsname AS namel SZsname AS name2 FROM Sailors Sl Sailors SZ WHERE 2Slrating SZrating 1 String operations oSQL also supports some string operations o LIKE is used for string matching SELECT Sage Sage 5 AS agel 2Sage FROM Saiiors S WHERE ssname LIKE Bb AS age2 stands for any one character and stands for O or more arbitrary characters Find sid s of sailors who ve reserved a red g a green boat 0 UNION Can be used to compute the union of any two unioncompatbe sets of tuples which are themselves the result of SQL queries SELECT RS39id FROM Boats BReserves R WHERE Rb39idBb39id AND Bcoior red 0R Bc010r green Js SELECT Rsid FROM Boats B Reserves R WHERE RbidBbid AND Bcoior red UNION SELECT Rsid FROM Boats B Reserves R WHERE RbidBbid AND Bcoior green Find sid s of sailors who ve reserved a red and a green boat 0 If we simply replace OR by AND in the previous query we get the wrong answer Why 0 Instead could use a selfjoin SELECT R1 S39id FROM Boats Bl Reserves R1 Boats 32 Reserves R2 WHERE R1sidR2sid AND R1bidBlbid AND R2bid32bid AND Blco1or red AND BZco1or green AND Continued Key field INTERSECTdiscussed in v book Can be used to SELECT 51 d compute the intersection FROM Sa39l l ors S Boats B of any two union Res e r39ves R compatib esets 039 WHERE SS39idR s139d tuples AND Rb1dBb1d AND Bcolor red 0 Also in text EXCEPT sometimes called MINUS INTERS ECT Included in the SQL92 SELECT S 539i d standard but many FROM Sailors S Boats B systemsdon39t support Reserves R them WHERE SS39idRS39id AND RbidBbid AND Bcolor green Find sid s of sailors who ve reserved a red but did not reserve a green boat SELECT Ssid FROM Sai1ors S Boats B Reserves R WHERE ssidRsid AND RbidBbid AND Bco1or red EXCEPT SELECT Ssid FROM Sai1ors S Boats B Reserves R WHERE ssidRsid AND RbidBbid AND BCo1or green Nested Queries 0 Powerful feature of SQL WHERE clause can itself contain an SQL query Actually so can FROM and HAVING clauses Names of sailors who ve reserved boat 103 SELECT Ssname FROM Sailors S WHERE ssid IN SELECT Rsid FROM Reserves R WHERE Rbid103 To find sailors who ve notreserved 103 use NOT IN 0 To understand semantics of nested queries think of a nested ooQs evaluation For each Sailors tupe check the quali cation by computing the subquey Nested Queries with Correlation Find names of sailors who ve reserved boat 103 SELECT S Shame FROM Sailors WHERE EXISTS SELECT FROM Reserv R WHERE Rbid103 AND SsidRsid EXISTS is another set comparison operator like IN Can also specify NOT EXISTS If UNIQUE is used and is replaced by Rbid finds sailors with at most one reservation for boat 103 UNIQUE checks for duplicate tuples in a subquery Subquery must be recomputed for each Sailors tuple Think of subquery as a function call that runs a query More on SetComparison Operators We ve already seen IN EXISTS and UNIQUE Can also use NOT IN NOT EXISTS and NOT UNIQUE Also available opANY opALL Find sailors whose rating is greater than that of some sailor called Horatio SELECT 7quot FROM Sailors S WHERE Srating gt ANY SELECT SZrating FROM Sailors SZ WHERE SZsname Horatio Rewriting INTERSECT Queries Using IN Find sid s of sailors who ve reserved both a red and a green boat SELECT Rsid FROM Boats B Reserves R WHERE RbidBbid AND Bcoior red AND Rsid IN SELECT R2sic FROM Boats 52 Reserves R2 WHERE R2bic32bid AND BZcoior green Similarly EXCEPT queries rewritten using NOT IN How would you change this to find names not sids of Sailors who ve reserved both red and green boats Division in SQL Find sailors who ve reserved all boats SELECT SSr1ame FROM Sailors S Sailors S such that WHERE NOT EXISTS SELECT Bbid FROM Boats B WHERE NOT EXISTS SELECT Rbid a Reserves triple showing 8 reserved B FROM ReserVeS R WHERE RbidBbid AND RSidSSid there is no boat B without Basic SQL Queries Summary An advantage of the relational model is its well defined query semantics SQL provides functionality close to that of the basic relational model some differences in duplicate handling null values set operators etc Typically many ways to write a query the system is responsible for figuring a fast way to actually execute a query regardless of how it is written Lots more functionality beyond these basic features Will be covered in subsequent lectures Minimum Cover of F Minimal Cover for a Set of FDs Minimal cover G for a set of FDs F Closure of F closure of G Right hand side of each FD in G is a single attribute If we modify G by deleting an FD or by deleting attributes from an FD in G the closure changes Intuitively every FD in G is needed and as small as possible in order to get the same closure as F eg A gt B ABCD E EF gt GH ACDF gt EG has the following minimal cover A gt B ACD gtE EF gtG and EF gtH MC implies LosslessJoin Dep Pres Decomp Start with MC of F do the decomposition from last slide Functional dependencies Our goal given a set of FD set F find an alternative FD set G that is smaller equivalent Bad news Testing FG F 3 is computationally expensive Good news Canonical Cover algorithm given a set of FD F finds minimal FD set equivalent to F Minimal can39t find another equivalent FD set w fewer FD s Canonical Cover Algorithm 1V Il2 F A 9 BC B 9 CE Fc F A 9 E 0N0 G that is equivalent AC9 H to F and is smaller than Fc D 9 B Determines canonical cover of F Fc A 9 BH B 9 CE D 9 B Another example F A 9 BC B a C A 9 B AB9 C AC 9 D FcA9BD CC B 9C Algorithm Canonical Cover Algorithm Basic Algorithm ALGORITHM CanonicalCover X FD set BEGIN REPEAT UNTIL STABLE 1 Where possible apply UNION rule A s axioms e g A 9BC A CD becomes A BCD 2 remove extraneous attributes from each FD e g AB C A B becomes A B B C ie A is extraneous in AB C Extraneous Attributes 1 Extraneous is RHS eg can we replace A 9 BC with A9C ie Is B extraneous in A 9BC 2 Extraneous in LHS eg can we replace AB 9C with A 9 C ie Is B extraneous in AB9C Simple but expensive test 1 Replace A 9 BC or AB 9 C with A 9 CinF F2 F A 9BC U A 9C 0139 F AB9C U A 9C 2 Test if F2 F if yes then B extraneous Extraneous Attributes A RHS Is B extraneous in A 9BC step 1 F2 F A 9BC U A 9C step 2 F F2 To simplify step 2 observe that F2 Q F Why Have effectively removed A B from F ie not new FD s in F2 When is F F2 Ans When A B in F2 Idea if F2 includes A B and A C then it includes A9 BC Extraneous Attributes B LHS Is B extraneous in A B C step 1 F2 F AB 9C U A 9C step 2 F F2 To simplify step 2 observe that F Q F2 Why A C implies AB C therefore all FD s in F also in F2 ie there may be new FD s in F2 But AB C does not imply A C When is F F2 Ans When A C in F Idea if F includes A C then it will include all the FD s of F2 Extraneous attributes A RHS Given F A9BC B C is C extraneous in A 9BC why or why not Ans yes because A C in A B B9C Proof 1 A9 B 2 B 9C 3 A C transitivity using Armstrong s axioms Extra neous attributes B LHS Given F A9B AB C is B extraneous in AB 9C why or why not Ans yes because A C in F Proof 1 A9 B 2 AB 9C 3 A C using pseudotransitivity on 1 and 2 Actually we have AA C but A A A Canonical Cover Algorithm ALGORITHM CanonicalCover F set of FD s BEGIN REPEAT UNTIL STABLE 1 Where possible apply UNION rule A s axioms 2 Remove all extraneous attributes a Test if B extraneous in A9 BC B extraneous if A B in F A BC U A9C b Test if B extraneous in AB C B extraneous in AB C if A C in F Canonical Cover Algorithm Example determine the canonical cover of F A 9BC B CE A E Iteration 1 a F A BCE B CE b Must check for upto 5 extraneous attributes B extraneous in A BCE No C extraneous in A 9 BCE yes A C in A BE B CE 1 A BE gt 2 A B gt 3 A CE gt 4 A 9 C E extraneous in A BE Canonical Cover Algorithm Example determine the canonical cover of F A 9BC B CE A E Iteration 1 a F A BCE B CE b Must check for upto 5 extraneous attributes B extraneous in A BCE No C extraneous in A 9 BCE Yes E extraneous in A BE 1 A B gt 2 A CE gt A E E extraneous in B CE No C extraneous in B CE No Iteration 2 a FA9BB9CE b Extraneous attributes C extraneous in B 9 CE No E extraneous in B 9CE No DONE Canonical Cover Algorithm Find the canonical cover of F A 9 BC B 9 CE A 9 E AC 9 H D 9 B Ans Fc A9BH B9 CE D9B SQL The Query Language Part 3 R ampG Chapter 5 Sorting the Results of a Query ORDER BY column ASC DESC SELECT Srating Ssname Sage FROM Sailors S Boats B Reserves R WHERE SsidRsid AND RbidBbid AND Bc010r red ORDER BY Srat139ng Ssname Can order by any column in SELECT list including expressions or aggs SELECT Ssid COUNT AS redrescnt FROM Sailors S Boats B Reserves R WHERE SsidRsid AND RbidBbid AND Bc010r red GROUP BY Ssid ORDER BY redrescnt DESC Views repeat from last class CREATE VIEW viewname AS selectstatement Makes development simpler Often used for security Not instantiated makes updates tricky CREATE VIEW Reds AS SELECT Bbid COUNT AS scount FROM Boats B Reserves R WHERE RbidBbid AND Bc010r red GROUP BY Bbid Views Instead of Relations in Queries CREATE VIEW Reds AS SELECT Bbid COUNT AS scount FROM Boats B Reserves R WHERE RbidBbid AND Bc010r red GROUP BY Bbid bid scount 102 1 Reds SELECT bname scount FROM Reds R Boats B WHERE RbidBbid AND scount lt 10 Discretionary Access Control GRANT privileges ON object TO users WITH GRANT OPTION Object can be a Table or a View Privileges can be 0 Select Insert Delete 0 References cols allow to create a foreign key that references the specified columns o All Can later be REVOKEd Users can be single users or groups See Chapter 17 for more details Two more important topics Constraints 0 SQL embedded in other languages Integrity Constraints Review 0 An IC describes conditions that every legal instance of a relation must satisfy Insertsdeletesupdates that violate IC s are disallowed Can be used to ensure application semantics eg sio is a key or prevent inconsistencies eg sname has to be a string age must be lt 200 o T mes ofIC Domain constraints primary key constraints foreign key constraints general constraints Domain constrants Field values must be of right type Always enforced Primay key and foreign key constraints you know them CREATE TABLE Sailors sid INTEGER General Constraints sname CHAR10 rating INTEGER age REAL Useful when PRIMARY KEY sid more general ICs CHECK rating gt 1 Fhan keys are AND rating lt 10 Involved CREATE TABLE Reserves Can use queries sname CHAR10 to 3 03quot bid INTEGER constraint Checked on insert day DATE or update PRIMARY KEY b1dday Constraints can CONSTRAINT noInterlakeRes be named CHECK Interlake ltgt SELECT Bbname FROM Boats B WHERE Bbidbid Constraints Over Multiple Relations CREATE TABLE Sailors Awkward and wrong Only checks sailors Only required to hold if the associated table is nonempty ASSERTION is the right solution not associated with either table Unfortunately not supported in many DBMS Triggers are another solution sid INTEGER sname CHAR10 rating INTEGER age REAL 39PRIMARY KEY sid CHECK Number of bouts plus number of sailors is lt 100 SELECT COUNT Ssid FROM Sailors S SELECT COUNT Bbid FROM Boats B lt 100 CREATE ASSERTION smallClub CHECK SELECT COUNT Ssid FROM Sailors S SELECT COUNT Bbid FROM Boats B lt 100 Writing Applications with SQL 0 SQL is not a general purpose programming language Tailored for data retrieval and manipulation Relatively easy to optimize and parallelize Can t write entire apps in SQL alone Options Make the query language turing complete Avoids the impedance mismatchquot but loses advantages of relational lang simplicity Allow SQL to be embedded in regular programming languages Q What needs to be solved to make the latter approach work Embedded SQL DBMS vendors usually provide host language bindings Eg for C or COBOL Allow SQL statements to be called from within a program Typically you preprocess your programs Preprocessor generates calls to a proprietary DB connectivity library General pattern One call to connectto the right database Iogin etc SQL statements can refer to host variables from the language Typically vendorspecific We won t look at any in detail we ll look at standard stuff Problem SQL relations are multisets no a prior bound on the number of records No such data structure in C SQL supports a mechanism called a cursorto handle this Just to give you a flavor EXEC SQL SELECT Ssname Sage INTO csnamecage FROM Sa11ors S WHERE Ssid csid Cursors Can declare a cursor on a relation or query Can opena cursor Can repeatedly fetcha tuple moving the cursor Special return value when all tuples have been retrieved ORDER BY allows control over the order in which tuples are returned Fields in ORDER BY clause must also appear in SELECT clause Can also modify delete tuple pointed to by a cursor A nonreationaquot way to get a handle to a particular tuple There39s an Embedded SQL syntax for cursors DECLARE ltcursornamegt CURSOR FOR ltseect stmtgt FETCH FROM ltcursornamegt INTO ltvariabe namesgt But we ll use JDBC instead Database APIs alternative to embedding Rather than modify compiler add a library with database calls API special proceduresobjects passes SQL strings from language presents result sets in a languagefriendly way ODBCa CC standard started on Windows JDBCa Java equivalent Most scripting languages have similar things 0 Eg For Perl there is DBI oraPerl other packages Mostly DBMSneutral at least try to hide distinctions across different DBMSs Architecture Application I ODBC driver I Data source I A lookup service maps data source names DSNs to drivers Typically handled by OS 0 Based on the DSN used a driver is linked into the app at runtime o The driver traps calls translates them into DBMSspecific code 0 Database can be across a network ODBC is standard so the same program can be used in theory to access multiple database systems 0 Data source may not even be an SQL database ODBCJDBC Various vendors provide drivers MS bundles a bunch into Windows Vendors like DataDirect and OpenLink sell drivers for multiple OSes 0 Drivers for various data sources Relational DBMSs Oracle DBZ SQL Server Informix etc Desktop DBMSs Access Dbase Paradox FoxPro etc Spreadsheets MS Excel Lotus 1 23 etc Delimited text files CSV TXT etc 0 You can use JDBCODBC clients over many data sources Eg MS Query comes with many versions of MS Office msqry32exe Can write your own Java or C programs against xDBC JDBC Part of Java very easy to use 0 Java comes with a JDBCtoODBC bridge So JDBC code can talk to any ODBC data source Eg look in your Windows Control Panel for ODBC drivers JDBC tutorial online httpdeveoperjavasuncomdeveloperBooksJDBC Tutorial JDBC Basics Connections 0 A Connection is an object representing a login to a database GET CONNECTION Connection con try con DriverManagergetConnection quotjdbczodbczsailorsDBquot userNamepassword catchException e Systemoutprintlne Eventually you close the connection CLOSE CONNECTION try conclose catch Exception e Systemoutprintlne JDBC Basics Statements 0 You need a Statement object for each SQL statement CREATE STATEMENT Statement stmt try stmt concreateStatement catch Exception e Systemoutprintne Soon we ll say stmtexecuteQueryC select quot CreateStatement cursor behavior a Two optional args to createStatement createStatementResultSetltTYPEgt ResultSetltCONCURgt Corresponds to SQL cursor features ltTYPEgt is one of TYPEFORWARDONLY can t move cursor backward TYPESCROLLINSENSlTIVE can move backward but doesn t show results of any updates TYPESCROLLSENSlTIVE can move backward will show updates from this statement ltCONCURgt is one of CONCURREADON LY this statement doesn t allow updates CONCURUPDATABLE this statement allows updates Defaults TYPEFORWARDONLY and CONCURREADONLY JDBC Basics ResultSet A ResultSet object serves as a cursorfor the statement39s results stmtexecuteQuery EXECUTE QUERY ResultSet results try results stmtexecuteQuery quotselect from Sailorsquot catch Exception e Systemoutprintlne Obvious handy methods resultsnext advances cursor to next tuple Returns false when the cursor slides off the table beginning or end scrollable cursors resultsprevious resultsrelativeint resultsabsoluteint resultsfirst resultsast resultsbeforeFirst resultsafterLast ResultSet Metadata Can find out stuff about the ResultSet schema via ResultSetMetaData ResultSetMetaData rsmd resultsgetMetaData int numCols rsmdgetColumnCount int i rowcount 0 get column header info for i1 i lt numCols i if i gt 1 bufappendquotquot bufappendrsmdgetColuanabeli bufappendquotnquot Other ResultSetMetaData methods getCqumnTypei isNuIIabIei etc Getting Values in Current of Cursor getString break it off at 100 rows max while ampamp rowcount lt lOO Loop through each column getting the resultsnext column data and displaying for i1 i lt numCols i if i gt 1 bufappendquotquot bufappendresultsgetStringi bufappendquotnquot rowcount Similarly getFloat getInt etc Updating Current of Cursor Update fields in current of cursor resultnext resultupdatelntquotRatingquot 10 Also updateString updateFloat etc Or can always submit a full SQL UPDATE statement Via executeQuery The original statement must have been CONCURUPDATABLE in either case Cleaning up Neatly try CLOSE RESULT SET resultsclose CLOSE STATEMENT stmtclose CLOSE CONNECTION conclose catch Exception e Systemoutprintlne Putting it Together wo trycatch Connection con DriverManagergetConnectionquotjdbcodbcweblogquotuserNamepas sword Statement stmt concreatestatement ResultSet results stmtexecuteQueryquotselect from Sailorsquot ResultSetMetaData rsmd resultsgetMetaData int numCols rsmdgetColumnCount i StringBuffer buf new StringBuffer while resultsnext ampamp rowcount lt 100 for i1 i lt numCols i if i gt 1 bufappendquotquot bufappendresultsgetStringi bufappendquotnquot resultsclose stmtclose conclose Similar deal for web scripting langs Common scenario today is to have a web client A web form issues a query to the DB Results formatted as HTML Many web scripting languages used jsp asp PHP etc we ll use JSP in our class most of these are similar look a lot like jdbc with HTML mixed in Eg PHPPostgres ltphp conn pgpconnectquotdbhamecowbook userjmh passwordsecretquot if c0hh echo quotAn error occuredhquot exit result pgquery conn quotSELECT FROM Sailorsquot if result echo quotAn error occuredhquot exit pghumrowsresult for iO i lt hum i pgfetchrowresult i for jO j lt countr j echo quotrjamphbspquot echo quotltBRgtquot gt COP4710 Introduction to Database Systems Fall Semester 2009 Prof Fefe Li What Is a Database sttem Database a very large integrated collection of data 0 Models a realworld entergrise Entities eg teams games Relationships eg The Patriots is playing in The Superbowl More recently also includes active components eg business logicquot 0 A Database Management sttem DBMS is a software system designed to store manage and facilitate access to databases Z I Is the WWW a DBMS Fairly sophisticated search available crawler indexes pages on the web Keywordbased search for pages 0 But currently data is mostly unstructured and untyped search only can t modify the data can t get summaries complex combinations of data few guarantees provided for freshness of data consistency across data items fault tolerance Web sites eg ecommerce typically have a DBMS in the background to provide these functions 0 The picture is changing New standards like XML can help data modeling Research groups are working on providing some of this functionality across multiple web sites The WWWDB boundary is blurring Is a File System a DBMS Thought Experiment 1 You and your project partner are editing the same file You both save it at the same time Whose changes survive A Yours B Partner s C Both D Neither E oThought Experiment 2 HOW F10 You Write programs over a You re updating a file subsystem when it The power goes out promises you only quotquot Which of your changes survive A Very very carefully A All B None CA Since last save D Why Study Databases 0 Shift from comgutation to information always true for corporate computing Web made this point for personal computing more and more true for scientific computing 0 Need for DBMS has exploded in the last years Corporate retail swipeclickstreams customer relationship mgmt supply chain mgmt data warehousesquot etc Scientific digital libraries Human Genome project NASA Mission to Planet Earth physical sensors grid physics network DBMS encompasses much of CS in a practical discipline OS languages theory AI multimedia logic Yet traditional focus on realworld apps What s the intellectual content representing information data modeling languages and systems for querying data complex queries with real semantics over massive data sets concurrency control for data manipulation controlling concurrent access ensuring transactional semantics reliable data storage maintain data semantics even if you pull the plug semantics the meaning or relationship of meanings of a sign or set of signs About the course Workload 0 Projects with a real worldquot focus Build a web based ecommerce application with MySQL Apache Tomcat amp JSP SQL JSP Java Other homework assignments o Exams 1 Midterm amp 1 Final 0 Projects to be done in groups of 2 Pick your partners ASAP About the Course Administrivia htt wwwcsfsuedu lifeifei cc 4710 Lecture Tuesday Thursday 335 450pm Love 0301 Prof Office Hours 269 Love M 1011 W 1011 TA Kun Hou houcsfsuedu Email check web page About the Course Administrivia Textbook Ramakrishnan and Gehrke 3rd Edition see course website Grading handin policies etc are in syllabus o Cheating policy zero tolerance We have the technology 0 Team Projects Teams of 2 if one drops the other one finishes it up Peer evaluations Be honest Feedback is important Rest of Today 0 A free tastingquot of things to come in this class data modeling query languages DBMSs Application development and web data management Database security and privacy 0 Today39s lecture is from Chapter 1 in RampG OS Support for Data Management 0 Data can be stored in RAM this is what every programming language offers RAM is fast and random access Isn t this heaven 0 Every OS includes a File System manages les on a magnetic disk allows open reaa seek Close on a file allows protections to be set on a file drawbacks relative to RAM Database Management Systems 0 What more could we want than a file system Simple efficient ad hoc1 queries concurrency control recovery benefits of good data modeling SMOP2 Not really as we ll see this semester in fact the OS often gets in the way 1ad hoc formed or used for speci c or immediate problems or needs 2SMOP Small Matter Of Programming Describing Data Data Models 0 A data model is a collection of concepts for describing data 0 A schema is a description of a particular collection of data using a given data model 0 The relational model of data is the most widely used model today Main concept relation basically a table with rows and columns Every relation has a schema which describes the columns or fields Levels of Abstraction Views describe how users see the data 0 Conceptual schema 7 o defines logical structure V1 W 1 V1 W 2 V1 W 3 Physical schema describes the files and indexes used IConceptual Schemal IPthical Schemal Example University Database 0 Conceptual schema Studentssid string name string 0gin string age integer gparea C0ursescid string enamestring creditsinteger Enroledsidstring Cidstring gradestring 0 Physical schema Relations stored as unordered files Index on first column of Students 0 External Schema View C0urseinf0cidstrngenroimentinteger Concurrency Control Concurrent execution of user programs key to good DBMS performance Disk accesses frequent pretty slow Keep the CPU working on several programs concurrently o Interleaving actions of different programs trouble eg account transfer amp print statement at same time o DBMS ensures such problems don39t arise Usersprogrammers can pretend they are using a singleuser system called Isolation Thank goodness Don t have to program very very ca refully Structure of a DBMS A typical DBMS has a layered architecture The figure does not show the concurrency control and recovery components Each system has its own variations The book shows a somewhat more detailed version These layers must consider concurrency control and recovery Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management Advantages of a DBMS Data independence Efficient data access Data integrity amp security Data administration Concurrent access crash recovery Reduced application development time So why not use them always Expensivecomplicated to set up amp maintain This cost amp complexity must be offset by need Generalpurpose not suited for specialpurpose tasks eg text search Databases make these folks happy DBMS vendors programmers Oracle IBM MS Sybase SUN 0 End users in many fields Business education science 0 DB application programmers Build enterprise applications on top of DBMSs Build web services that run off DBMSs Database administrators DBAs Design logicalphysical schemas Handle security and authorization Data availability crash recovery Database tuning as needs evolve must understand how a DBMS works Summary DBMS used to maintain query large datasets can manipulate data and exploit semantics Other benefits include recovery from system crashes concurrent access quick application development data integrity and security Levels of abstraction provide data independence Key when dappdt ltlt dpatformdt In this course we will explore How to be a sophisticated user of DBMS technology ER to Relational Mapping Logical DB Design ER to Relational o Entity sets to tables 0 Employees ssn name lot 123223666 Attishoo 48 231315368 Smiley 22 131243650 Smethurst 35 CREATE TABLE Employees ssn CHAR11 name CHAR20 lot INTEGER PRIMARY KEY ssn Relationship Sets to Tables 0 In translating a manyto many relationship set to a relation attributes of the relation must include Keys for each participating entity set as foreign keys This set of attributes forms a superkey for the relation All descriptive attributes CREATE TABLE W0rks1n ssn CHAR1 did INTEGER since DATE PRIMARY KEY ssn did FOREIGN IltEY ssn REFERENCES Employees FOREIGN IltEY did REFERENCES Departments ssn did since 123223666 51 1191 123223666 56 3393 231315368 51 2292 Review Key Constraints 0 Each dept has at most one manager according to the kez constraint on Manages Translation to relational model 1to1 1to Many Manyto1 ManytoMany Translating ER Diagrams with Key Constraints Map relationship set to a table Note that did is the key now Separate tables for Employees and Departments Since each department has a unique manager we could instead combine Manages and Departments CREATE TABLE Manages ssn CHAR11 did INTEGER since DATE PRIMARY KEY did FOREIGN KEY ssn REFERENCES Employees FOREIGN KEY did REFERENCES Departments CREATE TABLE DeptMgr did INTEGER dname CHAR20 budget REAL ssn CHAR11 since DATE PRIMARY KEY did FOREIGN KEY ssn REFERENCES Employees Review Participation Constraints 0 Does every department have a manager If so this is a partCbaton constraint the participation of Departments in Manages is said to be fatal vs partial Every d a value in Departments table must appear in a row of the Manages table with a nonnull ssn value w PartICIpatIon Constraints In SQL 0 We can capture participation constraints involving one entity set in a binary relationship but little else without resorting to CHECK constraints CREATE TABLE DeptMgr did INTEGER dname CHARQO budget REAL ssn CHAR11 NOT NULL since DATE PRIMARY KEY did FOREIGN IltEY ssn REFERENCES Employees ON DELETE NO ACTION Review Weak Entities A weak entitycan be identified uniquely only by considering the primary key of another owner entity Owner entity set and weak entity set must participate in a onetomany relationship set 1 owner many weak entities Weak entity set must have total participation in this identifying relationship set Dependents Translating Weak EntIty Sets 0 Weak entity set and identifying relationship set are translated into a single table When the owner entity is deleted all owned weak entities must also be deleted CREATE TABLE DepPolicy pname CHARQO age INTEGER COst REAL ssn CHAR11 NOT NULL PRIMARY KEY pname ssn FOREIGN KEY ssn REFERENCES Employees ON DELETE CASCADE Review ISA Hierarchies ozoAs in C or other PLs attributes are inherited C t t E ozolf we declare A ISA B every A Mi Iml entity is also considered to be a B entity Overlap constraints Can Joe be an HourlyEmps as well as a ContractEmps entity Alloweddisallowed Covering constraints Does every Employees entity also have to be an HourlyEmps or a ContractEmps entity Yesno Translating ISA Hierarchies to Relations 0 General approach 3 relations Employees HourlyEmps and ContractEmps o HourlyEmps Every employee is recorded in Employees For hourly emps extra info recorded in HourlyEmps 70ury wages hours worked m must delete HourlyEmps tuple if referenced Employees tuple is deleted Queries involving all employees easy those involving just HourlyEmps require a join to get some attributes 0 Alternative Just HourlyEmps and ContractEmps HourlyEmps name lot h0ury wages hours worked Each employee must be in one of these two subclasses ER to Relations ER diagram Relational schema eg accountbname acctno bal 1 an lt3 a1 a C1 Ck b1 b R1 gig bl C1 quot39a Ck More on relationships What about lt3 an C1 Ck b1 bm Could have R1 gig bl C1 Ck put b1 as the key for R1 it is also the key for E2b1 bn Usual strategy ignore R1 Add a1 c1 ck to E2 instead ie E2Ql bn a1 c1 ck More Ewan c1ck uubm W E1 a1 an E2Q1 R1 glblclquot39l Ck E1 a1 an b1 c1 ck E2 21 bm Treat as ml or 1m ER to Relational Weak entity sets E1 g an y 3911 I bm im an 31 bm ER to Relational a1 Method 1 E Q1 an 51 g1 b1 bm 2 Q1 C1 quot39I Ck Method 2 1 g1 an b1 bm 2 g1 an c1 ck Q When is method 2 not possible ER to Relational o Aggregation E1 R1 E2 E3 as before The Relational Model Review Why use a DBMS OS provides RAM and disk Review Why use a DBMS OS provides RAM and disk Concurrency Recovery Abstraction Data Independence Query Languages Efficiency for most tasks Security Data Integrity Data Models Q o DBMS models real world 0 Data Model is link between user39s view of the world and bits stored in computer Many madels eXiSt StudentsidStudentssid39 string name 0 We will concentrate on the string login string age integer gparreal Relational Model Why Study the Relational Model 0 Most widely used model Vendors IBM Microsoft Oracle Sybase etc 0 Legacy systemsquot in older models eg IBM s IMS Objectoriented concepts have recently merged in objectrea ona model IBM DBZ Oracle 9i IBM Informix Will touch on this toward the end of the semester Relational Database Definitions Relational database a set of relations 0 Relation made up of 2 parts Instance a table with rows and columns rows cam inaity Schema specifies name of relation plus name and type of each column Eg Studentsidl string name string logn string age integer gpa real elds degree arty 0 Can think of a relation as a setof rows or tuples ie all rows are distinct Example Instance of Students Relation sid name login age gpa 53666 Jones jonescs 18 34 53688 Smith smitheecs 18 32 53650 Smith smithmath 19 38 Cardinality 3 arity 5 all rows distinct 0 Do all values in each column of a relation instance have to be distinct SQL A language for Relational DBs SQL standard language 0 Data Definition Language DDL create modify delete relations specify constraints administer users security etc 0 Data Manipulation Language DML Specify queries to find tuples that satisfy criteria add modify remove tuples SQL Overview CREATE TABLE ltnamegt ltf1e1dgt ltdoma1ngt m INSERT INTO ltnamegt ltf1e1d namesgt VALUES ltfie1d va1uesgt DELETE FROM ltnamegt WHERE ltcond1tiongt UPDATE ltnamegt SET ltf1e1d namegt ltva1uegt WHERE ltcond1tiongt SELECT ltf1e1dsgt FROM ltnamegt WHERE ltcond1tiongt Creating Relations in SQL Creates the Students relation CREATE TABLE Students I sid CHARCZOD NOte the type doma39n Of each name CHARCZO field is specified and enforced by 1091 n CH AR 10 the DBMS age INTEGER whenever tuples are added or gpa FLOAT modified Another example the Enrolled table holds information about CREATE TABLE Enroned courses students take sid CHARCZO cid CHARCZO grade CHAR2 Adding and Deleting Tuples Can insert a single tuple using INSERT INTO Students Sid name login age gpa VALUES 53688 Smith smithee 18 32 Can delete all tuples satisfying some condition eg name Smith DELETE FROM Students S WHERE Sname Smith Powerful variants of these commands are available more later Keys 0 Keys are a way to associate tuples in different relations Keys are one form of integrity constraint IC Enrolled sid cid grade 53666 CarnatiolOl C 53666 Regga6203 B 53650 T0p010gy112 A 53666 History105 B gt4 Students sid name login age gpa 53666 Jones jonescs 18 34 53688 Smith smitheecs 18 32 53650 Smith smithmath 19 38 Primary Keys A set of fields is a sugerkey if No two distinct tuples can have same values in all key fields A set of fields is a keyfor a relation if It is a superkey No subset of the fields is a superkey gt1 key for a relation one of the keys is chosen by DBA to be the prmay key Eg sidis a key for Students What about name The set 503 gpa is a superkey Primary and Candidate Keys in SQL 0 Possibly many candidate kegs specified using UNIQUE one of which is chosen as the primary key CREATE TABLE Enrolled For agIven student and course Sm CHARCZO there IS a Single grade cid CHARCZO VS grade CHAR2 PRIMARY KEY 39 39 Students can take only one 51039 mm course and receive a single grade for that course further no two CREATE TABLE Enrolled 51d CHARCZO students In a course receive the cid CHARCZO same gradequot grade CHARCZ Used carelessly an IC can prevent 5325 the storage of database instances that should arise in practice Foreign Keys A Foreign Key is a field whose values are keys in another relation Enrolled Sid Cid grade Students 53666 CarnaticlOl C 51d name logm age gpa 53666 Reggaezm B 53666 Jones OI1 SCS 18 34 53650 Topologyllz A gtlt53688 Smith smitheecs 18 32 53666 History105 B 53650 Smith smithmath 19 38 Foreign Keys Referential Integrity LieMy Set of fields in one relation that is used to refer to a tuple in another relation Must correspond to primary key of the second relation Like a Iogical pointer Eg sid is a foreign key referring to Students Enrolled5dE string CdE string grade string If all foreign key constraints are enforced referential millis achieved ie no dangling references Foreign Keys in SQL 0 Only students listed in the Students relation should be allowed to enroll for courses CREATE TABLE Enrolled sid CHARCZO cid CHARCZO grade CHARCZ PRIMARY KEY sidcid FOREIGN KEY sid REFERENCES Students Enrolled Sid Cid grade 53666 CamaticlOl C 53666 Regga6203 B 53650 T0p010gy112 A 53666 History105 B gtlt Students sid name login age gpa 53666 Jones jonescs 18 34 53688 Smith smitheecs 18 32 53650 Smith smithmath 19 38 Integrity Constraints ICs IC condition that must be true for anyinstance of the database eg domain constraints ICs are specified when schema is defined ICs are checked when relations are modified A legal instance of a relation is one that satisfies all specified ICs DBMS should not allow illegal instances If the DBMS checks ICs stored data is more faithful to realworld meaning Avoids data entry errors too Where do ICs Come From ICs are based upon the semantics of the realworld that is being described in the database relations 0 We can check a database instance to see if an IC is violated but we can NEVER infer that an IC is true by looking at an instance An IC is a statement about a possible instances From example we know name is not a key but the assertion that sidis a key is given to us Key and foreign key ICs are the most common more general ICs supported too B opologyl 12 A 105 Enforcing Referential Integrity Remember Students and Enrolled sid in Enrolled is a foreign key that references Students What should be done if an Enrolled tuple with a non existent student id is inserted Reject it What should be done if a Students tuple is deleted Also delete all Enrolled tuples that refer to it Disallow deletion of a Students tuple that is referred to Set sid in Enrolled tuples that refer to it to a default 50 In SQL also Set sid in Enrolled tuples that refer to it to a special value null denoting unknown or lnappicabef Similar if primary key of Students tuple is updated Relational Query Languages 0 A major strength of the relational model supports simple powerful querying of data Queries can be written intuitively and the DBMS is responsible for efficient evaluation The key precise semantics for relational queries Allows the optimizer to extensively reorder operations and still ensure that the answer does notchange The SQL Query Language c The most widely used relational query language Current std is SQL99 SQL92 is a basic subset To find all 18 year old students we can write SELECT 7quot FROM Students S WHERE Sage18 To find just names and logins replace the first line Sid name login age 8Pa 53666 53688 Jones Sn l jonescs smithee 18 18 34 32 SELECT Sname Slogin Querying Multiple Relations 0 What does the following query compute SELECT Sname Ecid FROM Students S Enrolled E WHERE SS39idES39id AND Egrade39A39 Given the following instance of sid cid grade Enrolled 53831 CarnaticlOl C 53831 RegganOS B 53650 Topology112 A 53666 HistorleS B we get Sname ECld Smith Topologyl 12 Semantics of a Query 0 A concegtual evaluation method for the previous query 1 do FROM clause compute crossproductof Students and Enrolled 2 do WHERE clause Check conditions discard tuples that fail 3 do SELECT clause Delete unwanted fields 0 Remember this is conceptual Actual evaluation will be much more efficient but must produce the same answers Crossproduct of Students and Enrolled Instances Ssid Sname Slogin Sage Sgpa Esid Ecid Egrade 53666 Jones jonescs 18 34 53 831 Carnaticl 01 C 53666 Jones jonescs 18 34 53 832 Reggae203 B 53666 Jones jonescs 18 34 53650 Topology112 A 53666 Jones jonescs 18 34 53 666 History105 B 53688 Smith smithee 18 32 53 831 Carnaticl 01 C 53688 Smith smithee 18 32 53 831 Reggae203 B 53688 Smith smithee 18 32 53650 Topology112 A 53688 Smith smithee 18 32 53 666 History105 B 5 3 65 0 Smith smithmath 19 3 8 53 83 1 Carnaticl 01 C 53650 Smith smithmath 19 3 8 53 831 Reggae203 B 53650 Smith smithmath 19 3 8 53650 Topologyl 12 A 53650 Smith smithmath 19 3 8 53 666 History105 B Relational Model Summary A tabular representation of data Simple and intuitive currently the most widely used Objectrelational variant gaining ground XML support being added Integrity constraints can be specified by the DBA based on application semantics DBMS checks for violations Two important ICs primary and foreign keys In addition we always have domain constraints Powerful and natural query languages exist Functional Dependencies RampG Chapter 19 Review Database Design 0 Requirements Analysis user needs what must database do 0 Conceptual Design high level descr often done wER model 0 Logical Design translate ER into DBMS data model 0 Schema Re nement consistencynormalization Physical Design indexes disk layout 0 Security Design who accesses what The Evils of Redundancy Redundancyis at the root of several problems associated with relational schemas redundant storage insertdeleteupa ate anomalies Integrity constraints in particular functional dependencies can be used to identify schemas with such problems and to suggest refinements Main refinement technique decomgosition replacing ABCD with say AB and BCD or ACD and ABD Decomposition should be used judiciously Is there reason to decompose a relation What problems if any does the decomposition cause Functional Dependencies FDs A functional dependency X gt Y holds over relation schema R if for every allowable instance rof R implies 7rYtJ 7rYt2 where t and t2 are tupeSX and Yare sets of attributes In other words X gt Y means Given any two tuples in r if the X values are the same then the Y values must also be the same but not vice versa 0 Can read gt as determines FD s Continued An FD is a statement about all allowable relations Must be identified based on semantics of application Given some instance r of R we can check if r violates some FD If but we cannot determine if f holds over R Question How related to keys if K gt all attributes of R then K is a superkey for R does not require K to be minimal FDs are a generalization of keys Example Constraints on Entity Set 0 Consider relation obtained from HourlyEmps HourlyEmps name 0t rating wageper hr hr5per Wk 0 We sometimes denote a relation schema by listing the attributes eg SNLRWH This is really the setof attributes SNLRWH Sometimes we refer to the set of all attributes of a relation by using the relation name eg HourlyEmpsquot for SNLRWH What are some FDs on HourlyEmps 557 is the key S gt SNLRWH rating determines wageper hr R gt W 0tdetermines lot L gt L trivialquot dependnency Problems Due to R gt W S 123223666 231315368 131243650 434263751 612674134 Ugdate anomaly her rating or we get it wrong N L Attishoo 48 Smiley 22 Smethurst 3 5 Guldu 3 5 Madayan 3 5 R 8 8 5 5 8 W 10 10 10 H 40 30 30 32 40 Hour1yEmps Can we modify W in only the lst tuple of SNLRWH Insertion anomaly What if we want to insert an employee and don t know the hourly wage for his or Deletion anomaly If we delete all employees with rating 5 we lose the information about the wage for rating 5 Detecting Reduncancy S 123223666 231315368 131243650 434263751 612674134 N Attishoo Smiley Smethurst Guldu Madayan L 48 22 35 35 35 R W H 10 4O 10 3O 3O 32 10 4O OOUIUIOOOO H0ur1yEmps Q Why was R gt W problematic but S gtW not Decomposing a Relation Redundancy can be removed by chopping the relation into pieces FD s are used to drive this process R gt W is causing the problems so decompose SNLRWH into what relations 8 N L R H 123223666 Attishoo 48 8 40 II R W 231315368 Smiley 22 8 3o 8 10 131243650 Smethurst 35 5 3o 5 7 434263751 Guldu 35 5 32 612674134 Madayan 35 8 4o Wages H0ur1yEmp52 Refining an ER Diagram lst diagram becomes WorkersSNLDSi DepartmentsDMB Lots associated with workers Suppose all workers in a dept are assigned the same lot D gt L Redundancy fixed by After WorkersZSNDSi DeptLotsDL DepartmentsDMB Can finetune this WorkersZSNDSi Empmyees t DepartmentsDMBL Departments Reasoning About FDs Given some FDs we can usually infer additional FDs title gt stud390 star implies title gt studio and title gt star title gt studio and title gt star implies title gt stud390 star title gt stud390 stud390 gt star implies title gt star But tite star gt stud390 does NOT necessarily imply that title gt studio or that star gt stud390 An FD fis imglied bya set of PBS F if f holds whenever all PBS in F hold F closure of F is the set of all FDs that are implied by F includes trivial dependencies Rules of Inference Armstrong s Axioms X Y Z are its of attributes Re exVim If X 2 Y then X gt Y Augmentation If X gt Y then XZ gt YZ for any Z Transitwry If X gt Y and Y gt Z then X gt Z 0 These are sound and complete inference rules for FDs ie using AA you can compute all the FDs in F and only these FDs Some additional rules that follow from AA Union If X gt Y and X gt Z then X gt YZ Decomposition If X gt YZ then X gt Y and X gt Z Example Contracts si ji di pi qtn value and C is the key C gt CSJDPQV Proj purchases each part using single contract JP gt C Dept purchases at most 1 part from a supplier SD gt P Problem Prove that SDJ is a key for Contracts JP gt C C gt CSJDPQV imply JP gt CSJDPQV by transitivity shows that JP is a key SD gt P implies SDJ gt JP by augmentation SDJ gt JP JP gt CSJDPQV imply SDJ gt CSJDPQV by transitivity thus SDJ is a key Q can you now infer that SD CSDPQV ie drop J on both sides No FD inference is not like arithmetic multiplication Attribute Closure Computing the closure of a set of PBS can be expensive Size of closure is exponential in attrs Typically we just want to check if a given FD X gt Yis in the closure of a set of PBS F An efficient check Compute attribute Closure of X denoted X wrt F X Set of all attributes A such that X gt A is in F X X 0 Repeat until no change if there is an fd U gt V in F such that U is in X then add V to X Check if Y is in X Approach can also be used to find the keys of a relation 0 If all attributes of R are in the closure of X then X is a superkey for R c Q How to check if X is a candidate keyquot Attribute Closure example R A B c D E FB CDD EB gtAE gtCIAD B Is B gt E in F B B B BCD B BCDA B BCDAE Yes and B is a key for R too Is D a key for R D D D DE 39 D DEC quotNope Is AD a key for R AD AD AD ABD and B is a key so Yes Is AD a candidate key for R A A D DEC AD not keys so Yes Is ADE a candidate key for R No AD is a key so ADE is a superkey but not a cand key
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'