Database Management INFS 614
Popular in Course
Popular in Science
This 45 page Class Notes was uploaded by Hazle Turcotte on Monday September 28, 2015. The Class Notes belongs to INFS 614 at George Mason University taught by Staff in Fall. Since its upload, it has received 33 views. For similar materials see /class/215087/infs-614-george-mason-university in Science at George Mason University.
Reviews for Database Management
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/28/15
Relational Algebra Lecfure 4 INFS614 Fall 08 Relafional Query Languages zo Query languages Allow manipulafion and refrieval of dafa from a dafabase a Relafional model supporfs simple powerful QLs Sfrong formal founda rion based on logic Allows for much opfimizafion a Query Languages programming languages QLs no r expec red To be Turing complefequot QLs no r infended To be used for complex calculations QLs suppor r easy efficient access To large da ra se rs INFS614 Fall 08 Formal Relational Query Languages Two mathematical Query Languages form the basis for real languages eg SQL and for implementation 0 Relational Algebra More operational very useful for representing execution plans 9 Relational Calculus Let users describe what they want rather than how to compute it Nonoperational declarative Understanding Algebra is key to understanding SQL n and query processing IN39F5614Fa1108 3 Algebra Preliminaries ozo A query is applied to relation instances and the result of a query is also a relation instance Schemas of input relations for a query are fixed The schema for the result of a given query is also fixed Determined by definition of query language constructs IN39F5614 Fall 08 4 Example Ins rances oz Sailors and quotReservesquot relations for our examples R1 Q m m 22 101 101096 58 103 111296 IN39F5614 Fall 08 1 sname rating age 22 dustin 7 45 0 31 lubber 8 555 58 rusty 10 3 5 0 52 sname rating age 28 yuppy 9 350 31 lubber 8 555 44 guppy 5 350 58 rusty 10 350 Algebra Opera rions ozo Look wha r we want To ge r from The following Table sname rating age 28 yuppy 9 350 3 1 lubber 8 555 44 guppy 5 350 58 rusty 10 350 IN39F5614 Fall 08 sname rating Pr OJeC l39IOH yuppy 9 D I b h lubber 8 4 e eTes aTTr39i uTes T aT ar e noT in projecTion lisT i0 oz Schema of r esulT conTains exachy The fields in The JI39 projecTion lisT wiTh The same ShameJatmg names ThaT They had in The only one inpuT r elaTion oz ProjecTion oper aTor has To eliminaTe duplicaTes Why NoTe r39eal sysTems Typically don39T do duplicaTe eliminaTion unless The user explicile asks for iT Why noT IN39F5614 Fall 08 7 52 SeleCTlon sname rating age 28 yuppy 9 350 oz SelecTs r ows ThaT 31 lubber 8 555 saTisfy selecTion 44 guppy 5 350 condi on 58 rusty 10 350 oz No duplicaTes in r esulT Why z Schema of r39esulT Gratinggt8S2 idenTical To schema of only one in puT sid sname rating age r39elaTion 28 yuppy 9 350 58 rusty 10 350 IN39F5614 Fall 08 ComposiTion of Opera rions oz Result rela rion can be The input for ano rher rela rional algebra opera rion Operator composition 52 sname rating age 28 yuppy 9 350 31 lubber 8 555 44 guppy 5 350 58 rusty 10 35 0 snamerating0rating gt865 IN39F5614 Fall 08 sname rating yuppy 9 rusty 10 WhaT do we wam To 961 from Two rela rions 51 131 sname rating age 3 1 lojl O96 22 dustin 7 450 58 103 111296 31 lubber 8 555 58 rusty 10 350 Wha r abou r Who reserved boa39l 101 Or Find The name of The sailor who reserved boa r 101 IN39F5614 Fall 08 CrossProduct oz Each row of 51 is paired with each row of R1 55 Result schema has one field per field of 51 and R1 with field names inherited sidl sname rating age sid2 bid day 22 dustin 7 450 22 101 101096 22 dustin 7 450 58 103 111296 31 lubber s 555 22 101 101096 31 lubber s 555 58 103 111296 58 rusty 10 350 22 101 101096 58 rusty 10 350 58 103 111296 H Renaming operator because naming conflict psid gtsid1 S1 x psid gtsid2 R1 IN39F5614 Fall 08 11 Why does this cross product help Query Find the name of the sailor who reserved boat 101 CPpsid gtsid1 S1 x psid gtSid2 R1 Result 7 Snam ltasid1sid2 and bid101CP Note my use of quottemporaryquot relation CP IN39F5614 Fall 08 12 AnoTher example oz Find The name of The sailor having The highesT raTing AllRertingApratmg gtratzngA S2 Result snamewmting ltmtingAS2xAllR WhaT39s in ResulT Does iT answer our query IN39F5614 Fall 08 Union InTersecTion SeT Difference oz All of These operaTions sid sname rating age Take Two ianIT reIaTions 22 dus n 7 45 0 which musT be Ul39liOl39I 31 lubber 8 555 quot quot b39e 58 rusty 10 350 Sccnme number of fields 44 guppy 5 350 orrespondmg flelds 28 yuppy 9 35 0 have The same Type 39 oz WhaT is The schema of SIUSZ resuIT Sid sname rating age 31 lubber 8 555 22 dustin 7 450 58 rusty 10 350 51 52 51052 IN39F5614 Fall 08 Back To our query oz Find The name of The sailor having The highesT raTing AllR ertingApratmg gtratmgA 2 aratingltratingASZXA R Tmp 7 sidsname Result sname nsidsnameS2 Tmp Why noT projecT on sname only for Tmp IN39F5614 Fall 08 RelaTional Algebra Summary oz Basic operaTions SelecTion o SelecTs a subseT of rows from relaTion ProjecTion 7c DeleTes unwanTed columns from relaTion Crossproducf x Allows us To combine Two relaTions SeTdi erence Tuples in reln 1 buT noT in reln 2 Union U Tuples eiTher in reln 1 or in reln 2 or in boTh Rename p Changes names of The aTTribuTes oz Since each operaTion reTurns a relaTion operaTions can be composed Algebra is closedquot oz Use of Temporary relaTions recommended IN39F5614 Fall 08 Natural Join oz NaturalJoin S1 lgtltl R1 sid sname rating age bid day 22 dustin 7 450 101 10 10 96 58 rusty 10 350 103 111296 oz Result schema similar to crossproduct but only one copy of each field which appears in both relations oz Natural Join Cross Product Selection on common fields drop duplicate fields IN39F5614 Fall 08 17 Query revisited using natural join Query Find the name of the sailor who reserved boat 101 Result7r Snameabid101S1 M R1 0r ResultJrSnameS1 lgtlt1 abid101R1 IN39F5614 Fall 08 18 Consider ye r anoTher query ozo Find The sailors sid who reserved all The red boa rs R1 B Q day color 22 101 101096 101 Red 22 103 101196 102 Green 58 102 111296 103 Red IN39F5614 Fall 08 STar r an a r remp r ozo who reserved wha r boa r m SI R1 22 101 sidbid 22 103 58 102 oz All The red boa rs E S2 nbidacolor red 13 101 IN39F5614 Fall 08 Division ozo NoT supporTed as a primiTive operaTor buT useful for expressing queries like Find sailors who have reserved a red boaTs ozo LeT 51 have 2 fields x and y 52 have only field y 5152 ltxgtforall ltygt in S2 there exists ltxygt in 1 l ie 5152 conTains all x Tuples sailors such ThaT for every y Tuple redboaT in 52 There is an xy Tuple in 1 ie x reserved y ozo In general x and y can be any lisTs of fields y is The lisT of fields in 52 and ny is The lisT of fields of 51 IN39F5614 Fall 08 21 Examples of Division AB sno pno 51 pl 51 p2 51 p3 51 p4 52 pl 52 p2 53 p2 54 p2 54 p4 A IN39F5614 Fall 08 22 Find names of sailors who39ve r39eSer39ved boat 103 oz Solution 1 nsname0bid103Reserveslgtlt Sailors ozo Solution 2 Temp 1 abid 1 03 Reserves Temp 2 Templlgtltl Sailors Result 7 Temp 2 sname g Solution 3 Jrsnameabidlo3Reserveslgtltl Sailors IN39F5614 Fall 08 23 Find names of sailors who39ve r eServed a red boat oz Information about boat color only available in Boats so need an extra join 7 snameacolor redBoats lgtltl Reserveslgtltl Sailors oz A more efficient solution 71 snamensid7rbidawlor red Boats lgtlt1 Reslgtlt1 Sailors nA query optimizer can find this given the first solution IN39F5614 Fall 08 Z4 Find sailors who39ve reServed a red or a green boaT zo Can idenTify all red or green boaTs Then find sailors who39ve reserved one of These boaTs T empboats 06 Boats olor 39red39v color 39green39 Result JrsnameTempboats lgtlt1 Reserveslgtlt1 Sailors oz Can also define TempboaTs using union How oz What happens if V is replaced by A in this query IN39F5614 Fall 08 25 Find sailors who39ve reServed a red a green boaT ozo Previous approach won39T work MusT idenTify sailors who39ve reserved red boaTs sailors who39ve reserved green boaTs Then find The inTersecTion noTe ThaT sid is a key for Sailors T B lgtltl R empred nsidac010rred oats eserves T B lgtltl R empgreen sid0colorgreenl oats eserves Result nsnameTempred Tempgreen lgtltl Sailors IN39F5614 Fall 08 Introduction to OLAP Doug Burdick 24 April 2008 OnLine Analytical Processing OnLine Analytical Processing oz Data Warehousing Consolidate data from many sources in one large repository oz OLAP I Complex SQL queries and views I Queries based on spreadsheetstyle operations and multidimensional View of data I Interactive and online queries EXTERNAL DATA SOURCES Data Warehousing t oz Integrated data spanning EXTRACT TRANSFORM longtime periods often LOAD augmented W1th summary REFRESH information oz Several gigabytes to DATA terabytes common Meta ata WAREHOUSE RepOSItory oz Interactive response times expected for complex queries adhoc updates uncommon D AT A MININGQJ I i OLAP Data Model Representation oz Collection of numeric Measures I Eg Sales oz Which depend on a set of Dimensions I E g Product Location and Time Prod Iph Razr Treo Example Prod Time Loc Sales Iphone 4 1 08 Reston 25 Razr 4 1 08 Reston 30 Treo 4 1 08 Reston 8 Iphone 4 2 08 Reston 8 Razr 4 2 08 Reston 20 8 10 10 Treo 4 2 08 Reston 10 30 20 50 Iphone 4 3 08 Reston 15 Razr 4 3 08 Reston 50 25 8 15 Treo 4 3 08 Reston 10 LOC Iphone 4 1 08 Fairfax 37 41 42 43 Time Slice L0cReston shown Dimension Hierarchies oz For each dimension the set of values can be organized in a hierarchy PRODUCT TIME ALL ALL I category yelar quarter brand month date LOCATION ALL country state city Dimension Hierarchies TIME ALL year quarter month date 2008 2007 M 2Q 1Q M2 1Q Apr Mar Feb Ian 43008 4108 Fact Table oz The main relation which relates dimensions to a measure is called the fact table oz Each dimension can have additional attributes and an associated dirnension table oz Fact tables are much larger than dimensional tables Star Schema TIME I timeid I date I monthI quarter I yearI Ipid I timeid I locid I sales ISALES FaCt table PRODUCT LOCATION I pnameI category I price I IE I city I state I country I oz Type of schema is called a star schema oz Computing the join of all these relations is called a star join 10 Example Fact table is compact representation 0 Slice locid1 quot 8 10 10 is shown a S 30 20 50 1 25 8 15 1 2 3 timeid locid E U e t 11 1 1 25 11 2 1 8 11 3 1 15 12 1 1 30 12 2 1 20 12 3 1 50 13 1 1 8 13 2 1 10 13 3 1 10 11 1 2 35 quot 11 TAM 3W mt Wm Lamsmms Tmls mm m Jim wm w mm mm 9 din E TBCfmmec edJn m TBEMEASURES CN LD EWMENLESBBAEE CENSDUWDN UDA VARWEEHEWRWE UW DN NHDUET TBEPWDUUDM TBEREE DN may mash heme ne a as a MEASURES j 5 MW 5 wwwsm J a mNquDAHDN mm D EDNSIJUDAHEIN 55mm 5 B 5mmquot h umsunism nt j Em inmmmnn L Powwow I LFJ Us 5mm K W cm TBLMD am My OLAP Data Model Queries oz In uenced by spreadsheets I Pivot tables in Excel oz A common operation is to aggregate a measure over one 0139 more dimensions I Find total sales I Find total sales for each city or for each state I Find top five products ranked by total sales 13 OLAP Data Model Queries oz Rollup Aggregating at different levels of a dimension hierarchy I E g Given total sales by city we can rollup to get sales by state oz Drilldown The inverse of rollup I E g Given total sales by state can drilldown to get total sales by city I Eg Can also drilldown on different dimension to get total sales by product for each state 14 OLAP Data Model Queries oz Slicing and Dicing Equality and range selections on one or more dimensions oz Pivoting Aggregation on selected WI CA Total dimensions 1995 63 81 144 I EgPivoting on Location and Time 1996 38 107 145 yields this W 1997 75 35 110 15 Comparison with SQL Queries oz The crosstabulation obtained by pivoting can also be computed using a collection of SQL queries SELECT Tyear Lstate SUMSsa1es FROM Sales S Times T Locations L WHERE Sti1neidTtimeid AND SlocidLlocid GROUP BY Tyear Lstate SELECT Tyear SUMSsa1es SELECT Lstate SUMSsa1es FROM Sales 8 Times T FROM Sales S Location L WHERE StimeidTtimeid WHERE StimeidLti1neid GROUP BY Tyear GROUP BY Lstate 16 SELECT Tyear Lstate SUMSsales FROM Sales S Times T Locations L WHERE StimeidTtimeid AND SlocidLlocid GROUP BY Tyear Lstate PRODUCT category brand TIME LOCATION ALL ALL coulntry quarter month I date Clty 17 SELECT Tyear SUMSsales FROM Sales S Times T WHERE StimeidTtimeid GROUP BY Tyear PRODUCT TIME ALL category I quarter brand I month date LOCATION country state city 18 SELECT SUMSsales FROM Sales S There are354 60 queries with this form GROUP BY groupinglist PRODUCT TIME LOCATION ALL country category I quarter I brand I state month I I City date 19 CUBE Operator oz Introduce a new SQL operator that generates all possible aggregations SELECT SUMSsa1es FROM Sales S GROUP BY groupinglist CUBE pid locid timeid BY SUM Sales 60 queries 20 Interactivity requires precomputation a Store the answers to all possible aggregations I Each query is a table store all these tables I Answering query is simply a lookup Problems I CUBE is potentially enormous I What if the fact table changes 21 Store Cube as collection of materialized views OLD Query New Query CREATE VIEW CityYear AS SELECT LCity TYear SUMSa1es AS Sales FROM FactTable F Location L Times T GROUP BY LCity TYear SELECT Lstate Tyear SUMsales FROM FactTable F Location L Times T WHERE F10Cid Llocid AND Ftimeid Ttimeid GROUP BY Lstate Tyear SELECT state year Sales FROM CityYear CY 22 Relationship between OLAP views TIME LOCATION ALL ALL ALL ALL I ALL state year ALL year state year state I quarter Clty ALL city quarter ALL year City quarter state Fact table cfuarter city I 23 Figure out best pieces of CUBE to store oz Usually know query workload oz Idea Materialize Views to answer these queries TIME LOCATION ALL ALL ALL ALL year state quarter City year city quarter state quarter city year state 24 Materialized Views oz A View whose tuples are stored in the database is said to be materialized I Provides fast access like a very highlevel cache oz Need to maintain the View as the underlying tables change I Most Databases do this automatically I Still active research topic 25 Optimizing Cube Operator oz CUBE operator is equivalent to collection of SQL queries with a common structure oz Try to reuse as much work as possible to minimize number of fact table scans oz Exploit properties of aggregation functions I Algebraic I Distributive 26 Exploit Relationship between OLAP views TIME LOCATION ALL ALL I ALL state year ALL year state year state I quarter Clty ALL city quarter ALL year City quarter state Fact table cfuarter city I 27 oz Stupid SQL Aggregation Tricks 28 Querying Sequences in SQL1999 oz Trend analysis is difficult to do in SQL92 I Find the change in monthly sales I Find the top 5 product by total sales I Find the trailing nday moving average of sales I The first two queries can be expressed with difficulty but the third cannot even be expressed in SQL92 if n is a parameter of the query oz The WINDOW clause in SQL21999 allows us to write such queries over a table viewed as a sequence implicitly based on userspecified sort keys 29 The WIND OW Clause SELECT Lstate Tmonth AVGSsales OVER W AS movavg FROM Sales 8 Times T Locations L WHERE StimeidTtimeid AND SlocidLlocid WINDOW W AS PARTITION BY Lstate ORDER BY Tmonth RANGE BETWEEN INTERVAL 1 MONTH PRECEDING AND INTERVAL 1 MONTH FOLLOWING Let the result of the FROM and WHERE clauses be Temp Conceptually Temp is partitioned according to the PARTITION BY clause I Similar to GROUP BY but the answer has one row for each row in a partition not one row per partition Each partition is sorted according to the ORDER BY clause For each row in a partition the WINDOW clause creates a window of nearby preceding or succeeding tuples I Can be valuebased as in example using RANGE I Can be based on number of rows to include in the window using ROWS clause The aggregate function is evaluated for each row in the partition using the correspon ing window I New aggregate functions that are useful with windowin of a row within its partition and its variants DENSERA CUMEDIST guinclude RANK osition K PERCENT ANK 30 Top N Queries oz If you want to find the 10 or so cheapest cars it would be nice if the DB could avoid computing the costs of all cars before sorting to determine the 10 cheapest I Idea Guess at a cost c such that the 10 cheapest all cost less than c and that not too many more cost less Then add the selection costltc and evaluate the query 0 If the guess is right great we avoid computation for cars that cost more than c 0 If the guess is wrong need to reset the selection and recompute the original query 31 Top N Queries SELECT Ppid Ppname Ssa1es FROM Sales S Products P WHERE SpidPpid AND S10Cid1 AND Stimeid3 ORDER BY Ssa1es DESC OPTIMIZE FOR 10 ROWS SELECT Ppid Ppname Ssa1es FROM Sales S Products P WHERE SpidPpid AND S10Cid1 AND Stimeid3 AND Ssa1es gt c ORDER BY Ssa1es DESC z OPTIMIZE FOR construct is not in SQL21999 oz CutOff value c is chosen by Optimizer 32
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'