New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Scientific Databases

by: Summer Kreiger

Scientific Databases CSI 710

Summer Kreiger
GPA 3.64


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Computer & Information Science

This 24 page Class Notes was uploaded by Summer Kreiger on Monday September 28, 2015. The Class Notes belongs to CSI 710 at George Mason University taught by Staff in Fall. Since its upload, it has received 38 views. For similar materials see /class/215160/csi-710-george-mason-university in Computer & Information Science at George Mason University.

Similar to CSI 710 at Mason

Popular in Computer & Information Science


Reviews for Scientific Databases


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/28/15
Data Cube A Relational Aggregation Operator Generalizing GroupBy CrossTab and SubTotals by Gray et al Data Mining and Knowledge Discovely 1 2953 1997 IT 864CSI 710 Class Notes Fall 2002 Dr Kerschberg and Dr Jamison Data Cube Concept Cube generalizes the histogram crosstab rollup drilldown and subtotal operations found in report writers Cubes are relations that treat each of the N aggregation attributes as a dimension in N space The aggregate of a particular set of attributes is a point in the Nspace Superaggregates are computed by aggregating the Ncube to lower dimensional spaces CUBE A Relational Aggregate Operator Generalizing Group By from Gray et al Aggregate Sum Group By with total By Color RED L i Cross Tab BLUE Chevy Ford By Color RED 3 WH39TE 1 The Data Cube and BLUE 2 The SubSpace Aggregates By Make E III e P 0 4 men bum 391 I 91993 By Make amp Color By Color i Data Analysis Tasks Formulate a query to extract relevant information Extract the aggregated data from the database into a file Visualize the result to look for patterns Analyze the result and perhaps formulate new queries i Goal of the Cube Represent Ndimensional data cubes using 2D relations Compute the aggregate information within the database Create language extensions to SQL that will support the required operations Relational Representation of NDimensional Data Weather information consists of four dimensions Time Latitude Longitude and Altitude Measured variables are Temperature and Pressure Weather Altitude Temp Pres Time UCT Latitude Longitude m c mb 966l1500 375833N 1224528W 102 21 1009 Many more rows like the ones above and below 10 23 96671500 I 341618N I 270555w I I 1024 SQL 1992 Standard supports aggregation functions Count 0 Sum Min Max and AVE Some SQL systems have Median Standard Deviation variance etc SELECT AVETemp FROM Weather SELECT COUNTDSTNCT Time FROM Weather GroupBy Operation and Aggregates GroupBy operation defines an equivalence relation on a table partitioning the tuples in classes having the same attribute values Each group can then be aggregated TCPD Benchmark Handbook for Database SELECT Time Altitude AVETemp and Transaction FROM Weather Procession Systems 1993 has one 6D GROUP BY BROUP BY Time Altitude and three 3D GROUP Bys Problems with GROUPBY GroupBy cannot directly construct Histograms Rollup Totals Subtotals for Drilldowns and Cross Tabulations Computing Histograms For the Weather relation suppose we want to group time into days and map latitude and logitude to Nation and then Regions The following query would give the daily maximum reported temperature for each na on I SELECT day nation MaxTemp FROM Weather GROUP BY DayTime as day NationLatitude Longitude as nation RollUps and SubTotals for DrillDown mm in Sale m Uv m Made by Yen by cm Tables 3a m 3band4 quot53423 Dyslle m m m cm Wm Wm WM shOW Sales m Roll Up by W w W Model by mi m 5 Year by W us m Color and an 29 Excel Pivot W in m mm m mm by mnycmmmmmmm aquot mm was Table with Sam mm m th Made car Calm Sam by Year Dv Madcl w m m a an m Informatlon cw m an my 3 m g m w RollUps and SubTotals for l DrillDown 7m 4 Au Exce wnl um Nprexeulanmwfnhve 1 W pmmuam mmmed Excel Pivot YearCalm m Table w L W I W m Mme mack Wm mm mm m mm mm Sales 339 m 73 23 22 43 2quot 33 Summary 5mm mo in m nu an 2m 5m ROHUP39 mm Smexsummnry ga 39e 5071a Mm m m um or vy Chew mm an 0 over aquot color mm m mm an ear my Wm m 90 Note the my mm mm x ALL vame m m m smw rymws mlsxmgfnrm m 3mm m mm a tub Mm my Ca wr unu cum m mack us l Cum m Wm HS SQL Queryrfor Chevy Roll Up swam ALL FROM WHERE swam Made WHERE GROUP E l UNION 5mm Made FROM mum GROUP BY Oman SELECT Model FROM ER GROUP 5 ALL ALL SUMKSaJes Sales Model Chevy Sales Madel 7 Chevy Medal Year my SUMLSales Sales Mad 1 Chevy Medal Year Year Owen SDMLSales Sales Medal chevy Madel Year Euler ALL C0101 sumsales Sales Mudsl Chevy Medal can CUBE and Roll Up Operators Aggregate Sum Group By with total By Color Cross Tab Chevy Ford By Color Q Q RED WHITE BLUE By Make E III Sum The Data Cube and The SubSpace Aggregates P 5 0161 090 9399 391 I 91993 By Make Cube Each Attribute is a Dimension Ndimensional Aggregate sum max Fits relational model exactly a1 a2 aNf Superaggregate over N1 Dimensional subcubes ALL a2 aN f a3 ALL a3 aN f a1 a2 ALL f This is the N1 Dimensional crosstab Superaggregate over N2 Dimensional subcubes ALL ALL a3 aN f a1 a2 ALL ALL f The Data Cube Concept B 1994 W 1995 1994 MAKE B 1995 R Ford V Chevy 1994 1995 F YEAR White C 1994 COLOR B 1995 F W C i Subcube Derivation ltMYCgt ltMYgt ltMCgt ltYCgt gtlt ltMgt lt ltCgt gt gY ltkgt Dimension collapse denotes ALL Cube Operator Example S e WEI Group Cube RollUp Algebra The operators have the following properties CUBEROLLUP CUBE ROLLUP GROUP BY ROLLUP SQL extended GROUP BY operator GROUP BY ltaggregation listgt ROLLUP lt aggregation listgt CUBE lt aggregation listgt CUBE creates nested relations that is relations can be values Dimensions Snowflake and Star Queries Time has dimensions week month year and ALL Fact Table ALL Year Quarter Month Week Location has ISeller Buyer Product Qty Price Office I Date I dimension district region geography ALL Seller has dimension unit group division and ALL Unit Group Division ALL Discount ALL District Region Geography ALL Proposed SQL Syntax Proposed syntax select model make year sumunits from carsales where model in chevy ford and year between 1990 and 1994 group by cube model make year having sumunits gt 0 Note Group By operator repeats aggregate list in select list and in group by list Why This Syntax Abstract syntaxquot selecl ltfield lislgt ltoggregole lislgt from ltloble expressiongt where ltseorch condiliongt group by cube drill down ltoggregole lislgt having ltseorch condiliongt Allows functional aggregations eg sales by quarter selecl slore quorler sumunils from sales where nolion Mexico group by drill down slore quorlerdole as quorler and year 1994 User Defined Aggregate Function Generalized For Cubes Aggregates have graduated difficulty Distributive can compute cube from next lower dimension values count min max Algebraic can compute cube from next lower lower scratchpads average Holistic Need base data Median Mode Rank Distributive and Algebraic have simple and efficient algorithm build higher dimensions from core Holistic computation seems to require multiple passes real systems use sampling to estimate them eg sample to find median quartile boundaries Decorations and Abstractions Sometimes want to tag cube with redundant values region regionname sales region name is not a dimension it is a decoration Decorations are functionally dependent on dimensions More interesting some dimensions are aggregations block gtcity gt county gt state gt nation Often these aggregations are not linear are a lattice second gt minute gthour gtday gtweek month gt quarter gt year Xmas Easter gt Thanksgiving Holiday Rather than treat time as 12 dimensions Recognize abstractions as one dimension like decorations Compute efficiently virtual functions Interesting Aggregate Func ons From RedBrick systems Rank in sorted order NTile histograms Running average cumulative functions Windowed running average Percent of total Users want to define their own aggregate func ons statistics domain specific


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Allison Fischer University of Alabama

"I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.