New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Parallel and Distributed Database Systems

by: Luisa Beer

Parallel and Distributed Database Systems COP 5711

Marketplace > University of Central Florida > Computer Programming > COP 5711 > Parallel and Distributed Database Systems
Luisa Beer
University of Central Florida
GPA 3.78

Kien Hua

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Kien Hua
Class Notes
25 ?




Popular in Course

Popular in Computer Programming

This 112 page Class Notes was uploaded by Luisa Beer on Thursday October 22, 2015. The Class Notes belongs to COP 5711 at University of Central Florida taught by Kien Hua in Fall. Since its upload, it has received 44 views. For similar materials see /class/227454/cop-5711-university-of-central-florida in Computer Programming at University of Central Florida.

Similar to COP 5711 at University of Central Florida

Popular in Computer Programming


Reviews for Parallel and Distributed Database Systems


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/22/15
COP5711 Parallel and Distributed Databases Instructor K ien A H ua Of ce HEC 229 Email kicnhuacccsucfcdu One of the major motivations behind the use of database systems is the desire to integrate the operational data of an enterprise and to provide centralized thus controlled access to that data The technology of computer networks on the other hand promotes a mode of work that goes against all centralization efforts At first glance it might be difficult to understand how these two contrasting approaches can possibly be synthesized to produce a technology that is more powerful and more promising than either one alone The key to this understanding is the realization that the most important objective of the database technology is integration not centralization It is important to realize that either one of these terms does not necessarily imply the other It is possible to achieve integration without centralization and that is exactly what the distributed database technology attempts to achieve In this course we will discuss various such integration techniques Another topic for this course is parallel database technology For applications that require a system capable of sustaining trillions of operations per second on very large data sets parallel processing is the only solution We will examine techniques available for implementing such systems For both topics we will focus on how the systems work ie system internal rather than on how to use some commercial systems We will discuss the following subjects Parallel Architectures for DBMSs Distributed DBMS Architectures Data Placement Strategies Distributed Database Design Parallel Algorithms Distributed Query Processing Parallel DBMS Implementation Multidatabase Systems Techniques Peer to Peer Systems Location based Services Prerequisite COP4710 or working knowledge of DBMSs Class Notes Available at httpWVlAN rs urf 39 39 quot 39 Class Time Monday amp Wednesday 6715pm HEC103 Office Hours Monday and Wednesday 400 530 pm Grading Policy Test 1 Parallel DBMS s 30 Critical Reviews 20 Test 2 Distributed DBMS s 35 Project 15 Important Dates Withdrawal deadline is October 16 2009 Class ends on December 7 2009 Final exam is December 9 2009 4pm650pm FIRM l Fall holidays are 97 Labor Day 1111 Veteran s Day and 112628 Thanksgiving Semis rruc rured Dam Semistructured Data 5em57 ruc7 urea dam is da ra Tha r has some s rruc rure bu r i r may be irregular and incomple re and does no r necessarily conform To a fixed schema WorldWide Web In regra rion of da ra from he rerogeneous sources ObjecT Exchange Model OEM Nodes wiThouT ouTgoing edges are called aTomic objecTs The resT of The nodes are called complex objecTs ATomic objecTs have a value of Type inTeger39 real sTring eTc Complex objecTs have The reserved value C OEM DefiniTion An OEM da rabase is a 4 ruple 0quotNl wquot where Nis a se r of objec r iden rifiers A is a se r of labeled direc red arcs pc where pc e Nand is a s rring vis a func rion Tha r maps each node n e N To an a romic value or The reserved value 65 and ris a dis ringuished node in Ncalled The roof of The da rabase SelecT GuidemesTauranT wher e guider esTaur39anTpr39ice lt 205 The r39esuIT of The query is a singIeTon seT conTaining The resTauranT objecT for Bangkok Cuisinequot Lorel coerces The values To a common same Type before making The comparisons Lor39el Example 2 Select from where select from where 1 mm m m Guidemes rau ran r Guideres l auran fnddresss free f Z Z Gre nquot Guidemes rau ran r Guidemes rau ran r Guideres l auran fnddresss free f Green Basic Change OperaTions creNoa emv creates a new objec r The iden rifier n mus r be new updNoa e7v changes The value of objec r n where v is an a romic value or39 The special symbol Cf Objec r n mus r be either an a romic objec r or39 a complex objec r wi rhou r subobjec rs addArcCop adds an are labeled fr39om objec r p To objec r c The new are mus r no r alr39eady exis r remArdpp removes an arc pc Valid Change Sequence We say Tha r a sequence L u1 u2 un of basic change oper39a rions is vaI39a for39 an OEM da rabase Oif uis valid for39 0H for39 all 39 1 n where 00 0 and 0 u0J for39 f 1 n We use M0 To deno re The OEM da rabase ob rained by applying The en rir39e L To 0 Valid Changes We say Tha r a se r U u1 uz un 01 basic change oper39a rions is valid for39 an OEM da rabase 0 if for39 some ordering L of The changes in U L is a valid sequence of changes for39 any Two such valid sequences L and L M0 39 0 and U does no r con rain bo rh addArcpc and remArcpc for39 any 0 and c OEM His rory 39 OEM hIZS39faryis a sequence H T1 U1 Tn Uh wher39e Ui is a se r of basic change operations and T is a Times ramp for f 1 n and T lt TM for 39 1 n 1 We say H is valia for39 an OEM da rabase Oif for39 all 39 1 n U is valid for39 04 where 00 0 and 0 U10 for 39 1 n OEM His ror39y Example We have The his ror y H 1 U1 1392 U2 r3 U3 where 1391 1Jan97 1392 5Jan97 1393 8Jan97 U1 updNadz yZU creNadzn2 creNadzn3 quotHaka ra addArcn4 resTauranT3977 addArc72 quotnamequot 3 U2 creNadzn5 quot need info addArc72 quotcommen r 75 U3 r217Arcn6 parking 77 AnnoTaTions To The OEM graph AnnoTaTions ar39e aTTached To The nodes and ar39cs To encode The hisTor39y of basic change oper39aTions 39 Four39 Types of annoTaTions cre739 7 The node was cr39eaTed aT Time 7 upa 7 ov The node was updaTed aT Time 7 ovis The old value ada 7 The are was added aT Time 7 rem 7 The are was removed aT Time 7 OEM graph wi rh Anno ra rions d m l39i39t LI39L II JIH EllilIL 13953quot quotncrllnr39uquot m2 l Jam1 Ili39iirll quotIlkmuquot purlx i l l g uddrwm mm H unrhy anus anhhms quotIndianquot n Jn nm m quotJamaquot quot I It luyl lnnquot llhu llfr39 lullquot quot3939II11n139quot lt IIIUIJEEJN quot1 Hum quotPalm llerquot DOEM DaTabase The se r of all possible node anno ra rions is deno red by nodeannof and The se r of all possible are anno ra rions is deno red by arc annof A DOEM dafabase is a Triple D OfoA where 0 N A v r is an OEM da rabase fN maps each node in N To a fini re subse r of node annof and fA maps each arc in A To a fini re subse r of arcannof DOEM DaTabase ProperTies Given a DOEM daTabase D iT is easy To obTai n The original snapshoT 000 The snapshoT aT Time 7 00 and The cur39r39enT snapshoT 0CD Chorel Example 1 QUERY Find The names of all r39esTaur39anTs whose pr39ice r39aTings wer39e updaTed on or39 afTer39 January lsT 1997 To a value gr39eaTer39 Than 15 TogeTher39 wiTh The Time of The updaTe and The new price SelecT N 7 NV from gudefesfaumm prce ltupa aT 739 To NV gt guide resfauran 7 name N where 739gt 1Jan97 and NV gt 15 Answer name quotBangkok CUEhe new value 20 updaTe me 1Jan97 Chorel Example 2 QUERY Find The names of restaurants To which a moderate pr39ice subobjec r was added since January 13 1997 Selec r N from guide resfauranf R R name N where R ltaa a a r 739gt pr39ice modera re and 739gt 1Jan97 SynTax of Anno ra rion Expression ltAmo7 a r melgt if Annof is in add rem ere ltupa af me V from oa V 7 0 newVgt for upa Arc anno ra rion expressions mus r occur immedia rely before a label Example gufdemesfauram ltaa a a r 739gt price Node anno ra rion expressions mus r occur immedia rely af rer a label Example gufdemesfauram prce ltupa a r 739 To N V gt PARALLEL DATABASE TECHNOLOGY Kien A Hua School of Computer Science University of Central Florida Orlando FL 328162362 Khan A H ua Topics Results Queries QUERY OPTIMIZATION Pa QIIQIiZi g Que y Optimization Techniques Lu 4 D D 5 EXECUTOR ParaIIeIAIgorithms D E 3 U a STORAGE MANAGER Data Placement 5 Strategies 0 2 D D HARDWARE Parallel Architectures 1 COMMERCIAL SYSTEMS 5 Klan A H ma 2 Outline 1 Parallelism Goals and Metrics 2 Hardware Architectures for Parallel Database Systems 3 Basic Techniques for Parallel Database System Implementation The Role of Parallel Database Systems in Client Server Computing Environment 5 Some Commercial Parallel Database Systems 6 Future Directions and Research Problems Klan A H ua Relational Data Model An attribute column ElP NAME ADDR A relation table lt John Smith Orlando lt A tuple row o A database structure is a collection of tables 0 Each table is organized into rows and columns 0 The persistent objects of an application are captured in these tables Khan A H ua Relational Operator SCAN SELECT NAME FROM EMPLOYEE WHERE ADDRA Orlando ElIP NABE ADDR 0002 Jane Doe Orlando John Smith Orlando EMP NAME ADDR I 0002 I Jane Doe I Orlando I gt I 0005 I John Smith I Orlando I U I NAME John Smith Khan A H ma 5 Relational Operator JOIN SELECT gtzlt FROM EMPLOYEE PROJECT WHERE EMP ENUM EMPLOYEE PROJECT EI P NAME ADDR ENUM PROJECT DEPT Jane Doe Orlando 0002 0002 Database Research Research Matching Yes EMPPROJ EIP NANIE ADDR PROJECT DEPT Jane Doe Jane Doe Orlando Research Khan A H ma 6 Hash Based Join E2 P2 E3 DE DE mi A V 4 b 41 I I 391 I l quot I I 39 l 39gt r quotquot1 x 39 H r 39 4 vi3939 gf quot39 I I I EMP NAME ADDR ENIPL OYEE PROJECT 0004 Examples 0 mod 4 0 4 mod 4 0 1mod41 5m0d41 2m0d42 6m0d42 3mod423 7m0d423 Kien A H ua Bucket Sizes and 1 0 Costs KI I One tuple at a um Bucket A Bucket B One tuple at a time F l Bucket Al l Bucket AZ Bucket B2 l 131101 t AG Bucket B3 Bucket B does not t in the memory in its entirety It must be loaded several time Bucket B ts in the memory It needs to be loaded only once Kien A H ua Speedup and Scaleup The ideal parallel systems demonstrates two key properties 1 Linear Speedup smallsystem lapsediime S eedu p p bigjystem lapsediime Linear Speedup Twice as much hardware can perform the task in half the elapse time ie7 speedup number of processors 2 Linear Scaleup smallsystem lapsediimeonjmaleroblem S leu ca p bigjystem lapsedJimeonljigmroblem Linear Scaleup Twice as much hardware can perform twice as large a task in the same elapsed time ie7 scaleup 1 Klan A H ua Barriers to Parallelism o Startup The time needed to start a parallel operation thread creationconnection ouerhead may dominate the actual computation time o Interference When accessing shared resources each new process slows down the others hot spot problem 0 Skew The response time of a set of parallel processes is the time of the slowest one Kien A Hua 10 The Challenge o The ideal database machine has 1 a single in nitely fast processor 2 an in nitely large memory with in nite bandwidth gt Unfortunately technology is not delivering such mootines o The challenge is 1 to build an in nitely fast processor out of in nitely many processors of nite speed7 and 2 to build an in nitely large memory with in nitely many storage units of nite speed Kien A H ua Performance of Hardware Components 11 0 Processor Density increases by 25 per year Speed doubles in three years 0 Memory Density increases by 60 per year Cycle time decreases by in ten years 0 Disk Density increases by 25 per year Cycle time decreases by in ten years The Database Problem The I O bottleneck will worsen Kim A Him 12 Hardware Architectures Communication I 8 Network 39 Communication agm58 33 c Shared Everything SE b Shared Disk SD Communication Network 2 a Shared Nothing SN Memory Module Processor 8 Disk Drive Kien A H ua Examples 13 HARDWARE 0 SharedEverything IBM mainframes HP T500 SGI ChaIIenge Pentium based SMP o SharedDisk InteI Paragon nCUBE2 Tandem s ServerNet based machines 0 SharedNothing Teradata s DBC Tandem NonStopSQL IBM 6000 SP SOFTWARE 0 SharedEverything InformiX 72 OnIine Dy namic Server OracIe 73 ParaIIeI Query Option IBM DB2MVS o SharedDisk IBM IMSVS Data Shar ing Product DEC VAX DBMS and Rdb products OracIe on DEC s MAX cIuster and nCUBE Com puters o SharedNothing Teradata s DBC Tandem NonStopSQL IBM DB2 ParaIIeI Edi tion Khan A Hua 14 Shared Everything Systems z39 MORY EMORY EMORY MORY Cross Interrogation Kicn A Hua 15 Hybrid Architecture l COMMUNICATION NETWORK Cluster 1 Cluster N 0 SE clusters are interconnected through a communication network to form an SN structure at the inter cluster level 0 This approach minimizes the communication overhead associated with the SN structure and yet each cluster size is kept small within the limitation of the local memory and I O bandwidth 0 Examples of this architecture include Sequent computers NOR 5100M and Bull PowerCluster 0 Some of the DBMSs designed for this structure are the Teradata Database System for the NOR WorldMark 5100 computer Sybase MPP InformiX Online Extended Parallel Server Kien A Hua 1639 Parallelism in Relational Data Model o Pipeline Parallelism If one operator sends its output to another the two operators can execute in parallel INSERT INTO C SELECT FROM A B WHERE AX By an A B o Partitioned Parallelism By taking the large relational operators and partitioning their inputs and outputs it is possible to turn one big job into many concurrent independent little ones Kien A H ua Merge and Split Operators port 0 PROCESS EXECUTING MERGE OPERATOR A OPERATO 39 Input Output rt Input p0 Output streams streams o A merge Operator combines several parallel data streams into a simple sequential stream 0 A split operator is used to partition or replicate the stream of tuples 0 With the split and merge operators7 a web of simple sequential dataflow nodes can be connected to form a parallel execution plan Kien A Hua 18 Data Partitioning Strategies Data partitioning is the key to partitioned execution o RoundRobin maps the ith tupie to disk i mod n o Hash Partitioning maps each tupie to a disk location based on a hash function 0 Range Partitioning maps contiguous attribute ranges of a relation to various disks NETWORK NETWORK Hashed Partitioning RoundRobin Kien A Hua 19 Comparing Data Partitioning Strategies 0 Round Robin Partitioning Advantage simple Disadvantage It does not support associative search 0 Hash Partitioning Advantage Associative access to the tuples with a speci c attribute value can be directed to a single dish Disadvantage It tends to randomize data rather than cluster it 0 Range Partitioning Advantage It is good for associative search and clustering data Disadvantage It risks execution shew in which all the execution occurs in one partition Khan A H ua Horizontal Data Partitioning 20 COMMUNICATION NETWORK GED STUDENT SSN NAME GPA MAJOR 012345678 Jane Doe 38 Computer Science 876543210 John Smith 29 English Query 1 Retrieve the names of students who have a GPA better than 20 gt Only P2 and P3 can participate Query 2 Retrieve the names of students who ma jor in Anthropology The whole le must be searched Klan A H ua 21 Multidimensional Data Partitioning 1attribute The records in this Age A query Sfcl39c s ii gsi g o 1 2 3 4 539 6 7 8 55 1attribute 3456780 2tquery 6 7 8 O l 2 3 4 5 l 2 3 4 5 6 7 8 O 4 5 6 7 8 O l 2 3 7 8 O l 2 3 4 5 6 2 3 4 5 6 7 8 O 1 2attribute 25 query 5 6 7 8 O l 2 3 4 8 o 1 2 3 4 5 6 7 20K 30K 35K 45K 55K 70K 90K Salas Advantages 0 Degree of parallelism is maximized using as many processing nodes as possible 0 Search space is minimized searching only relevant data blocks Kien A H ua Query Types 22 Query Shape Square Query Row Query Column Query The shape of the data sub space accessed by a range query is a The query shape square The query shape is a rect angle containing a number of rows The query shape is a rect angle containing a number of columns Klan A H ua 23 Multidimensional Data Allocation Techniques o Multidimensional Data Placement MDP by Hun and Lee in 1990 0 Coordinate Modulo Declustering CMD by Li Srivasmva and Rotem in 1992 o Hilbert Curve Allocation Method HCAM by Faloutsos and Bhagwat in 1993 0 General Multidimensional Data Allocation GeMDA by Hun L0 and Young in 1994 Klan A H ua 24 Optimality o A data allocation strategy is usage optimal with respect to a query type if the execution of these queries can always use all the PNs available in the system c A data allocation strategy is balance optimal with respect to a query type if the execution of these queries always results in a balance workload for all the PNs involved 0 A data allocation strategy is optimal with respect to a query type if it is usage optimal and balance optimal with respect to this query type Kien A H ua 25 Coordinate Modulo Deelustering CMD O l 2 3 4 5 6 7 1 2 3 4 5 6 7 O 2 3 4 5 6 7 O l 3 4 5 6 7 0 1 2 4 5 6 7 O l 2 3 5 6 7 O 1 2 3 4 6 7 O l 2 3 4 5 7 0 1 2 3 4 5 6 Advantages Optimal for T011 and col umn queries Disadvantages P007 for square queries Kien A H ua 2639 Hilbert Curve Allocation Method HCAM 03452347 12761056 65016721 74325430 01670167 32543254 47034703 56125612 Advantages Good for square range queries Disadvantages Poor for row and column queries Klan A H ma 2 General Multidimensional Data Allocation GMDA Row lt Check row Row Row Row lt Check row Row Row Row lt Check row Row Row Oowm hkWNNQ mquprmwo OOxUJQDU39lNQIPH I QrPOC UJmU39IM NQDU39lI QIPomw WOQNQDU39lI Qrb Hgt 39ILAJOODU39I U39INQDrPI QUJOQ QWOU39INQDrPI Q erI QUJOU39INQD N is the number of processing nodes Regular Rows Circular left shift j po sitions Check Rows Circular left shift j 1 positions Advantages optimal for row column and small square range queries lt j 2 Kien A H ua Handling 3 Din1ensional Case 28 A cube with N 3 grid blocks can be seen as N 2 dimensional planes stacked up in the third dimension Y O 2 4 6 8 l 3 5 7 DmibNOQU IUJH O1U UJI DO HgtI I DONibNOQU IU J lO IU39UJ DO IJgt NOQU IWI DON U IUJI DONIbNOQ ONHgtNOU IWI D Algorithm 3DGMDA 1 We compute the mapping for the rst rows of all the 2 dimensional planes by considering these rows collectively as forming a plane along the third dimension We apply the 2DGMDA algorithm to this plane except that the shifting distance is set to P N We apply the 2DGMDA algorithm to each of the 2dimensional planes using the shifting distance j and the rst row already computed in Step 1 Khan A H ua 29 Handling Higher Dimensions Mapping Function A grid block X1 X2 Xd is assigned to PN GeMDAX1X2 Xd where 1 d i d 39 Ge lDAltX1Xd Z Z X Sh dzsti mod N i 239 1 N number of PNS Shfldzist j and GCDi gcdShfdzisf N Klan A H ua Optimality Comparison 30 Allocation Optimal with respect to scheme row queries column queries small square queries HCAM N0 N0 N0 CMD Yes Yes No GeMDA Yes Yes Yes Klan A H ua 31 Conventional Parallel Hash Based Join GRACE Algorithm l D D D D D E D D s s I Q IUCKTUIF IUCKTUN IUCKTUN E Q a E g E F E a m DATA TRANSMISSION l 65 a 1 a quotU D i a N PN2 PN3 PN4 Kit 1 A Hua The Effect of Imbalanced Workloads 100000 32755 5192 2131 A Zb05 m 3 204B 2130 a g y 0 U m 2 u s 9 4 O x m S m 32 B 0 2045 4095 5144 5192 Bucket ID 100 90 Number processors 54 10 54 X 4 MBytes 50 Communlcatlon 54 X 4 MBytes 70 8 1 8 50 m 9 0 50 3 u quot g 40 0 o B 30 GRACEibest Kien A H ua 33 Partition Tuning Largest Processing Time LPT First Strategy Hash Bucket Klan A H ua 34 Naive Load Balancing Parallel Hash Join NBJ DD 3 ii gt gt DATA TRANSMISSION l m CED CED CED PNl PNZ PN3 Klan A H ua Tuple lnterleaving Parallel Hash Join TlJ CID Ell mm DD QQ QQ QQ QQ Ill ll ll lll CID CID CID CID Q Q Q Q E SE SE E 2 gm f F f 1 l ll ll l 1quot PNl m m 4 PNZ PN3 a quotU 4 n 4 N Klan A H ua 36 Adaptive Load Balancing Parallel Hash Join ABJ l l l l l l l l DE DE DE DD 5 DE SE SE Owl 1 cm 1 ml 1 Owl I Ea SE 25 3 E E E E M M M M 3 3t 3 3 a 4 a a Em E g i Kit 1 A Hua Simulation Results cos Second u a Bucket skew cos Second u A u a u a Imus Partltlon skew cos Second Kicn A H ua 38 Sampling Based Load Balancing SLB Join Algorithm 0 Sampling Phase As the sampling tuples are loaded into memory7 each PN declusters its tuples into a number of in memory buckets by hashing on the join attribute 0 Partition Tuning Phase The coordinator determines the optimal bucket allocation strategy BASquot using the statistical information on the buckets in the sample 0 Split Phase The tuples already in the memory are collected to their respective host PN according to 318 to form the join buckets Each PN loads the remaining tuples and redistributes them to the join buckets in accordance with 318 0 Join Phase Each PN performs the local joins of respectively matching buckets Kien A H ua 39 nCUBE2 Results SLB vs ABJ vs GRACE time secs o The performance of SLB approaches that of GRACE on very mild skew conditions and o it can avoid the disastrous performance that GRACE suffers on severe skew conditions Klan A H ua 40 Pipelining Hash Join Algorithms o Two Phase Approach Output stream Hash Ia ble First operand Second operand Advantage requires only one hash table Disadvantage pipelining along the outer relation must be suspended during the build phase ie7 building the hash table 0 One Phase Approach As a tuple comes in it is rst inserted into its hash table and then used to probe that part of the hash table of the other operand that has already been constructed Output stream 0 table table First operand Second operand Advantage pipelining along both operands is possible Disadvantage requires larger memery space Kien A H ua 41 Aggregate Functions 0 An SQL aggregate function is a function that operates on groups of tupies Example SELECT department COUNT FROM Employee WHERE age gt 50 GROUP BY department 0 The number of result tupies depends on the selectivity of the GROUP BY attributes ie department Khan A H ua 44 Centralized Merging Coordinator PN1 PN2 U R Khan A H ua Distributed Merging Khan A H ua 4639 Repartitioning PN l PN2 PNS Department Count Department Count Department 3 15 1 13 2 13 6 20 4 17 5 11 7 13 8 14 PadMon139 PadMonZ39 PadMon339 hash value 0 hash value 1 hash value 2 MOD3 MOD3 MOD3 PadMon1 PadMons Klan A H ua Performance Characteristics 0 Centralized Merging Algorithm Advantage Disadvantage The merging phase is sequential 0 Distributed Merging Algorithm Advantage The merging step is not a bottleneck Disadvantage Since a group value is being accumulted on po tentially all the PNs the overall memory require ment can be large 0 Repartitioning Algorithm Advantage It reduces the memory requrement as each group value is stored in one place only Disadvantage It incurs more network traf c It works well when the number of tuples is small 43 Kien A H ua 42 Coventional Aggregation Algorithms o Centralized Merging CM Algorithm Phase 1 Each PN does aggregation on its local tuples Phase 2 The local aggregate values are merged at a predetermined central coordinator 0 Distributed Merging DM Algorithm Phase 1 Each PN does aggregation on its local tuples Phase 2 The local aggregate values are then hash partitioned based on the GROUP BY attribute and the PNs merge these local aggregate values in parallel o Repartitioning Rep Algorithm Phase 1 The relation is repartitioned using the GROUP BY attributes Phase 2 The PNs do aggreation on their local partitions in parallel Performance Comparison a CM and DM work well when the number of result tuples is small 0 Rep works better when the number of groups is large Klan A H ma 4 Adaptive Aggregation Algorithms 0 Sampling Based Samp Approach CM algorithm is rst applied to a small Page oriented random sample of the relation If the number of groups obtained from the sample is small then DM strategy is used Rep algorithm is used otherwise 0 Adaptive DM A DM Algorithm This algorithm starts with the DM strategy under the common case assumption that the number of group is small However if the algorithm detects that the number of groups is large ie memory full is detected it switches to the Rep strategy 0 The Adaptive Repartitioning A Rep Algorithm This algorithm starts with the Rep strategy It switches to DM if the number of groups is not large enough ie number of groups is too few given the number of seen tuples Performance Comparison o In general A DM performs the best 0 However A Rep should be used if the number of groups is suspected to be very large Klan A H ua 48 Implementation Techniques for A DM 0 Global Switch When the rst PN detects a memory full condition it informs all the PNs to switch to the Rep strategy Each PN rst partitions and sends the so far accumulated local results to PNs they hash to Then it proceeds to read and repartition the remaining tuples Once the repartitioning phase is complete the PNs do aggregate on the local partitions in parallel as in the Rep algorithm 0 Local Switch A PN upon detecting memory full stops processing its local tuples It rst partitions and sends the so far accumulated local results to the PNs they hash to Then it proceeds to read and repartition the remaining tuples During Phase one one set of PNs may be executing the DM algorithm while other are executing the Rep algorithm When the latter receives an aggregate value from another PN it accumulates it to the corresponding local aggregate value Once all PNs have completed their Phase 1 The local aggregate values are merged as in the DM algorithm Khan A H ua A DM Global Switch 49 Phase 2 Phase 2 Partition 139 Partition 239 hash value 0 hash value 1 Partition 1 Partition 2 Phase 2 Partition 3 hash value 2 Partition 3 Khan A H ua 50 A DM Local Switch Phase 2 Phase 2 PN3 Step1 MOD 3 PN1 Partition 1 partition 2 Partition 3 Kien A H ua 51 SQL Structured Query Language EMPLOYEE ENAME ENUM BDATE ADDR SALARY WORKSON ENO URS I PROJECT PNAME PNUM DNUM IPLOCATION Ari SQL query SELECT ENAME FROM EMPLOYEE WORKSON PROJECT WHERE PNAME alatabase AND PNUM PNO AND ENO ENUM AND BDATE gt 1965 0 SQL is nonpmceduml o The Compiler must generate the execution plan 1 Transforms the query from SQL into relational algebra 2 Restructures optimizes the algebra to improve performance Klan A H ua 5 2 Relational Algebra Relation T1 Relation T2 ENAME SALARY ENUM ENUM ADDRESS BDATE Andrew 98000 005 lt 001 Orlando 1964 Casey 150 000 003 003 Ne W York 1 966 James 53 120 000 007 005 Los Angeles 1 968 Kathleen 1 1500 001 007 London 1958 o Select Selects rows Case 150000 003 USALARY120000T1 Ejameys 120000 007 o Project Selects columns Andrew 98000 Case 150000 WENAME3ALARYT1 Ejameys 120000 Kathleen 115000 c Cartesian Product Selects all possible combinations Andrew 98000 005 001 Orlando 1964 Andrew 98000 005 003 New York 1966 T1 gtlt T2 3 Kathleen 150000 001 005 Los Angeles 1968 Kathleen 115000 001 007 London 1958 o Join Selects some combinations Andrew 98000 005 Los Angeles 1968 Casey 150000 003 New York 1966 James 120000 007 London 1958 T1 N T2 E Kathleen 115000 001 Orlando 1964 Khan A H ua Transforming SQL into Algebra An SQL query SELECT ENAME FROM EMPLOYEE WORAKSON PROJECT WHERE PNAME database AND PNUM PNO AND ENO ENUM AND BDATE gt 1965 Canonical Query Tree SELECT TI Clause ENAME WHERE 6PNAME database AND PNUM PNO AND ENO ENUM AND BDATE gt 1965 Clause T X FROM E Clause X PROJECT lEMPLOYEEI IWORKSON I This query tree procedure will compute the correct result However the performance will be very poor gt needs Optimization Kien A H ua Optimization Strategies GOAL Reducing the sizes of the intermediate results as quickly as possible STRATEGY 1 Move SELECTs and PROJECTS as far down the query tree as possible 2 Among SELECTs reordering the tree to perform the one with lowest selectivity factor rst 3 Among JOle7 reordering the tree to perform the one with lowest join selectivity rst Khan A H ua O1 01 Example Apply SELECTS First Canonical Query Tree SELECT Tr Clause ENAME WHERE 6PNAME database AND PNUM PNO AND ENO ENUM AND BDATE gt 1965 Clause X FROM PROJECT Clause X EMPLOYEE WORKSON After Optimization TENAME 6PNUM PNO 6 1 ENUM ENO PNAME database39 X PROJECT 530m gt 196539 mm EMPLOYEE Kien A H ua Example Replace a x by N Before Optimization TENAME 6PNUM PNO X 6ENUM ENO 6PNAME database39 x dBDATE gt 1965 WORKSON EMPLOYEE After Optimization TENAME JPNUM PNO ENUM ENO 6PNAME database gtlt oBDATE gt 1965 pRQJECT EMPLOYEE 01 I Khan A H ua Example Move PROJECTS Down Before Optimization TENAME PNUM PNO ENUM ENO oPNAME T database gt4 6BDATE gt 1965 WORKSON PROJECT T EMPLOYEE After Optimization TrENAME JPNUM PNO TrENAME PNO TPNUM NENUM ENO 6PNAME database39 TrENAME ENUM TrENO PNO 39 39 PROJECT EMPLOYEE Kien A H ua 58 Parallelizing Query Optimizer Relations are fragmented and allocated to multiple processing nodes gt Besides the choice of ordering relational oper ations7 the parallelizing optimizer must select the best sites to process data gt The role of a parallelizing optimizer is to map a query on a global relations into a sequence of lo cal operations acting on local relation fragments Khan A H ua Paraiieiizing Query Optimization SQL query on global relations Global Schema SEQUENTIAL OPTIMIZER Optimized sequential access plan l PARALLELIZING OPTIMIZER Fragment Schema Optimized parallel access plan Parallelizing Optimizer o Parallelizes the relational operators 0 Selects the best processing nodes for each parallelized relational operator Kien A H ua 60 Example Elimination of Useless JOle Fragments E1 0ENO E3Egt G1 0ENO E3Ggt E2 0E3ltENO E6Egt G2 0ENOgt133G E3 0EN0gtE6E Query SELECT gtzlt FROM E G WHERE EENO GENO 1 Sequential query tree 2 Data Localization ENO Igtltl ENO U N U 5 E1453 6 32 3 Distributing N over U 4 Eliminate useless JOINs U U Nl gtL E1 G1E1 G2 Es G2 E1 G1 52 G2 Es G2 Kien A H ua 61 Parallelizing Query Optimization 1 Determines which fragments are involved and transforms the global operators into fragment operators 2 Eliminates useless fragment operators 3 Finds the best ordering of the fragment operators Selects the best processing node for each fragment operator and speci es the communication operations Kien A H ua 6 2 Prototype at UCF o A prototype of a shared nothing system is implemented on a 64 processor nCUBE 2 computer 0 Our system was implemented to demonstrate GeMDA multidimensional data partitioning technique dynamic optimization scheme with load balancing capability and competition based scheduling policy Kien A Hua System Architecture Create Destroy Tables SQL Results Queries Global Schema QUERY TRANSLATOR QUERY EXECUTOR LOAD UTILITY STORAGE MANAGER 0 Kien A H ua 6394 Software Componets Storage Manager This component manages physical disk devices and schedules all lO activities It provides a sequential scan interface to the query processing facilities Catalog Manager It acts as a central repository of all global and fragment schemas Load Utility This program allows the users to populate a relation using an external file It distributes the fragments of a relation across the processing nodes using GeMDA Query Translator This component provides an interface for queries It translates an SQL query into a query graph It also caches the global schema information locally Query Executor This component performs dynamic query optimization It schedules the execution of the operators in the query graph Operator Routines Each routine implements a primitive database operator To execute an operator in a query graph the Query Executor calls the appropriate operator routine to carry out the underlying operation Presentation Manager This module provides an interactive interface for the user to create destroy tables and query the database The user can also use this interface to browse query results Kien A H ua Competition Based Scheduling Server pool Operator Servers Operator Server Operator Server I A it I I ll Actlve querles I I I Coordinator 39 39 39 Dlspatcher wanng queue Coordinator pool Advantage Fair Disadvantage System utilization is not maximized Kien A H ua 66 Planning Based Scheduling Operator Operator Server Server scheduling Window X Query queue Server pool I Operator Servers O perator Operator Operator Server Server Server JOIN SELECT PROJECT Advantage Better system utilization Disadvantages Less fair Scheduler can become a bottleneck Kien A H ma 6 Hardware Organization Host Computer I Interconnection Network I I I I ACP ACP ACP ACP Ea c J Local Area Network IFP Interface Processor ACP Access Processor 0 Catalog Manager Query Manager and Scheduler processes are run on IFP S o Operator processes are run on ACP s Kien A H ua 68 Structure of Operator Processes Operator Process Split Tab e Hash Destination Process Value Stream Of 0 Processor 3 Port 5 tuples Operatlon l e 9 8K byte Processor 4 Port 6 batches 2 Processor 5 Port 8 3 Processor 6 Port 2 o The output is den1u1tip1exed through a split table 0 When the process detects the end of its input stream it rst closes the output streams and then sends a control message to its coordinator process indicating that it has completed execution Khan A H ua 69 Example Operator and Process Structure 0 Query Tree C PN1 PN2 A PN1 PN2 B PN1 PN2 0 Process Structure PN3 PN4 39 I Table I I I II II II Il I ll II I II I I I Kim A H ua 70 Storage Manager A storage manager provides the primitives for scanning a le via a sequential or index scan Contains code for w gt OPERATOR each operator in the METHODS database access COMPILED language QUERY 7 Maintains an active scan table that describes all the scans in progress ACCESS METHODS Maps file names to file ID39s manages active files searches for the page given a record Manages a buffer pool Manages physical disk devices performs pagelevel IO operations PHYSICAL IO STORAGE MANAGER DATABASE Kien A H ua 71 Transaction Processing The consistency and reliability aspects of transactions are due to four properties 1 Atomicity A transaction is either performed in its entirety or not performed at all 2 Consistency A correct execution of the transaction must take the database from one consistent state to another 3 Isolation A transaction should not make its updates Visible to other transaction until it is committed Durability Once a transaction changes the database and changes are committed these changes must never be lost because of subsequent failure Klan A H ua 72 Transaction Manager TRANSACTION MANAGER LOCK LOG MANA GER MANA GER 0 Lock Manager Each local lock manager is responsible for the lock units local to that processing node They provide concurrency control 0 Log Manager Each local log manager logs the local database operations They provide recovery services Kicn A H ua 73 Two Phase Locking Protocol Number of locks Growing Shrinking Phase Phase Iransaction duration I Begin Lock End point 0 Any schedule generated by a concurrency control algorithm that obeys the EFL protocol is serializable ie7 the isolation property is guaranteed 0 2PL is dif cult to implement The lock manager has to know 1 the transaction has obtained all its locks and 2 the transaction no longer needs to access the data item in question so that the lock can be released 0 Cascading aborts can occur Khan A H ua 74 Strict Two Phase Looking Protocol Number of locks Transaction duration I Begin End The lock manager releases all the looks together when the transaction terminates commits or aborts Khan A H ua 75 Handling Deadlocks 0 Detection and Resolution Abort and restart a transaction if it has waited for a lock for a long time Detect cycles in the wait for graph and select a transaction involved in a cycle to abort 0 Prevention If Ti requires a lock held by Tj If Ti is older gt Ti can wait 7 If Ti is younger gt Ti is aborted and restarted with the same timestamp Kien A H ua 7639 Distributed Deadlock Detection Chandy 83 0 When a transaction is blocked it sends a special probe message to the blocking transaction The message consists of three numbers the transaction that just blocked the transaction sending the message and the transaction to whom it is being sent 0 When the message arrives the recipient checks to see if it itself is waiting for any transaction If so the message is updated replacing the second eld by its own TlD and the third one by the TID of the transaction it is waiting for The message is then sent to the blocking transaction 0 If a message goes all the way around and come back to the original sender a deadlock is detected Kim A H ua KI KI Two Phase Commit Protocol To ensure the atomicity property a 2P commit protocol can be used to coordinate the commit process among subtransactions V V PREPARE READY COMMIT ACK Or Or ABORT ABORT Voting 0 Coordinator it originates the transaction 0 Agent it executes a subtransaction on behalf of its coordinator Klan A H ua 78 Recovery 0 An entry is made in the local log le at a processing node each time one of the following commands is issued by a transaction begin transaction write insert delete update commit transaction abort transaction 0 Write ahead log protocol It is essential that log records be written before the corresponding write to the database If there is no commit transaction entry in the log for a particular transaction then that transaction was still active at the time of failure and must therefore be undone Kien A H ua 79 Client Server Technology 0 Trend Downsizing of applications to run on PC workstation and server based system Sales and market surveys indicate that corporations and other business enterprises are purchasing more PCs7 and workstations as compared to mainframes and minicomputers o Client Server Architectures File Server Distributed Database Database Server Kien A H ua 80 File Server Workstation Workstation Workstation Application Application Application Data Manager Data Manager Data Manager LAN LAN LAN Operating System Operating System Operating System Request for data les I LAN I Files sent back to individual Workstations LAN Operating System Individual Application LAN File Server Dam Manager Workstation DATA The le server maintains data and ships entire les to each client workstation as requested Problems 0 The LAN could quickly become a bottleneck o The le server is not well equipped to maintain the security or integrity of data7 handle concurrent updates by different users7 or provide services such as backup and recovery of data Kien A H ua 81 Distributed Databases Workstation Workstation Workstation Request Application Application m Application Database Manager Database Manager V Database Manager LAN LAN Return LAN Operating System Operating System requested data Operating System 3 2 LAN We can de ne a distributed database as a collection of multiple logically interrelated databases distributed over a computer network 0 Advantages Local Autonomy Permits setting and enforcing local policies regarding the use of the data Improved Reliability Availability System crashes or link failures do not cause total system inoperability o Disadvantages Cost Replication of effort man power Performance The network could become a bottleneck for query processing Security Maintaining adequate security over computer networks is difficult Klan A H ua 82 Database Server Workstation Workstation Workstation Application I Application I Application I quotfront end quotfront end quotfront end LAN Operating System LAN Operating System LAN Operating System Highlevel requests for speci c data I LAN Only requested data LAN is returned to the workstation O peratmg System Database Manager LAN quotbackend Database DATA Server A database server controls and maintains storage data for the client workstations 0 Both the client and database server can be dedicated to the tasks for which they are best suited o It provides an opportunity for both horizontal ie7 more servers and vertical ie7 larger server scaling of resource to do the job 0 Data can be properly safeguarded against loss or improper access Kion A H ua 83 Reasons for Popularity of Parallel DBMSs l High performance low cost commodity components have recently become available Microprocessor based systems are much cheaper than traditional mainframes 2 Widespread adoption of the relational data model Relational data model is ideally suited to parallel execution 3 Terabyte on line databases are becoming common as the price of online storage decreases ie The WalMart retail information system as of early 1994 contained more than 4TB of data and the largest single table was 225GB with 32 billion rows It is dif cult to build mainframes powerful enough to meet the IO demands of large relational databases Klan A H ua 84 Commercial Product Teradata DEC1012 Local Area Network Host Computer IFP Interface Processor AMP Access Module Processor COP Communication Processor 0 It may have over 1000 processors and many thousands of disks 0 Each relation is hash partitioned over a subset of the AMPs o Near linear speedup and soaleup on queries have been demonstrated for systems containing over 100 pI39OCGSSOI39S K1911 A H ua 85 Teradata DEC1012 Distribution of Data o The fallback copy ensures that the data remains available on other AMPs if an AMP should fail o In the following example if AMPs 4 and 7 were to fail simultaneously however there would be a loss of data availability DSUAMP 1 DSUAMP 2 DSUAMP 3 DSUAMP 4 PrimaryCopyAIea 1 9 17 21018 3 11 19 4 12 20 F lbkaOPy Area 21 22 15 1 23 8 9 2 16 1710 3 DSUAMP 5 DSUAMP 6 DSUAMP 7 DSUAMP 8 PI J39maIyCopyAIea 5 13 21 6 14 22 7 15 23 8 16 24 Fa11baakCopy Area 1811 4 19 12 24 20 5 6 13 14 7 0 Additional data protection can be achieved by clustering the AMPs in groups 0 In the following example If both AMPs 4 and 7 were to fail all data would still be available CLUSTER A DSUAMP 1 DSUAMP 2 DSUAMP 3 DSUAMP 4 PrimaryCopyArea 1 9 17 21018 3 11 19 4 12 20 Fallback Copy Area CLUSTER B DSUAMP 5 DSUAMP 6 DSUAMP 7 DSUAMP 8 Primary Copy Area 5 13 21 6 14 22 7 15 23 8 16 24 Fallback Copy Area 6 7 8 5 15 16 13 14 24 21 22 23 Kien A H ua 8639 Commercial Product Tandem NonStop SQL DYNABUS WHEN YTD Y M quot 39IROL DYNABUS CONTROL MAlN PROCESSOR MAlN PROCESSOR MAlN PROCESSOR MAIN PROCESSOR MEMORY MEMORY MEMORY MEMORY IO PROCESSOR IO PROCESSOR IO PROCESSOR IO PROCESSOR o Tandem systems run the applications on the same processors as the database servers 0 Relations may be range partitioned across multiple disks 0 It is primarily designed for OLTP It scales linearly well beyond the largest reported mainframes on the TPC A benchmarks o It is three times cheaper than a comparable mainframe system Kien A H ma 8 Commercial Product lnformix 0 With lnformix Online 70 lnformix made a major step forward in its support of SE environment with the lnformix Parallel Data Query PDQ as part of its dynamic Scalable Architecture DSA o DSAXMP in version 80 extends the PDQ functions to work in loosely coupled parallel environments including clusters and SN computers 0 The lnformix optimizer supports indexed and hashed joins as well as table scans using indexes Kien A H ua 88 Commercial Product Oracle Oracle offers two parallel products the Parallel Server and the Parallel Query Option POO o The Parallel Server provides access to a shared Oracle database via multiple Oracle instances This could be a clustered environment or a SD system such as an nCube gt It does not parallelize database operations It is aimed at transaction scale up by throwing more DBMSs at the problem 0 The POO part of Oracle 7 is intended to provide full parallel functionality in an SE or SD environment In a SN environment it treats the partitions as a shared disk The degree of parallelism is speci ed by the DBA at the table level or by the programmer at the query level The optimizer rst looks at the best serial plan based on the standard Oracle optimizer and considers that plan for parallelization Kien A H ua 89 Commercial Product Sybase 0 Navigation Server partitions data across disks using hashing ranges or schema partitioning o If the join attribute is not the partitioning attribute Navigation Server moves all the join data to a single node 0 Navigation Server currently runs on ATampT 3600 Kien A H ua 90 Commercial Product ATampT GlS 0 ATampT Global Information Solutions formerly NCR Corp has ported the Teradata s parallel database product to the ATampT 3600 o A Unix version of the Teradata software is now available Database processes are no longer tied to proprietary hardware but can run on general purpose Unix computers Klan A H ua 91 Commercial Product IBM DB2 Parallel Edition o In addition to supporting SE environment the DB2 Parallel Edition supports the IBM SP2 SN multiprocessor 0 Parallel query execution is determined by a cost based parallelizing optimizer 0 Utilities are also parallelized including load index backup restore and recovery Kien A H ua 92 Challenges Facing Parallel Database Technology 0 Multimedia applications require very large network l O bandwdith o A parallel parallelizing query optimizer will be able to consider many more execution plans 0 Design parallel programming languages to take advantage of parallelism inherrent in the parallel database system 0 We need new data partitioning techniques and algorithms for non relational data objects allowed in SQL3 eg7 graph structures


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Kyle Maynard Purdue

"When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the I made $280 on my first study guide!"

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.