Data Bases ECS 289F
Popular in Course
Popular in Engineering Computer Science
This 48 page Class Notes was uploaded by Ashleigh Dare on Tuesday September 8, 2015. The Class Notes belongs to ECS 289F at University of California - Davis taught by Bertram Ludaescher in Fall. Since its upload, it has received 62 views. For similar materials see /class/187773/ecs-289f-university-of-california-davis in Engineering Computer Science at University of California - Davis.
Reviews for Data Bases
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/08/15
Dataflow vs Controlflow Fuzzy distinction yet useful for specification language model synthesis scheduling optimization validation simulation formal verification Rough classification control don t know when data arrive quick reaction time of arrival often matters more than value data data arrive in regular streams samples value matters most B Ludaescher EC8289FW05 Topics in Scientific Data Management Dataflow vs Controlflow Specification synthesis amp validation methods tend to emphasize for control eventreaction relation response time Real Time scheduling for deadline satisfaction priority among events and processes for data functional dependency between input and output memorytime efficiency Dataflow scheduling for efficient pipelining all events and processes are equal B Ludaescher EC8289FW05 Topics in Scientfic Data Management Process Networks Communicating processes with directed flow communication token stream between two processes process operations on tokens host language process description coordination language network description token stream 7 process process channel W B Ludaescher EC8289FW05 Topics in Scientific Data Management Kahn process networks 1974 FIFO special class of process networks stream is FIFO with unbounded capacity process destruc ive read consump ion at process start nondestructive write production at process end blocking read process only executed if data available nonblocking write 1 2 3 2 1 EMMPLE B Ludaescher EC8289FW05 Topics in Scientfic Data Management 0 1 Kahn Process Networks Formalism Sequence a stream X X1 X2 Prefix ordering X1 X2 gX1 x2 X3 XXo X1 lub X g Y where XgY for all X ex increasing chain of seq where X0 g X1 Least upper bound X Continuous process F lub X lub F X B Ludaescher ECSZBQFW05 Topics in Scienti c Data Management Kahn Process Networks Formalism ptuple of sequences ordered set of seq XX0 X1 XP 68quot XgX if X g X for each i XX01X11 Fsp gtsq set of ptuple of sequences functional process FIFO FIFO Continuous process F lub x lub F x B Ludaescher ECSZBQFW05 Topics in Scientfic Data Management Kahn Process Networks Monotonicity Monotonicity ng FXgFX It can be proved that a continuous process is monotonous given a part of the input sequence it may be possible to compute part of the output sequence B Ludaescher ECSZBQFW05 Topics in Scienti c Data Management Monotonic does not imply continuous Consider F S a S 0 if X is a finite sequence Fan 7 0 l otherwise 39 4 Only two outputs are possible both nite sequences To Show that this is monotonic note that if the sequence X is in nite and X E X then X X so YFX EY FX 5 If X is nite then Y F X 0 which is a pre x of all possible outputs To show that it is not continuous consider the increasing chain XXOX1whereXO 2X12 6 Where each X has exactly 139 elements in it Then D X is in nite so FUX01l UFx0l 7 Iterative computation of this fruiction is clearly problematic A useful property is that a network of monotonic processes itself de nes a monotonic pro cess This property is valid even for process networks with feedback loops as is fonnally proven using induction by Panagaden and Shanbhogue 78 It should not be stu39prising given the results B so far that one can formally show that networks of monotonic processes are determinate Nonmonotonic Processes Canonical nonmonotonic process fair merge Fairness every nonempty sequence is processed X17X21X3 39 39 x1 y1 xg y y3n y11y27y3 39 39 X11X2 ZE 39 X11 y17 X2 WA273 s x1x2x3 gt V X lv y17 X2 V1Y2 B Ludaescher ECSZBQFW05 Topics in Scienti c Data Management Nonmonotonic Processes In the previous example we have x1x2 y1y2y3 g x1x2x3 y1y2y3 but x1 y1 x2 y2 x3 y3 and x1y1x2y2 y3 o are incomparable The process is not monotonic needs prediction of the future to be really fair The process is not continuous In fact the process is not even a deterministic function B Ludaescher ECSZBQFW05 Topics in Scient c Data Management Fair Merge ab c a c Input 8 is a pre x of ab and c is a pre x of c7 but neither of the possible outputs ac nor ca is abc ac f abc a pre x of abc ca abc Fair merge interleave input streams X1 and X2 to produce output stream Y B Ludaescher ECSZBQFW05 Topics in Scienti c Data Management Least Fixed Point Semantics Let Xbe the set of all sequences A network is a mapping Ffrom the sequences to the sequences where I represents the input sequence XFX The behavior of the network is defined as the unique least fixed point of the equation LFP If F is continuous then the least fixed point exists LFPLUBF LIn20 B Ludaescher ECSZBQFW05 Topics in Scient c Data Management Description Logics Formerly known as terminological logics ldea logic language for defining concepts in terms of other concepts interrelating concepts constraining the meaning of concepts DL de nition of Happy Father Example from Ian Horrocks Ulrike Sattler Man I I Elbaschllclt llue n 3ha3 ch11dxmn v haschildt appy Ll Rich a Ludaescher ECSZEBFVWDE Tup cs m Smenuf Data Management Example Domain Knowledge to glue SYNAPSE amp NCMIR Data cumn Z hL mnpunnIcm xun Jcndntue Snmm 39mnrlanmrnr SpinyJVL umn Neuron l iihas plnu l rurkinpcl39cll l ymmlduli cll Splnybiwrun ntndiilu 311gtBIZIHLquot Shall Braml1 l I ll msSpint Spine l cnmaim lumijnndingfluwin Spun F lon cgulmiugl mu39mnmu cri I Z hj lI CChsJIfNUElUlIili39iMillwith lunJiimlii b Pminin I mwin 1icusntmlsIonskm ii lnnchLIl1IiI H nmponcn gt 39ma c1ii a aw e sf mw er uiv n rera 1 L r r a Source Contextualization mm lularnnm through Ontology Refinement WWW I In add an fo regsferng hanging off dafa reafVe fo 4 eXsfng concepfs a source w w may also refI72 fIe medafors w W M quotan domain map 7quot m Mmvllnnamamp Satmam am 7 9 g 4 y t t n I l r v nun NIPnu P Ilaawn I m 7 a y a r 39 a I ma rau mm mu m mumu una mun ImI 72 d it ilee m rxpDnya uuJ trigGiant damexpat rml sources can regsfer new concepfs af fhe medafor 2 increase your a a fa usabiI39fy a Ludaescher ECSZEBFVWUE TDp cs m Smenuf Data Management Structured Inheritance Networksquot Brachman 1977 KLONE Brachmam Schmolze 1985 Core ideas 7 Buiioing biucks atomic concepts unarypreoicatesi uctors fur puiioing cornpiex concepts and roies frurn ier ones 7 Autumated inference fur concept subsum ption and instance classi cation israisinstanceruf are no 7 Constr sirnp n e E definitiunsinstance properties Mull E rig mam i i newton top W iii iiiiinii i AEUXH e Degennrien lagicsm mui ion Hank one some Uinkasuwav mizcoz Wm nonee ioimre 2mm Knowledge Base DLStyle Terminological Knowledge TBox 7 Concept De nition naming uf concepts SpinyNCllUn E Neuron n aimsninc uns training uf concepts Neuron E Elias Compartment eAxiom gt a rneoiators gioe knuwiedge suurce e n277img77i3 Neuron gt tne concrete instancesinoiViouais uf tne conceptsciasses tnatyuur sources export Example TBox Atomic concepts FEW M1 Base concepts 7 ennca concepts 7 w M1 M2 Rules 7 Mid Canccpt Dcnniticn Axinm C where A atomic couch c D complex concept expressions Example TBox Base Ennczpts F sun Female ueeur Em the RHS unly De ned cnnczpts F F W h t 4 Base interpremtinn J mtemret base eencepts unly Extens39nnluf un same dumam as J and agrees En base Wxth T t e nilnriahfewery base mterpreta un has exact1y une extensmn thaws amodelufT Problem Exercise t Fzzwm m M3 MZF VVHDIquotW Let the interpremtinn IFersunx be x15 apersun S1m1larlyIFemalex x1sfemale Quesnun WhatduW M1 etc mean 7 Back to Scientific Workflows Actororiented programming in Kepler Actors to do the acting Director to factor out the orchestrationcontrol BlackBoxl to factor out the flight recording eg by default leave a trace of certain ops in the black box BlueBoxl to factor out the Grid resource allocation and scheduling eg schedule an abstract work ow without specifying on which host a job is run which data transfer protocol is used between hosts how jobs and data is shipped SHA a Luuaesener Ecszaapvvnajuples lrl Scientific Data Management Shipping and Handling Algebra SHA Iran 1 b plan YC FA of XB 1 XB to A YA FAXA YA to c 1 2 2 FA gt B YB FBXB YB to C 3 XB to c FA gt c YC FCXC 1 3 Physca View SHA Pans a Luuaesener Ecszaapvvnajuples lrl Scientific Data Management An oversimplified Model of the Grid Hosts h1 h2 h3 DataHosts d1 hi d2hj FunctionsHosts f1 hi f2hJ f 9 X gtY Z Given dataworkflow as a functional plan Y fX Z gY as a logic plan fXYAgYZ Find Host Assignment di hi fl hj for all d f st d3h3 fh2d1h1 is a valid plan Iiiquot a Ludaescner ECSZEBFVWEIE TDplES in Scientific Data Management Fair Merge Revisited Fairness don t let any channel assembly line starve Determinism input sequence functionally determines output sequence a Ludaescner ECSZEBFVWEIE TDplES in Scientific Data Management Process Network wl One Peek PN1 P Fair NonDeterministic FND 1 L1 X L2Y gt delX delY out XY 2 L1X L2 gt delX out X 3 L1 L2Y gt delY out Y 4 L1 2 gt FND is nondeterministic since the arrival order which is outside of the model determines the output of mergex1 x2 y1y2 FND is fair since even if the sequences have different lengths they are served on a rst come rstserve asis In particular even if one linechannel produces tokens much fasterthan the other a bounded buffer is suf cient provided the merge actor itself works fast enough E Ludaescher ECSZEBFWDEJDpiES in Scientific Data Managemem Process Network wl One Peek PN1 P DUM Deterministic Unfair Merge 1 L1X lt L2Y gt delX out X 2 L1X gt L2Y gt delY out Y 3 L1X L2Y gt delX delY out XY Here we assume that the tokens X and Y have a builtin quotserial number an integer and we know that each channel produces increasing sequences of serial numbers with unknown gaps though Note that case 3 can be avoided simply by making serial numbers unique terministic since for anytwo input sequences includin knowledge of each token39s serial number the output sequence is determined increasing serial numbers 39 39r since we must request one token each forthe comparison ofthe merge step which unfairly starves the longer sequence say L2 while waiting for the never appearing quotpostultimatequot token ofthe shorter sequence say L1 That is once we39ve consumed the last token of L1 we keep waiting inde nitely for a token that won39t arrive a Ludaescher Ecszaapwnajepics in SEiEritifiE Data Management Process Network wl One Peek PN1 P So it seems that in this setting we have a tradeoff between determinism and fairness NOTE In practical settings one will have a timeout associated with each channel so that after not having seen a token after a sufficiently long time on some channel say L1 it is quotknownquot that no token will arrive on L1 thereby allowing the actor to process the tokens from L2 If the timeout assumptionquot is valid no token can arrive after the timeout then this yields a fair and still deterministic merge actor then the actorbecomes nondeterministic again sincethe mere actor ca prematurely pick the quotwrongquot token the one With the higher serial number ming nothing Will arrive on the presumably dead channel E Ludaeschei ECSZEBFWUEJDpiES in Scientific Data Managemem However if the timeout assumption is violated which even happens in practice n Process Network without Peek PN QuestionExercise What happens to the previous cases ifyou cannot peek behind the black wall or similarly if as in the case of PtolemyPN you always get true but are put to sleep you are waiting for the requested token that s not yetever there a Ludaescher Ecszaapwnajepies iri SEiEritifiE Data Management Course Overview 1 Data Integration and structured relational databases knowledgebased extensions ontologies semistructured XML databases 2 Scienti c Work ows Data ow process networks W service work ows The Kepler system 3 Student projects on 1 and 2 Dara Integmion Scienti c Data st Workfl ow Wedge Engineering Represenhtion Perfect Recall Database Systems 165A sts of Database Management System DBMS interreiated data Which are stored in mes Data can come from commerciai orsmentifm appneatmns and none E g a Scientific database mtgnt contain tntannatmn about known ntaiagteat enernteat astronomicai entttes ian experimen s e c A Database Management System ts a coiiectmn at software arre s it provmes users and appncatmns wttn an enwonrnent tnat ts convenient and efficient 0 USE Relational Database Model Think of a relational DB as a number of tables each have a particular schema Coursenstructor Name Quarter Department The tablerelation name Course identifies which table we are talking about The attributecolumn name eg Instructor corresponds to the column header Elements aka instances or tupes of a tablerelation can be written eg as follows Course Gertzquot ECS165A W2005quot CS Course LudaescheI ECSZBQF W2005quot CS Example 1193 Gertz ECS165A W 2005 CS Ludaescher ECSZSQF W 2005 CS The same in Datalog notation as a set of facts course LudaescheI ECSZBQF W 2005 CS course Hmm looks like a Spreadsheet o but there are differences What are they WW l Datalntegrat nMe atorSystem USERchm warmers s ls WWW Query Languages Databases can be queried We state a question usually in terms ofthe given database schema about the stored data 39 Query languages such as Datalog and SQL Structured Query Language are declarative just say what you re interested in you do not need to give the details how to retrieve the data but can focus on the what to retrieve Question What s the difference between keywordbased search and querying a database But watch out on keyword search in databasesquot DATALOG o A relational database is given as a sex of acts cployauiphn 40600 toys ezployeelmary 65000 cs depths wary a A Datalog program defines views by means of rula of the form Head 4 Body39 busleml gr I anployaafklpi Salary Depcllo daptCDechu gr p 4 ezployaalfsp Salary J Salary gt 50000 highpaid o EDB extensionally defined relations facts omplvyaoS dept2 IDB inicnsioually i e ruleJ de ned relations views boss 2 o A query is I view with 1 distinguished answgrc n relation ansue kpmg 4 e 10yeep Salary DepLXO dept DepzlinKgr Notation lowercase relation names employee3 higbpmdl and constants aka data X toys 50000 o UPPERCASECapilalized variables L p X quot139 means don39t care values 1 Fllwl thr qpvvn DATALOG Examples of Relational Operations Reismna Dperatcans have cuneise representations Examples selix 1 FXY ia no xY 1 ch some uples IranJ FOLY proJIXJ thY 39l PROJECT on the rst argumt joleYZ pom girl 7 JOIN perm qlCDl LL Bc 111dele 7 pix q 39 1 PHDDUCT at pm and cY J muraeccix 39 pix th Z ZEITERSECFIlJH of pm thI diff 7 plx not in1 IL SETDIFFERENCE1 100 c102 mmnm 7 pm 1 union a 9m unxunm 00 391 mm qUU Rules have 1 39Icgnal vendingquot Le rules are formulas v i dlf xj 4 p39i n nle l V X unzoanl 4 pl i vin W a Luuaeschev Ecszaapvvn What is a Query A guem expression eg in SQL or in Datalog denotes a query but we still don t know what a query is A guem is a generic mapping ffrom instances of an input schema EDB to instances of an output schema IDB f instEDB e instlDB Note Different query expressions can denote the same query mapping Example 5 Luuaeschev ECSZEBFVVEIiTup as m Scienlilc Data Management What is a Query A query is a generic mapping f 39om instances ofan input schema EDBto instances ofan output schema IDB r instEDB ampinstlDB gen ric invariant under renamings r ie fr 1 r071 for all database instance Iofthe schema EDB Examples Consider EBD pX empNS Which of the following are generic feven T if x i px isin DB I is even fJeff NS i empNS in DB 1 N Jeftquot Problem How can one evaluate DATALOG queries That is given a database instance a t f facts how can one obtain the answerto a given query rule or set of rules DATALOG Fixpoint Semantics BottomUp n hr Mum WW m mm y mmxm is quotme mi mu ms Fll rrlhuon 62 a r vx w 1 Ms Ulilrrahuvrquot m a n n n m ButtumUuE MFuuulmSnn AwwmmmyMawmy Nunl un ihilwmulrsluzlicd 7 mm x m mm c n g mint mm m I 1 mm unleauumm am y purmmwwr Rmtuve wnwm m Example Transitive Closure eKi b lm ma 1 ans marmum m x was 103 V 12 r a a Mun um mm 2 whammynit Hm mum myqu an I y 4 u I mmmuuhym4mm WW yr mm mm m MM 1 mquot m 7 m longul m m 3 ms mm mu on was an quotMimi Emils m mm m humuw mm 4m mu 1 DATALOG Minimal Model Semantics mm m bu m 5 5mm 139 vmuhi pm 7 gum unrm e 3 M y A model of mm m r l U W m gums m min s mm mum nmpmnm m mm m lle an uvmvumumn w n wnltm a 2 m m mm mle g r m mm M mm mendzd39 mudan m Mm maku u as M r 2 mum er Wm minunnlmndels ml mm 5 naollvev model M c w Maw s an Huupruanon Hi Im Malian mm mm mmva Query Languages for Relational Databases so mum mm m as Ramauh mm mm am mm mm wn u a m m am m m mm W W H mm SQL my up 1 sum quotmm any in ammo mm x my mob m mammmn asmry nmuu immyeampy Salary nwunmammwmmn Duals whammy 1 e Pkwy suny Wm mumwh xm Syntax of FirstOrder Logic FO Logical symbols A v a gt lt gt V for all El exists Nonlogical symbols A FO signature 2 consists of constant symbols abc function symbols f g predicate relation symbols pqr function and predicate symbols have an associated arity we can write eg p3 f2 to denote the ternary predicate p and the function fwith two arguments Firstorder variables Vars x y Formation rules for terms TermZ constants and variables are terms if t1tk are terms and f is a kary function symbol then ft1tk is a term B Ludaescher E08289FW05 Topics in Scientific Data Management Syntax of FirstOrder Logic FO Formation rules for formulas FmIZ if t1tk are terms and pk is a predicate symbol of arity k then pt1tk is an atomic formula AtZ short atom all variable occurrences in pt1tk are free if FG are formulas and x is a variable then the following are formulas FAG FVG F F gtG Flt gtG F Vx F for all x Fx is true Elx F there exists x such that Fx is true the occurrences of a variable x within the scope ofa quanti er are called bound occurrences B Ludaescher E08289FW05 Topics in Scientific Data Management Examples Vx malePersonx gt personx malePersonbill childmarriagebilhillarychesea Variable x Constants Oary function symbols billO hillaryO chelseaO Function symbols marriage2 Predicate symbols malePerson1 person1 child2 B Ludaescher E08289FW05 Topics in Scientific Data Management Semantics of Predicate Logic Let D be a nonempty domain aka universe of discourse A structure is a pair I D with an interpretation I that maps each constant symbols 0 to an element ce D each predicate symbol pk to a kary relation p g Dk each function symbol fk to a kary function f Dk gtD Let I be a structure Vars gt D a variable assignment A valuation valL maps TermZ to D and leZ to true false val x 8x for x e Vars val ft1tk lf val t1 val tk for ft1tk e TermZ val pt1tk p val t1 val tk for pt1tk e AtZ val F G val F and val G for FG eFmIZ for le2 over v a gt lt gt VEI in the obvious way B Ludaescher E08289FW05 Topics in Scientific Data Management Example Formula F Vx malePersonx gt personx Domain D b h c d e Let s pick an interpretation I Ibi b Ihiary h Ichesea c Iperson b h c ImaePerson b Under this I the formula F evaluates to true If we choose I Iike Ibut I maIePerson bd then F evaluates to false Thus Iis a model of F while I is not I F I F B Ludaescher E08289FW05 Topics in Scientific Data Management FO Semantics cont d F entails G G is a logical consequence of F if every model ofF is also a model ofG F G F is consistent or satis able if it has at least one model F is valid or a tautology if every interpretation of F is a model Proof Theory Let FG be FO sentences no free variables Then the following are equivalent 1 F1 Fk G 2 F1 Fk gt G iS valid 3 F1 A A Fk A a G is unsatisfiable inconsistent B Ludaescher E08289FW05 Topics in Scientific Data Management Proof Theory A calculus is formal proof system to establish F1 Fk G via formal syntactic derivations F1 Fk G where the denotes allowed proof steps Examples Hilbert Calculus Gentzen Calculus Tableaux Calculus Natural Deduction Resolution Firstorder logic is semidecidable the set of valid sentences is recursively enumerable but not recursive decidable Some inference engines httpwwwsemanticweborginferencehtml B Ludaescher ECSZBQFWOS Topics in Scientific Data Management Semantic Tableaux Rules 0 f5 xruleforFAB a2 61 52 3 rule for F A v B y rule for F Vx AX Y 7 substitute a VVariable X with an W 5 arbitraly term t 8 teTermg 1m 8 rules for F Elx AX t arbitrary c new 7 substitute a EIVariable X with a new constant c A branch is closed if it contains complementary formulas A tableaux is closed if every branch is closed R Hda h r Fr QFVWOS FO Tableaux Calculus Theorem Soundness Completeness of Tableaux calculus Let A1 Ak and Th be firstorder logic sentences Recall a sentence is a closed formula ie has no free variables Then the following are equivalent 1 A1 Ak Th 2 A1 Ak a Th is unsatisfiable inconsistent 3 There is a closed tableaux for A1 Ak a Th B Ludaescher E08289FW05 Topics in Scientific Data Management Example Given A1 for all x Mx 9 Px A2 for all x Px 9 exists y cxy and Hy Show Th For all x Mx 9 exists y cxy and Hy Proof by contradiction Show that A1 A2 not Th is unsatisfiable B Ludaescher E08289FW05 Topics in Scientific Data Management Back to Ontologies What is a Conceptualization Conceptualization Geneser universe of discourse domai relations on2 above2 ce Compare A and B wordA onab onbc onde wordB onab oncd onde tabeb ta e e two different conceptualizations or rather two different states of the same conceptualization E Ludaesenen ECSZEBFVWEIS TDpiES if Scientific Data Management Intensional Structures Meaning is not in a single state of affairs extensional relations but can be captured by intensional relations An nary intensional relation R over domain D is a function R 39 W amp PowersetDquot Wset of possible worlds w1 w2 W3 a possible world is one state of affairs or a situation PowersetDquot set of all subsets of Dquot D x x D So for each WE Wwe have Rw a subset of Dquot ie with each world we associate the interpretation of R in that world B Ludaescher E08289FW05 Topics in Scientific Data Management Example Syntax signature vocabulary 2 constant symbols ab relation symbols on2 table1 Semantics domain D abock bbock Structure D with some interpretation Ia abock Ib bbock IOn ab IPJC Ide abock bbock Itable c e I B Ludaescher ECSZBQFW05 Topics in Scienti c Data Management How can we capture some of the meaning of onness Many things can be said about onness physics of gravity pressure and deformation etc o What is common among all possible states of on2 over a certain domain D That is if we look at all possible worlds W and the values that lonw can take what is common among all those states What is always true in all possible worlds about on2 is part of the meaning of on2 D Vx a onxx in all possible worlds x is not on x D Vxy a onxy onyx in all possible worlds no x is on y while y is on x Good enough what about onab onbc onca Even worse What if someone sees on and understandsinterprets it as below we only capture some aspects using the above ontological theory B Ludaescher E08289FW05 Topics in Scientific Data Management Where we are so far Intro to Data Integration Datalog mediators more to come your projects schema matching simple query rewriting Intro to Knowledge Representation amp Ontologies description logic firstorder logic more to come FCA biological pathways amp ontologies Intro to Scientific Workflows and Kepler more to come lectures and your projects ing Scientific Workflows Process Integration and Today Link Semantics support for workflow modeling and design data discovery serviceactor discovery data lttigt actor binding actor lttigt actor binding El Ludaesener ECSZEBPWUS Tupies in Scientific Data Management PIW Workflow en Dream Frumoler identi cation wovkliw lPlWlaims alconslvuclmg models at transcription actor binding Sites in identify Dovegulaleri genes staniiigrrorir niicmarray dale Rigm Click and Corinqmeto nreuiry tiie qerie Accession Numbers in quotes separated by commas lo be investiged GEHSMDESSiDHNgy S gSM rm NliinnllLinnmmW n Lixiiiiinriiiiasemmitei CluslalW Resiiils Dismay Mergearm Disc m A Simplified Scientific Workflow Model A SWF consists of a setA a1 a2 an of actors i NOTE we alien but not aima s oln39y other terms ar we Will say more ai V each actor a can have a number of named ports p1 pk a port is either an input port or an output port a set C C1 02 cm of channels ie directed edges linking outputs ports to input ports NOTE more will be said about JULIi om he oi LhEc I a Cozriposzte actorsem Hui r 39 39 39ulJtnralg innquotone a 1 these actors are executed model39s of39compuratron and directors 0 C 2 a Ludaescher ECSZEBFVWEIE Tupics if SEiEntifiE Data Management A Simplified Scientific Workflow Model We can capture the topological structure of a SWF in different forms simple ones are as relations channel4 or even channel2 channelfromactor outport toactor inport channe a1p1 a2 p1 channe a1p1 a2 p2 channelfromactoroutport toactorinport channe a1p1 a2p1 channe a1p1 a2p2 plus a table associating ports with actors a Ludaescher ECSZEBFVWEIE Tupics if SEiEntifiE Data Management Port Annotations amp Types Port Name for identification purposes Port lO type to say whether tokens flow in or out of the port Structural type to describe the dataobject structure of the port can be captured via a programming languagedata type array0n of record ofaint b oat an XML Schema relational schema employeerecordSSN Name DeptNo Storage type to describe details of the physical representation iht16 Semantic type to describe the semantics ofthe tokens being transferred on channels a concept name concept expressions or a more general constraint 5 Ludaescher Ecszaapwnatuples lfl smenm Data Management Semantic Types and Constraints The goal of the semantic type is to provide a link between the structural type and concept expressions from an ontology In its general form a semantic type is given as a semantic annotation over two schemas the conventional structural schema S the semantic schema ontology O o A semantic annotation is denoted as a logic formula constraint 0c involving symbols from the structural schema S and the ontology schema Here we focus on at having the form at S 9 O virtually populate ontology structure 0 with instances of S a Ludaescher Ecszaapwnatuples lfl smenm Data Management A Semantic Annotation 0L S 9 O 1 x biom gt x OBSERVATION 2 x biomyr 2739 22gt x TEMPORALCONTEKT y YEAR 3 x blomseas 139 W 22gt x TEMPORALCONTEXT y WINTER 4 x b10mseas v39 5 22gt x TEMPORALCONTEXT 2y SPRING 5 x biomseas P F 22gt x TEMPORALCONTEXT 2y FALL 6 x biomplt y 22gt x SPATIALC ONTEXT y PLOT 7 x biomqd 2 22gt x SPATIALCONTEXT y QUADRAT 8 x biomspp 339 gt x ITEM y SPECIES 9 x biombm y gt x PROPERTY y BIOMASNGSM Schema elements Ontology Structural type S Semantic type 0 E Ludaesenen ECSZEBFVWDE TDplES ln SElEntlflE Data Management Propagating Semantic Annotations 0 2 a I 3 cr W S S gt t actor 1 L q J Given structural schemes 8 input and 8 output and an ontology O a semantic annotation XI 8 O a quay annotation q 89 8 Problem compute 06 E Ludaesenen ECSZEBFVWDE TDplES ln SElEntlflE Data Management E39x lmplf 1 Ontology At hm39mo a s tnltttt spect 2 mteltt have pmudcd a scmamtc annomu Applications Augmentation Consider ml ncmr es m l mess dam The developer of on at forAt39s ntput txphcllt tyslnlmg lel tlwm dealgntdlu msum WF design time N1CI 39RILI4MS tlttttt ttttt tt mtttttttnttttdettgttet Conn cls U1 nulpul ttl tttttttlter clttr At to the tnpttl um Adora Acmr ttttd Hs llnf HIV tltttt fur the Unlpul urn tt helllt mllk HIIIH L connections ttnttm ttt ltttt been dentml vtn prttpztgnutttt tttdtcttnng thttt Data binding time Ag otttpttt 15 amps SHIIOHURRENLE t e He mtnmettt 39 sum teettlnng from n conespontltng aggtegme npemtton Actor gt Data utottttnxmg data In utdct at the ink connections data at t bindingquot WF ru ntime to hc scmanncall type Cancel We must have that tt IS H H cotnptttbla39 mt t s ttumtttt Rum semantlc fagglng 0f ch Choose to mu out putpagatlon system H automatic derived data mettle it thl nugmcm the gut otttologv 0 nit tt 5 atllr products dtlion lax 0m mt r M 39 39 in Cr zucuvr modequot nsktng ll wet to ttetcmnnr whslhe ll m ferred Milan 5 correct and r t can be included m 0 or Whether there 5 sotnevlnng wrong wnlt the cmntecnon E Ltd Ontology O at tttt m Gt tSVJ tttmttet rm m39txnt tt wt SH 7 Flgme 3 E v ttztttttt ottxpttwr 1 new at n v dun mm mu tt 41mm n 3 alwluUHWthSmnt01kr n v tttttttettt39tmtxt nutmttt39nxtt 1 n a tllt0RlClV1le lml mAlCoHlU ttt IVLL ntnttm n at MSQt mw t m Elil rt 1quot LkatnR Attt DL B n tl tuttmosM n Gttttmt wth1ku Bhnlhxpy n Pm Meenut uk Umle LllLH t tn tn tn tn tn tn tn tn tn tn tn tn tn UN lute n an 1m rt mm H ll n tt tt mnutt ttxttvttt t tUxttt A slmpu ed measurement ontology wttn ecologtcat concepts Scientl Ic Workflow w Query Annotations q mi Iiirmle biomyr seas pll qd Spp bml l seasonal commmmy Sszdispp ill lid w l M i Mi ml min l i m mm mm 1 m in mm ii in n vii ii App m Ham m L ml e iiwimii m mm Liam im nlvu ii we w iii mmquot n minimum gnaw 1 pi it mm elnu Mil in my Mi min lmml lmr mmlmi i Sum Uulpml iim Frouuclivlly m p N WSW win m 1 m ii iii i w W 1 pl tilt Murmur WW um iiquot lgt2 imam im i m t W mm ll Marni in will i minim mm in my iii in mm e w 2 m mnllihen r mm way a Figure 2 A scienti c warkl luw wmi query annatalions q In sol tor compuimg species richness and pmduciiviiy 14 see Figure 4 ior semaniic annotaiions in Annotation Constraint a s 0 on Vx ocsx occx a y oc z z x u y 15 links the variables X to schema elements of S 10 is conjunction of comparisons over X and constants a0 populates the ontology structure 0 X biomseasS S w gt X observationtemporalContext S VWnterSeason 1500 1000 102 We consider query annotations q and semantic annotations a of the form 39 quot m W Semantic Annotations a Hailing i ago 5 Here P5 is a logic atom over the output schema S and if is a query over the input sehemats S Similarly in a semantic annotation a we relate instances ol39a sehema S with those ol39an on ology via subl39ormulas d cm 4a is as follows We would like to relate instances of the output schema S39 39 t cos ol39ips in 4 that imply certain instances of rig in a to be true For at U E these we then have established the desired relation between S and 0 we can denote thisabstractly as S q S a More precisely eonsiderq as a constraint ol the form 407 Ps39 m Qt Qi39l PS and 1 ol39 the form toc Ai AMAti ag W or Hem P539 is a logic atom over the output schema S Similarly the Q and A are logic atoms over the input schemas S l39rr stiorder formulas Note that we assume that all 3 quantilied variables it and yahove have been eliminated through Skolemizationquot so that q and a can be seen as im plicitely Viquantil39red formulas with variables it and 5c respectively In particular we can assume that the variables it in 407 are disioint from the variables 5 i 1 5c rrcLionqi AMA 4 A N with 11 P5 gt Q an M P5 a it Now Aim ie Q 1127 Then we can inl39er l39rom Observe that q can be written as a corr39 assume that there is a substitution 039 that unilics some atom QU and so qr Ps i Q and a a new annotation constraint 1 ol the lorm mi Pg A 15A1 7 a 15 where as Am is the conjunction A A A At with Am removed It is easy to show that the annotation all is implied by 4 and t In this way by successively resolving away atoms A 39 m 25 with matching atoms Q from 5 we can obtain new semantic annotations a that relate elements ol the output schema S 39 to those in the ontology 0 Example Biodiversity Workflow Exarrrple f Cumidcr the following query annotation Expressed as u skolernierl lirstrt orzlur formqu t nrrln llmum miimmtitv actor in Figure 2 Itnzbiomllyrc seasv rubiomlyztr seas plt A f l55cdl5ppgflll A gtntp p1tl qdq Sppzp binIi a r qd 39 spDp bmbt and semantic annotation 9 in Figure 4 expresst as the tlvstrmdcr I39nnnula t2 tbiotttbnbrl a t39lt39ttotErrn Brmtnsst it can resolve t l with 42 using the substitution tr I39 i i my i i bls which results in the formula 3 rzbiotnll r39 tlIlTFMZI BlOMt seast ll qd1 sppp bmb a ssl A ftntsscdsppgtm A gtntp instances littkcrl 39 Observe that we now have but values For the output selteruubioml semantic lly annotated as Brot tlu ouvlt llrc I ItOl hRH role to corresponding triples We ttlsoger some additional inl39unnntirin which cant he viewed its con ts uverlhe input schema These mlrliriunul constraints can be ignored bringing 3 into our stunrlanl l39orrn for semantic annotations which results in t41rbiom1lyrr seastplt1 qdq spppbmh atirzorrrxn39zh rosinsst We can t39unltcrsimplit39y H by dropping the attribute rrn39zrhlts not used elsewhere in the annotation giving t5 abiomlbml2 a niPRor39ErtTv Btoiuitssl Et Ludaescner ECSZEBFVWDE Tunic in Scientific Data Management Scientific Workflows ActorOriented Modeling amp Design Language Issues Different levels of SWF granularity u Hierarchical View Functional View plumbing level connecting to remote resources movmg Nested Work ows IDatarow View data around launching local or remote applications monitoring amp restarting jobs etc intermediate levels involving some database queries data transformations data analysis amp visualization designlevel conceptual diagrams What languages programming metaphors programming constructs execution models are r smemummi 9i Wu adequate for the different levels 1 5 quotquotquot quotquotquot39 39 I I Subsyitemiieveli Elf Output How to combine different aspects eg dataflow I Subsystemtlevelij controlflow etc System Abstractions Source Workflow based Process Controlling Michael zur Muehlen 2003 Actororiented Design amp Data ow ObjectOriented vs ActorOriented Interfaces Object Oriented ActorDataflow Oriented Object orientation TextTo Speech Ten 1 39mm class name initializeo void tear in marn m d ata 1335 Obiole an 1 methods getSpeechO double limite feee gives lonecidng titer have to AC0 miierfeea k m t m are ca relurn time d I an FGEIP mi rm and Eli gave you agarad t er p541 921 fig mam W mag mmgr W quotchir mm m Source Edward Lee et a httpptolemyeecsberkeleyedu ActorDataflow orientation actor name data state I 1 d 1 parameters 0 n u a a ut ut ata p ports p Directan of flow is impfed by IO part type Source Edward Lee et a httpptolemyeecsberkeleyedu ports B Ludaescher ECSZBQFWOS Topics in Scienti c Data Management B Ludaescher ECSZBQFWOS Topics in Scient c Data Management Flowbased Design Patterns More Flowbased Design Patterns Connection tokens power line JDBC connection tokens SRB connection tokens GSle proxies certi cate tokens P f 39 Exercise Design a WF that uses a connection poolquot Generality vs specialization of actors as te and data transformation steps Stageexecutefetch pattern GridWFs Loops higherorder functions map foldr c Taverna39s automatic loop insertion based on data types State changing pipeline 5 O mzml state fxiv 089 Xivxzv Kn Why Ptolemy II and thus KEPLER BehaVIoral Polymorphism In Ptolemy Ptolemy ll Objective 7 The recus is en asseinpi er ccincunent cciinpcinents The key undenying pnncipie in tne prciiect is tne use at Wellrdefined ineueis er cciinputatici tnat govern tne lntEraEtan petween cciin pcinents A inaicir prublem area peing addressed is tne use it interface erneterciuenecius mixtures erin ueiscir m utatlun Receiver Th I h H d I t H t fl Process Networks wlnatural support for abstraction 252 P ynforp 395 me o 5 39mP emfquot e I ell strea In actororientation actor and war ow reuse C mmumca39wquot SE39M39I39ICS of a dam Iquot Iquot e Cumpcinent actcir interacticin semantics nuthardrwired insiue Bumpunentsi put Ptolem I The receiver instance used in factured cut in a dlrEEtDr 9Cumpunent inieiaciiun semantics is nut emevgent ie an accident utthe panicuiai 99 Take C quot quot quotquot39 quot quot 395 s quot d h quot39E d39rec39quot cumpunentswuvkmgtugethev WelcomeHEW OF ON not by the component 7 Dineientuirectcnsrciruine mu Ellng anu executicin neeus can even pe pasRoom boo9577 cciinpineu tci SEImE extent hasToke7 poocan Cf A W5 397 MEN4 I etter abstractlun deEllng cciinpcinent reuse UserOrientati Mm Take 7 W ktlow design amp exec consoie Vergil GUl W 7 lIcatIonGluerWare exceIIent inuueiing and design suppurt quot7quot runrtlmE suppurt m lt rl g ncita iniuuieunueiware We use seinecine eise s e g Gl bus seat put iniuuieunueiware is cunveniently accessipie tniciugn actursl Pragmatics Rmim e Ptulemy ll is mature E ntlnu usly extended amp lmprDVEd Wellrducumented EDEpr cipen sciuice system Ptulemy ll talks actively participate in kEPLER a lumen Ecszxvrvins i Domains and Directors Semantics for Polymorphic Actor Components Working Across Component Interaction Data Types and Domains Actor Data Polymorphism 39 Cl PUShPU component 39nteraCtlon Add numbers int float double Complex quotjdswml CSP concurrent threads with rendezvous Add strings concatenation CT continuoustime modeling Add complex types arrays records matrices DE discreteevent systems Wigwam MW Add userdefined types I DDE distributed discrete events ACtor Behav39ora39 P 39ym rph39sm 1 i riraiagiimm g In dataflow add when all connected inputs have data FSM fmlte State maChmes In a timetriggered model add when the clock ticks 39 DT dlscrete tlme CyCIe drlven 13m Mugs gt i In discreteevent add when any connected input has data Giotto synchronous periodic eta ai i ie ii i and add in zero time GR 2D and 3D graphics In process networks execute an infinite loop in a thread that blocks when reading empty inputs PN process networks In CSP execute an infinite loop that performs rendezvous SDF synchronous dataflow quot quotW r quot In pushpull ports are push or pull declared or Inferred SR synchronousreactive and behave accordingly TM timed multitasking In realtime CORBA priorities are associated with ports and a dispatcher determines when to add Source Edward Lee et a1 httpptolemyeecsberkeleyedu B Ludaescher ECSZ89PWO5T0piCS in R l we n r r apvvos V Source Edward Lee et a1 httpptolemyeecsberkeleyedu Com ponent Com position amp Interaction Everything Flows But what exactly mec k 39amn Building Applications by Composition Data ow Data flows through operations zoom into your CPU Activity diagrams data flows through actions Process networks data flows between processes Controlflow Nodes are controlflow operations that start other operations on a state Mixed approaches Statecharts events trigger state transitions Petri nets tokens mark control and dataflow Workflow languages mix control and dataflow many others Connecl uses Forts Io Provides Ports an rowed dis dim Mmg stilt 12x9 maul for gymmill madame do quotblip WI 6 B Ludaegche EcsmngE Source GRISTSC4DEVO workshop July 2004 Caltech I R i dz h F s quWOS A Closer Look at Dataflow or Do you know what s going on under your carpet Dataflow what you see is what you get almost 1 llli lhtiaazc gar 5 eg m a 1 m toyig rwx Need a general systematic way to handle references B Ludaescher ECSZBQFWOS Topics in Scienti c Data L quotcleanquot Data tokens flow almost no other side effects WYSIWYG usually References flow DataControlFlow Spectrum dotoctlflow special tokens flow message pass ng control flow token reference type may be httpget ftpget hsi put generic handling still possible Application specific tokens flow eg current Nimrod job management in Resurgence invisible contract between components Director is unaware of what s going on sounds familiar Specific message passing protocols eg CSP MPI for systems of tightly coupled components B Ludaescher ECSZBQFWOS Topics in Scient c Data Management Wan ijmmow t 4391 quotn B 3 45 F quot 39 mapfstyle iterators A Scientific Workflow Problem More Solved Computer Scientist s view Solution based on declara ive it functional data ow process network I In also a data streaming model 1 Powerful type Higherorder constructs mapf checkin 7 pt 3 Rewritings amp optimizations eg m Generic declarative mapf o g mapf o mapg 39 programming 2 no control ow spaghetti l39 T c quot5t t5 2 dataintensive apps M 2 free concurrent execution t t 2 free type checking Generic data 3 automatic support to go from transformation actors piwGeneld to PIW mappiw over Geneld Forward only abstractable sub B Ludaescher ECSZBQFWOS Topics in Scient c Data 39 39 g 39 work ow piwGeneld A Scientific Workflow Problem More Research Even More Solved domainampCS coming together I I I 7 Hquot Ii 7 I I l IltTI 39 75th I T I I T 5 A A army5 11 wanna r v T M P 39139 quotI l 339 ii 1 innerS quot String blinctor in E l I 39 I 1 zzlps Litnur39 u gt3er I tansm StrumWu I i5lrn in Earnu lr 39 N in I Alums it 451163quot I Intuit17L i minm 39Q SC J JD 39 I39 ir u rarrr m r u r 39 r i equot imapGe bnkWS l 39 r id awaresworkalows 39iquot39quotquotquot quot quot 39 Input NM001924 NM020375 Output i CAGTAATATGAC GGGGACAAAGA i 1 r 31 J n 1 quotTif39r r vkuiltsa Source RealTime Signal 395 39 Processi Da a o Visualand 39 L u o n I I Functiogagl ProtgramVKIning deld mu 4 r r r r gt I JohnReekieUniversityof quot 7 quot 39 quot quot39 quotquot3quot 39 3 quotquot5 39 quot quot 39 Technology Sydney 39 Hf A DA NM hunt I39M r t gt r WWW Mm lt3 A MVWWquot Clean functional semantics facilitates algebraic work ow program transformations I r 5 ir ee ens eg map map 9 map g wreaks WWW B dM rt St S sq B Ludaescher ECSZBQFWDS Topics in Scienti c Data Managemen B Ludaescher ECSZBQFWDS Topics in Scient c Data Management Scientific Workflows Some Findings Scientific Workflows vs Business Workflows Very different granulari ies from highlevel design to lowest level 39 Busmess work ows BPEL4WS plumbing Taskorientation travel reservations credit approval BPM More data ow than business contr0 work ow Tasks documents etc undergo modifications eg flight reservation from D N t K I SC R S t T T reservedto ticketed but modified WF objects still identifiablethroughout 39scovery e eper Uh Cl eglc nana avema Complex control flow complex process composition danger of control Need for programming exten5ions flowdataflow spaghettiquot terations over lists foreach filtering functional composition generic amp Dataflow and controlflow are often divorced higherorder operations zip mapf Scientific Workflows Need for abstraction and nested work ows Data ow and data transformations Need for data transformations WS1DTW82 Data problems volume complexity heterogeneity Need for rich user interaction amp workflow steering Grid39aSPeCtS ause revise resume Distributed computation p Distributed data seletczjt amp Frjrgithegq web browser capability at speCIfic steps as part of a UserinteractionsVVF steering coor In 6 H Data tool and analysis integration 39 Needf0r h39 hthrofi h t data tranSferS and CPU Cyles Data39G d39 Dataflow and controlflow are often married can bea happy marriage at times enabling streaming 39 Need for 0f intermediate prOdUCtS and W Business Process Execution Language for Web Services in case you Wondered B Ludaescher ECSZBQFWDS Topics in Scienti c Data Management B Ludaescher ECSZBQFWDS Topics in Scient c Data Management Outline Last time Cal language for declarativer specifying actors actor A k Inputl InputZ gt Output action a b if ka b end end Today Ptolemy expression language map functinnt racHGeneID string Start int merge rec AccessionNumherAchum Homologirray End int QueryStart int QueryEnd int 1 B Ludaescher ECSZBQFWOS Topics in Scienti c Data Management Ptolemy ll Expression Language can be very useful for data plumbing tiieckepleimiknuws Elie Mew Edit graph Debug Q Q 9 IspalPlWxtul nei bllO 9Fl o mm Q Aclais V L Promoter identi cation workliow PlW aims at constructing medeis allrariscnptlori faclarbinding sites la identiiy ctrregulaled genes starting itom micmanay data Buick Seaich i i lexl cancepl Right click and Con gure D modify tiie gene Accession Numbers in quotes separated by eeninias to be investigated 7 Diieclais Aclais Straw EDENquot semmmss nmmbem Emression SequenoeTnNmY Expressl 2 WebSenice Damain specilic inputs snowsine physcai allgnnienl Filleis ul inuiiipie germ equeucee Cansl Uses DDELCiusalw Mulllvl 1 CWEM Aiigiiiiieni reoi Camplex Sliucluies Tm aclnr xecules 31 mm Gane Sequurims Piecesan i Bis swim larnach gem nnlcmd in J SW Mm GeusAccesenNun ther tie J Vaiiahles Aimmi Man Coleman iieni Lawrence Livemmre National Laboraan Version snevisien 113 iiieyiiiiinm mingan mug Xianwnn xiquot lmm ne SPA pm ecl Aulhur allintas S i iesuits laund c i L B Ludaescher ECSZBQFWOS Topics in Scienti c Data Management PIW Example ClustalW Results Displzi PIW Example mWGe n9 Se qua nee PrucessingBlast duhlasl w r wvo lename CWD quotPIWBLASTInpuLba deHmller mum S 0 7 raadFHem enama previous 5 SampleDelay ExpressionS InpuLsplItde mner SampleDela rmm Bomea witch T BlasLWeb Service Expressmns arrayxcoum quot9 arraylangthm Expression oquut Fl Iudas rhsr Fr 7RqFW05 El Expression Evaluator E Elle elp l x 2 l l 1 2 2 Example Expressions gtgt a1 132 a 1 b 2 gtgt al b2getquotaquot 1 gtgt input The ID input is undefined gtgt SintpiJ39Z 10 gtgt 121o 1 3 5 7 9 lgtgt sinttpif101210 030901699437 19 08090169943749 10 08090159943749 03090169943749 gtgt 122o 02 5 l lxl l l u rlt I llll constants pi e true i literals 20 2 23i a string variables x 110 value comparison true 1equals10 also type comparison 9 false BooleanExpr TrueExpr FalseExpr B Ludaescher ECSZSQFW05 Topics in Scienti c Data Management Use of Expressions in actor parameters beware of string parameters though in port parameters a parameter that is also a port provides a default value which can be overridden with the value provided at the port it minis quotWW quotquot 9 CT 50quot customize the name E 399 r Parsmaev l g F Fnepgfy39 t39l39 Get Documentation C CulnrA ribnile A V P FonParamerer l l I 1 IE ScopEExtendirv t quot3 umsy ams dgtgtn0lSaLeyer COl39lflnglZ39 ChiE 0le us nmize Name Ga Dommanratian Edit Pa ramele r naiseLevel Rename nuiseLevel e 939 gt New name inasm O Shaw name Commrt nulse ll l I 397 gtgtnmseLeelllll lw FIGURE 32 AparfParmnerer is both a port and a parameter To use it in a composite actor drag it into the actor Change its name to something meaningful and set its default Value B Ludaescher ECSZSQFW05 Topics in Sinewave El Sinewave Customize Name 991 mu imnniniinn can Edit parameters for Sinewave 39 7 l Listz 3 samplingFrequency goon il Loot frequency 44110 phase LU Sinewave a Add Remove Edit Styles Cancel Con gure Customlze Name SDF DIVector Get Documentsan Generate a sine wave Con gure Pong Llsten to Actor gtgtnequency 4400 Ramp T gFunmn WW A Egtgtphase 00 k Customize Name Get Documentation Con gure Ports ll Listen to Actor Set Breakpoints Look Inside Edit parameters For Ramp 2 ringCountLimlt n lnit phase step requencWZ39PIsampitngFrequemy I Add Remove Edit Styles cancel FIGURE 33 Siuewave actor showing its port parameters and their use at the lower level of the hierarchy Expression Expssan I I 0tlogrilill8Name 21 Get Documentation Look Inside Listen to Actor bl i rigire ports for Expression lg t 0mm F input 539 output f multiport Dmpuwpew Expresslan I t X39 F input F output f multioort I W output f multiport We unknown Add Remove Help Cancel C 251 r Expression expression E 3030 Add Remove Edit Styles cancel FIGURE 35 Illustration of the Expression actor d e The Expression actor by default has one output and no inputs a The rst step in using it is to add ports as shown in b and c resulting in a new icon as shown in d In 0 when you click on Add you will be prompted for a Name pick one and a Class Leave the Class entry blank and click OK You then specify an expression using the port names as shown in e resulting in the icon shown in f B Ludaescher ECSZBQFWOS Topics in Scienti c Data Management Composite Data Types Arrays are ordered sets oftokens eg 1 23 1 2 3 4 5 element access gtgt10 231 23 Matrices are multidimensional arrays but of for some mostly numeric types only Array and matrix operations available Eg matrix multiplication multiplication w a scalar etc a Ludaescner Ecszaapvvnajupms lrl smenm Data Management Composite Data Types Records like tuples with named attributes eg a1 bquotfooquot note type is lntegerx String parts are accessed as expected a1 b foo a orjust a yieds 1 operators can be applied as well gtgt foodCost40 hotelCostlOO foodCost20 taxiCost20 foodCost60 works like an intersection gtgt intersect al 02 a3 b4 laN this is really intersect on attributes and pick first record merge gtgt mergeal b2 a3 c3 al b2 c3 a Ludaescner Ecszaapvvnajupms lrl smenm Data Management
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'