DB Sys Concepts& Design
DB Sys Concepts& Design CS 6400
Popular in Course
verified elite notetaker
Popular in ComputerScienence
This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 6400 at Georgia Institute of Technology - Main Campus taught by Shamkant Navathe in Fall. Since its upload, it has received 7 views. For similar materials see /class/234049/cs-6400-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.
Reviews for DB Sys Concepts& Design
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/02/15
File Organizations and Indexing NOTES VI File Organizations and Indexing Based on Chapters 56 in Fundamentals of Database Systems by Elmasri and N avathe Ed 3 QUILUJNH Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files 61 Static External Hashing 62 Dynamic and Extendible Hashing Techniques Indexes as Access Paths Types of Singlelevel Indexes 81 Primary Indexes 82 Clustering Indexes 83 Secondary Indexes Multilevel Indexes Using BTrees and BTrees as Dynamic Multilevel Indexes Fundamentals of Database Systems 61 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 1 Disk Storage Devices Preferred secondary storage device for high storage capacity and low cost Data stored as magnetized areas on magnetic disk surfaces A disk pack contains several magnetic disks connected to a rotating spindle Disks are divided into concentric circular tracks on each disk surface Track capacities vary typically from 4 to 50 Kbytes A track is divided into blocks The block size B is xed for each system Typical block sizes range from B512 bytes to B4096 bytes Whole blocks are transferred between disk and main memory for processing A readwrite head moves to the track that contains the block to be transferred Disk rotation moves the block under the readwrite head for reading or writing A physical disk block address consists of a surface number track number within surface and block number within track Reading or writing a disk block is time consuming because of the seek time s and rotational delay latency rd Double buffering can be used to speed up the transfer of contiguous disk blocks Fundamentals of Database Systems Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 51 Fundamentals of Database Systems 63 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 52 Fundamentals of Database Systems 64 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 2 Files of Records A le is a sequence of records where each record is a collection of data values or data items A le descriptor or file header includes information that describes the le such as the field names and their data types and the addresses of the le blocks on disk Records are stored on disk blocks The blocking factor bfr for a le is the average number of le records stored in a disk block A le can have xedlength records or variablelength records File records can be unspanned no record can span two blocks or spanned a record can be stored in more than one block The physical disk blocks that are allocated to hold the records of a le can be contiguous linked or indexed In a le of xedlength records all records have the same format Usually unspanned blocking is used with such les Files of variablelength records require additional information to be stored in each record such as separator characters and field types Usually spanned blocking is used with such les Fundamentals of Database Systems 65 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 3 Operations on Files Typical le operations include OPEN Readies the le for access and associates a pointer that will refer to a current le record at each point in time FIND Searches for the rst le record that satis es a certain condition and makes it the current le record FINDNEXT Searches for the next le record from the current record that satis es a certain condition and makes it the current le record READ Reads the current le record into a program variable INSERT Inserts a new record into the le and makes it the current le record DELETE Removes the current le record from the le usually by marking the record to indicate that it is no longer valid MODIFY Changes the values of some elds of the current le record CLOSE Terminates access to the le REORGANIZE Reorganizes the le records For example the records marked deleted are physically removed from the le or a new organization of the le records is created READORDERED Read the le blocks in order of a speci c eld of the le Fundamentals of Database Systems Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 4 Unordered Files Also called a heap or a pile le New records are inserted at the end of the le To search for a record a linear search through the le records is necessary This requires reading and searching half the le blocks on the average and is hence quite expensive Record insertion is quite ef cient Reading the records in order of a particular eld requires sorting the le records 5 Ordered Files Also called a sequential file File records are kept sorted by the values of an ordering field Insertion is expensive records must be inserted in the correct order It is common to keep a separate unordered overflow or transaction le for new records to improve insertion ef ciency this is periodically merged with the main ordered le A binary search can be used to search for a record on its ordering field value This requires reading and searching log2 of the le blocks on the average an improvement over linear search Reading the records in order of the ordering eld is quite ef cient Fundamentals of Database Systems Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 59 Fundamentals of Database Systems 68 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 6 Hashed Files 61 Static External Hashing The le blocks are divided into M equalsized buckets numbered bucketo bucket1 bucketM1 Typically a bucket corresponds to one or a xed number of disk block One of the le elds is designated to be the hash key of the le The record with hash key value K is stored in bucketi where ihK and h is the hashing function Search is very ef cient on the hash key Collisions occur when a new record hashes to a bucket that is already full An over ow le is kept for storing such records Over ow records that hash to each bucket can be linked together To reduce over ow records a hash le is typically kept 7 O80 full The hash function h should distribute the records uniformly among the buckets otherwise search time will be increased because many over ow records will exist Main disadvantages of static external hashing Fixed number of buckets M is a problem if the number of records in the le grows or shrinks Ordered access on the hash key is quite inef cient requires sorting the records Fundamentals of Database Systems Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 512 Fundamentals of Database Systems 610 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 62 Dynamic and Extendible Hashing Techniques Hashing techniques are adapted to allow the dynamic growth and shrinking of the number of le records These techniques include the following dynamic hashing extendible hashing and linear hashing Both dynamic and extendible hashing use the binary representation of the hash value hK in order to access a directory In dynamic hashing the directory is a binary tree In extendible hashing the directory is an array of size 2d where d is called the global depth The directories can be stored on disk and they expand or shrink dynamically Directory entries point to the disk blocks that contain the stored records An insertion in a disk block that is full causes the block to split into two blocks and the records are redistributed among the two blocks The directory is updated appropriately Dynamic and extendible hashing do not require an over ow area Linear hashing does require an over ow area but does not use a directory Blocks are split in linear order as the le expands Fundamentals of Database Systems Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 513 Fundamentals of Database Systems 612 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 7 Indexes as Access Paths A singlelevel index is an auxiliary le that makes it more ef cient to search for a record in the data le The index is usually speci ed on one eld of the le although it could be speci ed on several elds One form of an index is a le of entries lt eld value pointer to recordgt which is ordered by eld value The index is called an access path on the eld The index le usually occupies considerably less disk blocks than the data le because its entries are much smaller A binary search on the index yields a pointer to the le record Fundamentals of Database Systems Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing Example Given the following data le EMPLOYEENAME SSN ADDRESS JOB SAL Suppose that record size R150 bytes block size B512 bytes r3 0000 records Then we get blocking factor Bfr B div R 512 div 150 3 recordsblock number of le blocks b rBfr 300003 10000 blocks For an index on the SSN eld assume the eld size VSSN9 bytes assume the record pointer size PR7 bytes Then index entry size R1VSSN PR9716 bytes index blocking factor Bfr1 B div R1 512 div 16 32 entriesblock number of index blocks b rBfr1 3000032 938 blocks binary search needs log2b1 log293 8 10 block accesses This is compared to an average linear search cost of b2 300002 15000 block accesses If the le records are ordered the binary search cost would be log2b log230000 15 block accesses Fundamentals of Database Systems 614 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 8 Types of SingleLevel Indexes 21 Primary Index De ned on an ordered data le The data le is ordered on a key field Includes one index entry for each block in the data le the index entry has the key eld value for the first record in the block which is called the block anchor A similar scheme can use the last record in a block 22 Clustering Index De ned on an ordered data le The data le is ordered on a nonkeyfield Includes one index entry for each distinct value of the eld the index entry points to the rst data block that contains records with that eld value 23 Secondary Index De ned on an unordered data le Can be de ned on a key eld or a nonkey eld Includes one entry for each record in the data le hence it is called a dense index Fundamentals of Database Systems 615 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 61 Fundamentals of Database Systems 616 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 62 Fundamentals of Database Systems 617 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 63 Fundamentals of Database Systems 618 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 64 Fundamentals of Database Systems 619 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 65 Fundamentals of Database Systems 620 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT TABLE 62 HAVE TO INPUT IT Fundamentals of Database Systems 621 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 9 MultiLevel Indexes Because a singlelevel index is an ordered le we can create a primary index to the index itself in this case the original index le is called the firstlevel index and the index to the index is called the secondlevel index We can repeat the process creating a third fourth top level until all entries of the top level t in one disk block A multilevel index can be created for any type of rstlevel index primary secondary clustering as long as the rstlevel index consists of more than one disk block Such a multilevel index is a form of search tree however insertion and deletion of new index entries is a severe problem because every level of the index is an ordered file Because of the insertion and deletion problem most multilevel indexes use Btree or Btree data structures which leave space in each tree node disk block to allow for new index entries Fundamentals of Database Systems 622 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 66 Fundamentals of Database Systems 623 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 67 Fundamentals of Database Systems 624 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing 10 Using BTrees and BTrees as Dynamic Multi level Indexes These data structures are variations of search trees that allow ef cient insertion and deletion of new search values In BTree and BTree data structures each node corresponds to a disk block Each node is kept between halffull and completely full An insertion into a node that is not full is quite ef cient if a node is full the insertion causes a split into two nodes Splitting may propagate to other tree levels A deletion is quite ef cient if a node does not become less than half full If a deletion causes a node to become less than half full it must be merged with neighboring nodes Difference between Btree and Btree In a Btree pointers to data records exist at all levels of the tree In a Btree all pointers to data records exists at the leaflevel nodes A Btree can have less levels or higher capacity of search values than the corresponding Btree Fundamentals of Database Systems Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 68 Fundamentals of Database Systems 626 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 69 Fundamentals of Database Systems 627 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 610 Fundamentals of Database Systems 628 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 611 Fundamentals of Database Systems 629 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 612 Fundamentals of Database Systems 630 Ramez Elmasri and Shamkant B Navathe File Organizations and Indexing INSERT FIGURE 613 Fundamentals of Database Systems 631 Ramez Elmasri and Shamkant B Navathe Fundamentals of DATABASE SYSTEMS FOURTH EDITION Chapter 3 Data Modeling Using the EntityRelationship ER Model Chapter Outline 0 Example Database Application COMPANY 0 ER Model Concepts 7 Entities and Attributes 7 Entity Types Value Sets and Key Attributes 7 Relationships and Relationship Types 7 Weak Entity Types 7 Roles and Attributes in Relationship Types 0 ER Diagrams Notation 0 ER Diagram for COMPANY Schema 0 Alternative Notations 7 UML class diagrams others ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Cha ter 373 In 97 Example COMPANY Database 0 Requirements of the Company oversimplified for illustrative purposes 7 The company is organized into DEPARTMENTS Each department has a name number and an employee who manages the department We keep track of the start date of the department manager 7 Each department controls a number of PROJECTS Each project has a name number and is located at a single location ElmasrilNavathe Fundamentals of Database Systems Foulth Edition ghagmjr 374 In L Example COMPANY Database Cont We store each EMPLOYEE s social security number address salary sex and birthdate Each employee works for one department but may work on several projects We keep track of the number of hours per week that an employee currently works on each project We also keep track of the direct supervisor of each employee Each employee may have a number of DEPENDENTs For each dependent we keep track of their name sex birthdate and relationship to employee ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chagter 375 In Hquot i ER Model Concepts 0 Entities and Attributes 7 Entities are specific objects or things in the miniworld that are represented in the database For example the EMPLOYEE John Smith the Research DEPARTMENT the ProductX PROJECT 7 Attributes are properties used to describe an entity For example an EMPLOYEE entity may have a Name SSN Address Sex BirthDate 7 A specific entity will have a value for each of its attributes For example a specific employee entity may have Name39John Smith39 SSN3912345678939 Address 39731 Fondren Houston TX Sex39M39 BirthDate3909IAN55 7 Each attribute has a value set or data type associated with it 7 e g integer string subrange enumerated type ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Qha her 376 In L Mg a I I I Types of Attributes 1 0 Simple 7 Each entity has a single atomic value for the attribute For example SSN or Sex 0 Composite 7 The attribute may be composed of several components For example Address Apt House Street City State ZipCode Country or Name FirstName MiddleName LastName Composition may form a hierarchy Where some components are themselves composite O MultiValued 7 An entity may have multiple values for that attribute For example Color of a CAR or PreviousDegrees of a STUDENT Denoted as Color or PreviousDegrees ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Cha ter 377 In in Types of Attributes 2 O In general composite and multiValued attributes may be nested arbitrarily to any number of levels although this is rare For example PreviousDegrees ofa STUDENT is a composite multiValued attribute denoted by PreviousDegrees College Year Degree Field ElmasrilNavathe Fundamentals of Database Systems Foulth Edition ghagngr 37 In L Entity Types and Key Attributes O Entities with the same basic attributes are grouped or typed into an entity type For example the EMPLOYEE entity type or the PROJECT entity type 0 An attribute of an entity type for which each entity must have a unique value is called a key attribute of the entity type For example SSN of EMPLOYEE O A key attribute may be composite For example VehicleTagNumber is a key of the CAR entity type with components Number State 0 An entity type may have more than one key For example the CAR entity type may have two keys 7 VehicleldentificationNumber popularly called VIN and 7 VehicleTagNumber Number State also known as license jlate number 5 ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 379 In quot394 ENTITY SET corresponding to the ENTITY TYPE CAR CAR RegistrationRegistrationNumber State VehiclelD Make Model Year Color car 1 a ABC 123 TEXAS TK629 Ford Mustang convertible 1999 red black car2 ABC 123 NEW YORK WP9872 Nissan 3oozx 2door 2002 blue car3 VSY 720 TEXAS TD729 Buick LeSabre 4door 2003 White blue ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Cha leg 3710 In L Magi I SUMMARY OF ERDIAGRAM NOTAIJ39EN FORMER SCHEMAS ENTITY TYPE El WEAK ENTITY TYPE ltgt RELATIONSHIP TYPE 0 IDENTIFYING RELATIONSHIP TYPE ATTRIBUTE KEY ATTRIEUTE MULTIVALUED ATTRIBUTE W COMPOSITE ATTRIB UTE Cgt DERIVED ATTRIBUTE n TOTAL PARTICIPATION OF E2 IN R N n CARDINALITY RATIO 1 N EOR E E2111 R mm max STRUCTURAL CONSTRAINT mm m ax ON PARTICIPATION 9 m Elmasn39lNavathe Fundamentals of Database Systems Fourth Edition ChaBteerll Iquot ER DIAGRAM Entity Types are EMPLOYEE DEPARTMENT PROJECT DEPENDENT Elmasn39lNavathe Fundamentals of Database Systems Fourth Edition Chagter 371 Iquot Relationships and Relationship Types 1 O A relationship relates two or more distinct entities with a speci c meaning For example EMPLOYEE John Smith works on the ProductX PROJECT 0r EMPLOYEE Franklin Wong manages the Research DEPARTMENT 0 Relationships of the same type are grouped or typed into a relationship type For example the WORKSiON relationship type in which EMPLOYEEs and PROJECTs participate or the MANAGES relationship type in which EMPLOYEEs and DEPARTMENTs participate O The degree of a relationship type is the number of participating entity types Both MANAGES and WORKSiON are binary relationships ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 3713 In 394 7 Example relationship instances of the WORKSFOR relationship between EMPLOYEE and DEPARTMENT EMPLOYEE WORKSiFOR DEPARTMENT ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chapter 3714 In L Example relationship instances of the WORKSON relationship between EMPLOYEE and PROJECT ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 3715 In quotquot Relationships and Relationship Types 2 More than one relationship type can exist with the same participating entity types For example MANAGES and WORKSiFOR are distinct relationships between EMPLOYEE and DEPARTMENT but with different meanings and different relationship instances ElmasrilNavathe Fundalnentals of Database Systems Fourth Edition Kcllwagtegll In ER DIAGRAM Relationship Types are WORKSFOR MANAGES WORKSON CONTROLS SUPERVISION DEPENDENTSOF ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapterj3717 vquot Weak Entity Types 0 An entity that does not have a key attribute O A weak entity must participate in an identifying relationship type with an owner or identifying entity type 0 Entities are identi ed by the combination of 7 A partial key of the weak entity type 7 The particular entity they are related to in the identifying entity type Example Suppose that a DEPENDENT entity is identified by the dependent s rst name and birhtdate and the specific EMPLOYEE that the dependent is related to DEPENDENT is a weak entity type with EMPLOYEE as its identifying entity type Via the identifying relationship type DEPENDENT70F ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 8 M quot Weak Entity Type is DEPENDENT Identifying Relationship is DEPENDENTSOF ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter3719 iquot Constraints on Relationships 0 Constraints on Relationship Types 7 Also known as ratio constraints 7 Maximum Cardinality O Onetoone 11 I Onetomany l N or Manytoone Nl O Manytomany 7 Minimum Cardinality also called participation constraint 0r existence dependency constraints 0 zero optional participation not existencedependent I one or more mandatory existencedependent ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 3720 iquot Manytoone N1 RELATIONSHIP EMPLOYEE WORKSiFOR DEPARTMENT ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Cha er 3721 In in Manytomany MN RELATIONSHIP ElmasrilNavathe Fundamentals of Database Systems Foulth Edition C1331 leg 372 In L M Relationships and Relationship Types 3 0 We can also have a recursive relationship type 0 Both participations are same entity type in different roles O For example SUPERVISION relationships between EMPLOYEE in role of supervisor or boss and another EMPLOYEE in role of subordinate 0r worker O In following gure rst role participation labeled with l and second role participation labeled with 2 O In ER diagram need to display role names to distinguish participations ElmasriINavathe Fundamentals of Database Systems Fourth Edition Cha ter 3723 In no A RECURSIVE RELATIONSHIP SU PERVISION EE SUPERVISION EMPLOY ElmasriINavathe Fundamentals of Database Systems Foulth Edition Chapter 3724 In Wquot Recursive Relationship Type is SUPERVISION participation role names are shown ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chagter LZS vquot Attributes of Relationship types O A relationship type can have attributes for example HoursPerWeek of WORKSON its value for each relationship instance describes the number of hours per week that an EMPLOYEE works on a PROJECT 3 1 ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chagteri3726 vquot Attribute of a Relationship Type is Hours of WORKSON ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter3727 vquot Structural Constraints one way to express semantics of relationships Structural constraints on 39 quot 39 o Cardinality ratio of a binary relationship 11 1N N 1 or MN SHOWN BY PLACING APPROPRIATE NUMBER ON THE LINK 0 Participation constraint on each participating entity type total called existence dependency or partial SHOWN BY DOUBLE LINING THE LINK NOTE These are easy to specify for Binag Relationship Types ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chagterl3728 vquot Alternative minI max notation for relationship structural constraints 0 Specified on each participation of an entity type E in a relationship type R o Specifies that each entity e in E participates in at least min and at most max relationship instances in R o Defaultno constraint min0 maxn I 0 Must have minSmax min20 max 21 39 o Derived from the knowledge of miniworld constraints 1 Examples 3 o A department has exactly one manager and an employee can manage at most 7 one department Specify 01 for participation of EMPLOYEE in MANAGES Specify 11 for participation of DEPARTMENT in MANAGES 393 0 An employee can work for exactly one department but a department can have any number ofemployees Specify 11 for participation of EMPLOYEE in WORKSiFOR Specify 0 n for 39 39 39on of DEPARTMENT in WORKS FOR ElmasrilNavathe Fundamen of Fourth Edition Cha teieizg In The minmax notation relationship constraints Employee Department Employee Department ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Cha t i 973 In COMPANY ER Schema Diagram using min max notation Alternative ER Notations Elm asriINavaxhe Fundam entals of Database System s Fourth Edition Chzgte 331 Relationships of Higher Degree q Relationship types of degree 2 are called binary 0 Relationship types of degree 3 are called ternary and of a degree n are called n ary m o In general an nary relationship is not equivalent to n binary relationships 0 Higherorder relationships discussed further in Chapter 4 EImasriINavathe Fundamentals of Database Systems Fourth Edition chaEmi ssg Data Modeling Tools A number of popular tools that cover conceptual modeling and mapping into relational schema design Examples ERWin 8 Designer Enterprise Application Suite ER Studio etc POSITIVES serves as documentation of application requirements easy user interface mostly graphics editor support ElmasrilNavathe Fundamentals of Database Systems Fourth Edition ChaBter 3733 In quotquot Problems with Current Modeling Tools 0 DIAGRANHVHNG 7 Poor conceptual meaningful notation 7 To avoid the problem of layout algorithms and aesthetics of diagrams they prefer boxes and lines and do nothing more than represent primaryforeign key relationships among resulting tablesa few exceptions O lVlETHODOLGY 7 lack of builtin methodology support 7 poor tradeoff analysis or userdriven design preferences 7 poor design veri cation and suggestions for 1mprovement ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Cha leg 3734 In L Magi is Some of the Currently Available Automated Database Design Tools COMPANY TOOL Embarcadero ER Studio Database Modding in ER and lDEFlX Technologies DB Artisan Database administration and space and security management Oracle Developer 2000 and Desigler Database modeling application development 2000 Popkin Software System Architect 2001 Data modeling object modeling process modeling structured analysisdesign Platinum Platinum Enterprice Data process and business component modeling Technology Modeling Suite Erwin BPWin Paradigm Plll Persistence Inc Pwertier Mapping from 070 to relational model Rational Rational Rose Modding in U39ML and application generation in c and JAVA Rogue Ware RW Metro Mapping from 070 to relational model Resolution Ltd Xcase Conceptual modeling up to code maintenance sybase Mmquot nag mum Visio Visio Enterprise Data modding design and reengineering Visual Basic and Visual C ElmasnlNavathe Fundamentals of Database Systems Fourth Edition Uglgpteggs ER DIAGRAM FOR A BANK DATABASE mm ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chap 3 36 Emerging Database Technology Research and Applications Shamkant B Navathe Georgia Institute of Technology Atlanta GA shamccgatechedu Talk at Comp Soc of India Chennai India June 23 99 History of Database Management 19651980 Dominance of Hierarchical Model IBM s IMS and Network Based Systems IDS Honeywell IDMS Goodrich gt Cullinet gt CA DMS 1100 Univac gt Unisys TOTALSUPRACincom VAX DBMS 1980 Now Increasing Role ofthe Relational Model IBM s SQLDs DB2 DB22 ORACLE INFORMIXSYBASE ALLBASEHP 19905 Increasing acceptance of 00 and other DBMSs Late 90 s The Object Relational Approach 22 DBMS Product Scene Current Categories a ClientServer Products e Relational DBMSs and theirvariants SQL Engines Oracle DB2 MSSQL server lnformix Sybase Parallel DB Sewers TeradataATampT Oracle lnformix a ObjectOriented Databases Objectstore Versant 02Ardent Gemstone ONTOS POET a Object Relational Databases lnformix Universal Server IBM UDB Oracle Universal Server a Tools Application Prototyping tools Powerbuilder ConceptualLogical Modelling and Mapping tools Erwin S designor Process Modelling BPVWn Workflow Flowmark 3435 Product Scene The 90 s Data Warehousing Most relational vendors Redbrick HP DW market 6 to 8 Bills year OLAP Tools ROLAP MOLAP Extraction Transformation Loading Prism Extract ETI SEARCH ENGINES Emerging Applications a World V de Web The dominant force for the next 10 years a Engineering CADCAMClMCAE a Scientific Applications Weather Forecasting Genome Databases Earth Sciences GIS Environmental Applications at Telecommunications Databases Network management Telemedicine Info Brokering a Packaged Databases Hypermedia Encyclopedias Manuals Part catalogs CDROM Consumables a Multimedia Entertainment Visualization Interactive Virtual Reality based systems a Software as Data Metadata Repositories Microsoft 5 5 Across Domain Applications a Data Mining OLAP and Design Support Knowledge Generation Modeling Management Document EDI Scanning amp Storage and Archiving Complexities Vs Ease of Use a Complex Objects in General Design Objects Biological Objects Documents Physical Systems Information Flows Multiplicity of Data Types Numbers Text Images Audio Video User s Expectations on the Rise Better interfaces More visualization More animation More involvement ofthe user Different paradigms for search querying browsing navigation interactive exploration Application Challenges Multiple Dimensions of Information Structural and Behavioural Data Content Relationship and Links among data Incomplete illdefined illstructured illformed information Missing and erroneous information Going beyond Raw Data to Meaningful Information Extraction Selection Derivation Deduction Exploration and Discovery data mining Realtime response with high volume databases Automatic Control Robotics Standards Document Text a HTML a XML a EDI standards Engineering at PDES Products Data Exchange at STEP ISO Objects a Models ODMG a De nition lDL a Querying ODL 9amp9 Lack of Standards Security a Modeling a Language a Implementation GUI Scripting Languages a CGI a PERL a Java script Object Relational Database Management a SQL3 104nm Evoluti on of Business Applications Q Q Q Q 60 s Batch reports hard to find and analyze information in exible and expensive reprogram every new request 70 s Terminalbased D88 and EIS executive information systems still in ex ble not integrated with desktop tools 80 s Desktop data access and analysis tools query tools spreadsheets GUls easier to use but only access operational databases 90 s Data warehousing with integrated OLAP engines and tools OLTP vs OLAP l39l39P MAP User a Clerk IT Professional a Knowledge worker Functjm Day to day operations a Decision support DB DesJignApplicationoriented E o Subjectoriented Star snow ake Data 9 Current Isolated 9 Historical Consolidated View a Detailed Flat relational 9 Summarizwy Usage 9 Structured Repetitive Mu39tidimenSiona39 UHil DN VDISBhort Simple Ad hoc transaction 9 Complex query Access 9 Readwrite o Read Mostly Operatmndndexhash on prim Key 9 Lots of Scans Reconls iQBSsed 9 Millions Userso Thousands 9 Hundreds Db size 100 MBGB O 100GBTB Data Warehouse a A decision support database that is maintained separately from the organization s operational databases a A data warehouse is a subjectoriented integrated timevarying nonvolatile collection of data that is used primarily in organizational decision making Why Separate Data Warehouse in Performance 7 Op dbs designed amp tuned for known txs amp workloads 7 Complex OLAP queries would degrade perf For op txs 7 Special data organization access amp implementation methods needed for multidimensional views amp queries Q Function Missing data Decision support requires historical data which op dbs do not typically maintain Data consolidation Decision support requires consolidation aggregation summarization of data from many heterogeneous sources op dbs external sources Data quality Different sources typically use inconsistent data representations codes and formats which have to be reconciled Data Warehousing Market Q 2 B market in 1998 Gartner Group Expected to grow to 698 in 1999 Gartner 70 of Fortune 1000 companies use inhouse code for warehousing solutions Forrester Oracle lnformix Sybase IBM and Teradata are dominant players Q Q Already deployed in many industries manufacturing retail financial insurance transportation telecom utilities healthcare Data Warehousing Architecture I In servers Analysis Query Benorting llth nata Marts Approaches to OLAP Servers Q Relational OLAP ROLAP Relational and Specialized Relational DBMSto store and manage warehouse data OLAP middleware to support missing pieces gtgt Optimize for each DBMS backend gtgt Aggregation Navigation Logic gtgt Additional tools and se 39 s Example Microstrategy MetaCube Informix Multidimensional OLAP MOLAP a Arraybased storage structures Direct access to array data structures Example Essbase Hyperion Accumate Kenan Q Relational DBMS as Warehouse Server Schema design Specialized scan indexing and join techniques Handling of aggregate views querying and materialization Supporting query language extensions beyond SQL Complex query processing and optimization Data partitioning and parallelism Populating amp Refreshing the Warehouse a Data extraction a Data cleaning a Data transformation Convert from legacyhost format to warehouse format a Load Sort summarize consolidate compute views check integrity build indexes partition if Refresh Propogate updates from sources to the warehouse Data Cleaning amp Why Data warehouse contains data that is analyzed for business decisions More data and multiple sources could mean more errors in the data and harder to trace such errors Results in incorrect analysis a Detecting data anomalies and rectifying them early has huge payoffs a Important to identify tools that work together well a Long Term Solution Change business practices and data entry tools Repository for metadata Data Cleaning Techniques a Transformation Rules Example translate gender to sex Products Warehouse Manger Prism Extract ETI a Uses domainspeci c knowledge to do scrubbing a Parsing and fuzzy matching Multiple data sources can designate a preferred source Products Integrity Vality Trillum a Discover facts that ag unusual patterns auditing Some dealer has never received a single complaint Products WizRule Load a Issues huge volumes of data to be loaded small time window usually at night when the warehouse can be taken offline When to build indexes and summary tables allow system administrator to monitor status cancel suspend resume load or change load rate restart after failure with no loss of data integrity Techniques batch load utility sort input records on clustering key and use sequential lO build indexes and derived tables sequential loads still too long 100 days for TB use parallelism and incremental techniques Incremental Load Q Full load may still take too long Entire load is a long batch transaction replace old table with new after transaction commits use periodic checkpoints after failure restart from last checkpoint Use incremental loads during refresh to reduce data volume eg RedBrick Table Management Utility Insert only updated tuples now incremental load conflicts with queries break into sequence of shorter transactions every 1000 records every few seconds Q coordinate this sequence of transactions must ensure consistency between base tables and derived tables amp indices Refresh a Issues when to refresh on every update too expensive only necessary if OLAP queries need current data eg upthe rninute stock quotes periodically eg every 24 hours every week or after significant events refresh policy set by administrator based on user needs and traffic possibly different policies for different sources how to refresh Refresh Techniques a Full extract from base tables read entire source table or database expensive may be the only choice for legacy databases or files a Incremental techniques related to work on active dbs detect amp propagate changes on base tables replication servers eg Sybase Oracle IBM Data Propagator snapshots amp triggers Oracle transaction shipping Sjbase ogical correctness computing changes to startables computing changes to derived and summary tables optimization only signi cant changes transactional correctness incremental load 39 Research Issues in data warehousing a Data cleaning focus on data inconsistencies not schema differences data mining techniques Physical Design design of summary tables partitions indexes tradeoffs in use of different indexes Query processing selecting appropriate summary tables dynamic optimization with feedback acid test for query optimization cost estimation use of transformations search strategies partitioning query processing between OLAP server and backend server Q Q 26 Emerging Technologies tied to Data Management amp amp amp amp amp Communicating Databases ubiquitous databases mobile applications mobile data More Intelligence in Data Management reasoning learning along many dimensions model based case based analogy based explanation based knowledge based systems with scaleable rule bases and databases Multimedia Display Visualization Animation Large Scale Software with Reuse Design Methodologies reverse engineering better control of the design process Computer Supported Cooperative Work working at home working in teams 2 Internal DB Functionality Challenges amp amp amp amp Rule processing as an integral function Rule and Transaction Interaction for Integrity control security and authorization active application control derivation and deduction Adaptable Databases Ultimate forms of schema evolution Multiple Data Types and their realtime transport Interoperability Heterogeneity Decision Support coupled with Visualisation and Animation eg SGI Mineset 28mg Trends to Consider Uninitiated untrained users getting access and direct manipulation capability Internet access particularly publicly accessed networks like the Internet open up vast databases for public consumption Standards like ODBC JDBC SOM DCOM OLE for interoperability Functionality Trends More behavioral and dynamic specifications as a part of DB definition More reliance on metadata Migrate Application semantics into a database Use of parallel processing combined with declustering 2942Q9 Future Scenarios for databases Databases that refresh themselves by linking up with multiple sites and systems Databases that migrate with the users and are a part of different federations Databases that adapt to users needs and information request pro le Databases that prompt the users when new and relevant information arrives and deliver information in appropriate form Push Technologyeg Pointcast 304436in Current Research Thrusts amp New data models for new data types and relationships image video sound hypertext Methodologies for designing large scale applications corresponding tools eg reverse engineering Parallel Database processing Data Mining and Knowledge Discovery Browsing Animation Visualization Mediators Secure databases sticky problems incompleteness inconsistency long transactions practical issues databases on the WWW application domain specific research sunf t Open Issues amp amp amp amp Push Vs Pull Technologies How to achieve the right balance Content based Retrieval On all types of data including image text audio video Coping with Information Overload How to answer a query intelligently on the web Human Vs Machine information processing impedance mismatch 324432332 Challenges for the Database Professionals a Learning the application the Jargon the Process Model of the environment complexities typical scenarios rules constraints a Apply DB Techniques to help in application conceptual modeling views transactions specification normalization query optimization a Apply techniques outside DB area to DB management From Al Information Retrieval Software Engineering Ul ma a Future a More variety of applications and data Cax where X D Design EEngineering PPublishing EEducation Scientific Biology Weatherforecasting Space Image data Entertainment Worldwide project teams Electronic Commerce at More variety of users housewives K 1 2 children occasional and nai39ve users a A great need for domain experts who can also understand db modeling and design a More demands on performance scaling to larger databases staying within decent response time MGM Fundamentals of DATABASE SYSTEMS FOURTH EDITION Chapter 11 Relational Database Design Algorithms and Further Dependencies Chapter Outline 0 Designing a Set of Relations 1 Properties of Relational Decompositions 2 Algorithms for Relational Database Schema 3 Multivalued Dependencies and Fourth Normal Form 4 Join Dependencies and Fifth Normal Form 5 Inclusion Dependencies 6 Other Dependencies and Normal Forms ElmasrilNavathe Fundamentals of Database Systems Fourth Edition CIEBter 1173 DESIGNING A SET OF RELATIONS 1 The Approach of Relational Synthesis Bottom up Design O Assumes that all possible functional dependencies are known 0 First constructs a minimal set of FDs 0 Then applies algorithms that construct a target set of 3NF or BCNF relations 0 Additional criteria may be needed to ensure the the set of relations in a relational database are satisfactory see Algorithms 112 and 114 ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chapter 1174 n DESIGNING A SET OF RELATIONS 2 Goals 0 Lossless join property a must algorithm 111 tests for general losslessness O Dependency preservation property algorithms 113 decomposes a relation into BCNF components by sacrificing the dependency preservation 0 Additional normal forms 7 4NF based on multivalued dependencies 7 5NF based onjoin dependencies ElmasrilNavathe Fundamentals of Database Systems Fourth Edition CIEBter 1175 1 Properties of Relational Decompositions 1 Relation Decomposition and Insuf ciency of Normal Forms 0 Universal Relation Schema a relation schema RA1 A27 An that includes all the attributes of the database 0 Universal relation assumption every attribute name is unique 0 Decomposition The process of decomposing the universal relation schema R into a set of relation schemas D R1Rz Rm that will become the relational database schema by using the functional dependencies ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chapter 111 Properties of Relational Decompositions 2 Relation Decomposition and Insuf ciency of Normal Forms cont 0 Attribute preservation condition Each attribute in R will appear in at least one relation schema Ri in the decomposition so that no attributes are lost 0 Another goal of decomposition is to have each individual relation R1 in the decomposition D be in BCNF or 3NF 0 Additional properties of decomposition are needed to prevent from generating spurious tuples ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Clggter 1137 V Properties of Relational Decompositions 3 Dependency Preservation Property of a Decomposition m 0 Given a set of dependencies F on R the projection of F on R denoted by pRlF where R1 is a subset of R is the set of dependencies X gt Y in Ft such that the attributes inX U Y are all contained in R1 Hence the projection of F on each relation schema R1 in the decomposition D is the set of functional dependencies in Ft the closure of F such that all their left and righthandside attributes are in R ElmasrilNavathe Fundamentals of Database Systems Founh Edition Chapter 1178 t Properties of Relational Decompositions 4 Dependency Preservation Property of a Decomposition cont El 0 Dependency Preservation Property a decomposition D R1 R2 Rm ofR is dependencypreserving with respect to F if the union of the projections of F on each inD is equivalent to F that is TERIF U U 751e11 f See examples in Fig 1012a and Fig 1011 Claim 1 It is always possible to find a dependency preserving decomposition D with respect to F such that each relation R1 in D is in 3nf ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Clggter 1179 V Properties of Relational Decompositions 5 Lossless Non additive Join Property of a Decomposition De nition quot1 O Lossless join property a decompositionD R1 R2 Rm of R has the lossless nonadditive join property with respect to a the set of dependencies F on R if for every relation state r of R that satis es F the following holds where is the natural join of all the relations in D nR1r anr r Note The word loss in lossless refers to loss of information not to loss of tuples In fact for loss of information a better term is addition of spurious information ElmasrilNavathe Fundamentals of Database Systems Founh Edition Chapterilelo Properties of Relational Decompositions 6 Lossless Non additive Join Propelty of a Decomposition cont Algorithm 111 Testing for Lossless Join Propertv Input A universal relation R a decompositionD R1 R2 Rm of R and a set F of functional dependencies z 1 Create an initial matrix S with one row 139 for each relationRi in D and one column j for each attribute A in R 2 Set SiJ39bi for all matrix entries each bij is a distinct All symbol associated with indices ij For each row 139 representing relation schema Ri 1 for each column j representing attribute A if relation Ri includes attribute Aj then set SiJ39 aj each aj is a distinct symbol associated with index 139 l ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 1111 my Properties of Relational Decompositions 7 Lossless Non additive Join Propelty of a Decomposition cont Algorithm 111 Testing for Lossless Join Propertv cont 4 Repeat the following loop until a complete loop execution 1 results in no changes to E for each functional dependency X gt Y in F for all rows in S which have the same symbols in the columns corresponding to attributes inX make the symbols in each column that correspond to an attribute in Y be the same in all these rows as follows if any of the rows has an a symbol for the column set the other rows to that same a symbol in the symbol exists for the attribute in any of the rows choose column If no a one of the 1 symbols that appear in one of the rows for the attribute and set the other rows to that same b symbol in the column 1 5 If a row is made up entirely of a symbols then the decomposition has the lossless join property otherwise it does not ElmasrilNavathe Fundamentals of Database Systems Foulth Edition ngp terllela Properties of Relational Decompositions 8 Lossless nonaddmve loin test for new decompositions a Casel Decomposition ofEMFLPROJ l to EMFLPROJl and EMFLLOCS falls test b A decomposition of EMFLPROthal has the losslessloin property al RSSN ENAME PNUMEER PNAME moomom HOURS DW1r R2 MPLOCSENAME PLOCATlDN Mlpnmtqssn PNUMEER nouns FNAME FLOOATION FssN ENAMEPNUMBEFI FNAME PLOCATlON ssNyNuMBErai Hounsi R2 SSN ENAME PNUMEER FNAME FLOCATIDN HOURS PROJECT WOHKSON Elm asriINavathe Fundam entals of Database System s Fourth Edition Chagmr 11713 um emu Properties of Relational Decompositions 8 Lossless nonaddltwe lei two Rat loin test for nary ecomposltlol is 0 Case 2 isswzwsewwsewmenocmowsswwasawoues Decomposition of 55quot em Ween mm PmnoN HOURS PROJECTlan wowsiow satis es t ow mam s asamswmy SSN awe PNUMBER puma FLDCAMN Nouns gt2 3 by an 5 malnxSaHer n mammoqu We Rimesmam a Elm asriINavathe Fundam entals of Database System s Fourth Edition Chzgmz114 um emu quot Properties of Relational Decompositions 9 Testing Binary Decompositions for Lossless Join Property Binary Decomposition decomposition of a relation R into two relations PROPERTY LJ1 lossless join test for binary decompositions A decompositionD R1 R2 of R has the lossless join property with respect to a set of functional dependencies F onR if and only if either The fd R1 0R2 gt R1 R2 is in F or The fdR1riRz gtR2 R1 is in Ft ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 1115 V Haw Properties of Relational Decompositions 10 Successive Lossless Join Decomposition Claim 2 Preservation of nonadditivity in successive decompositions If a decomposition D R1 R2 Rm ofR has the lossless nonadditive join property with respect to a set of functional dependencies F on R and if a decomposition Di Q1 Q2 Qk of Ri has the lossless nonadditive join property with respect to the projection of F on Ri then the decomposition D2 R1 R2 RH Q1 Q2 Qk RM Rm ofR has the non additive join property with respect to F ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chapterilel 2 Algorithms for Relational Database Schema Design 1 Algorithm 112 Relational Synthesis into 3NF with Dependency Preservation Relational Synthesis Algorithm Input A universal relation R and a set of functional dependencies F 39 I on the attributes of R a 1 Find a minimal cover G forF use Algorithm 102 For each lefthandside X of a functional dependency that appears in G create a relation schema in D with attributes X U A1 U A2 U Ak whereX gt A1X gt A2 X gt Ak are the only dependencies in G withX as lefthandside X is the key of this relation 3 Place any remaining attributes that have not been placed in any relation in a single relation schema to ensure the attribute preservation property Claim 3 Every relation schema created byAlgorithm 112 is in 3NF ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 1117 V Haw Algorithms for Relational Database Schema Design 2 Algorithm 113 Relational Decomposition into BCNF with Lossless non additive join property Input A universal relation R and a set of functional dependencies F on the attributes of R a 1 Set D R 2 While there is a relation schema Q in D that is not in BCNF do choose a relation schema Q in D that is not in BCNF find a functional dependench gt Y in Q that Violates BCNF replace Q in D by two relation schemas Q Y and X U Y quot No null values are allowed for the join attributes ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chgpter 1718 Algorithms for Relational Database Schema Design 3 Algorithm 114 Relational Synthesis into 3NF with Dependency Preservation and Lossless Non Additive Join Property Input A universal relation R and a set of functional dependencies F 7 on the attributes of R a 1 Find a minimal cover G forF Use Algorithm 102 2 For each lefthandside X of a functional dependency that appears in G create a relation schema in D with attributes X U A1 U A2 U Ak whereX gt A1 X gt 142X7gtAk are the only dependencies in G withX as lefthandside X is the key of this relation 43 If none of the relation schemas in D contains a key of R then create one more relation schema inD that contains attributes that form a key ofR Use Algorithm 11461 tofmd the key ofR ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 11719 lgtr4m Algorithms for Relational Database Schema Design 4 Algorithm 114a Finding a Key K for R Given a set F of Functional Dependencies illnput A universal relation R and a set of functional dependencies F a on the attributes of R 1 SetK R 2 For each attributeA in K compute K A with respect to F If K A contains all the attributes in R then setK K A g is ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chapterll o Algorithms for Relational Database Schema Design 5 lssues Wltn nullrvaluelolns a Some EMPLOYEE lupies have null forlheloll39l allrbule DNUM EMPLOYEE EImasriINavathe Fundamentals of Database Systems Fourth Edition Chagmr 11 21 mm Algorithms for Relational Database Schema Design 5 lssues With nullrvalueloll39ls b Result Of appMng NATURAL JOlN to the EMPLOYEE and DEPARTMENT relallol39ls 0 Result Ofapplyll lg LEFT OUTER JOlN 0 EMPLOYEE and l l W l E l ml we lml W l mm mm was We mmmm s w WW we we stem 5 mg m 1 mm m mm 4 mm W WW Wm 2mm t Wm SW W W W5 5 am mam em WWW 5 am W semi aw lw mm A We saw M W 0mm l mm W W l W l E 1 ml we luml ml ism ma meme lsssawg l u 5 mm 51445555 WW mm m l 5 am m M w Wm am an s MW am slam m mm aw mama W was malan 5 am A we w l 5 am we l l may We lemma 4 WW aw Mm W m mm mm W mum was we mam WWW m was lemma Elm asriINavathe Fundam entals of Database System s Fourth Edition Chzgmulda um emu quot Algorithms for Relational Database Schema Design 6 The dahgllhgtuple problem aThe relallolT EMPLOYEEJ lhcludes all allrbules of EMPLOYEE from frlgul e ll 2a except DNUM a EMPLOYEEJ ENAME ssm i BDATE ADDRESS 11h J mamas 15650159 731 Fandren stlon g F l 555 l 538Voss stlnn rx Zelaya Anna J 7777 1 955 0719 3321 Castle Spring TX allaoe Jennifer 5 987654321 19mm 20 291 Berry Bellane Narayan amasn K 555554444 195252 15 975 Flre Oak Humble TX E l A 453453553 197257731 5631 Flee Houstme Jamar Ahmad 1 9137987557 15590323 950 alles Houston TX 5019 James E 555665555 1937711710 450 Stone Hous1on39l39lt 551921 Anders 1 999775555 19555425 5530 Brass Bellalre TX Benllez Carlos M 155301479 7554 Beech Houston TX Elm asriINavaxhe Fundam emals of Database System s Fourth Edition Chagmr 11723 um emu Algorithms for Relational Database Schema Design 6 The dahgllhg lupie problem b The relallon EMPLOYEEJ lhcludes DNUM atthbute Wlth l lull values 0 l l E l N Ml w lupies lorwhlch DNUM has null values b EMPLOVEEJ E DNUM 13455759 333445555 lei EMPLOYEE WM 5 123456769 5 5 5555 5 959557777 4 7777 4 957554321 4 537554321 4 5 5 453453453 5 453453453 5 937957957 4 9379579137 A 1 555555555 1 999775555 null 835664444 null ElmasriINavathe Fundamentals ornatabase Systems Fourth Edition Chagm 11 4 mm quot Design 7 Discussion of Normalization Algorithms 0 These algorithms are not deterministic in general Algorithms for Relational Database Schema Problems 0 The database designer must first specify all the relevant a functional dependencies among the database attributes It is not always possible to nd a decomposition into relation schemas that preserves dependencies and allows each relation schema in the decomposition to be in BCNF instead of 3NF as in Algorithm 114 ElmasrilNavathe Fundamentals of Database Systems Fourth Edition ChagterlleZS Desi Algorithms for Relational Database Schema n 8 Table 11 1 Summary of some of t e algorithms discussed above Algori Input Output Properties Remarks ethrn Purpose 111 A decomposition D Boolean result Testing for on See a simpler test in ofR and a set F of r o for additive join Section 11 1 4 for functional I lossless join decomposition binary I I dependencies property decompositions 112 Set of inctional A set of Dependency N dependencies F relations in 3N39F preservation satisfying lossless join property 113 Set of inctional A set of Losslessjoin No guarantee of dependencies F relations in decomposition dependency BCNF preservation 114 Set of inctional Losslessjoin and May not achieve dependencies F relations in 3N39F dependency preserving decomposition 114a Relation schema R Key K ofR To nd a key K The entire relation quot111 f1 59101quot which is a R is alWa s a M de subset of R default superkey de endenciesF ElmasrilNavathe Fouith Edition C ter 11726 3 Multivalued Dependencies and Fourth Normal Form 1 a The EMP relation With two MVDs ENAME gtgt PNAME and ENAME gtgt DNAME b Decomposing the EMP relation into two 4NF relations EMPPROJECTS and EMP DEPENDENTS a b EMPPROJECTS EMPDEPENDENTS ENAME PNAME ENAME I DNAME I Smith X Smith John Smith Y Smith Anna ElmasriNavathe Fundamentals of Database Systems Fourth Edition Chagteri 27 i 3 Multivalued Dependencies and Fourth Normal Form 1 c The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JDR1 R2 R3 cl Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3 0 SUPPLY 1 H3 I PRQJNAME PAHTNAME PROJNAME Boll ijx Nut ProlY Boll Front Nut PrujZ Nail FVDJX ElmasriNavathe Fundamentals of Database Systems Foulth Edition Chagterl 28 Multivalued Dependencies and Fourth Normal Form 2 De nition 0 A multivalued dependency MVD X 7gtgt Y speci ed on relation schema R where X and Y are both subsets of R speci es the following constraint on any relation state r of R If two tuples t1 and t2 exist in r such that t1 X 2X then two tuples t3 and t4 should also exist in r with the following properties where we use Z to denote R 2 X U Y t3iX t4iX ltliX linXi l 3Y l 1Y and l 4Y t3Z 2Z and t4Z t1Z 0 An MVD X igtgt Y in R is called a trivial MVD if a Y is a subset ofX or bXU YR ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 11729 gtrm Multivalued Dependencies and Fourth Normal Fo rm 3 Inference Rules for F quot 39 and 39 39 T I J IRl re exive rule for FDs leg Y thenX 7gt Y IRZ augmentation rule for PBS X7gt Y lXZ 7gt YZ IR3 transitive rule for FDs X7gt Y Y7gtZ l X7gt Z 1R4c0mplementation rule for MVDs X igtgt Y lXigtgt R 7 XU Y IRS augmentation rule for MVDs le igtgt Y and WQZthen WX igtgt YZ IR6 transitive rule for MVDs X igtgt Y Yigtgt Z lXigtgt 22 Y f IR7 replication rule for FD t0 MVD X 7gt Y lXigtgt Y alle coalescence rule for PBS and MVDs le igtgt Y and there exists Wwith the properties that a W n Y is empty b W7gt Z and c Y QZ then X 7gt Z r4 5 ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chaptergl o Multivalued Dependencies and Fourth Normal Form 4 De nition 0 A relation schema R is in 4NF with respect to a set of dependencies F that includes functional dependencies and multivalued dependencies if for every nontrivial multivalued dependency X 7 Y in F2 X is a superkey Note F is the complete set of all dependencies functional or multivalued that will hold in every relation state r ofR that satis es F It is also called the closure of F EImasriiNavathe Fundamentals of Database Systems Fourth Edition chggmzruol Multivalued Dependencies and Fourth Normal Form 5 mm H WANF in i w tuples bTWo corresponding 4NF relations EMFLPROJECTS and EMFLDEPENDENTS ai EMF b EMILPROJECTS PNAME ENAME PNAME Smnh x Juhn 5min X Sm h v Anna 5mquot V 5mm x Anna 5W W Smnn v John BMWquot X 5mm w Jim Em V Brown x Jim BM Z Brown it Jim Erown 2 Jim EMPJEPENDENTS Brown w Juan Eiuwn x Joan DNAME Erwin v Juan Brown 2 Joan 5min Anna Blown w Bab Smilh John 5min x Bob Brown Jlm Brown it Bob Emwn Joan Emwn 2 Bob Ewwn Bob ElmasrilNavarhe Fundamentals of Database Systems Fourth Edition ChgmeLl 2 m E m d Multivalued Dependencies and Fourth Normal Form 6 Lossless Nonadditive Join Decomposition into 4NF Relations 1 PROPERTY LJl The relation schemas R1 and R2 form a lossless nonadditive join decomposition of R with respect to a set F of functional and multivalued dependencies if and only if R1 0 R2 7gtgt R1 R2 or by symmetry if and only if R1 0 R2 7gtgt R2 R1 ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chgpter 1173 may Multivalued Dependencies and Fourth Normal Form 7 Algorithm 115 Relational decomposition into 4NF relations with nonadditive join property 39 In ut A universal relation R and a set of functional and multivalued H P dependencies F a 1 SetD R 2 While there is a relation schema Q in D that is not in 4NF do I choose a relation schema Q in D that is not in 4NF nd a nontrivial MVD X igtgt Y in Q that Violates 4NF replace Q in D by two relation schemas Q Y and X U Y i ElmasrilNavathe Fundamentals of Database Systems Foulth Edition ChgptyerileM 4 Join Dependencies and Fifth Normal Form 1 De nition 0 A join dependency JD denoted by JDR1 R2 Rn speci ed on relation schema R speci es a constraint on the states r of R The constraint states that every legal state r of R should have a nonadditive join decomposition into R1 R2 R that is for every such r we have R10 7171120 711910 r Note an M VD is a special case of a JD where n 2 gt10 A join dependency JDR1 R2 Rn speci ed on relation schema R is a trivial JD if one of the relation schemas Ri in JDR1 R2 Rquot is equal to R ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 11735 M Join Dependencies and Fifth Normal Form 2 W O A relation schema R is in fth normal form 5N F or ii ProjectJoin Normal Form PJNF with respect to a a set F of functional multivalued and join dependencies if for every nontrivial join dependency JDR1 R2 Rn inlm that is implied by F every R1 is a superkey of R ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chapteril Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form CTne relation SUPPLYWitH l lO MVDS is ll l ANF but not ll l 5NF ifit has the JDR1 R2 R3 d Decomposing the relation SUPPLY into the 5NF relations R1 R2 arid R3 ill re quot5 m 1 HI R3 i WWE i i SM WW i i WW i WW Smilh Boll Smith PralX Ball F le 5min N Smnh PMY Nu WOW Bel W w my Walton Mm Walton Prqz Nm FiniZ Adansky Nail Adams PIOJX Nall FIOIX ElmasriINavalhev Fundamentals of Database Systems Fourth Edition Chgpmr 1147 mm Mesa 5 Inclusion Dependencies 1 m 0 An inclusion dependency RX lt SY between two sets of attributesiX of relation schema R and Y of relation schema Si speci es the constraint that at any speci c time when r is a relation state of R and s a relation state of S we must have 7ixrR Q ySS Note The subset relationship does not necessarily have to be a proper subset The sets of attributes on which the inclusion dependency is speci ediX of R and Y of Simust have the same number of attributes In addition the domains for each pair of corresponding attributes should be compatible EImasriINavathe Fundamentals of Database Systems Fourth Edition cha wringis m E i 93 Inclusion Dependencies 2 Objective of Inclusion Dependencies To formalize two types of interrelational constraints which cannot be expressed using FDs or MVDs Referential integrity constraints Class subclass relationships Inclusion dependency inference rules IDIR1 re exivity RX ltRX IDIR2 attribute correspondence If RX lt SY where X A1 A2 An and Y 32 BR and Ai Corresponds to Bi then RAi lt SBi for 1 s z39 s n IDIR3 transitivity IfRXlt SY and SY lt TZ thenRXlt TZ Hill 31 ElmasrilNavathe Fundamentals of Database Systems Fourth Edition Chapter 11739 V Haw 6 Other Dependencies and Normal Forms 1 Template Dependencies 0 Template dependencies provide a technique for representing constraints in relations that typically have no easy and formal de nitions The idea is to specify a templateior exampleithat de nes each constraint or dependency There are two types of templates tuplegenerating templates and constraintgenerating templates A template consists of a number of hypothesis tuples that are meant to show an example of the tuples that may appear in one or more relations The other part of the template is the template conclusion ElmasrilNavathe Fundamentals of Database Systems Foulth Edition Chapterilizio Other Dependencies and Normal Forms 2 Templates for some a R A 39 B 39 C D commontypes of H1 b 6 XlAB dependencies WM 31 b 22 cm a Template for m functional quot 2 V 2 dependency X gt Y bTemPIate forthe D a l A a t c D multivalued a b d XA BI dependency X gtgt Y mamas 1 1 1 I cTemplate forthe 5 1 2 52 quotc inclusion dependency a b d W50quot 1 I 2 I RXlt S al Di 1 d2 c RlAB CD SEiFG C x D Mm a in c 11 bier modusm C d 9 ElmasriNavathe Fundamentals of Database Systems Fourth Edition Chagterall ll Other Dependencies and Normal Forms 3 Templates for the constraint that an employee s salary must be less than the supervisors salary EMPLOYEE NAME SSN SALARY SUPERVISORSSN a b c d hypothesis e d f g conclusion c lt f ElmasriNavathe Fundamentals of Database Systems Fourth Edition Chagter 1142
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'