CIS499Database and Data Warehousing Design
CIS499Database and Data Warehousing Design
Popular in Course
verified elite notetaker
Popular in Department
This 17 page Study Guide was uploaded by tophomework Notetaker on Sunday November 15, 2015. The Study Guide belongs to a course at a university taught by a professor in Fall. Since its upload, it has received 17 views.
Reviews for CIS499Database and Data Warehousing Design
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 11/15/15
Running head: DDWD 1 Database and Data Warehousing Design Tony Howell Jr. CIS499 February 9, 2013 Dr. Richard Burroughs DDWD 2 Database and Data Warehousing Design Reference number: 01 DDWD 3 Document Control Change Record Date Author Version Change Reference 02/9/2013 Tony 1 Original Document Howell Jr. Reviewers Name Position Tony Howell Jr. (CIO) Chief Information Officer DDWD 4 Table of Contents Document Control…………………………………………………………………..3 1. Introduction……………………………………………………………………...5 2. Purpose…………………………………………………………………………..6 2.1. Goal…………………………………………………………………………7 3. Data Schema……………………………………………………………………..9 4. EntityRelationship (ER) Diagram…………………………………………….12 5. Data Flow Diagram……………………………………………………………..13 EntityRelationship (ER) Diagram (Microsoft Visio)………………Appendix (A) Data Flow Diagram (DFD) (Micro Visio).…………………………..Appendix (B) DDWD 5 1. Introduction: PAKIT, Inc. uses today’s state of the art technology to collect data by Web analytics and key operational system technology to provide its customers with the best product. The company is expanding its ability into the cloud computer technology and SoftwareasaService (SaaS). With the growth of the company’s abilities within the next 18 months, the additional two floor construction and laboratory installations or projected. PAKIT, Inc. ten (10) terabyte data warehouse is expected to grow by 20% each year. The data warehousing concept is the company’s best practices which will aid immensely in its requirements. This document details the lowlevel design for the Data Warehouse Management Database component of the PAKIT, Inc. system. Datasets will be stored within a flat file (directory) system on either the PAKIT, a central management system database is required to store the actual data location information as described by the PAKIT, Inc. mission statement. An EntityRelationship (ER) Diagram is provided in appendix (A) that will explain by Bentley, 2007, “depict data in terms of the entities and relationships described by the data.” (p.271). Data Flow Diagram (DFD) provided in appendix (B); described by Bentley, 2007, “depicts the existing and/or proposed processes in a system along with their inputs, outputs, and data. The models show the flow of data between and through processes and show the places where data is stored.” (p.162) The database schema is the physical model or what may be called the blue print of the database developed by PAKIT, Inc. for its analytics, computer cloud and SaaS operations. DDWD 6 2. Purpose: The business requirement of PAKIT, Inc. and subject areas in the fields on data analytics, computer cloud system and SaaS; has posed a requirement to implement the growth of its database warehouse of its ten (10) terabyte data. This document will discuss the business intelligence (BI) needed to accomplish the growth process. Data warehouses aid business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining. As discussed in the project plan the company uses the Geographic Information Systems (GIS). GIS data entities are associated with their geospatial data representations (i.e. geometry and attributes. They may also be associated with media (e.g. pictures, movies, or sound files) and any dynamic editorial event information. With the development of the database warehouse (DW) it will provide greater performance, greater capacity and greater divergence. The methods used to model the data in a data warehouse and data mart structure data to support ad hoc queries, different types of queries, and high volumes of data access. The data in the company’s operational systems is usually highly normalized to support high volumes of updates and data integrity. The data in the data warehouse will be highly denormalized to support high volumes of access. A data warehouse/data mart has very few (if any) updates. The data in a data warehouse will be updated by refreshing the data on a specific time schedule. PAKIT, Inc., will be expanding is abilities to clients in all facets and with this growth the needs of crossfunctional organizational structures must be explored. Then needs which must be explored are described by Lane, 2011, “∙ Exploratory and confirmatory analysis of clinical DDWD 7 trial data and ∙ Reporting on crossstudy data that required integration across preclinical, clinical, and safety functions.” (p. 138) 2.1 Goal: The goal for PAKIT, Inc. regarding the expansion of its database warehouse are basic and are similar in a smaller scale as addressed by Lane, 2011, “To deliver an overarching strategy that will consolidate and harmonize the complex data warehousing, BI, and DI landscape without compromising the underlying information needs. To define future architecture vision and create a longterm road map that will harness the value of information the corporation possesses and leverage the corporation’s investment in the operational systems. To embrace a federated approach that will promote integration of various DWs in a coherent manner in alignment with the functional area road maps and portfolios. The goal was not to pursue a bigbang approach (e.g., creation of a single, monolithic, enterprise wide DW) but rather to adopt an evolutionary approach. To deliver the guidance for the decision makers and governing bodies on how to respond to new data warehousing. BI. And DI project requests, To deliver the key recommendations that, if approved, would lead to concrete actions whereby the aforementioned road map would be executed in the form of creation. Execution. And monitoring of specific projects. Programs. And/or foundational capabilities.” (p.139140) By using these goals as the basis for enhancing the company’s DW, and assists with other goals such as: DDWD 8 1. The data warehouse provides access to corporate or organizational data. – Managers, Supervisors and analysts will be able to link to the data warehouse by use of their personal computers and this connection must be immediate, on demand, and with high performance. 2. The data in the warehouse is consistent. – The information stored in the DW will be consistently available for any analytic, cloud, or SaaS inquire. 3. The data in the warehouse can be combined by every possible measure of the business From the E/R organization of typically Operational Data and a dimensional data model. 4. The data warehouse is not just data, but is also a set of tools to query, analyze, and to present information. Geographic Information Systems (GIS), Strength, Weakness, Opportunity, Threat (SWOT), etc. 5. The data warehouse is where used data is published. – Data is assembled from a variety of information sources in the organization, cleaned up, quality assured, and then released only if it is fit for use. A data quality manager is required assignment at the data warehouse to be responsible for the content and quality of the publication and is identified with the deliverable. 6. The quality of the data in the data warehouse is the driver of business reengineering. – Data quality in the DW can convey the infractions of an organization’s data which can stem the reengineering for efficient BI. DDWD 9 3. Data Schema: The Data Schema of PAKIT, Inc. database warehouse (DW) is two key types of data entities: fact entities, and reference entities. Fact entities are clarified as for each business query, identify the facts about the business query that the user is interested in (i.e., wants to analyze). The following information is gathered: The facts required to support the business queries, The lowest level of detail that must be stored to support the business queries. This item helps identify potential issues pertaining to performance, The data elements that is required, The length of time the detailed information is required. Now these facts are stored on fact tables and a star schema is used to model fact tables. Reference entities are clarified as reference data (i.e., dimensions) provides context. A fact requires at least two dimensions, and probably more, to put the metric into context. Dimensions are used to give meaning to facts by such: customer, date, time, region, geography, product, and account. Identify the reference data essential to label the facts. The following information can be gathered: The dimensions of each fact table, The key of each dimension (e.g., customer identification number is the key of the customer dimension), The data elements required (e.g., the customer dimension may include customer name, address, etc.). Dimensions are shared across subject areas (within this project and other data warehousing projects), therefore, it is important to standardize dimensions across the organization. DDWD 10 The rules and guidelines are express in Bentley, 2007, “The transformation of the logical data model into a physical relational database schema is governed by some fairly generic rules and options. These rules and guidelines are summarized as follows: 1. Each fundamental, associative, and weak entity is implemented as a separate table. Table names may have to be formatted according to the naming rules and size limitations of the DBMS. For example, a logical entity named MEMBER ORDERED PRODUCT might be changed to a physical table named tblMemberOrdProd. The prefix and compression of spaces is consistent with contemporary naming standards and guidelines in modern programming languages. a. The primary key is identified as such and implemented as an index into the table. b. Each secondary key is implemented as its own index into the table. c. An index should be created for any nonkey attributes that were identified as subsetting criteria requirements (Chapter 8). d. Each foreign key will be implemented as such. The inclusion of these foreign keys implements the relationships on the data model and allows tables to be joined in SQL and application programs. e. Attributes will be implemented with fields. These fields correspond to columns in the table. The following technical details must usually be specified for each attribute. (These details may be automatically inferred by the CASE tool from the logical descriptions in the data model.) Field names may have to be shortened and reformatted according to DBMS constraints and internal rules. For example, in the logical data model, most attributes might be DDWD 11 prefaced with the entity name (e.g., MEMBER NAME). In the physical database, we might simply use NAME.” (p. 530) Bentley, 2007, “ i. Data type. Each DBMS supports different data types and terms for those data types. Figure 148 shows different physical data types for a few different database management systems. ii. Size of the field. Different DBMSs express precision of real numbers differently. For example, in SQL Server, a size specification of NUMBER (3,2) supports a range from –9.99 to 9.99. iii. NULL or NOT NULL. Must the field have a value before the record can be committed to storage? Again, different DBMSs may require different reserved words to express this property. By definition, primary keys can never be allowed to have NULL values. iv. Domains. Many database management systems can automatically edit data to ensure that fields contain legal data. This can be a great benefit to ensuring data integrity independent from the application programs. If the programmer makes a mistake, the DBMS catches the mistake. But for DBMSs that support data integrity, the rules must be precisely specified in a language that is understood by the DBMS. v. Default. Many database management systems allow a default value to be automatically set in the event that a user or programmer creates a record containing fields with no values. In some cases, NULL serves as the default. vi. Again, many of the above specifications were documented as part of a complete logical data model. If that data model was developed with a CASE tool, the DDWD 12 CASE tool may be capable of automatically translating the data model into the physical language of the chosen database technology. 2. Supertype/subtype entities present additional options as follows: a. Each supertype and subtype can be implemented with a separate table (all having the same primary key). b. Alternatively, if the subtypes are of similar size and data content, a database administrator may elect to collapse the subtypes into the supertype to create a single table. This presents certain problems for setting defaults and checking domains. In a highend DBMS, these problems can be overcome by embedding the default and domain logic into stored procedures for the table. c. Alternatively, the supertype’s attributes could be duplicated in a table for each subtype. d. Some combination of the above options could be used.” (p. 532) 4. EntityRelationship (ER) Diagram: EntityRelationship Diagram is defined by Bentley, 2007, “A data model utilizing several notations to depict data in terms of the entities and relationships described by that data.” (p.271) The ER) Diagram can clearly communicate complex data relationships in PAKIT, Inc. It can model the company and its operations in terms of data types or data groupings that the company handles and the ways in which they are related. It represent the logical "what" of the organization's use of data but not the physical "how,". It will obtain detailed data requirements to design a database. An EntityRelationship (ER) Diagram (or ER Model) visually depicts an organization's entities, the entities' relationships to each other, and the business rules (i.e., DDWD 13 cardinality and dependency) associated with the relationships. The ER Diagram is the picture used to represent and test the knowledge obtained from Data Modeling. EntityRelationship Diagram DDWD 14 Appendix (A) 5. Data Flow Diagram: DDWD 15 Described by Bentley, 2007, “A data flow diagram (DFD) is a tool that depicts the flow of data through a system and the work or processing performed by that system. Synonyms include bubble chart, transformation graph, and process model. We’ll also introduce a DFD planning tool called decomposition diagrams. Finally, we’ll also study context diagrams, a process like model that actually illustrates a system’s interfaces to the business and outside world, including other information systems.” (p. 317) The symbols are described by Bentley, 2007, “There are only three symbols and one connection: • The rounded rectangles represent processes or work to be done. Notice that they are illustrated in the PROCESS color from your information system framework. • The squares represent external agents—the boundary of the system. Notice that they are illustrated in the INTERFACE color from your information system framework. • The openended boxes represent data stores, sometimes called files or databases. If you have already read Chapter 8, these data stores correspond to all instances of a single entity in a data model. Accordingly, they have been illustrated with the DATA color from your information systems framework. • The arrows represent data flows, or inputs and outputs, to and from the processes.” (p.317) DDWD 16 Data Flow Diagram DDWD 17 Appendix (B) Reference Bentley, W., 2007, “Systems Analysis and Design Methods”, 7 Edition McGraw Hill/Irwin Inc., New York, NY Lane, D., 2011, “The Chief Information Officer’s Body of Knowledge People, Process, and Technology”, John Wiley & Sons, Inc, Hoboken, NJ
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'