Midterm Exam Review
∙ Components of a DB System
− physical things (ex: computer)
− It contains: operating system software, DBMS, application programs that access with DBMS(plus utility software for backups)
− database designer (who does the design modeling), system and database admins, systems analysist and programmers, and users
− Business rules, the instructions/rules/polices that dictate the design and use of the system.
− collection of facts that are stored in your database
∙ Database systems are created and managed at different levels of complexity − Some maybe small/ some maybe big (ex. for 1 person or whole industry) ∙ Database solutions must be cost-effective, tactic and strategic Don't forget about the age old question of Who did Rineheart traveled to the Rockies with?
∙ Database technology already in use affects selection of a database system
∙ Functions of a DBMS
− It should be transparent to the end user which means, it hides the complexity from the user.
∙ Data dictionary management
− Stores the definitions of data elements and their relationships.
− DBMS looks up data components structures and relationships. It automatically records any structural changes in its data dictionary. Also it provides data abstractions and removes structural and data dependency We also discuss several other topics like How much greater or smaller is the output amplitude compared to the input amplitude?
∙ Data storage management
− Create and manage the data structure that are required for data storage. (ex. create tables, columns)
− Also create data entry forms, reports, screen definitions, query
− Performs fine tuning for you so database perform more efficiently. (ex. create index)
∙ Data transformation and presentation
− Transforms from physically to logical and vis-versa. Transforms data to conform to the required data structures.
∙ Security management
− DBMS creates security system that enforces user’s data security and privacy. ∙ Multiuser access control
− multiple concurrent users
− Takes complex algorithms to maintain the integrity of the data while allowing for concurrent access. Don't forget about the age old question of What is Polyploidy?
∙ Backup and recovery management
− Backups your data - makes copy of your data and restore if it something is lost.
− Backup and data recover to ensure data safety and integrity. Critical to preserve the integrity of the data
∙ Data integrity management
− DBMS force and enforce integrity rules. Minimizes redundancy, max consistency.
− The integrity rules are stored in data dictionary.
− Referential integrity: Enforces data integrity based on the relationships − Entity integrity: Primary key’s has to be different from once that already exist. ∙ Database access languages and APIs
− SQL is default query language nowadays. SQL is supported by majority of DBMS vendors. Query languages are non-procedural.
− API (Application Program Interface): Programs that allow 1 application to communicate with another application.
∙ Database communication interfaces
− DBMS accepts end-user requests. You can go directly in DBMS and create different quarries and reports. Also allows you to connect to 3rd party application We also discuss several other topics like How are there so many different life forms
∙ Different types of database systems (by location and number of users) − You have different vendors for database and you also have different types of databases.
∙ Databases classifications
o Number of users
o Database location(s)
o Expected type and extent of use
∙ Number of users
− Single-user database: only one user at a time. (A.K.A. desktop database) − Multi-user database: multiple users using the database at the same time. − Work group database: a department or one small organization (less than 50 people)
− Enterprise database: multiple department, large companies (over 50 people) If you want to learn more check out
What is Gene?
− This is where the data is actually stored.
− Physical location is where data is stored (ex: in disk or the actual location) − Logical location is how the user sees it.
− You can have a centralized location: all data at a single cite physically at one spot
− Decentralized location: data is distributed physically at different areas but logically viewed as single location.
∙ Type and Extent of Use
− Extent (A.K.A. transactional or productional database): operational database which support the companies day to day operations (ex: payroll, inventory). − Data warehouse and decision support: for tactical or strategic decision making (ex: see what this company produced in long term and see if they should continue or not). All the long term data is stored here. If you want to learn more check out What are the causes of economic growth?
− We can store different data in these databases.
− Unstructured is data that is in its original state.
− Structure is data resulting from formatting. (we usually don’t store data like this)
− Semi-structured is in between both unstructured and structured. Data processed to some extent.
− Ex: some calculated that have total (but we can calculate the total anytime so why store data in this form?) or if we add “$” to money data (but it is not necessary to add it and store that data)
− XML is Extensible Markup Language.
− It represent data elements in textual format.
− XML format supports semi-structured XML data.(ex: you can use customized tag like <title><text>)
∙ Basic concepts about the relational database model
– Entity: Primary key’s has to be different from once that already exist – Attribute: (Property/field). Refers to the initial filing system. Specific characteristic of it.
- A.K.A relations.
- Relational model is based on collection of table/relations
- Each table represents an entity.
- Each table is matrix of rows and columns.
- This model greatly reduced the amount of duplicate data.
– Row: Each row in a table is referred as a tuple
– Column: Each rows and columns intersect and data can be stored in those. – Attribute domain: set of values allowed in an attribute
∙ Types of keys
– Primary: 1 or more attributes that uniquely identifies each row in a table - Natural: key that is formed of attributes that already exist in the real world
- Surrogate: candidate key
- Entity Integrity: each row can be uniquely identified. Each row much have a unique identifier. provided by primary keys
− Foreign: 1 or more attributes that value matches the primary key in another table. 1 or more attributes that reference the entire primary key of another table
- Referential Integrity: if a foreign key contains a value, it must refer to an existing a valid row in a table/relation. Provided by foreign keys
− Super: A key that unique identifies each row. A category of a key. All primary key, but not all superkey is a primary key.
– Candidate: a super key without unnecessary attributes
– Secondary: used for data retrieval purposes. Don’t have to be unique
∙ Union compatibility: types of columns must match
∙ Relational database operators (definitions and union compatibility requirements) – Union: add the rows together into tables. They must be union compactable (they must have the same types of columns)
– Intersection: must be union compactable (still same attribute types even if its named differently). Intersect to see what is in both tables. Yield a result in what exists in both tables
– Difference: rows of table A – rows of table B. result will yield, whats in 1st table but not in 2nd table. Have to be union compactable
∙ Product: multiples tables together to get a Cartesian product. Doesn’t have to be union compactable
∙ Selection: A select retrieves horizontal subset of reference tables. Gives u all the columns but narrow down the rows
– Projection: yield vertical substance of the table
– Joins :
- Natural join: link the tables together by selecting rows by common values in their common attributes
- Equijoin: link the tables based on condition of equality, comparing specific tables
- Theta join: linking tables bases on any comparison operator other than = sign
- Outer join: links tables and retains the matched pairs and any un matched values from 2nd table are left null
– Divide: 2 column table by 1 column table, result will yield any value in the unmatched column that appears all the values in the 1 column table
∙ Data dictionary
− Provides detailed accounting of all tables found within the user/designer created database
− Contains (at least) all the attribute names and characteristics for each table in the system
− Contains metadata: data about data
∙ System catalog
− Contains metadata
− Detailed system data dictionary that describes all objects within the database
∙ Data models
– Conceptual: identifies the highest-level relationships between the different entities. No attributes or primary keys identified.
– Internal: representation of the database as seen by the DBMS
– External: Represent the user’s view of the database
– Physical: lowest level of abstractions. This is how the data is stored. ∙ E-R modeling concepts
- Weak: exist when PK of related entity doesn’t contain PK of parent entity 1. it has to be existence dependency
2. PK that is partially or totally derived from PK of parent entity
▪ Also known as bridge entities
▪ Used to implement M:N relationships
▪ Composed of primary keys of each of the entities to be connected ▪ May also contain additional attributes that play no role in
- Simple: can’t be sub divided. Its an atomic in nature
- Composite: it can be sub divided(ex: persons name – 1st name, middle name, last name)
- single valued: can only have 1 value, even over time it wont change (ex: DOB)
- Cardinality: Expresses minimum and maximum number of entity occurrences associated with one occurrence of related entity (ex: credit hour class: 1,21)
- Connectivity: Describes the relationship classification
▪ Optional: one entity occurrence doesn’t require corresponding
entity occurrence( dependency)
▪ Mandatory: one entity occurrence requires a corresponding entity occurrence
- Existence dependency: A semantic control that indicates that an instance of one entity cannot exist unless an instance of a related entity also exists ∙ Normalization: the process for evaluating and correcting table structures to minimize data redundancy
– Functional dependency: a constraint that describes the relationship between attributes in a relation
– Full functional dependency: full functional dependency occurs when you already meet the requirements for a functional dependency and the set of attributes on the left side of the functional dependency statement cannot be reduced any farther
– Partial dependency: none key attribute can be determined by part of primary key (only applies to composite key)
– Transitive dependency: exist when a non-key attribute can be determined by another non key attribute
∙ Forms of normalization
- Step 1: Eliminate the Repeating Groups
- Eliminate nulls: each repeating group attribute contains an appropriate data value
- Step 2: Identify the Primary Key
- Must uniquely identify attribute value
- New key must be composed
- Step 3: Identify All Dependencies
- Dependencies are depicted with a diagram
- Step 1: Make New Tables to Eliminate Partial Dependencies
- Write each key component on separate line, then write original
(composite) key on last line
- Each component will become key in new table
- Step 2: Reassign Corresponding Dependent Attributes
- Determine attributes that are dependent on other attributes
- At this point, most anomalies have been eliminated
- Step 1: Make New Tables to Eliminate Transitive Dependencies - For every transitive dependency, write its determinant as PK for new table - Determinant: any attribute whose value determines other values within a row
- Step 2: Reassign Corresponding Dependent Attributes
- Identify attributes dependent on each determinant identified in Step 1 - Identify dependency
- Name table to reflect its contents and function
∙ Denormalization – what and why?
− Creation of normalized relations is important database design goal − Processing requirements should also be a goal
− If tables are decomposed to conform to normalization requirements: − Number of database tables expands
∙ SQL: Structural Query Lang. default database lang. its non-procedural lang. – DDL: Data Definition Lang (you can definite the structures)
– DML: data manipulation lang. (add, modify, delete, retrieve data) ∙ SQL Statements
– CREATE DATABASE: Create and reserve where the database will be stored – Physical files: that will create a physical files in a disk
– CREATE TABLE: contain the data inside the database
– ALTER TABLE (add columns, delete columns, add constraints) – DROP TABLE: drop table previously stored in the system
– INSERT: Inserts multiple rows from another table
– UPDATE: Modify data in a table
– DELETE: Deletes a table row
– SELECT: Used to list contents of table
– COMMIT: saving to disk. use after inserts, updates or delete – ROLLBACK: undo
– DISTINCT: used in conjunction with SELECT statement to eliminate all the duplicate records and fetching only unique records.
– ORDER BY: used to sort the data in ascending or descending order
Note: Exam will be closed book, closed notes. Questions will be True/False, Multiple Choice, and Short Answer.