Chapter 5. Data and knowledge Management
The difficulties of managing data:
1) The amount of data increases exponentially with time, data scattered throughout organizations, and collected by many individuals using various methods and devices.
2) Data are generated from multiple sources:
∙ Internal (company documents);
∙ Personal (opinions, experiences);
∙ External (government reports)
Data come from the Web, in clickstream data- that visitors and customers produce when they visit a Web site and click on hyperlinks (web-purchase).
3) New sources data (blogs, podcasts) are constantly being developed and data these technologies generate must be managed. Data degrade over time (customer change name or address).
If you want to learn more check out What are the characteristics of fraudulent papers?
If you want to learn more check out What is the study of psychopharmacology?
4) It is subject to data rot, refers to primarily to problems with the media on which it stored.
5) Easily jeopardized
6) Org have developed info system for specific business process (transaction process)
7) Complicate data management:
∙ Federal regulations;
∙ Companies are drowning in data.
It is can approach to managing inform across an entire organization. Policies designed to ensure that data handled in a certain, well-defined fashion.
Master data management- process that spans all org business process and applications. Ability to store, maintain, exchange.
Master data – set of core data (customer, product, and vendor) that span the enterprise IS. Applied to multiple transactions and are used to categorize, aggregate, and evaluate the transaction data.
Don't forget about the age old question of What is the barrier to the rotation of ethane?
Transaction data- generated and captured by operational system, describe business activities, or transactions.
Each application required its own data, which were organized in data file. Data file – collection of logically related records. This file contains all of the data records the application requires.
Database system minimize:
∙ Redundancy: same data, multiple locations;
∙ Isolation: app cannot access data associated with other app ∙ Inconsistency: various copies of the data don’t agree
∙ Security: databases have extremely high security measures (minimize and deter attacks) to decrease risk of losing data ∙ Integrity: meet certain constants ( no alphabetic characters in SSN) If you want to learn more check out What is salvage ethnography?
If you want to learn more check out What is angiosperm reproduction?
∙ Independence: app and data not linked to each other.
Bits (binary digits) – the smallest unit data a computer can process (consists of 1 and 0). If you want to learn more check out What is the energy of interactions?
Byte ( group of 8 bits)- represent a single character, letter, number, or symbol.
Field – logical grouping of characters into a word, small group of words. Logical grouping of related fields – records (the courses taken, the date).
Data files – logical grouping of related records (table).
Database- grouping of related files.
Database management system.
DBMS- set of programs that provide users with tools to create and manage a database. Provide the mechanisms for maintaining the integrity of stored data, managing security and user access, recovering info, if system fails.
The relational database model – based on the concept of two dimensional tables. Consists of flat file- all records and attributes. Designing an effective database – data model, diagram that represent entities in the database and their relationships.
Entities – person, place, thing, or event about which info maintained. Instance- an entity refers to each row in a relational table, which is unique representation of the entity.
Attribute- each characteristic or quality of the particular entity. Primary key (attribute) – every record in the database must contain at least one field that uniquely identifies that records, so it can be retrieved, updated and sorted.
Secondary key- another field that has some identifying information, but doesn’t identify the record with complete accuracy. (Student major) Foreign key – field (or group of fields) in one table that uniquely identifies a row of another table (establish and enforce a link between two tables).
Bid data- a collection of data so large and complex that it is difficult to manage using traditional database management system. It’s about
predictions, came from applying math to huge quantities of data to infer probabilities.
Defining Big Data
1) The technology research firm Garter: Big Data – diverse, high volume, high-velocity information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.
2) The Big Data Institute: Big Data exhibit variety, include structured, unstructured, semi structured data.
By 2015, the amount of stored information in the world was over 98% and less than 2% no digital.
∙ Generated at high velocity with an uncertain pattern. ∙ Don’t fit neatly into traditional, structured, relational databases.
Big Data consists:
1) Traditional enterprise data ( Web store transactions); 2) Machine- generated/ sensor data ( manufacturing data); 3) Social data ( customers feedback) (comments, social media); 4) Images captured by billions of devices located in the world.
Characteristics of Big Data:
∙ Velocity : the rate at which data flow into an organization is rapidly increasing;
∙ Variety: traditional data formats tend to be structured and relatively well described, and they change slowly (financial market data).
Issues with Big Data
∙ Can come from untrusted source: internal and external to the organization (e-mail, call center notes);
∙ It is dirty: inaccurate, incomplete, duplicate or erroneous data (misspelling of words)
∙ Its changes, especially in data streams: organizations must be aware that data quality in an analysis can change, or the data itself can change, because conditions under which the data are captured can change. Managing Big Data
BD makes it possible to do many things that were previously impossible (prevent disease).
1) Integrate information silos into a database environment and develop data warehouses for decisions making.
2) Business of information management – making sense of their proliferating data.
Many organizations employ NoSQL – database to process BD (not only structured query language). Manipulate structured as well as unstructured data and inconsistent or missing data.
Putting Data to Use
∙ Making BD available – for relevant stockholders can help org gain value (open data in the public sector). Can be used to create new business and solve complex problems.
∙ Enabling org to conduct experiments – offering different “looks” of the Web site page.
∙ Microsegmentation of customers – dividing them into groups that share one or more characteristics.
∙ Creating new business model – use sensors to collect data on vehicle usage and improve the driving.
∙ Organizations can analyze more data- they don’t have to rely as much on sampling.
Big Data in the functional areas of the org
1) Human resources: it recognizes that people different skills to the table and that there is no one-size-fits-all person for any job.
2) Products development: BD capture customer preferences and put that information to work in designing new products.
3) Operations (sensors that capture the truck’s speed and location) 4) Marketing: using data t better understanding the customer and to target their marketing efforts more directly.
5) Government operations: record water level in rivers to prevent flooding.
Data warehouses and data marts
Data warehouse- a repository of historical data that are organized by subject to support decisions makers in the org.
Data mart- a low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit (SBU) or an individual departments.
1. Organized by business dimension or subject (customer vendor)
Business dimension- data subject such as product, geographic
area, time period that represent the edges of the data cube.
2. Use online analytical processing (OLTP) business transactions are processed online as soon as they occur. Speed and efficiency.
Online analytical processing (to support decision makers) involves the analysis of accumulated data by end users.
3. Integrated: data collect from multiple system and then
integrated around subject.
4. Time variant: warehouses and marts maintain historical data (time as a variable). Stores years of data.
5. Nonvolatile: users cannot change or update the data;
6. Multidimensional structure: common representation is the data cube.
A generic DW environment
1) Source systematic that provide data to the warehouse or mart- “organizational pain” that motivates a firm to develop its IB capabilities;
2) Data- integration technology and processes that prepare the data for use- extract data, transform them, then load into a data mart or warehouse – ETL (data integration);
3) Different architectures for storing data- central enterprise data (stored in warehouses and accessed by all users and represent the single version of the truth);
4) Different tools and apps for the variety of users;
5) Metadata, data-quality governance processes that ensure that warehouses and marts meets its purposes. Metadata- the data about the data.
Limitations of data warehouses:
∙ Can be very expensive to build and maintain;
∙ Incorporating data from obsolete mainframe system can be difficult and expensive.
∙ People can share data with other departments.
Knowledge management- a process that helps organizations manipulate important knowledge that comprises part of the organizations memory.
Intellectual capital (knowledge) – information that’s contextual, relevant, and useful. Can be utilized to solve a problems.
Explicit knowledge deals with more objective, rational and technical knowledge. Consist of the policies, procedural guides, reports. It is the knowledge that has been codified in a form that can be distributed to others or transformed into a process or a strategy.
Tactic knowledge – the cumulative store of subjective or experiential learning. Consists of an organization’s experiences, insights, expertise, and culture. It’s imprecise and costly to transfer, highly personal, difficult to formalize or codify.
Knowledge management system (KMSs) refer to the use of modern information technologies to systemize, enhance and expedite intrafirm and interfirm knowledge management. Help to make the most productive use of the knowledge.
Benefit- the best practices- the most effective and efficient ways of doing things- available to a wide range of employees.
The KMS Cycle:
1) Create knowledge;
2) Capture: must be identified as valuable, used in reasonable way; 3) Refine: placed in context, so its actionable;
4) Store: stored in reasonable format;
5) Manage: must be kept current;
6) Disseminate: available in useful format.
Fundamentals of Relational database operations
SQL- the most popular query language used for interacting with a database. Allow to perform complicated searches by using relatively simple statements or key words.
SELECT- to choose desired attribute;
FROM- to specify the table to be used;
WHERE- to specify conditions to apply in the query
QBE (query by example) – users fills out a grid or template (form) to construct a sample or a description of the data designed.
Entity- Relationship Modeling (ER)
ER- consists of entities, attributes, and relationships and used with business rules to properly identify them. ER allows to communicate with users throughout the organization to ensure that all entities and the relationships among entities are represented.
Business rules – precise descriptions of policies, procedures, or principles in any organization that stores and uses data to generate information.
The data dictionary- provides information on each attribute, such as name, if it is a key, part of a key, or non-key attribute, the type of data expected and valid values.
Relationships illustrate an association between entities. Degree of a relationship – the number of entities associated with a relationship. A unitary relationship- an association is maintained within a single entity.
A binary relationship- two entities are associated.
A ternary relationship- three entities are associated.
Connectivity- the relationship classification.
Cardinality- the maximum number of times an instance of an entity can be associated with an instance in the related entity.
∙ Connectivity and cardinality –established by the business rules of a relationship.
∙ Mandatory single
∙ Optional single
∙ Mandatory many
∙ Optional many
Entities have attributes, or properties, that describe the entity’s characteristics.
Three types of binary relationships:
One-to-one (1:1) a single-entity instance of one type is related to a single-entity instance of another type. (ex. Student-parking permit)
One-to-many (1:M) represented by the class- professor
Many-to-many (M:M) represented by the student-class
relationship. Therefore, junction (bridge) tables uses so there are two one-to-many relationship.
Normalization and Joins
Normalization- a method for analyzing and reducing a relational database to its most streamlined form to ensure minimum redundancy, maximum data integrity, and optimal processing performance.
Functional dependencies- means of expressing that the value of one particular attribute is associated with a specific single value of another attribute.
Join operation combines records from two or more tables in a database to obtain information that is located in different tables.