Data and Information Lecture Notes
Data and Information Lecture Notes INFO-I101
Popular in Introduction to Informatics and Computing
Popular in Information technology
This 2 page Class Notes was uploaded by Mei Lin on Friday March 25, 2016. The Class Notes belongs to INFO-I101 at Indiana University taught by Nina Onesti and Dan Richert in Spring 2016. Since its upload, it has received 6 views. For similar materials see Introduction to Informatics and Computing in Information technology at Indiana University.
Reviews for Data and Information Lecture Notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 03/25/16
Lecture Notes Monday, March 283:44 PM • Pros and cons of data collection ○ Pros: EX: Kroger's reward card for discounts ○ Cons: EX: lots of knowledge is collected about user --> ads targeted to you specifically • Where is the data? ○ Data can be altered or can be wrong ○ Security and privacy issues on data • Information Hierarchy ○ Data Difficult to not be in any database (social media, personal technology, etc.) 3 Types □ Unstructured □ Semi-structured □ Structured ○ Information "Processed" data □ Organized, selected, analyzed, mined Meaningful to recipients (companies, mostly) ○ Knowledge 2 Types □ Implicit/Tacit Implied through the data. Almost guessing □ Explicit EX: directions on a map Visualization ◊ Charts, maps, and Google Charts • Prior to Databases ○ Pre 1970 Had to go directly to the source to find out their data Had to gather data manually ○ 1970 onwards. E.F. Codd -- The Relational Model □ Created database concept where elements are linked together • KDD (Knowledge Discovery in Database) ○ Average Life of a Fortune 500 Company used to be ~45 years ○ Businesses in Europe and Japan lasted ~12.5 years ○ CAUSE: Data gathered was being misused since company didn't understand the data collected --> business fails due to mistakes ○ Who uses this? Stores like Wal-Mart --> uses collected data to find out what products sells best at various times --> stock up on those products so they don't run out Long distance companies Credit card companies --> company track purchases of customers to check for fraud Drug manufacturers Sport teams --> uses stats to improve team ○ Incentives for KDD Money --> increase profits Retain customers --> build loyalty New markets Product development Forecasting Data and Information Page 1 Forecasting ○ Non-Linear KDD Process Problem Statement Get data Clean the data □ Null values EX: customer refuses to give phone number/email/zipcode to company during shopping transaction □ Duplicate data □ Known "wrong" data EX: fill in random zipcode when customer doesn't want to give theirs during a transactions □ Outliers Data that doesn't make sense Transform the data □ Discretize Grouping similar answers EX: age 18-25, 26-35, 36-45, etc. □ Change tuple format Tuple: a record or entry in a database Mine the data □ 2 examples of data mining algorithms Association rules ◊ Used to determine what "things" indicate the presence of other "things" ◊ Also known as Market Basket Analysis when applied to purchases ◊ EX: people who bought bread usually also bought milk and eggs, people who bought diapers also buy beer in the same transaction ◊ Grocery store layouts use this data to separate the items usually bought together in order to make people walk around the entire grocery store to encourage impulse buys Classification ◊ Creates a decision tree from data already in the database ◊ New instances are then placed into a category based on their attributes ◊ NOTE: Categories/trees have 2 properties ◊ Categories must be: Mutually exclusive – Can only belong to a single category Collectively Exhaustible – Everything must belong to a category – There cannot be an instance that does not belong to either categories Knowledge is produced Take action Data and Information Page 2
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'