Data Mining

by: Nick Rowe

Data Mining CS 57300

Nick Rowe
GPA 3.68

Jennifer Neville

Data Mining C857300 STAT 59800 024 Purdue University January 13 2009 Introduction v What is data mining 0 Why now v Data mining process 0 Example What is data mining the nontrivial extraction of implicit previously unknown and potentially useful information from data Frawley Piatetsky Shapiro and Matheus I 992 a new paradigm that focuses on computerized exploration of large amounts of data and on discovery of relevant and interesting patterns within them Feldman and Dagan I 995 What is data mining Statistics Data Mining Visualization Artificial Intelligence Also known as knowledge discovery exploratory data analysis applied statistics machine learning Why now Plunging disk price mu Ill inuuahla hungemmmga can I16 3 Wm M 5mm How much information Lyman undvmun UCBerkeIey 2003 5 exabytes of new information stored in 2002 I Exabyte 1000 petabytes 1 mil terabytes 1 bil gigabytes The amount of new information stored has about doubled in the last three years Imost 18 exabytes of information flowed through electronic channels in 2002 98 percent of this total is the information sent and received in telephone calls Data mining process y Data Mining 8 Patterns BEjEj PrepBJalizssed Data o39Target Data adade from U Fayyud er a1 1995 From Knowledge Discovery to Data n Overvie quot Mining A lance in Knowledge Discovery and Dam Mining U Fayyad at 31 iEdsJ AAAIlMJT Press Data mining process 1Application setup 3Data preprocessing 0 Acquire relevant domain 0 Remove noise or outliers knowledge Handle missing values 0 Assess user goals Account for time or other 2Data selection changes 0 Choose data sources 4Data transformation 0 Identify relevant attributes 0 Find useful features Sample data 0 Reduce dimensionality Data mining process 5Data mining 6lnterpretationevaluation 0 Choose task eg classification regression clustering 0 Assess accuracy of model results 0 Interpret model for end users 0 Choose algorithms for learning and inference 0 Consolidate knowledge 0 Set parameters 7Repeat 0 Apply algorithms to search for patterns of interest Example These trains carry toxic chemicals These trains do not carry toxic chemicals E W39 magi o C A nun Elil Does this train carry toxic chemicals L I iiii Example rule 1 These trains carry m i toxic chemicals These trains do not carry toxic chemicals Example rule 2 These trains carry toxic chemicals A Does this train carry toxic chemicals BEE These trains do not carry toxic chemicals Elli o C AGED


