### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Scientific Data Mining CDS 401

Mason

GPA 3.56

### View Full Document

## 31

## 0

## Popular in Course

## Popular in Computer & Information Science

This 16 page Study Guide was uploaded by Mr. Casper Russel on Monday September 28, 2015. The Study Guide belongs to CDS 401 at George Mason University taught by Kirk Borne in Fall. Since its upload, it has received 31 views. For similar materials see /class/215200/cds-401-george-mason-university in Computer & Information Science at George Mason University.

## Reviews for Scientific Data Mining

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/28/15

CDS 401 Final Exam Review Questions A Clustering 7 Introduction to Unsupervised Learning 1 2 3 GNU 00 VP 4 9939 t O O D ID ID I LANH What is the de nition of clustering What other names are used for it Know how to examine a data table and recognize clusters groups and partitions Understand why a subject matter expert SME is vitally important during unsupervised learning projects Understand what a Feature Vector is and recognize that it is used to characterize a data record Thus Characterization and Classi cation are two di erent things Understand the differences between Classi cation and Clustering What is Class Discovery Is it Classi cation or Clustering Know some of the features of Clustering such as the need for similarity and or distance metrics the lack of semantic meaning of clusters number of clusters is usually unknown in advance clusters depend on choice of featuresattributes clusters depend on the clustering algorithm clusters sometimes depend on the order in which the data records are examined Be able to recognize examples of Clustering and be able to distinguish these examples from examples of Classi cation Clustering 7 Introduction continued Understand the meaning and signi cance of outliers anomalies novelties defects Understand how cluster results are dynamic as the database gets updated Know the major quantitative cluster parameters and how to calculate them centroid radius diameter Know the major distance measures for clusters how to calculate them and what they mean single link complete link average link centroid distance Understand the difference between partitional and hierarchical clustering Understand the two di erent types of hierarchical clustering and be able to give examples of each Understand the various algorithms for partitional clustering and how to apply them Which partitional clustering algorithm is the most widely used Be able to use it for simple clustering problems What is an Adjacency Matrix What are its elements Why do you need it What is the difference between KMeans KMedoids and KModes Under what conditions would you use these different clustering methods What is PAM What are mixture models and how are they used Understand the Jaccard coef cient why you would need to use it and be able to use it for simple problems C Clustering 7 Evaluation 1 2 3 Be sure to understand all of the clustering issues and challenges which explain why we need to evaluate the quality of the clustering results How do you know if your clustering results are good enough What are the Dunn index DaviesBouldin Index and Cindex And why are they used D Classi cation 7 Special Topics P1 Ti 1 Be able to cite examples and give a brief explanation of the various special topics mentioned in this class lecture 2 For example what is PCA What is ROC What is MCMC What are convex hulls What is Bagging What is Boosting What are Random Forests 3 What is meant by normalization to zero mean and unit variance 4 Understand and be able recognize examples of classi cation through regression 5 What is a kernel What is KDE Why is it useful 6 What is linear SVM 7 What are oblique decision trees Why are they used Be able to solve simple problems using them 8 What is the difference between an oblique decision tree and an axisparallel decision tree 9 What is sampling with replacement In what type of process is it used Bagging or Boosting Why is it used Association Mining 1 What is association mining Why is it called rule mining 2 Be able to explain and to use the special terminology used in association mining 3 Be able to recognize examples of association mining 4 What is the most commonly used other name for association mining 5 Does association mining work primarily on numeric or nonnumeric data What do you do if your database has the wrong type of data and yet you still wish to run an association mining algorithm on the database 6 What is Link Analysis What is Af nity Analysis 7 Understand and be able to calculate Support Con dence and Lift Gain for a set of association rules 8 Be able to nd association rules from a simple database table 9 What are the goals and objectives of association mining 10 What are some of the basic algorithms and techniques used in association mining What are the main steps in each What are their advantages and disadvantages 11 What is Apriori What is LIP 12 What is the Combinatorial Theorem What is the Combinatorial Explosion Understand that these are two di erent things 7 one is the consequence of the other Be able to calculate examples of these 13 What are kitemsets Be able to nd them from a simple list of transactions 14 What are the 3 main metrics for measuring the quality of discovered association rules Mining Conditions for Rapid Intensi cation of Tropical Cyclones 1 Understand the main science application of association mining in this case 2 Which scienti c measurements were used 3 What is data discretization Why was it used 4 What are some of the scienti c and data mining challenges related to this problem G Time Series Data Mining l 2 Discuss three different ways in which time series can be represented and what each approach is best suited for Be able to provide example applications where time series data mining could be used Discuss what approach would be used and what computations would be required H Spatial Data Mining 1 2 3 4 5 6 7 8 9 10 11 Be able to list and explain the main di erences between spatial data and nonspatial data and between spatial data mining and nonspatial data mining What is the difference between spatial metadata and spatial data Know how to recognize the difference What are geometric and topological metadata How are they different What are some of the special spatial relationships in spatial databases What are raster data What are the challenges in using raster data What is GIS How are geospatial data special What is a Thematic Map Which indexing schemes are used for spatial data Be able to apply these to simple examples What kinds of database constructs are used to query spatial databases Be able to recognize and explain the application of standard data mining algorithms to spatial data mining problems 1 Temporal Data Mining continuation of Time Series Data Mining topic 90gt V39 E N H 0 0 1 Be able to list and explain the main di erences between temporal data and nontemporal data and between temporal data mining and nontemporal data mining Be able to recognize and explain the application of standard data mining algorithms to temporal data mining problems Be able to identify and classify the different classes of time series behavior What are some of the special temporal relationships in temporal databases Know and understand the 5 major motivations and reasons for using temporal data mining What is the statistical independence assumption Is it valid or invalid for temporal data Be able to explain the special temporal data mining concepts Be able to explain the difference between an event a time series a temporal pattern and an event characterization function Be able to identify explain and work with simple examples of the 3 main types of temporal data mining Markov Models Dynamic Time Warping and Association Mining What are the 3 main types of Temporal Association Mining Be able to work with simple examples of each Which data type is required for the attributes used in Trend Dependency Mining J Outlier Anomaly Novelty Defect Detection Chapter 10 of textbook 1 2 3 4 Know the different names for outlier detection and what they mean to different endusers Be able to identify and explain examples of outlier detection What are some of the causes of outliers in data What is Hawkin s definition of an outlier 9939 0 t O N 13 What are the 3 main approaches to anomaly detection How does the existence or absence of labeled data in a database determine the type of approach that is taken in anomaly detection What are the major issues that need to be addressed in anomaly detection Be able to explain and to identify a simple example of Statisticalbased anomaly detection Be able to explain and to identify a simple example of Proximitybased anomaly detection Be able to explain and to identify a simple example of Densitybased anomaly detection Be able to explain and to identify a simple example of Clusteringbased anomaly detection Be able to list some of the strengths and weaknesses of the above approaches What are the impacts of outliers on clustering CDS 401 Midterm Exam Review Questions will also be covered on the Final Exam gt Introduction to Scientific Data Mining What makes scienti c data di erent from other types of data Be able to list some of the key differences How does scienti c data mining re ect the new trends in scientific research Be able to explain the paradigm Data gt Information gt Knowledge gt Understanding Be able to give a scienti c example of the above DIKU paradigm Understand different ways in which databases enable scienti c discovery Know the different de nitions of data mining How are Machine Learning and Data Mining different How are they the same What does this mean Data Mining is the Killer App for Scienti c Databases Be able to explain and compare Descriptive data mining versus Predictive data mining Be able to explain the differences between these and give examples of each Supervised Learning Unsupervised Learning Semisupervised Learning What is outlier detection What type of Learning is it What are the 2 key bene ts of very large data collections for scienti c research Describe the relationship between databases and data mining Be able to de ne and differentiate these categories of data mining classi cation clustering regression association analysis link analysis selforganizing maps Be able to identify examples of realworld data mining applications and which one of the standard data mining techniques listed above is being used in the example Be able to match a graphic illustration of one of the standard data mining techniques with the name of the technique 17 What are the 6 main justi cations reasons for using data mining 18 Be able to identify some of the di erent names used for data mining such as Connecting the Dots EDA IDA CRM BI CBR IDS Data Analytics Discovery Informatics 19 What is the formal de nition of Classi cation 20 What is the formal de nition of Clustering 21 Be able to identify Classi cation algorithms 22 Be able to identify Clustering algorithms 23 Be able to explain the Classi cation process and the di erent steps in the process 24 Know the differences and different uses of Training Data and Test Data 25 What is Over tting Know how to detect it what to do about it and how to avoid it 26 What is meant by Class Discovery Is that a form of classi cation or clustering Is it a form of Supervised Learning Unsupervised Learning or Semisupervised Learning 27 What is meant by Rule Learning Give some examples of techniques that use this concept 28 Understand that PCA Principal Components Analysis and ICA Independent Components Analysis are used to detect and measure correlations in data parameters gpw QMPN D ID ID ID I th I UI O B Basic Concepts in Data Mining 1 What are the basic concepts in Data Mining 2 What are the key steps in the Data Mining process 3 Which step usually takes the longest Why 4 How do the basic concepts relate to the key steps in the Data Mining process 5 What types of activities are performed during the Data Previewing phase 6 What types of activities are performed during the Data Preparation phase Understand what they are how they are used why they are used and how they are di erent 7 What is a Feature Vector 8 What is Feature Selection 9 Know and understand the different data types that are encountered in databases and in data mining Be able to identify at least one data mining technique that works specifically on each data type 10 What is Data Normalization What is Data Transformation How are these 2 concepts different How are they the same 11 Give examples of data normalization Be able to apply data normalization to simple tables of numbers 12 Give examples of data transformation Be able to apply data transformation to simple tables of numbers 13 What is meant by distance in data mining 14 What is meant by similarity in data mining 15 How do distance and similarity relate to each other 16 What are distance and similarity metrics Why are they used Be able to describe how they are used 17 Be able to list some of the challenges issues and problems with using these metrics 18 What are the general mathematical requirements that any similarity or distance metric must satisfy 19 Be able to identify by name some of the more common distance and similarity metrics 20 Be able to apply some of the more common distance and similarity metrics to simple problems and simple tables of numbers 21 What is meant by Accuracy in data mining 22 Be able to describe and to use the Classification Accuracy Matrix ie Classification Error Matrix 23 Know how to calculate how to use and what is meant by these terms True Positive False Positive True Negative and False Negative 24 Understand the meaning of and how to calculate the di erent measures of accuracy in data mining Overall Accuracy Producer s Accuracy and User s Accuracy C Data Mining in Action 1 What is meant by Data Mining in Action Give examples 2 What is meant by Mining for Actionable Data 3 Be able to identify and to explain scientific examples of data mining in action 4 What is meant by Intelligent Data Understanding 5 What is meant by Science Goal Monitoring Give examples 6 What is Discovery Informatics Give some science discipline examples D Supervised Learning Classification 1 Be able to identify the names of various Classification algorithms used in data mining 2 Be able to identify specific data mining applications that use Classification algorithms 3 Know the steps in the classification process E Classi cation Bayes Theorem F 9 gt1 994 509 Know what Bayes Theorem is what the terms mean how to calculate the terms and how to use it for data mining classi cation problems What is meant by Inference What is meant by Deduction What is meant by Induction What is the difference between the Hypotheticaldeductive framework for science versus the Observationalinductive framework Is data mining inference or deduction Is KDD inference or deduction What is statistical inference Give examples of 2 statistical inference techniques that we have studied in detail In Bayes theorem what is the Prior what is the Posterior what is the Likelihood and what is the Evidence Know how to calculate each one of these What is the False Positive Paradox Be able to solve a simple problem of this type What is a Bayes Inference Engine BIE Classification SVM DT KNN Ni 3095 O NHHHHHHHHHH oww CNMbUJNt O39 What do these acronyms mean SVM DT KNN Know the definition of data mining that refers to the transformation of knowledge into a rule format representation In SVM what is the goal How is this accomplished What is a kernel function in SVM What are some of the difficulties in applying the SVM technique What is meant by costsensitive classification Know how decision theory and cost functions can be used to optimize the results of a data mining classification application What is meant by Rule Induction in Decision Tree classification applications What is Information Gain How is it used What is its main advantage Be able to describe the use of Decision Trees for data mining Classification applications What are some of the best reasons for using Decision Tree classification Be able to label and to identify the constituent parts of a Decision Tree diagram Know how to interpret and how to use a decision tree Know how to construct simple decision trees from simple databases Know how to calculate classification accuracy from a decision tree application What is KNN And how is it used in Classification data mining applications Know how to calculate and apply KNN for simple databases How is K selected in a KNN application What is the relationship between the training data and the model in KNN How do you estimate the likelihood of the classification in a KNN application G Classification Regression l 2 3 4 What is Regression What is the difference between linear and nonlinear regression Give a scientific example of a nonlinear relationship between variables How is regression used in Classification data mining 5 Describe how regression is both Descriptive and Predictive 6 What data type is typically required for a regression algorithm Compare this with Logistic Regression and Bayes Classi cation F Classi cation ANN Arti cial Neural Networks How are ANN and Regression similar Be able to describe the use of ANN for data mining Classi cation applications What are some of the reasons for using ANN What are some of the problems with using ANN Be able to label and identify the constituent parts of an ANN diagram How many training samples are needed in order to train an ANN model Understand some of the concepts and terminology related to ANN weights activation function ring rule hidden layers backpropagation feedforward Know some examples of Activation Functions and their basic mathematical properties Know how to calculate values of different nodes in an ANN using values of the input nodes What is an SLP How do you calculate the Classification output from an SLP What is Gradient Descent Know how this concept is related to these 3 applications ANN backpropagation testing for over tting and Genetic Algorithms Under what circumstances would multiple hidden layers be used in an ANN Know some of the advantages and disadvantages of using ANN 14 Be able to calculate some simple ANN node values including output Class using ring rules weights and inputs QMer N D ID I 000 HO D ID I LAN 1 Classi cation Markov Models 1 Know the difference between statistical independence and statistical dependence and when each is applicable 2 What is the de nition of statistical independence 3 What is Temporal Data Mining Do we assume statistical independence or statistical dependence in solving these types of problems 4 What is a Markov Chain Be able to give examples 5 What is a Markov Model Be able to solve simple problems using Markov Models 6 What is a FirstOrder Markov Model 7 What is a SecondOrder Markov Model 8 What is a Hidden Markov Model HMM Be able to give examples 9 Be able to explain how Markov Modeling is a form of Predictive data mining 10 What is a State What are Transition Probabilities What are Emission Probabilities 11 How is Markov Modeling similar to Bayesian Analysis 12 How are Markov Models used for data mining Classification applications 13 Can you predict the weather with Markov Models Explain 14 Can you predict the stock market with Markov Models Explain 15 Can you predict Gene Sequences with Markov Models Explain 16 Can you do voice recognition with Markov Models Explain 17 Can you do pattern recognition with Markov Models Explain J Classi cation Genetic Algorithms GAs CDS 401 Final Exam Review Questions A Clustering 7 Introduction to Unsupervised Learning 1 What is the de nition of clustering What other names are used for it 2 Know how to examine a data table and recognize clusters groups and partitions 3 Understand why a subject matter expert SME is vitally important during unsupervised learning projects 4 Understand what a Feature Vector is and recognize that it is used to characterize a data record Thus Characterization and Classi cation are two different things Understand the differences between Classi cation and Clustering What is Class Discovery Is it Classi cation or Clustering 7 Know some of the features of Clustering such as the need for similarity andor distance metrics the lack of semantic meaning of clusters number of clusters is usually unknown in advance clusters depend on choice of features attributes clusters depend on the clustering algorithm clusters sometimes depend on the order in which the data records are examined Be able to recognize examples of Clustering and be able to distinguish these examples from examples of Classi cation GNU 00 DJ Clustering 7 Introduction continued 1 Understand the meaning and signi cance of outliers anomalies novelties defects 2 Understand how cluster results are dynamic as the database gets updated 3 Know the major quantitative cluster parameters and how to calculate them centroid radius diameter 4 Know the major distance measures for clusters how to calculate them and what they mean single link complete link average link centroid distance 5 Understand the difference between partitional and hierarchical clustering 6 Understand the two different types of hierarchical clustering and be able to give examples of each 7 Understand the various algorithms for partitional clustering and how to apply them 8 Which partitional clustering algorithm is the most widely used Be able to use it for simple clustering problems 9 What is an Adjacency Matrix What are its elements Why do you need it 10 What is the difference between KMeans KMedoids and KModes Under what conditions would you use these different clustering methods What is PAM What are mixture models and how are they used Understand the Jaccard coef cient why you would need to use it and be able to use it for simple problems D ID ID LAN C Clustering 7 Evaluation 1 Be sure to understand all of the clustering issues and challenges which explain why we need to evaluate the quality of the clustering results 2 How do you know if your clustering results are good enough 3 What are the Dunn index DaviesBouldin Index and Cindex And why are they used D Classi cation 7 Special Topics P1 7391 1 Be able to cite examples and give a brief explanation of the various special topics mentioned in this class lecture 2 For example what is PCA What is ROC What is MCMC What are convex hulls What is Bagging What is Boosting What are Random Forests 3 What is meant by normalization to zero mean and unit variance 4 Understand and be able recognize examples of classi cation through regression 5 What is a kernel What is KDE Why is it useful 6 What is linear SVM 7 What are oblique decision trees Why are they used Be able to solve simple problems using them 8 What is the difference between an oblique decision tree and an axisparallel decision tree 9 What is sampling with replacement In what type of process is it used Bagging or Boosting Why is it used Association Mining 1 What is association mining Why is it called rule mining 2 Be able to explain and to use the special terminology used in association mining 3 Be able to recognize examples of association mining 4 What is the most commonly used other name for association mining 5 Does association mining work primarily on numeric or nonnumeric data What do you do if your database has the wrong type of data and yet you still wish to run an association mining algorithm on the database 6 What is Link Analysis What is Affinity Analysis 7 Understand and be able to calculate Support Con dence and Lift Gain for a set of association rules 8 Be able to nd association rules from a simple database table 9 What are the goals and objectives of association mining 10 What are some of the basic algorithms and techniques used in association mining What are the main steps in each What are their advantages and disadvantages 11 What is Apriori What is LIP 12 What is the Combinatorial Theorem What is the Combinatorial Explosion Understand that these are two different things 7 one is the consequence of the other Be able to calculate examples of these 13 What are kitemsets Be able to nd them from a simple list of transactions 14 What are the 3 main metrics for measuring the quality of discovered association rules Mining Conditions for Rapid Intensi cation of Tropical Cyclones Understand the main science application of association mining in this case 2 Which scienti c measurements were used 3 What is data discretization Why was it used 4 What are some of the scienti c and data mining challenges related to this problem G Time Series Data Mining 1 Discuss three different ways in which time series can be represented and what each approach is best suited for 2 Be able to provide example applications where time series data mining could be used Discuss what approach would be used and what computations would be required H Spatial Data Mining 1 Be able to list and explain the main differences between spatial data and nonspatial data and between spatial data mining and nonspatial data mining 2 What is the difference between spatial metadata and spatial data Know how to recognize the difference 3 What are geometric and topological metadata How are they different 4 What are some of the special spatial relationships in spatial databases 5 What are raster data What are the challenges in using raster data 6 What is GIS 7 How are geospatial data special 8 What is a Thematic Map 9 Which indexing schemes are used for spatial data Be able to apply these to simple examples 10 What kinds of database constructs are used to query spatial databases 11 Be able to recognize and explain the application of standard data mining algorithms to spatial data mining problems 1 Temporal Data Mining continuation of Time Series Data Mining topic 1 Be able to list and explain the main differences between temporal data and nontemporal data and between temporal data mining and nontemporal data mining 2 Be able to recognize and explain the application of standard data mining algorithms to temporal data mining problems 3 Be able to identify and classify the different classes of time series behavior 4 What are some of the special temporal relationships in temporal databases 5 Know and understand the 5 major motivations and reasons for using temporal data mining 6 What is the statistical independence assumption Is it valid or invalid for temporal data 7 Be able to explain the special temporal data mining concepts 8 Be able to explain the difference between an event a time series a temporal pattern and an event characterization function 9 Be able to identify explain and work with simple examples of the 3 main types of temporal data mining Markov Models Dynamic Time Warping and Association Mining 10 What are the 3 main types of Temporal Association Mining Be able to work with simple examples of each 11 Which data type is required for the attributes used in Trend Dependency Mining J Outlier Anomaly Novelty Defect Detection Chapter 10 of textbook 1 Know the different names for outlier detection and what they mean to different endusers 2 Be able to identify and explain examples of outlier detection 3 What are some of the causes of outliers in data 4 What is Hawkin s definition of an outlier 9939 0 N 13 What are the 3 main approaches to anomaly detection How does the existence or absence of labeled data in a database determine the type of approach that is taken in anomaly detection What are the major issues that need to be addressed in anomaly detection Be able to explain and to identify a simple example of Statisticalbased anomaly detection Be able to explain and to identify a simple example of Proximitybased anomaly detection Be able to explain and to identify a simple example of Densitybased anomaly detection Be able to explain and to identify a simple example of Clusteringbased anomaly detection Be able to list some of the strengths and weaknesses of the above approaches What are the impacts of outliers on clustering CDS 401 Midterm Exam Review Questions will also be covered on the Final Exam gt Introduction to Scienti c Data Mining 1 What makes scienti c data different from other types of data Be able to list some of the key differences How does scienti c data mining re ect the new trends in scienti c research Be able to explain the paradigm Data gt Information gt Knowledge gt Understanding Be able to give a scienti c example of the above DIKU paradigm Understand different ways in which databases enable scienti c discovery Know the different de nitions of data mining How are Machine Learning and Data Mining different How are they the same What does this mean Data Mining is the Killer App for Scienti c Databases Be able to explain and compare Descriptive data mining versus Predictive data mining Be able to explain the differences between these and give examples of each Supervised Learning Unsupervised Learning Semisupervised Learning What is outlier detection What type of Learning is it What are the 2 key bene ts of very large data collections for scienti c research Describe the relationship between databases and data mining Be able to de ne and differentiate these categories of data mining classi cation clustering regression association analysis link analysis selforganizing maps Be able to identify examples of realworld data mining applications and which one of the standard data mining techniques listed above is being used in the example 16 Be able to match a graphic illustration of one of the standard data mining techniques with the name of the technique 17 What are the 6 main justi cations reasons for using data mining 18 Be able to identify some of the different names used for data mining such as Connecting the Dots EDA IDA CRM BI CBR IDS Data Analytics Discovery Informatics 19 What is the formal de nition of Classi cation 20 What is the formal de nition of Clustering 21 Be able to identify Classi cation algorithms 22 Be able to identify Clustering algorithms 23 Be able to explain the Classi cation process and the different steps in the process 24 Know the differences and different uses of Training Data and Test Data 25 What is Over tting Know how to detect it what to do about it and how to avoid it 26 What is meant by Class Discovery Is that a form of classi cation or clustering Is it a form of Supervised Learning Unsupervised Learning or Semisupervised Learning 27 What is meant by Rule Learning Give some examples of techniques that use this concept 28 Understand that PCA Principal Components Analysis and ICA Independent Components Analysis are used to detect and measure correlations in data parameters 59 899PN D ID ID ID I th I UI B Basic Concepts in Data Mining 1 What are the basic concepts in Data Mining 2 What are the key steps in the Data Mining process 3 Which step usually takes the longest Why 4 How do the basic concepts relate to the key steps in the Data Mining process 5 What types of activities are performed during the Data Previewing phase 6 What types of activities are performed during the Data Preparation phase Understand what they are how they are used why they are used and how they are different 7 What is a Feature Vector 8 What is Feature Selection 9 Know and understand the different data types that are encountered in databases and in data mining Be able to identify at least one data mining technique that works specifically on each data type 10 What is Data Normalization What is Data Transformation How are these 2 concepts different How are they the same 11 Give examples of data normalization Be able to apply data normalization to simple tables of numbers 12 Give examples of data transformation Be able to apply data transformation to simple tables of numbers 13 What is meant by distance in data mining 14 What is meant by similarity in data mining 15 How do distance and similarity relate to each other 16 What are distance and similarity metrics Why are they used Be able to describe how they are used 17 Be able to list some of the challenges issues and problems with using these metrics 18 What are the general mathematical requirements that any similarity or distance metric must satisfy 19 Be able to identify by name some of the more common distance and similarity metrics 20 Be able to apply some of the more common distance and similarity metrics to simple problems and simple tables of numbers 21 What is meant by Accuracy in data mining 22 Be able to describe and to use the Classification Accuracy Matrix ie Classification Error Matrix 23 Know how to calculate how to use and what is meant by these terms True Positive False Positive True Negative and False Negative 4 Understand the meaning of and how to calculate the different measures of accuracy in data mining Overall Accuracy Producer s Accuracy and User s Accuracy N C Data Mining in Action 1 What is meant by Data Mining in Action Give examples 2 What is meant by Mining for Actionable Data 3 Be able to identify and to explain scientific examples of data mining in action 4 What is meant by Intelligent Data Understanding 5 What is meant by Science Goal Monitoring Give examples 6 What is Discovery Informatics Give some science discipline examples D Supervised Learning Classification 1 Be able to identify the names of various Classification algorithms used in data mining 2 Be able to identify specific data mining applications that use Classification algorithms 3 Know the steps in the classi cation process E Classi cation Bayes Theorem 9 gt1 994 509 Know what Bayes Theorem is what the terms mean how to calculate the terms and how to use it for data mining classi cation problems What is meant by Inference What is meant by Deduction What is meant by Induction What is the difference between the Hypotheticaldeductive framework for science versus the Observationalinductive framework Is data mining inference or deduction Is KDD inference or deduction What is statistical inference Give examples of 2 statistical inference techniques that we have studied in detail In Bayes theorem what is the Prior what is the Posterior what is the Likelihood and what is the Evidence Know how to calculate each one of these What is the False Positive Paradox Be able to solve a simple problem of this type What is a Bayes Inference Engine BIE F Classi cation SVM DT KNN Q l 2 3095 What do these acronyms mean SVM DT KNN Know the de nition of data mining that refers to the transformation of knowledge into a rule format representation In SVM what is the goal How is this accomplished What is a kernel function in SVM What are some of the dif culties in applying the SVM technique What is meant by costsensitive classi cation Know how decision theory and cost functions can be used to optimize the results of a data mining classi cation application What is meant by Rule Induction in Decision Tree classi cation applications What is Information Gain How is it used What is its main advantage Be able to describe the use of Decision Trees for data mining Classi cation applications What are some of the best reasons for using Decision Tree classi cation Be able to label and to identify the constituent pa1ts of a Decision Tree diagram Know how to interpret and how to use a decision tree Know how to construct simple decision trees from simple databases Know how to calculate classi cation accuracy from a decision tree application What is KNN And how is it used in Classi cation data mining applications Know how to calculate and apply KNN for simple databases How is K selected in a KNN application What is the relationship between the training data and the model in KNN How do you estimate the likelihood of the classi cation in a KNN application Classi cation Regression l 2 3 4 What is Regression What is the difference between linear and nonlinear regression Give a scienti c example of a nonlinear relationship between variables How is regression used in Classi cation data mining 5 Describe how regression is both Descriptive and Predictive 6 What data type is typically required for a regression algorithm Compare this with Logistic Regression and Bayes Classi cation E Classi cation ANN Arti cial Neural Networks How are ANN and Regression similar Be able to describe the use of ANN for data mining Classi cation applications What are some of the reasons for using ANN What are some of the problems with using ANN Be able to label and identify the constituent parts of an ANN diagram How many training samples are needed in order to train an ANN model Understand some of the concepts and terminology related to ANN weights activation function ring rule hidden layers backpropagation feedforward Know some examples of Activation Functions and their basic mathematical properties Know how to calculate values of different nodes in an ANN using values of the input nodes What is an SLP How do you calculate the Classi cation output from an SLP What is Gradient Descent Know how this concept is related to these 3 applications ANN backpropagation testing for over tting and Genetic Algorithms Under what circumstances would multiple hidden layers be used in an ANN Know some of the advantages and disadvantages of using ANN 14 Be able to calculate some simple ANN node values including output Class using ring rules weights and inputs QMer N HH 000 HO 39 D ID I LAN 1 Classi cation Markov Models 1 Know the difference between statistical independence and statistical dependence and when each is applicable 2 What is the de nition of statistical independence 3 What is Temporal Data Mining Do we assume statistical independence or statistical dependence in solving these types of problems 4 What is a Markov Chain Be able to give examples 5 What is a Markov Model Be able to solve simple problems using Markov Models 6 What is a FirstOrder Markov Model 7 What is a SecondOrder Markov Model 8 What is a Hidden Markov Model HMM Be able to give examples 9 Be able to explain how Markov Modeling is a form of Predictive data mining 10 What is a State What are Transition Probabilities What are Emission Probabilities 11 How is Markov Modeling similar to Bayesian Analysis How are Markov Models used for data mining Classi cation applications Can you predict the weather with Markov Models Explain Can you predict the stock market with Markov Models Explain Can you predict Gene Sequences with Markov Models Explain Can you do voice recognition with Markov Models Explain 17 Can you do pattern recognition with Markov Models Explain D ID ID ID ID I ONUIAUJN J Classi cation Genetic Algorithms GAs

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.