### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 639 Class Note for STAT 59800 with Professor Neville at Purdue

### View Full Document

## 11

## 0

## Popular in Course

## Popular in Department

This 20 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 11 views.

## Similar to Course at Purdue

## Reviews for 639 Class Note for STAT 59800 with Professor Neville at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Data Mining 3857300 STAT 59800 024 Purdue University February 10 2009 Predictive modeling learning Learning algorithms Evaluation funciion 0 Search technique Search an agent with several immediate options of unknown value can decide what to do by first examining different possible sequences of actions that lead to states of known value and then choosing the best sequencequot Russell amp Norvig 2002 Problem solving as search 0 Search Exploration of possibilities Abstraction of general problem solving method 0 Basic ingredients 0 States describe configuration of environment 0 Actions activities which move you from one state to another 0 Search algorithm determines which actions should be tried in which order to find solution Cannibal and Missionaries problem Three cannibals and three missionaries come to a crocodile infested river c There is a boat on their side that can be used by either one or two persons C If cannibals outnumber the missionaries at any time the cannibals eat the missionaries How can they use the boat to cross the river so that all missionaries survive Problem formulation c What is the goal to be achieved C What are the actions 0 What is the state space Search algorithms Generate a search tree by 0 Considering a particular state C Testing to see if it is the goal state 0 And if not expanding the node to generate successor states by applying all possible actions 0 Search strategies differ in their choice of which state to expand Local search 0 Used when path to goal state does not matter 0 Basic idea 0 Only use current state Don t save paths followed 0 Why use local search O Low memory requirements Usually constant C Effective Can often find good solutions in extremely large state spaces 10 Example Eightqueens problem o Place eighl queens on a ohessboard so lhal no queen can allaok anolher 0 Queen can move in ihe same row column or diagonally 0 Heurislio number of pairs of a aoking queens 0 How Io formulale search ll Example Hillclimbing o Consider slalerspaoe landscape where 0 Looalionslale 0 Elevalionevaluation function of slale o Continually move in direolion of increasing value o iflrying io maximize slale evalualion luneiion l2 Evaluation functions Guide search inside of algorithms 0 Select pa39th in heuristic search 0 Decide when to stop O Identify whether to include model or pattern in output 0 Evaluate results outside of algorithms 0 Measure the absolute quality of model or pattern O Compare the relative quality of different algorithms or outputs 13 Decision tree learning 14 Learning predictive models Choose a data representation I Select a knowledge representation a model Use search to identify possible models Search a space of alternative model structures andor parameters 0 Score possible models with an evaluation function to determine the model with the best fit to the data 15 Tree models Easy to understand knowledge representation Can handle mixed variables Recursive divide and conquer learning method I Efficient inference 16 Tree learning Topdown recursive divide and conquer algorithm 0 Start with all examples at root 0 Select best attributefeature e Partition examples by selected attribute 0 Recurse and repeat 0 Other issues e When to stop growing 0 Pruning irrelevant parts ofthe tree 17 DecrsronTreeexamples classLabel attrrbutes features lt n for each attrrbute for each attrrbute value create feature f features lt features f create root node of tree grov ree root examples features grov reenode examples features max5care lt o maxFeature lt hull for each feature in features calculate score of feature on examples 1f score gt maxScore amp stopprhg crrterra not met maxFeature lt feature maxScore lt score 1f maxFeature ls hull madeClassDist lt drst of classLabel 1n examples returh stop growlng else hedeFeature lt maxFeature create nodes leftChild aha rightchild lchl39ldExamples lt examples that pass hadeFeatureTest rchl39ldExamples lt examples that farl hadeFeatureTest recurse on partrtroned data growTreeleftChlld leftchrlcuaxamples features growTreerlghtChlld rlghtchlldExamples features 18 Tree models u Most wellskrlowl39l systems CART Bieimen Friedman 0lsnen and Stone u IDs 045 Quinlan How do tney differ Split eielpetion function Stopping criterion u Pruning lllechanism t9 Choosing an attributefeature ldee a good attribute splits tne examples into subsets that are ideally all positive or all negative 2n Split evaluation Buy No buy High 2 2 Med 4 2 Low 3 1 21 Entropy 0 Used to quantify the amount of randomness of a probability distribution 0 Definition The entropy HX of a discrete random variable X is defined by HltXgt Q pltxgtlog2 pm 22 22 Entropy of a random variable A completely random binary variable with X0505 has entropy HX 05 log 05 05 log 05 05 05 1 A deterministic variable with X10 has entropy HX 1 log 1 0 log 0 O0 0 10 A biased variable with XO75025 has HX 08113 E 05 The entropy of a probability m distribution p expresses the amount of uncertainty that we have about X 0 28 Information gain 0 How much does a feature split decrease the entropy GainS A EntropyS Z EWtT0PySA vaaluesA EntropyD 94 log 9l4 54 log 5I4 09400 24 24 Information gain GainS A EntTopyS 39u E values A Z EntropySA Entropylncomehigh 24 log 24 24 log 24 Entropylncomemed 46 log 46 26 log 26 09I83 Entropylncomelow 34 log 34 4 log 4 08I I3 GainDncome 09400 44 I 6I4 09I83 44 08I 3 0029 25 25 Exam pie 0 O C O O C a What is the information gain for O O O O C C Type anch Italian Thai Burger O O O O O O O O O O O C C O O O O O C O O O O O 6 What is the information gain for Patrons None Some Full 0 O O O C O C C O C C O 26 26 Gini gain 0 Similar to information gain a Uses gini index instead of entropy Gim39X 1 212002 6 Measures decrease in gini index after split GainSAGiniS Z GimsA vaaluesA l I 27 27 Contingency tables Buy No buy High 2 2 4 Med 4 2 6 Low 3 4 9 5 I4 28 28 ChiSquare test Calculates the normalized squared deviation of observed predicted values from expected actual values 0i ei2 e k 1 l 1 Sampling distribution is known given that cell counts are above minimum thresholds Widely used to test independence between counts of categorical data 22 29 Expected values in cells Class 0 e2 e k 1 l l Attribute ea 9a N pattribute 0 pclass I attribute 0 N pattribute 0 pclass 39 N abN39acNN 30 30 Example calculation Observed Expected Buy No buy Buy No buy High 2 2 High 257 43 Med 4 2 Med 386 2 4 Low 3 Low 257 43 X2 i k oi ei2 22572 4 3862 32572 1 e 257 386 257 2 1432 2 2142 1 1432 143 214 143 057 1 31 31 Tree search algorithm 0 input Initial state a Set of actions 0 Goal test Path cost function 3 Output 2 32 Algorithm comparison 0 CART O Evaluation criterion Gini index 0 Search algorithm Simple to complex hill climbing search 0 Stopping criterion When leaves are pure O Pruning mechanism Cross validation to select gini threshold 0 C45 0 Evaluation criterion Information gain a Search algorithm Simple to complex hill climbing search Stopping criterion When leaves are pure 0 Pruning mechanism Reduce error pruning 33 When to stop growing 0 Full growth methods require pruning later 0 All samples for at a node belong to the same class 0 There are no attributes left for further splits C There are no samples left O Prepruning methods 0 Stop when split evaluation measure is not significant 34 34 Pruning Postpruning 0 Use a separate set of examples to evaluate the utility of postpruning nodes from the tree Prepruning Apply a statistical test to decide whether to expand a node 0 Use an explicit measure of complexity to penalize large trees eg Minimum Description Length 35 Search with chi square 2 Ct cl 0 Split evaluation maXImize chi X2 2 M square i 1 Stopping criterion stop growing when chisquare value is df 2 not significant df Prob alziilitiI n n H A 36 Movement in the table Actual o a D o O O O O O 00 o 08 D 39C D D o 2 a b 00 9 000 8 E C d O Can be used to make search more efficient 37 Other issues 0 Why not just use accuracy Are the different roles of evaluation functions compatible 38 Next Class Reading Chapter 8 PDM 0 Topic Learning cont O Due Homework 2 39

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.