### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 577 Class Note for STAT 59800 with Professor Neville at Purdue

### View Full Document

## 16

## 0

## Popular in Course

## Popular in Department

This 18 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 16 views.

## Similar to Course at Purdue

## Popular in Subject

## Reviews for 577 Class Note for STAT 59800 with Professor Neville at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Data Mining 3857300 STAT 59800 024 Purdue University February 17 2009 Naive Bayes Classifiers Example Home security PB PE Burglary 001 Earthquake 002 PAlB E 95 94 29 001 mama a meme m PJIA A PMlA F 05 F 01 Semantics e The full joint distribution is defined as the product of the local conditional distributions 9 P X X r517 Xfl Parentsogl 0 Example 9 P0 A m A a A b A e P la Flea I Paubue Pub Pue Challenge 1 choosing Bayes net structure 0 How do we determine conditional independent relations Naile Bayes assumption attributes are conditionally independent given class label Naive Bayes classifier PXlCPC PClX POO oc PXZl0 130 NBC learning Estimate prior PC and conditional probability distributions PXi C independently wMLE PC914 PIh ighCyes29 PImedCyes49 PowCyes39 etc Learning CPTs from examples Low Medium High True IO I3 I7 1 00 False 2 I3 0 Fgtgtlt1 Low fx True i 101317 P Fx False 13gt 213101317 Zero counts are a problem w If an attribute value does not occur in training example we assign zero probability to that value t How does that affect the conditional probability P PM I X 6 It equals 0 39 Why is this a problem 0 Adjust for zero counts by smoothing probability estimates Smoothing Laplace correction XI Low Medium High True IO I3 I7 fX False 2 I3 0 0 I X1 ngh I fX False 2130 Add uniform prior 10 ls assuming independence a problem 0 What is the effect on probability estimates 0 Over counting evidence leads to overly confident probability estimate C What is the effect on classification Less clear For a given input x suppose fx True 0 Na39I39ve Bayes will correctly classify if PFx True Xgt05 11 Naive Bayes classifier 0 Simplifying naive assumption attributes are conditionally independent given the class 0 Strengths Easy to implement 0 Often performs well even when assumption is violated 0 Can be learned incrementally Weaknesses 0 Class conditional assumption produces skewed probability estimates 0 Dependencies among variables cannot be modeled 12 NBC learning Model space 0 Search algorithm Evaluation function 13 Other predictive models 14 Nearest neighbor Instancebased method Store instance and delay processing until a new instance must be classified All point represented in mdimensional space Nearest neighbols are calculated using Euclidean distance I kNN retums the most common value among k closest training examples How to choose k 15 Nearest neighbor decision boundary All points in such a cell are labeled by the class of the training point forming a Voronoi tesselation of the feature space am my MNwwcs mum mu yr sausaycamamsmsunngauuasiaamssmmnmyaam pm 16 Nearest neighbor Strengths 0 Simple model easy to implement O Very efficient learning 01 0 Weaknesses O Inefficient inference time and space On O Curse of dimensionality 17 kNN learning 0 Model space C Search algorithm 0 Evaluation function 18 Neural networks 0 Analogous to biological systems 0 Massive parallelism is computationally efficient 0 First learning algorithm in 1959 Rosenblatt 0 Perceptron learning rule 0 Provide target outputs with inputs for a single neuron 0 Incrementally update weights to learn to produce outputs 19 Neuron output y Input Weight Weighted Activation vector x vector w sum function 20 Peroeptron Model Zwimi b i1 y Learning m if yiZw ij b S 0 then w lt w 21 Peroeptron learning 0 Model space Search algorithm 0 Evaluation function 22 Support vector machines 0 Discriminative model 9 General idea 9 Find best boundary points support vectors and build classifier on top of them 0 Linear and nonlinear SVivls Choosing hyperplanes 24 Maximize margin In Bl bll margin lor 81 D12 Source lnlroduclionlo DalaMining Tan Sieinbacn and Kurmr 25 Linear SVMs a Search for hyperplane with largest margin 0 Margind Cl Sprort vectors where d is distance to closest positive example and Cl is distance to closest negative example 26 Constrained optimization 52 1f0ryj 1 b 1 for yj 71 l Eq3 yjj w 712 OVyU qu Eq21 96039 71 71 H1 3jwb1 H2 3jwb71 1 i margin iiwii d d 7 iiwii 3 Can maximize margin by minimizing w2 subject to constraints 27 Constrained optimization 0 Can maximize margin by minimizing w subject to constraints on EqS Introduce Lagrange multipliers and minimize 28 Lagrange function I I 1 Lp w2 gemaim w b 21 21 Minimize LP with respect to w b and on 20 Convex programming problem I LB Em e gZazajymwxm wmi i1 23339 Dual problem maximize LD wrt constraints on w b and on 20 29 Dual problem 0 For a convex problem no local minima the dual problem is equivalent to the primal problem ie we can switch between them O Dual depends on inner product between feature vectors simpler quadratic programming problem 30 Limitations of linear SVMs O Linear classifiers cannot deal with Non linear concepts O Noisy data 0 Solutions 0 Soft margin eg allow mistakes in training data C Network of simple linear classifiers eg neural networks 0 Map data into richer feature space egq non linear features and then use linear classifier 31 Map to new features O Define a new set of features where data are linearly separable ltlgtXX1 X2 X1X2 X12 X22 32 Kernel trick Note that the dual problem only depends on iT i 0 Move to an infinite number of features by replacing I with a kernel iT i 39 KLi 0 Here kernel K is a function that returns the value ofthe dot product between the two arguments 0 As long as kernel is symmetric and positive semi definite you can forget about the features 0 Example Polynomial kernel KiJr XiTxjd 33 Kernel SVMs 0 State of the art classifier with good kernel O Solves computational problem of working in high dimensional space 0 Non parametric classifier keeps all data around in kernel O Learning On approximations available for On 34 Next Class Reading Chapter 10 PDM Topic PM evaluation 35

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.