### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 629 Class Note for CHEM C1260 at Purdue

### View Full Document

## 85

## 0

## Popular in Course

## Popular in Department

This 14 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 85 views.

## Similar to Course at Purdue

## Reviews for 629 Class Note for CHEM C1260 at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Local methods and simple comparisons Randy Julan Lily Research Laboratories Remember the bad news Sometimes local effects need to be taken into account 1 Various approaches lNonparametericl I Parameteric I lNonparameteric Density Based Approaches Geometric Approach 2 Nonparametric Density Estimates Parametric forms involve a functional form of the density and tting parameters of the function via estimation from the sample Typically a Gaussian is used pee exp Hx fz l x lt0 Non parametric methods do not use a parametric functional form of the density Popular versions Parzen windows or kernel estimate k nearest neighbor estimate 4 Leading methods for estimating density at a point Parzenwindows 1 n4 119 1116 1100 viav 3905 knearest neighbors 1 Bias and Variance of KNN amp Parzen lt gt Fukunaga gives the bias and variance of Parzen 1395 varmx 5 kn A k1 var3x z m px NW k In Parzen windows kx is a random variable and v is held fixed In knn vx is a random variable and k is held fixed In both cases increasing the number of nearest neighbors k used to reduce variance But this increases v giving a coarser estimate of pX and increasing bias 4 BiasVariance Trade Off High Bias o Pred ction Error Low Variance rl raining SELI LOW Bias High Variance Low High Model Complexity 3 k 1 vs k nn X2 A x A a 39 0 nearest neighbor k nearest neighbor 1 J Nearest Neighbor Example k15 15Nearest Neighbor Classifier 3 Finding an fx Nearest Neighbor Nearest Neighbor tries to find fx directly from the training data fx Aveoi e Nkx lt Where Nkx is the neighborhood containing the k points in T closest to x Approximations expectation is approximated by averaging over sample data conditioning at a point is relaxed to conditioning on some region close to the target point lt gt Assumes fx is approximated by a locally constant function 10 3 n knn library function in R ksneaest neighbor classi cation for test set from training set For each row of the test set the k39 neaest h Euclidean distmoe training set vecmrs are found and the classi cation is decided by majority vote with ties broken at random If there are ties for the k39th nearest vector all cmdidams ae included in the vote Usage knntrain test cl k 1 o prob FALSE useall TRUE tra39n matrix or dam frame of tra39n39ng set Cases test matrix or dam frame of test set cases A vector will be interpreted as a row vecmr for a S39ngle Case cl facmr of true classi cations of tra39ning set k number of neighbors consideed I minimum vote for de nite decision otherw39se doubt More precisely less tha1 ksl39 dissenting votes are allowed even r k39 is 39ncreased by ties prob lrthis is true the proportion of the votes for the whning class are returned as atlrbute prob39 useall controls handling of ties if true all dista1ces equal to the kth largest are ncluded Iffalse a random selection ordistances equal m the k39th is chosen to use exactly k neighbors Value factor of classi cations of Est set doubt will be returned as NA39 6 k nearest neighbors in R p lt asmatrixPopr 3 H lt POPSY xp lt seq60 100 length 10 up lt lengthp yp lt seq60 100 length 10 pt lt7 expandgridhcl xp x2 yp from library unet par mfcolc 1 2 Z lt knnpr pt tp k 1 from libraryclass decplotxpr YP classindZ r quotk1quot 5 13 n Functions in R decplot lt7 functlonxp yp z t plotPop 1 Pop 2 x11mcrsvo1o y11mc761o type quotnquot xlab quotx1quot ylab quotx2quot t1tlet for11 111 12 set lt7 Popy1evelsPopy 11 textPopset 1 Popset 2 labels asvcharacterPoPYset col 2 11 zp lt7 Z1 e z 2 contourxp yp matrlxzp 11p add T levels o labex o 111115113160 5 14 1 k3 2 a a a a a a E 39 e n e n n n n n n n I I I I I I I I 75 u 5 In 75 u 5 In X1 X1 15 maximize local effects 1 k k1 mx 10 Xi 15 k1 k1 Variance in k 1 n k 15 exlOadat gtc1m 12 1 3 0 2 0 6 gtc2m 12 1 3 0 2 0 0 eX10bdat k15 m ea m 17 test pattern training pattern 2 Classification Error k Number 039 Nearest Neighbors 151 83 45 25 15 9 5 3 1 i i i i i i i i i i i ii i i i i i i I m Linear N r 39 a n I 8 I 39 39 39 8 39 n o 39 39 g m a m Train Test Bayes e d i i i i i i i i i 2 3 5 B 12 18 29 67 200 Degrees of Freedom Mk 3 Bias on training set 20 x a 0 ke O oe a o i o immaams o Avg gtsd f gimgwm in 3393 20 25 30 10 n Variance error by k on Test sets 10 I 9 I AveError 8 I Mixture of Gaussians should converge on Bayes limit 15 I Re 10 I ki n Diminishing returns 15 a sofaquot 9 5 g gquot a x o p L W 3amp1 tar 4 397 quotVFW an an new a 0 9 Re D In 4D EEI an ion ki 23 KNN and Parzen Density Estimates 9 In the Parzen window uniform kernel estimate kernel volume is xed and we counted the number of samples falling inside the volume to estimate pgtlt In the Knearest neighbor estimator we choose a point x at which we wish to estimate the density and construct the smallest region Lgtlt that contains k points Then estimate the density at x as where N is the total number of points Vx the volume of the minimal ndim region containing k points and the numerator is k 1 sothat the estimate is roughly unbiased Ifr is the distance from x tothe k h nearest neighbor then we can take Vx to be the volume inside the nsphere of radius r F where Fis the Euler gamma function 24 4 12 Next Time n Multiple classes amp Mixtures LDA QDAlprediclivej Pregnanelnnl nus nzn Inn sun I I I I I I I I Pregnanelnnl nus nzn Inn San I llll In zuausu Yelmhvdmcamsane QDAldehizsedj p regnanannl p regnanannl nus n2n Inn San nus n2n Inn San Next Week Q ExamProject Starts Monday Q Dongmao Zhang BenAmotz Group D Zhang and D Ben Amotz Enhanced Chemical Classification of Raman Images with of Strong Fluorescence Interference Appl Spectrosc 54 2000 1379 83 n References 1 Pattern Classification Duda Hart Stork John Wiley Sons 2001 2 Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings Lipinski C A Lombardo F Dominy B W amp Feeney P J Advanced Drug Delivery Reviews 23 3 25 1997 3 quotThe Elements of Statistical Learning Data Mining Inference and Predictionquot Hastie Tibshirani and Friedman Springer Verlag 2001 4 Statistical Pattern Recognition K Fukunaga 2nd Ed Academic Press 1990 Purdue Electrical Engineering 5 Modern Applied Statistics in S WN Venables BD Ripley 4th Ed Springer Verlag 2002 6 R Documentation wwwr projectorg

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.