### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 651 Class Note for CS 59000 with Professor Qi at Purdue

### View Full Document

## 15

## 0

## Popular in Course

## Popular in Department

This 28 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 15 views.

## Similar to Course at Purdue

## Reviews for 651 Class Note for CS 59000 with Professor Qi at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

CS 59000 Statistical Machine learning Lecture 8 Yuan Aura j PUMME C3 3ampth 18 200 Outline Review of Regularized regressionBayesian regression Equivalent kernel and Model Comparison Linear classification Discriminant functions Probabilistic generative models Probabilistic discriminative models Regularized Least Squares Adding regularization term to error controls overfitting Ew1EWW Where 2 is the regularization coefficient that controls importance of datadependent error EDw and the regularization term EWW Simple form of regularizer Ew wrw Total error function becomes wquot x F wlw Called weight decay because in sequential learning weight values decay towards zero unless supported by data More Regularizers Re ularized Error 1 9 z N 2 71 In wT ltxn l2 I w r Where q2 corresponds to the quadratic regularizer q1 is known as lasso Regularization allows complex models to be trained on small data sets without severe overfitting Contours of regularization term wjq q r 5 Am r j L4 Visualization of Regularized Regression Minimization With quadratic Regularizer q2 Blue Contours of unregularized errorq function f Constraint region With q1 and A is r sufficiently large some of the coefficients w Minimization are driven to zero With lasso Regularizer Sparse models where 41 corresponding basis functions play no role Bayesian Linear Regression o Prior probability distribution over model parameters w Assume precision Bis known Since Likelihood function ptw with Gaussian noise has an exponential form Conjugate prior is given by Gaussian pwNWm0S0 with mean m0 and covariance S0 Posterior Distributions of Parameters Given by product of likelihood function and prior pWlD pDWpWpD Due to choice of conjugate Gaussian prior posterior is also Gaussian Posterior can be written directly in the form pwtNwmNSy Where m NSNSO391 m0 BQDT t SNquot S0391BQ TQ Predictive Posterior Distribution We are usually not interested in the value of w itself But predicting tfor new values of x We evaluate the predictive distribution Mt Xafpaa N0 m xla x Where a SN x Noise in data Uncertainty associated with parameters w Predictive uncertainty depends on x and is smallest in neighborhood of data points Levelof uncertainty decreases as more data points are observed Examples of PredictiveDistribution Predictive Distribution One standard deviation from Mean Mean of the Gaussian N 1 N22 N4 N25 Equivalent Kernel Given mN SN ETt Predictive mean is N yx mN erltzgtltxgt WXJTSmTt Z ltxTsNltgtx gttn nzl where N yXgt mN Z X7 X39Iitn 711 kX X 5 XTSN X Covariance between two predictions Predictive mean at nearby points will be highly correlated whereas for more distant pairs of points the correlation will be smaller COVlyX yX l COVl XTW7WT X l XTSN X gt 51 X7X39 Bayesian Model Comparison Suppose we want to compare models Mi Given a training set we compute 19Mz39lp OCPMipDlMi Model evidence also known as marginal likelihood pUDlMi Bayes factor pDlM pDlVlj Likelihood Parameter Posterior amp Evidence Likelihood and evidence palm fpltvlwmigtpltwwigtdw Parameter posterior distribution and evidence PltDiW7MiPWiVli MPHIi 39 19Wi737Mi I Crude Evidence Approximation Assume posterior distribution is centered around its mode Atwp osterior 191 p17wpw dw ENDWMAP Awprim All postcrior G D m gt Evidence penalizes overcomplex models Given Mparameters InpD 2 1npDlWMAp A1 In lt Maximizing evidence leads to a natural trade off between data fitting amp model complexity MD M1 Evidence Approximation amp Empirical Bayes Approximating the evidence by maximizing marginal likelihood Wit plttlw6gtpltwltamlet dwdad A Wit 2 palta plttlw6gtpltwlta3gtdw Where hyperparameters a maximize the evidence Wlm Known as Empirical Bayes or type2 maximum erhhood Model Evidence and CrossValidation Fitting polynomial regression models 1 0 2 4 7 8 M Rootmeansquare error Model evidence Linear Classification Goal Given an input vector x assign it to one odeisjoint discrete classes based on training data Input space divided by decision boundaries or decision surfaces Linear classification decision boundaries are linear functions of the input vector x Three Approaches Discriminant functions Directly assigns an input vector in a specific class Probabilistic generative models Model the data generation process MxlCu and use Bayes rule PleCchJlCAl MK 39 Probabilistic discriminative models Model the class conditional densities PlClel directly PCle Distance from X to decision surface Fisher s Linear Discriminant find projection to a line st samples from different classes are well separated Example in 20 bad line to project to good line to project to classes are mixed up classes are well separated Linear projection Let the line direction be given by unit vector v Scalar vixi is the distance of projection of xifrom the origin Thus it vixi is the projection of xiinto a one dimensional subspace A na39ive choice of separation measure Let in and z be the means of projections of classes 1 and 2 Let u and 2 be the means of classes 1 and 2 39 i111 2l seems like a good measure Problem of Na39ive Separation Criterion How good is l 1 2l as a measure of separation The larger I111 7 2 the better is the expected separation 1 Hui a2 I A i 1 12 toooI I I gt I the vertical axes is a better line than the horizontal axes to project to for class separability however 1 tzl gt l 1 quot 172i Missing Factor in Consideration Variances A A CD 39 o o g 1111 quotmatiz m l I 12 39I39 I U g m 1 2 large variance Scatter of Data in Each Class Define their scatter as s z z M i1 Thus scatter is just sample variance multiplied by n scatter measures the same thing as variance the spread of data around the mean scatter is just on different scale than variance 0 0 o o larger scatter 0 smaller scatter if Solution Normalization by Scatter Fisher Solution normalize lil1 ml by scatter Let y vtxi ie y s are the projected samples Scatter for projected samples of class 1 is 12 2Yi 12 y ie Class 1 Scatter for projected samples of class 2 is g5 2yi39 22 y 5 Class 2 Next Class Perceptron Probabilistic generative models Probabilistic discriminative models

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.