### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Machine Learning CS 545

CSU

GPA 3.51

### View Full Document

## 42

## 0

## Popular in Course

## Popular in ComputerScienence

This 79 page Class Notes was uploaded by Betty Kertzmann on Tuesday September 22, 2015. The Class Notes belongs to CS 545 at Colorado State University taught by Charles Anderson in Fall. Since its upload, it has received 42 views. For similar materials see /class/210186/cs-545-colorado-state-university in ComputerScienence at Colorado State University.

## Reviews for Machine Learning

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/22/15

C5545 Classification with Logistic Regression Chuck Anderson Department of Computer Science Colorado State University Fall 2009 Outline Logistic Regression y Setup Derivation Masking 0 Recall that a linear model used for classification can result in masking Masking 0 Recall that a linear model used for classification can result in masking a We discussed fixing this by using different shaped membership functions other than linear Masking 0 Recall that a linear model used for classification can result in masking a We discussed fixing this by using different shaped membership functions other than linear 0 Our first approach to this was to use generative models Gaussian distributions to model the data from each class Masking 0 Recall that a linear model used for classification can result in masking a We discussed fixing this by using different shaped membership functions other than linear 0 Our first approach to this was to use generative models Gaussian distributions to model the data from each class 0 Using Bayes Theorem we derived QDA and LDA Masking 0 Remember this picture 001 Indicator variables 010 100 Masking 0 Remember this picture 01 0 Indicator variables 010 100 0 The problem was that the green line for Class 2 was too low In fact all lines are too low in the middle ofx range Maybe we can reduce the masking effect by Masking 0 Remember this picture 01 0 Indicator variables 010 100 0 The problem was that the green line for Class 2 was too low In fact all lines are too low in the middle ofx range Maybe we can reduce the masking effect by requiring the function values to be between 0 and 1 and Masking 0 Remember this picture 01 0 Indicator variables 010 100 0 The problem was that the green line for Class 2 was too low In fact all lines are too low in the middle ofx range Maybe we can reduce the masking effect by requiring the function values to be between 0 and 1 and requiring them to sum to 1 for every value of X Logistic Regression Setup a We can satisfy those two requirements by directly representing pC kix as X750 2L1 fxv m where we haven39t discussed the form of f yet but 6 represents the parameters of f that we will tune to fit the training data later MCM Logistic Regression Setup a We can satisfy those two requirements by directly representing pC klx as flx k 2L1 fxv m where we haven39t discussed the form of f yet but 6 represents the parameters of f that we will tune to fit the training data later MCM o This is certainly an expression that is between 0 and 1 for any x Logistic Regression Setup a We can satisfy those two requirements by directly representing pC klx as f x6 2m1 fxv m where we haven39t discussed the form of f yet but 6 represents the parameters of f that we will tune to fit the training data later 0 This is certainly an expression that is between 0 and 1 for any x 0 Now we have pC klx expressed directly as opposed to the previous generative approach of first modeling pxlC k and using Bayes39 theorem to get pC kix 0 Let39s give the above expression another name fX75k x C k x g mm P 2g1fx m 0 Let39s give the above expression another name X750 X C kx g v k P i 2L1 fogam 9 Now let39s deal with our requirement that the sum must equal 1 K 1 ZPKC kix k1 Mx gb k ii 1 0 Let39s give the above expression another name fX75k x C kx g k P l 2g1fx m 9 Now let39s deal with our requirement that the sum must equal 1 K K 1 ZPMC klx 2mm k1 k1 0 However this constraint overdetermines the gx k If 1abc must be true then given values for a and b c is already determined as c 17 a 7 b 0 Let39s give the above expression another name f x6 gixi an pC klx Kl k Zm1 Xv rm 0 Now let39s deal with our requirement that the sum must equal 1 K K 1 2pm klx Zgx k k1 k1 0 However this constraint overdetermines the gx k If 1 a l b l 6 must be true then given values for a and b c is already determined as c 17 a 7 b 0 Another way to say this is that we can set c to any value and values for a and b can still be found that satisfy the above equation For example 1aic2bic22c 0 So let39sjust set the final fx6k to be 1 Now X7 k 1 21 fx m 1 1 21 fx m kltK 0950 kK Derivation 9 Whatever we choose for f we must make a plan for optimizing its parameters 6 How Derivation 9 Whatever we choose for f we must make a plan for optimizing its parameters 6 How 0 Let39s maximize the likelihood of the data So what is the likelihood of training data consisting of samples x17 xz 7xN and class indicator variables f11 1 13 f1K 1 11 f2 flK fN1 1W 2 fNK with every value tmk being 0 or 1 and each row of this matrix contains a single 1 We can also express x1XQ7 7xN as an N x D matrix but we will be using single samples xn more often in the following Data Likelihood o The likelihood isjust the product of all pC class of nth samplelxn values for sample n A common way to express this product using those handy indicator variables is K71 Lia H H M kion n1 k1 Data Likelihood o The likelihood isjust the product of all pC class of nth samplelxn values for sample n A common way to express this product using those handy indicator variables is N K71 H H PlC klxnlt n71 k1 0 Why does second product stop with K 71 Data Likelihood o The likelihood isjust the product of all pC class of nth samplelxn values for sample n A common way to express this product using those handy indicator variables is N K71 LWHJIMCMMW n1 k1 0 Why does second product stop with K 71 0 Say we have three classes K 3 and training sample n is from Class 2 then the inner product is M 1ixnt1pc 2ixnr2pic sixnw pC 1an pC 2an1pC 3an 1MCmmV1 MCmm This shows how the indicator variables as exponents select the correct terms to be included in the product Maximizing the Data Likelihood 0 So we want to find 6 that maximizes the data likelihood How shall we proceed K71 um H H M kion n1 k1 Maximizing the Data Likelihood 0 So we want to find 6 that maximizes the data likelihood How shall we proceed N K71 L05 H H pC kin k n1 k1 0 Right Find the derivative with respect to each component of 6 or the gradient with respect to 6 But there is a mess of products in this So Maximizing the Data Likelihood 9 So we want to find 6 that maximizes the data likelihood How shall we proceed N K71 L05 H H pC lenWk n1 k1 0 Right Find the derivative with respect to each component of 6 or the gradient with respect to 6 But there is a mess of products in this So 0 Right Work with the logarithm log L6 which we will call 3 K71 5 log 1 5 Z tnk log PC len n1 k1 Gradient Descent o Unforunately the gradient of I6 with respect to 6 is not linear in 6 so we cannot simply set the result equal to zero and solve for 6 Gradient Descent o Unforunately the gradient of I6 with respect to 6 is not linear in 6 so we cannot simply set the result equal to zero and solve for 6 0 Instead we do gradient descent 8 e 6 aVBILB where 04 is a constant that affects the step size Gradient Descent o Unforunately the gradient of I6 with respect to 6 is not linear in 6 so we cannot simply set the result equal to zero and solve for 6 0 Instead we do gradient descent o Initialize 6 to some value 8 e 6 aVBILB where 04 is a constant that affects the step size Gradient Descent o Unforunately the gradient of I6 with respect to 6 is not linear in 6 so we cannot simply set the result equal to zero and solve for 6 0 Instead we do gradient descent o Initialize 6 to some value a Make small change to 6 in the direction of the gradient of 6 with respect to 6 or Vp D 8 e 6 aVBILB where 04 is a constant that affects the step size Gradient Descent o Unforunately the gradient of I6 with respect to 6 is not linear in 6 so we cannot simply set the result equal to zero and solve for 6 0 Instead we do gradient descent o Initialize 6 to some value a Make small change to 6 in the direction of the gradient of 6 with respect to 6 or Vp D 9 Repeat above step until 6 seems to be at a maximum 8 e 6 aVBILB where 04 is a constant that affects the step size 0 Remember that 6 is a matrix of parameters with let39s say columns corresponding to the values required for each 1 of which there are K71 0 Remember that 6 is a matrix of parameters with let39s say columns corresponding to the values required for each 1 of which there are K71 0 We can work on the update formula and VBIW one column at a time k H k t XVSkim and combine them at the end 3 H 3 ve5 V132 7 7V3K1 0 Remembering that alog x w and that PC Mxn gxn7 k N K71 5 Z fnk log PC Mxn n1 k1 K71 Z tnk oggXm k n1 k1 N K71 t k V I ni v Xny 5 k1 gxn7 k Bg Bk 0 Remembering that awayquot h1X pC kixn gm r 3h and that N K71 B Z tnk log PC kan n1 k1 K71 Z tnk oggXm k n1 k1 N K71 tnk Via Z mVagkm 6k n1 k1 quot7 0 It would be super nice if Vggxn k includes the factor gxn7 k so that it will cancel with the gxn k in the denominator a Can get this by defining fxn6k 95 so eBIXn x7 g n k 1Zg 813 Can get this by defining axmak ed M so eBIXn 1 3 M Now we can work on Vggxn k K71 T 71 T Va 1 256m 56 quot gxn7 k 2 71 1 1 33156 eafxnxnealxn 1 33159 59an quot11 quot11 551xquot 55fo spaTXquot 1 2i 56 quot 1 2 e67 1 2 e67 gxm kgxm Xn gXn mm Amy80W gXm Xn where 6jk 1 ifj k 0 otherwise 0 Now x 1 tnk 1m 1 M2 V H VBgbw k H x H x H iMz A w x 71 fnk6jk 7 gxn j tr 1 x H H M2 X th 7 gxn7 j H H 9 Now x 71 tnk 1 gxn7 k 1 M2 VB ngm k H x H x H iMz A x X 71 fnk5jk gXn7 j fnkgt 1 x H H M2 Xnth 7 gXm J H H a which results in this update rule for 8 N j H j 042th 7 gxn7 jxn n1 9 Now x 71 tnk 1 gxn7 k 71 M2 VB ngm k 2 H H x H x 71 fnk5jk gXn7 j fnkgt H x x H H M2 A x 1 H M2 Xnth 7 gXm J H a which results in this update rule for 8 N j H j 042th 7 gxn7 jxn n1 0 How do we do this in R C5545 Linear Modeling Chuck Anderson Department of Computer Science Colorado State University Fall 2009 Outline R Tips for Linear Modeling Standardizing Inputs 9 Standardize attribute values each has mean zero unit variance Standardizing Inputs 9 Standardize attribute values each has mean zero unit variance 3 Calculate mean of each attribute for training data means lt7 colMeansXLrain Standardizing Inputs 9 Standardize attribute values each has mean zero unit variance 3 Calculate mean of each attribute for training data means lt7 colMeansXLrain 9 Calculate standard deviation of each attribute for training data stdevs lt7 sdXLrain Standardizing Inputs 9 Standardize attribute values each has mean zero unit variance 3 Calculate mean of each attribute for training data means lt7 colMeansXLrain 9 Calculate standard deviation of each attribute for training data stdevs lt7 sdXLrain a Subtract means and divide by stdevs column by column Xstrain lt7 Xtrain 7 r39xmeans nrowXLrain ncolXLrain by mat I i rowTRUE malrixstdevs nr0WXLral n j ncol Xtra m byrowTRU E Standardizing Inputs 9 Standardize attribute values each has mean zero unit variance 3 Calculate mean of each attribute for training data means lt7 colMeansXLrain 9 Calculate standard deviation of each attribute for training data stdevs lt7 sdXLrain a Subtract means and divide by stdevs column by column Xstrain lt7 Xtrain 7 malrixmeans nrowO rain ncoXLrain byrowTRUEjj malrixstdevs nr0WXLral n j ncol Xtra m byrowTRU E 0 To standardize testing data use means and stdevs calculated from training data Xstest lt7 Xte malrixmeans nrowOQesL ncoXLesL byrowTRUE st 7 matrix stdevs nrowXLesL ncol Xtest byrowT RU E i I I cm 0 Must keep track of means and stdevs from training data Can do as variables returned from standardize func on standardize lt7 functionOQmeansappygtlt 2 mean stdevsapplyX 2 sd returnParmsFALSE es by nnputConponents 0 lt7 1 i F Ti Mm E l yuppie m Llll Cullllifalll Sam alrixrepmeans Nj N p byrowTRUE malrixrepstdevs N phylumAWE if returnparms list dalaX mea nsmea ns std evsstdevs else X used like lt7 standardiZeXLrain reLum ParmsTRUE Xsuam lt7 d la Xstest lt7 standardiZeOQesL Lp means LpSsLdevs 3 i C ll u i 0 Or can quotstorequot means and stdevs as local variables bound to values inside a newly created function N w makeSLandardizeF lt7 functionX 0 17 if missinggtlt H ICU e E l i7 u Fil H lt7 makeSLandardizeFX X is nSamplesx quotDimensions quot Cullllifalll X S nSSMpes x nDmensons mu lt7 colMeans Sigma lt7 sdgtlt sd showd be named mSds ownewX nconewX 7 malrixUnu nr nc byrowTRUE malrixsigma nr nc byrowTRUE functionnewgtlt Vii lt r nc lt7 newX used like standardize lt7 makeSLandardizeFXLrain Xsuam lt7 standardizeOmam Xstest lt7 standardiZeOltLesLj What should makeLLS return 0 Certainly want the weights returned After all that is the model What else What should makeLLS return 0 Certainly want the weights returned After all that is the model What else a The means and stdevs should also be associated with this model Different models will have different weights and different standardization parameters What should makeLLS return 0 Certainly want the weights returned After all that is the model What else a The means and stdevs should also be associated with this model Different models will have different weights and different standardization parameters 0 How would makeLLS return all of this What should makeLLS return 0 Certainly want the weights returned After all that is the model What else a The means and stdevs should also be associated with this model Different models will have different weights and different standardization parameters 0 How would makeLLS return all of this J 0 return list weightsw standardizestandardize What should makeLLS return 0 Certainly want the weights returned After all that is the model What else The means and stdevs should also be associated with this model Different models will have different weights and different standardization parameters How would makeLLS return all of this return list weightsw standardizestandardize 1 Use like makeLLSXLrain TLrain lambda useLLSmode XLesL model lt7 predictions lt7 What should makeLLS return 0 Certainly want the weights returned After all that is the model What else The means and stdevs should also be associated with this model Different models will have different weights and different standardization parameters How would makeLLS return all of this return list weightsw standardizestandardize J Use like makeLLSXLrain TLrain lambda seLLSmode XLesL model lt7 predictions lt7 u Inside useLLS how would you use model What should makeLLS return 9 Certainly want the weights returned After all that is the model What else a The means and stdevs should also be associated with this model Different models will have different weights and different standardization parameters 0 How would makeLLS return all of this 0 return list weightsw standardizestandardize J 0 Use like model lt7 makeLLSXLrain TLrain lambda predictions lt7 useLLsmodeLXLesi 0 Inside useLLS how would you use model 0 Say useLLS has arguments named model and X x lt7 model standardizeOQ predictions lt7 x r model weigh39s Collecting and Combiningg Multiple Results 0 We often want to repeat a calculation a number of times using different parameter values like values of and of training set fraction So you might use a for loop like for Lramf m c02 04 06 08 09 8 o byo 5 do EaEuaton here using tranf and mm to obtain tranRMSE and testRMSE Collecting and Combiningg Multiple Results 0 We often want to repeat a calculation a number of times using different parameter values like values of and of training set fraction So you might use a for loop like for Lramf in COQ 04 06 08 09 8 00 lambda in se O lo by0 5 do EaEuaton here using mm and Lambda to obtain M tranRMSEand test SE 0 Can try to do sums of RMSE39s so you can calculate average later But let39s use that cheap memory and just collect each result in a new row in a matrix do EaEuaton here using mm and Lambda to obtain tranRMSE and testRMSE results lt7 r m result c Lramf lambda LramRMSE LesLRMSE J Collecting and Combiningg Multiple Results 0 We often want to repeat a calculation a number of times using different parameter values like values of and of training set fraction So you might use a for loop like for Lramf m can 04 06 08 09 8 00 lambda in se 010by0 5 do EaEuaton here using trainf and Lambda to obtain trainRMSEand testRMSE 0 Can try to do sums of RMSE39s so you can calculate average later But let39s use that cheap memory and just collect each result in a new row in a matrix do EaEuaton here using tranf and Lambda to obtain tranRMSEan testRMSE results lt7 r m results c Lramf lambda LramRMSE LesLRMSE J 9 Don39t forget to initialize results results lt7 0 before you start the for loops 0 Now the matrix has many rows 200 for each pair of trainf lambda values How can we calculate the means of those 200 values Check out 39unique gt results 1 2 3 4 02 1 32 36 war o m 000 m m w w m l 02 1 5 33 niqueUesulLs 1 2 1 L21 1 0 Now the matrix has many rows 200 for each pair of trainf lambda values How can we calculate the means of those 200 values Check out 39unique gt results 1 i2 i3 i4 1 02 01 32 36 2 02 05 53 32 3 02 01 5 33 gt uniqueresults 1 2 i1 i2 1 02 01 2 02 05 So we can use unique to identify unique combinations of parameter values in our results matrix 0 Now the matrix has many rows 200 for each pair of trainf lambda values How can we calculate the means of those 200 values Check out 39unique gt results 11 1 i3 i4 02 1 32 36 V33 o m 000 m m w w m o 2 1 queresults 1 1 1 5 33 ni 12 m 0 So we can use unique to identify unique combinations of parameter values in our results matrix 0 Generate a boolean mask to select rows for one unique combination gt uniqueCombos lt7 uniqueresults12 gt oneCombo lt7 uniq eCombos1 gt mask lt7 applyUesulLsLl 2 1 function p5 all psoneCornbo E gt colMeansresulLsmasllt34 1 435 345 Outline Backing up Files in Unix Making Backups in Unix 0 Simple shell script linked to in C8545 Schedule web page Making Backups in Unix 0 Simple shell script linked to in C8545 Schedule web page a Copy into your bin directory and make it executable by chmod ax backup Making Backups in Unix 0 Simple shell script linked to in C8545 Schedule web page a Copy into your bin directory and make it executable by chmod ax backup 0 Add your bin directory to your PATH shell variable export PATHPATHz sparsonsefacandersonbin Making Backups in Unix 0 Simple shell script linked to in C8545 Schedule web page a Copy into your bin directory and make it executable by chmod ax backup 0 Add your bin directory to your PATH shell variable export PATHPATHsparsonsefacandersonbin 0 Then to backup filesjust do gt backup R Creating BACKUP directory Created the following BACKUP files BACKUPhighdR20090910114846 BACKUPmngollinearityR20090910114846 BACKUP now contains 2 files Outline Collinearity Collinearity of Attributes o What if one attribute is a linear function of another attribute say pressure 72 temperature How might this affect their weight values Collinearity of Attributes o What if one attribute is a linear function of another attribute say pressure 72 temperature How might this affect their weight values 9 Given any linear model weights the model will make exactly the same predictions if the weight value for temperature is multiplied by a and the weight value for pressure is multiplied by 723 Collinearity of Attributes o What if one attribute is a linear function of another attribute say pressure 72 temperature How might this affect their weight values 9 Given any linear model weights the model will make exactly the same predictions if the weight value for temperature is multiplied by a and the weight value for pressure is multiplied by 723 a How can we create a new attribute that is a linear function of another and vary how close it is to linearly dependent Collinearity of Attributes o What if one attribute is a linear function of another attribute say pressure 72 temperature How might this affect their weight values 0 Given any linear model weights the model will make exactly the same predictions if the weight value for temperature is multiplied by a and the weight value for pressure is multiplied by 723 a How can we create a new attribute that is a linear function of another and vary how close it is to linearly dependent 0 Add noise to a linear function Say X has 7 columns x lt7 cbindgtlt 72 xm mormnrowX 0 01 0 Doing this for the mpg data and varying the stan dard deviation 0 of the noise we get these weight values 8 8 ongin o 79 noisvori g a g 7 9 o q 8 7 i i O i i N 2 8 g o r g eatenln tleI aeaeu i g 79 o i g i T quot 8 i 8 8 i39 o e i 3 i 9 i i i i i i i 50 45 40 35 30 25 20 iog 5 degree of independence between 8 and 9 C5545 Bishop 23 25 31 Chuck Anderson Department of Computer Science Colorado State University Fall 2009 Outline Gaussian Distribution Generating Samples a 1 dimensional Gaussian iOPAQ 1 PMMTZ We mydnorm1lt7 functionx mu0 0 swgma10 1squ2pwswgma exp712sgmaA2 x 7 mum a 1 dimensional Gaussian iOPAQ 1 PMMTZ We mydnorm1lt7 functionx mu0 swgma10 1squ2pwswgma exp712sgmaA2 x 7 mum o 2 dimensional Gaussian 1 1 T 71 pxmzWe 2ltx H 2 x H mydnorm2lt7 functionx muc0 0 swgmadiag1 2 as numeric12p squdetswgma ex 712 1x7mu WA solvesgma WA ximujjj a 1 dimensional Gaussian 1 77 7 2 pmwz W9 X mydnorm1lt7 functionx mu0 Sigma10 1squ2pisigma exp712sigmaA2 x 7 muygj o 2 dimensional Gaussian 1 1 T 71 7 quotPM gt3 XiJ pxip 27r 2 12e 2 mydnorm2lt7 functionx muc0 0 sigmadiag1 2 as numeric12pi squdetsigma ex 712 1x7rnu WA solvesigma WA ximujjj o d dimensional Gaussian 1 7 7 T271 7 PXil 7 We 20 F x F mydnorm lt7 functionx mumalrix0 lenglhgtlt sigmadiag1 lenglhx d lt7 englhx as numeric12piAd2 squdetsigma ma2 1x4mu WA solvesigma WA ximujjj

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.