### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Machine Learning CS 545

CSU

GPA 3.51

### View Full Document

## 45

## 0

## Popular in Course

## Popular in ComputerScienence

This 66 page Class Notes was uploaded by Betty Kertzmann on Tuesday September 22, 2015. The Class Notes belongs to CS 545 at Colorado State University taught by Charles Anderson in Fall. Since its upload, it has received 45 views. For similar materials see /class/210186/cs-545-colorado-state-university in ComputerScienence at Colorado State University.

## Reviews for Machine Learning

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/22/15

C5545 Linear Models Nonlinear Inputs Probabilistic Chuck Anderson Department of Computer Science Colorado State University Fall 2009 Outline Bayesian Regression Derivation Application to Auto MPG Data Application to 1 D Data Bayesian Regression o The full Bayesian approach to regression does not solve for a single w value Instead an expression for the probability of a model given the data is formulated Then a sum is taken over all possible models of the prediction value for a given model weighted by the probability of that model Bayesian Regression o The full Bayesian approach to regression does not solve for a single w value Instead an expression for the probability of a model given the data is formulated Then a sum is taken over all possible models of the prediction value for a given model weighted by the probability of that model 0 So ptnlxnXT ptnmodellxnXT dmodel ptnlmodelxnX7 T pmodellXT dmodel Bayesian Regression o The full Bayesian approach to regression does not solve for a single w value Instead an expression for the probability of a model given the data is formulated Then a sum is taken over all possible models of the prediction value for a given model weighted by the probability of that model 0 So ptnlxnXT ptnmodellxnXT dmodel ptnlmodelxnX7 T pmodellXT dmodel 0 As before let39s choose the model to be yxr7 w xnw l e where E N NO 1 With this choice Pfnlm0del7Xn7 X7T ptnlW7 7Xn7 X7135 Nfnl XnW7 B71 o Bayes Theorem tells us pmodeX7 T X pTX7 modelpmode PW 7 X7T7 0lt PT W7 137 X7 BMW o Bayes Theorem tells us pmodellX7 T X pTlX7 modelpmodel PlWl 7 XiTi 0lt PlTlW7 137 X7 BMW 0 We will model the data likelihood function as a Gaussian PTlW7 995 NTl W7 571 N HNtnl xnWv il n1 o Bayes Theorem tells us pmodellX7 T X pTlX7 modelpmodel PlWl 7 XiTi 0lt PlTlW7 137 X7 BMW 9 We will model the data likelihood function as a Gaussian PTlW7 995 NTl W7 571 N HNtnl xnWv il n1 0 Again choose the prior distribution of the weights to be the zero mean Gaussian pwla Nwl0ofll 0 Now the original equation becomes ptnixnXT N Nfni xnw75 1HNfni XnW7 1 n1 Nwi0oflldw 0 Now the original equation becomes ptnlxnXT N NirniwxniwiVli NirniwxniwiB l n1 Nwl0oflldw Let39s work on the last two terms in the integral They involve products of Gaussians which involve products of exponential terms with base e These products are formed by adding the exponents so let39s focus on the value of the exponent of e The sum we get is 0 N 1 BZtni Xnw2 wTalw n1 0 Continuing working with the sum we get N 50 20 7 ltigtXnw2 wTaw n1 1 N N 1 N 1 2 2 T E ztn Zin xnw 7 EZ Xnw 7 5w aw n1 n1 n1 TTTJr l Tdmi 5WT T W7 wraw 7w75 7 aw wT6 TT 7 gTTT 0 Continuing working with the sum we get N 2 T 25 2 XnW W aW 1 N N 1 N 1 7 77 2 7 7 2 7 7 T 7 213ng grnwxuw zgwxuw 2w aw 75T7T 5TT w 7 wT T w 7 wTaw 7w75 7 aw wT TT 7 BTTT 0 Notice the terms are grouped by ones quadratic and linear in w This allows us to figure out what the Gaussian distribution is After all a product of two Gaussians must be another Gaussian Reverse engineering a Gaussian 0 Look at expression 271 in Bishop39s book and the text around it 7XiuTZ 1Xiu 1 7 EXTZAX XTZAp const Reverse engineering a Gaussian 0 Look at expression 271 in Bishop39s book and the text around it 1 750 MT 1X7 M 1 7 EXTZAX XTZAp const 0 Comparing with our previous expression i mechcb alw wT TT 7 TTT Reverse engineering a Gaussian 0 Look at expression 271 in Bishop39s book and the text around it 75x7 mew u 7 xTiilx XTZAp const 0 Comparing with our previous expression gwnmw anw WTmTT e mTT 0 we see that 2 1 3 T a 9 Unfortunately with this definition the linear terms do not match To make them match we can introduce Z71 I at the right place so wT TT becomes wTiili d TT 9 Unfortunately with this definition the linear terms do not match To make them match we can introduce Z71 I at the right place so wT TT becomes wTiili d TT 0 making the full exponent i mechtb aiw mechcb al T airlm n TTT 9 Unfortunately with this definition the linear terms do not match To make them match we can introduce Z71 I at the right place so wT TT becomes wTiili d TT 0 making the full exponent i mechcb alw wT T aIB T airlm n 1 T EBT T 0 Now we can identify the mean u 52 TT o Renaming the mean and covariance matrix of this distribution of w to be 5W mch arl U39W ll BSW TT7 o Renaming the mean and covariance matrix of this distribution of w to be 5W mch arl U39W ll BSW TT7 0 We have now identified the resulting Gaussian to be NPW7 SW Our book uses variable names uN and 5N for In and SW o The initial integral pirnixnx Nrni xnw 1 N H Ntni xnw7371NWi07ailldw n1 is now ptnix7XT Ntni xnw51wwi w75Wdw Sheesh Another Gaussian Product 0 To deal with the integral of the product of these two Gaussians we will first form thejoint distribution of t and W and develop an expression for the mean and covariance matrix of thejoint distribution First form a vector containing both tn and w and call it 2 no Sheesh Another Gaussian Product 0 To deal with the integral of the product of these two Gaussians we will first form thejoint distribution of t and W and develop an expression for the mean and covariance matrix of thejoint distribution First form a vector containing both tn and w and call it 2 no 9 Since tn and w are independent pz ptnpw and when we multiply these two Gaussians wejust add their exponents o prnpw is gm mam 7 wwwmwaw 1 T 71 T 71 1 71 75w 5W ww 5W WigWSW uw 1 tn T B xn tn 7 w xn Xn XnT7v1 w tquot T O const W sszl W a At this point we have followed the derivation in Section 233 up to Equation 2106 Continuing this derivation would result in meaning I don39t have slides for this determining that pt nlxX7 T 39tnl xnw7 Bil39wlu7 5Wdw Ntnlm7 5 where Mt WMw 1 st B l XTSW X39 a At this point we have followed the derivation in Section 233 up to Equation 2106 Continuing this derivation would result in meaning I don39t have slides for this determining that pirnixixin Nrni xnw731mwvw7 swidwi lenl tv 5t where m WMW st g xTw x 0 To use this result we build our model by first calculating pm and SW given some training data X and a At this point we have followed the derivation in Section 233 up to Equation 2106 Continuing this derivation would result in meaning I don39t have slides for this determining that pt nlxX7 T 39tnl xnw7 B INwluw 5Wdw NU nlMh 5 where m WMw 1 St B xTSW x 0 To use this result we build our model by first calculating pm and SW given some training data X and 0 Then for every data sample x we predict the target value tpredided xuw an expression very much like our frequentist solution In addition the Bayesian approach gives us the variance Sr of the prediction Application to Auto MPG Data 9 Calculate pm and SW after choosing some hyperparameters 5W mch arl MW u BswchT gt beta lt 10 gt alpha lt7 10 gt M lt gt SW beta 101 gtlt1 alpha diag1 M1 lt7 solve gt mu w lt7 beta Sw 101 r T 0 Then for the test data calculate the predicted MPG distribution parameters Mt WPW st g xTw x gt pred mu lt7 Xtestl r mu w gt pred var lt7 co gt for i m 1nrowXLesL1 x lt7 XLesL1inropFALSE pred var lt7 Cpred var lbela x s w 19 gt for i m 110 calpred mui squpred vari n 3 88899 0 2 22685 1 0 14 79652 7 0 2033407 8 29176 7 0 2082009 22 83277 7 0 2029982 25 06364 7 0 2018831 29 37667 7 0 2036604 0 Then for the test data calculate the predicted MPG distribution parameters Mt WPW st l XTSW X39 pred mu lt7 Xtestl r mu w pred var lt7 0 for i m 1nrowXLesL1 x lt7 XLesL1inropFALSE pred var lt7 Cpred var lbela x r S w r lx gt for i m 110 mum mui squpred vari n 13 88899 7 0 2022685 14 79652 7 0 2033407 gt gt gt 8 29176 7 0 2082009 22 83277 7 0 2029982 25 06364 7 0 2018831 29 37667 7 0 2036604 0 Will calculate ST in matrix form a few slides later Now something a little easier to understand 0 Let39s try fitting a sine curve using radial basis functions First generate the X values and target values Xall lt7 malrixc0 4 0 6 0 1 runif50 nRBFs 7 9 rbeenLers lt7 sequringepyxrangepyiennRBFs XallRBF lt7 mnze1Doltaii cemem bfcmm wmm brwim beta lt7 Assume We know beta for target distribution Tall lt7 mauixsin2pirgtltaHjmormmowgtltauDamnbeta Now something a little easier to understand 0 Let39s try fitting a sine curve using radial basis functions First generate the X values and target values Xall lt7 matrixc0 40 601 runir50 nRBFs lt7 9 mfwmh lt range lt7 r rbeenLers lt7 seqxrnge1xrange2lennRBFs XallRBF lt7 mnze1Doltaii centersrbeenLers wmm brwim beta lt7 Assume We know beta for target distribution Tall lt7 malrixsin2tpitXalljmormnrowXall0squ1bela 0 Then the test data quotTest lt7 100 Xtest lt7 matrixseq01lennTesL XLesLRBF lt7 rbfizelDO esL centersrbeenLers widthsrbfidth 19 H mm Pr 71 o What do the RBFs look like magma awh malplolXLesLXLesLR BF Lypei 19 H mm Pr 71 o What do the RBFs look like magma awh malplolXLesLXLesLR BF Lypei 0 Now let39s perform Bayesian Regression to form a model 5W mch arl MW b BW TT Mt WMw 1 T st 3 l sw x39 Remember the last two equations are for a single sample x but we want the R code to handle multiple samples makeBayesReg lt7 functionoa Ti alpha beta gtlt1 lt7 cbindnm e ncoX1 w lt7 solvebela e tgtlt1 e X1 alpha e diag1M T mu w lt7 beta e Sw e tgtlt1 e list mu w mu w s w s w alpha alpha beta beta useBayesReg lt7 mnctionmodei X gtlt1 lt7 cbindnm mu L lt7 X1r model mu w S t lt7 1model5beta rowSumsX1t modemva r X1 appendlistmean mu L variance S 1 model Now use these for different sizes of training sets to see how this affects the results We are generating Figure 38 in Bishop39s teXt aipha lt7 o 1 other precision parameter We need def par lt7 parmarc2 2 3 1 layoulmaLrixc 123A 2 byrowTRUE Lextcoiorred read 7ayout for quotTrain in c1 2 4 25 Xtrain lt7 Xaii1nTraianropFALSE x ra L inRBF lt7 XaiiRBF1nTraianropFALSE Ttrain lt7 Taii1nTraianropFALSE model lt7 makeBayesRegXLrainRBF TLrain aipha bela predictions lt7 useBayesRegmodel XtesLRBFj plot BayesRegression1Dmodel XLrain Ttrain XLesL predictions HRBFS ib Vidtii pardef par r 19 H mm F rl 39l a lll 0 That plot function is below Uses polygon Read examples at end of polygon plot BayesRegresslonlD lt7 nclion model XLram TLram XLesLpredlctlons nRBFs rb Vldth upper lt7 predlcmonsSmean squpredlctlonsvarlanee lower lt7 predlctlons mean 7 squpredlctlonsvarlanee See demograp1Es to see how to draw shaded regon plolcXLesL revXLesL Cupper rev owerjj Lype n xlab x ylab L H yl lmc 72 polygoncXLeerevXLes j cupper revlower col plnllt bombNA 39 co n lines Xtest predlctlonsimean coli red lines XLesLsin2tpltXL stj co green Lille pasle nRBFs nRBst r ledL rbledLh H quotTram lengthomamjmn beta delibela mo alpha xmodelsalplajj Green line is target curve Red line is model output mean nREFs 9 rbfWidm beta 25 alph 05 quotTrain 1 nREFs 9 rbfWidth 005 quotTrain 2 01 beta25 alpha 01 1 1 Vi 1 1 F 7 1 1 1 1 1 F an 12 no an 12 no ua 1n nREFs 9 rbfWidm 05 quotTrain 4 nREFs 9 rbfWidm 005 quotTrain 25 beta25 alpha 01 b a25 alpha C5545 Linear Models for Classification Chuck Anderson Department of Computer Science Colorado State University Fall 2009 Outline Linear Least Squares for Classification Indicator Variables Masking Problem Example a To classify a sample as being a member of 1 of 3 different classes we could use integers 1 2 and 3 as target outputs 3 Linear Model 6 Class 2 1 a To classify a sample as being a member of 1 of 3 different classes we could use integers 1 2 and 3 as target outputs 3 Linear Model 6 Class 2 1 9 Linear function of X seems to match data fairly well Why is this not a good idea c We must convert the continuous y aXis value to discrete integers 1 2 or 3 Without adding more parameters we are forced to use the general solution of splitting at 15 and 25 c We must convert the continuous y aXis value to discrete integers 1 2 or 3 Without adding more parameters we are forced to use the general solution of splitting at 15 and 25 a We must convert the continuous y aXis value to discrete integers 1 2 or 3 Without adding more parameters we are forced to use the general solution of splitting at 15 and 25 o Rats Boundaries are not where we want them Indicator Variables 0 To allow flexibility we need to decouple the modeling of the boundaries Problem is due to using one value to represent all classes Indicator Variables 0 To allow flexibility we need to decouple the modeling of the boundaries Problem is due to using one value to represent all classes 0 Instead let39s use three values one for each class Binary valued variables are adequate Class 1 100 Class 2 010 and Class 3 001 Our linear model has three outputs now How do we interpret the output for a new sample Indicator Variables 0 To allow flexibility we need to decouple the modeling of the boundaries Problem is due to using one value to represent all classes 0 Instead let39s use three values one for each class Binary valued variables are adequate Class 1 100 Class 2 010 and Class 3 001 Our linear model has three outputs now How do we interpret the output for a new sample a Let the output be y y1y2y3 Convert these values to a class by picking the maximum value class argmaxy i Indicator Variables 0 To allow flexibility we need to decouple the modeling of the boundaries Problem is due to using one value to represent all classes 0 Instead let39s use three values one for each class Binary valued variables are adequate Class 1 100 Class 2 010 and Class 3 001 Our linear model has three outputs now How do we interpret the output for a new sample a Let the output be y y1y2y3 Convert these values to a class by picking the maximum value class argmaxy i 9 Ta rgets 0 Can plot the three output components on three separate graphs What linear functions will each one learn 1 m 1 mammn 1 new master ndrcatnr nmcamr vanaue vanabre vanabre 1 2 3 u ammnnum u m nn H magnumng 0 Can plot the three output components on three separate graphs What linear functions will each one learn 1 m 1 mammn 1 new master ndrcatnr nmcamr vanaue vanabre vanabre 1 2 3 u ammnnum u m nn H magnumng 0 Can plot the three output components on three separate graphs What linear functions will each one learn mammn 1 an a lndicatur variable 3 1 mm mm variable variable 1 2 x o Overlay them to see which one is the maximum for each X value 0 Can plot the three output components on three separate graphs What linear functions will each one learn mammn 1 an a lndicatur variable 3 1 mm mm variable variable 1 2 x o Overlay them to see which one is the maximum for each X value 0 Can plot the three output components on three separate graphs What linear functions will each one learn mammn 1 an a lndicatur variable 3 1 mm mm variable variable 1 2 x o Overlay them to see which one is the maximum for each X value 0 See any potential problems 0 See any potential problems 0000 a 000000 00 V o What if the green line is too low 0000 0000000 V 0 See any potential problems 0000 a 000000 00 V o What if the green line is too low 0000 0000000 V o What could cause this 0 Too few samples from Class 2 1 1 mmcatur vaname 1 u nun 1 no 0 ndmatur mdmatur vaname vaname 2 3 H mm monm H mm mama X 0 Too few samples from Class 2 1 1 1 W D lndlcatur lndlcatur lndlcatur vanable vanable vanable 1 2 3 u 1 new acouan 1 mm mama X X c There may be no values of X for which the second output yg of our linear model is larger than the other two Class 2 has become masked by the other classes 0 Too few samples from Class 2 1 1 1 W D indicatur indicatur indicatur variable variable variable 1 2 3 u 1 new acouan 1 mm mama X X c There may be no values of X for which the second output yg of our linear model is larger than the other two Class 2 has become masked by the other classes a What other shape of function response would work better for this data Hold that thought while we try an example Application Parkinsons Data Set from UCI ML Archive 0 147 samples from subjects with Parkinsons 48 samples from healthy subjects Application Parkinsons Data Set from UCI ML Archive 0 147 samples from subjects with Parkinsons 48 samples from healthy subjects a Each sample composed of 21 numerical features extracted from voice recordings Application Parkinsons Data Set from UCI ML Archive 0 147 samples from subjects with Parkinsons 48 samples from healthy subjects a Each sample composed of 21 numerical features extracted from voice recordings 0 from collaboration with the University of Oxford and the National Center for Voice and Speech in Denver Read and Prepare the Data data lt7 read lable parkmsons data headerTRUE sep remove name an m ks num m 1 t data lt7 data samplenrowdala statuss 0 for eathy 1 fo Parkmama status lt7 data status remove statui Eoumn from data data lt7 dalaL iwhichconamesdata status 0 dataHeaMy lt7 datastatus dataParks lt7 dalaslalusiil nHeaMy lt7 nrowdataHeathy quotParks lt7 nrowdataParks Read and Prepare the Data 0 data lt7 read lable parkmsons data headerTRUE sep remove name an m ks num m 1 t ow dalajj statuss 0 for eathy 1 fo status lt7 data status remove status Eoumn from data data lt7 dalaL iwhichconamesdala ParkMons status 0 daLaHeaMy lt7 dalaslalus daLaParks lt7 dalaslalusiil nHeaMy lt7 nrowdaLaHeaLhy quotParks lt7 nrowdaLaParks Look at the data gt dala12 MDVP F0 HZ MDVP Fm HZ MDVP F10 HZ MDVP Jwtter MDVP J LLerAbs 1 223 361 26 7 7 8 0 00352 26705 2 136 926 9 866 131276 0 00293 2e705 MDVP RAP MDVP PPQ my DDP MDVP Shwmmer MDVP thmmerdB ShwmmerAPQB 1 0 00169 0 0188 0 22 0 01379 2 00118 0 00153 0 00355 0 01259 0112 0 00656 ShwmmerAPQS MDVP APQ Shwmmer DDA NHR HNR RPDE DFA 1 0 01478 0 01909 0 04137 0 01493 20 366 0 566849 0 574282 2 0 00717 0 01140 0 01968 0 00581 25 703 0460600 0 646846 spreadl spread 1 754568110 345238 2 840556 0 232861 2 76 547148 0152813 2 041277 0138512

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.