This 3 page Class Notes was uploaded by Jacinto Carter Sr. on Thursday October 29, 2015. The Class Notes belongs to STA 6973 at University of Texas at San Antonio taught by Victor Oliveira in Fall.


Date Created: 10/29/15
HANDOUT 7 Model Selection This handout illustrates some methods for model selection AlC7 BIG and cross validation Cross Validation Consider again the spatial rainfall data set darWinWdat We entertain four models for this data set The models have all a constant mean function and no nugget7 but they di er in the semivariogram function Model 1 Power exponential model With 62 1 kappa in geoR Model 2 Power exponential model With 62 15 Model 3 Matern model With 62 1 Model 4 Rational quadratic With 62 2 gt librarygeoR A possible criterion to select model is based how well each semivariogram model fits the empirical semivariogram measured by the weighted residual sum of squares evaluated at the estimated values gt darwinwls1 lt variofitdarwinmom9 covmodel quotexpquot ini c80010 fixnugget T nugget O weights quotcressiequot Model 1 gt darwinwls4 lt variofitdarwinmom9 covmodel quotpoweredexpquot ini c80010 fixnugget T nugget O weights quotcressiequot kappa 15 Model 2 gt darwinwls5 lt variofitdarwinmom9 covmodel quotmaternquot ini c1005 fixnugget T nugget O weights quotcressiequot kappa 1 Model 3 gt darwinwls6 lt variofitdarwinmom9 covmodel quotcauchyquot ini c1001 fixnugget T nugget O weights quotcressiequot kappa 2 Model 4 gt gt darwinwls1value darwinwls4value darwinwls5value darwinwls6value 1 7485468 1 4564043 1 4658527 1 3461156 According to this criterion Model 4 provides the best fit VVVVVVV plotdarwinmom9 Xlab quotdistance kmquot ylab quotsemivariogram mm 2quot main quotSeveral models fitted to the darwin data by WLSquot linesdarwinwls1 linesdarwin wls4 lty 2 linesdarwin wls5 lty 3 linesdarwin wls6 lty 4 legend1400cquotmodelquot quotpower exp theta2 1quot quotpower exp theta2 15quot quotmatern theta2 1quot quotrational quadratic theta2 2quot lty 04 bty quotnquot dev printfile quotfig7 1 psquot Several models fitted to the darwin data by WLS model power exp theta 1 power exp theta 1 5 maternlhela 2 1 r rational quadTauc theta 2 sem var ogram mm 100 l distance km Figure 1 Alternative criteria to select models include AIC and BIC These criteria are more satisfactory than the previous one because these are likelihood based and include a penalty for model complexity so in general these criteria avoid over fitting gt darwinml1 lt likfitdarwingdat covmodel quotexpquot ini c80010 fixnugget T nugget 0 methodlik quotMLquot gt darwinml2 lt likfitdarwingdat covmodel quotpoweredexpquot kappa 15 ini c80010 fixnugget T nugget 0 methodlik quotMLquot gt darwinm13 lt likfitdarwingdat covmodel quotmaternquot ini c1005 kappa 1 fixnugget T nugget 0 methodlik quotMLquot gt darwinml4 lt likfitdarwingdat covmodel quotcauchyquot ini c1001 kappa 2 fixnugget T nugget 00000001 methodlik quotMLquot gt V darwinml1AIC darwinml2AIC darwinm13AIC darwinml4AIC 1 1815812 1 1787415 1 1791823 1 1800809 According to AIC the model that fits the data best is Model 2 gt darwinml1BIC darwinml2BIC darwinm13BIC darwinml4BIC 1 1851154 1 1822757 1 1827164 1 1836150 Model 2 is also the best according to the BIC criterion as it has to be since the competing models have all the same number of parameters Yet another criterion to select models is based on cross validation This criterion is quite appealing in the current context where the selected model is used for prediction gt darwincv1 lt xvaliddarwingdat model darwinwls1 gt darwincv4 lt xvaliddarwingdat model darwinwls4 gt darwincv5 lt xvaliddarwingdat model darwinwls5 gt darwincv6 lt xvaliddarwingdat model darwinwls6


