Applied Regression Analysis
Applied Regression Analysis STAT 51200
Popular in Course
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
Popular in Statistics
This 13 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 51200 at Purdue University taught by Chong Gu in Fall. Since its upload, it has received 41 views. For similar materials see /class/207936/stat-51200-purdue-university in Statistics at Purdue University.
Reviews for Applied Regression Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/19/15
STAT 512 Multiple Linear Regression l Multiple Regression Models Examples 0 Consider multiple predictors X17 X27 7X1771 A multiple linear regression model is given by Y 5051X1 p71Xpi1e Sl 1 For a single predictor X7 a quadratic polynomial regression model is l 6 given by Y o iX zX2E o For two predictors X1 and X27 a regression model with an interaction term is given by Y 60 iXi 52X2 aXin E and a quadratic response surface model is given by Y 60 6le zXz aXle 64X12 65X e Multiple Regression Models Generali The examples above are all of the form Y 50 51f1X 39 39 39 pilfpilX 67 where fj X7s are known functions of the predictors X and fs Slide 2 are to be estimated A linear model is linear in the unknown coe icients fs Generally a linear regression model to be estimated using observations Yi X131 XZ7p1 can be written as Y 50 51X 39 39 39 pilXi7pil 6239 where va is the ith reading of the jth numerical predictor C Gu Spring 2009 STAT 512 Multiple Linear Regression 2 Linear Regression in Matrix Termsl Putting Yi g lXiJ p71XZ7p71 6139 in matrix terms Y1 l X11 X14771 60 El 39 s s s s s 7 Y 1 Xml XMH 571 en Slide 3 or more concrsely Y X e WhereYisngtlt1Xisngtltp ispgtltlandeisngtltl Under the model assumptions E6 07 EY X 02Y 02e 021 LS Estimation Fitted Values Residualsl The LS criterion can be written as Q Y e X WY e Xm Taking derivatives Wrt and set to 0 the LS estimates satisfy i i 71 Slide4 XXbiXY or biXX The fitted values are Y Xb XXX 1XY HY and the residuals are eY7YIiHY The variance is estimated by MSE SSEn 7 p Where SSE ee YI 7 HY C Gu Spring 2009 STAT 512 Multiple Linear Regression 3 Analysis of Variance Decompose the deviation of Y from Y nirlt irltn7 lgt where 7 Y is systematic and 7 is random It can be shown that Slide 5 2m 7 Y 7 22da 7 Y 206 7 W SSTOn71SSRp7lSSE 1171 The ANOVA table summarizes related information Source SS df MS F SSR MSR Model SSR p 7 l E W 8 Error SSE n 7 1 E Total SSTO n 7 l FTest for 51 p1 0 It can be shown that EiMSR 027 51 p71 07 gt 02 otherwise When l lkl 0 one has MSR N P L W P These lead to the F test for F H0151 5171 0 vs Ha otherwise which rejects H0 when Fquot gt F17a7p717n7p C Gu Spring 2009 Slide 7 Slide 8 STAT 512 Multiple Linear Regression The same as for SLR Eb Inferences Concerning j39 02b 02XX 1 Where the latter is to be estimated by s2b MSEXX 1 The inferences concerning j are based on 5339 5339 843 For example a l 7 a100 Cl for Q is bj i tliaZynipSJEbj 1501 7p Suppose n 23 p 4 and b1 7121 sb1 30 bg 767 Sbg 82 b3 7861 sb3 1224 The joint 9577 Cl s for 61 62 and g by the Bonferroni method are given by 61 E 7121 i 262530 52 E 767i 262582 53 E 7861 i 26251224 Where t10523 1g 2625 lJoint CI S Bonferroni Methodi To estimate several parameters jointly one often wants the joint con dence coe icient to be above a certain level say P6j E bi i Bsbj7 j 1 2 Z 1 7 Oz The Bonferroni method uses B t17cxZgn7p7 Where g is the number of parame ters involved in the estimation 0 The actual conf coef is usually greater than the nominal l 7 Oz Spring 2009 Slide 9 Slide 10 STAT 512 Multiple Linear Regression 5 Inferences Concerning EYhXh Xz 39 7Xhp717 EYh Xh 50 51Xh1 39 39 39 pithmil X237 To estimate the mean response at Xh 1Xh71 use the unbiased estimate Yh be b1Xh1 bpith7pil Xhb It can be shown that Yb NX1 02lAh where 02m 02xXX1xh Estimate 02Yh by 52Yh MSEX1XX 1Xh and as usual the inferences are based on MN galh 1501 p Prediction of New Observation at Xhi To predict a new response at Xh Yh 50 51Xh1 p71Xhp71 6 Xh 6 one has to allow for the variability of 6 With Xh estimated by Xhb and 02 estimated by MSE a l 7 a100 prediction interval is given by Yb i tliaZmipiMSE 323 or more explicitly Yb i gig27M MSE1 X2XX 1Xh Gu Spring 2009 STAT 512 Multiple Linear Regression 6 Extra Sums of Squares A Q Consider potential predictors X1 gt2 gt3 3 X2 and X3 Denote by SSEX1 s s 5 Q3 the SSE from tting the model m gg g g Q Y6061X1E7 g g N a by SSEX1X2 the SSE from t m m A Q Q V3 Slide 11 A A 5 E G in NE 3 ting the model e e e 5 gtlt gtlt gtlt a Y o61X1 zX2e 3 3 3 m a E E gtlt L11 L11 L11 3 v2 m m 3 etc For Whatever model tted 3 3 3 i w w w gtlt H H H H j SSTO SSR SSE A A A g i 2 o 5 g 5 gt3 g g g g g As predictors are being added 5 s 5 79 g g g t to the model SSE is being re g g g 93 Cl H H duced The amount of SSE reduc g a g tion achieved through such a pro V3 Z V3 cess is an extra sum of squares TypeI and TypeIII 8839 A A A When regression models are tted using the g 5 5 gt2 SAS procedure PROC GLM two types of S 3 gt3 gt3 gt3 extra SS are automatically computed 03 a i E K g HT 5 5 5 The t e l SS assume a se uential addi E m m m yp q Slide 12 3 tiondeletion scenario so are sensitive to 2 Q the ordering of the predictors They are use E H G g 5 ful for testing hypotheses involving several E g 5 E 3 parameters such as H0 g g 0 m gtgt 31 V m g H g g g The type lll SS drops one predictor While 3 3 VI keeping the rest predictors in the model Ex N gt03 They are useful for testing the necessity of m5 N individual terms given everything else C Gu Spring 2009 Slide 13 Slide 14 STAT 512 Multiple Linear Regression lTests Based on Extra 8839 To test the hypotheses Hm j 0 vs Ha26j7 0 one may calculate j 7 3bj39 If t gt t17a2 n7p reject H0 Using the general linear test7 one calculates F SSER 7 SSEF139 SSEFn P If Fquot gt Flimlmip reject H0 Note that SSER 7 SSEF here is simply the type lll SS of Xj o It can be shown that FquotF 752 Consider the model Y 50 iXi 52X2 aXa E and test the hypotheses H0352 g0 vs Haow Using the general linear test7 one calculates 7 SSRX2X3X12 SSEX1X27 X3n 7 4 If Fquot gt F1a 2 n4 reject H0 SSRX2X3 X1 is simply the sum of the type l SS of X2 and X3 0 Put suspect terms at the end7 and add type l SS from bottom up Testing General Linear Hypothesesl When H0 is not of the form j k 0 one has to modify the data to perform the test using SAS Consider the model Y 5051X1 52X2 6 and test the hypotheses Hm l 62 vs Ha61 7552 Rewrite the model as Y 60 51X1 X2 62 51X2 E 50 51Xf X2 e The hypotheses becomes H026 0 vs Ha 730 Consider the model Y 5051X152X253X367 and test the hypotheses H0261 37 63 5 vs Haow Rewrite the model as Y 50 iX152X25 X367 Where Yquot Y73X1 75X37 61 737 and 6375 The hypotheses becomes H0261F 6 0 vs Ha ow Spring 2009 Slide 15 Slide 16 STAT 512 Multiple Linear Regression 8 Coef cient of Partial DeterminationCorrelationI The coef cient of determination R2 SSS FRO quanti es the proportion of total variation explained by the model The coef cient of partial determination quanti es the proportion of the residual variation in a existing model that is explained by some extra predictor 2 7 SSRX1lX2 r2 7 SSRX2lX1X3 7 Y12 SSEltXZgt 7 Y213 SSEX1 X3 The coef cient of partial correlation measures the linear association between Y and some X j after detrending against a set of common variables 73213 Y YX17X37X2 X2X17X3 Multicollinearity Consider the model 0 One can get good t and reli Y o iX1 zX2E lf X1 and X2 are uncorrelated then SSRX2X1 z SSRX2 lf X1 and X2 are perfectly corre lated ie X2 aX1 b then SSRX2X1 0 able inferences concerning the response within the data range Individual regression coe i cients are less estimable or in terpret able Some coef cient estimate say Multicollinearity The same in b1 may vary a lot when other formation is carried in more terms are includedexcluded sb1 then should be large in the bigger model than one predictors With multicollinearity the regres sion coef cients j s are not as 0 One may see 7321 gtgt 73212 or SSRX1 gtgt SSRX1lX2 identi able so are not as estimable Gu Spring 2009 Slide 17 Slide 18 STAT 512 Multiple Linear Regression All Subsets Regression ln SAS assignment 57 we ran all subsets regression on the patient satisfaction data of Prob 6157 and collected R2 and R2 for all 237 l 7 models 12 12 a Given predictors X17 Xp17 one can look at all 21371 7 1 possible models With the same number of predictors7 pick the best tting model With dif ferent number of predictors7 use model selection criteria R2 criterion Stop adding variables when R2 saturates R2 criterion Select a small model with R close to maximum7 where R2 7 1 7 7171 SSE a nip ssro 7 1 7 MSE SSTOn7139 Model Selection using GP or PRESS39 ln SAS assignment 57 we ran all subsets regression on the patient satisfaction data of Prob 6157 and collected 01 for all the 7 models 2 3 2 Op criterion Calculate CPW7n72p Select a model with small 01 and GP z p PRESS criterion Calculate PRESS 2106 Vim where K is the prediction of Y based on the other 717 1 data points Select a small model with PRESS close to the minimum Spring 2009 Slide 19 Slide 20 STAT 512 Multiple Linear Regression Forward Backward Regression 39 Forward regression 1 Start with no variable in 2 With X1 Xj 1 in the model7 search among the Pij still out and pick the largest SSRXkX1Xj 1 Call the variable Xj lf Xj passes a F test7 add it to the model and go back to 2 If it fails the test7 stop 0 The full model for the F test is X1Xj 1Xj 9 Backward regression 1 Start with all P7 1 variables in the model to Search among all variables still in the model and pick the smallest type Ill SS Call the variable X If Xquot fails the F test7 drop it from the model and go back to 9 2 If it passes the test7 stop 0 The full model for the F test keeps shrinking Stepwise Regression Stepwise regression 1 Start with no variable in the model 2 Add variables as in forward re gression After each addition7 drop insigni cant ones as in backward regression 9 When all that are in can t be dropped7 and all that are out can t be added7 stop 0 The full model for the F tests is constantly changing o The F tests are controlled by signi cance levels SLE and SLS usually SLE Z SLS o A k variable model selected by stepwise regression may not be the best k variable model There is no best model and don t leave the robot in charge Use the techniques to narrow down choices7 use subject area Make sure the selected model is in knowledge to select terpre table Gu Spring 2009 Slide 21 Slide 22 STAT 512 Multiple Linear Regression Partial Regression Plotsl Data of Prob 615 Consider predictors X17 X27 and X3 To assess the net effect of X2 given X17 X3 in the model7 one may use the partial regression plot7 which plots Y 7 YX1Xg versus X2 7 X2X1 X3 Fitting a SLR to the partial regres sion plot of X27 the intercept is 0 and the slope is b2 0 The effect shown is net of linear dependency on X1 and X3 0 These plots may help in spotting problems7 but rarely suggest xes lStudentized Deleted Residuals To identify Youtlyers7 use the deleted residuals di Yi i7 where Yim is the prediction at Xi based on the other 71 7 l observa tions It can be shown that di 176112 02 170 where hi2 is the ith diagonal of the hat matrix H XX X 1X Studentizing the di s7 one has i 51 ti 7 where MSEW is based on the other 71 7 l observations When all are well ti tn7p7l One may compare maxi with tliaZn nipil to formally test for outlyers7 but ti s are more often used as informal diagnostics 0 Note that there is no contribu tion from Yi in K The delete one operation is closely related to jac kkm39fe and cross validatio n 0 One may calculate ti using 717127 1 ti 2 5 SSE1 7 hu 7 522 Gu Spring 2009 Slide 23 Slide 24 STAT 512 Multiple Linear Regression Leverage Values 30 simulated points Xi1Xi2 are plotted below The points with hm gt 2 are highlighted To identify X outlyers7 use the di agonal hi i of the hat matrix7 H XX X 1X 7 which are known as the leverage values It can be shown that 2211 hm P7 so it pn As a rule of thumb7 observations with hm gt 23 n are worth some careful look To avoid extrapolation7 one may check at the location ofinterest Xh X hX X 1Xh DFFITS DFBETAS To assess the in uence of on Yi one calculates W Remember that 02lAi 021 0729 UZH DFFITS As a rule of thumb7 a case is con sidered in uential on its own pre diction if DFFITSM gt 2 pn l7 ow n large7 To assess the in uence of on j bj one calculates bi b i xMSE1C J 7 where Cm39 is the jth diagonal of X X 1 02b02X X 1 DFBETASMi As a rule of thumb7 a case is con sidered in uential on the parameter estimate bj if DFBETASM gt 2m 17 ow n large7 Spring 2009 Slide 25 Slide 26 STAT 512 Multiple Linear Regression Cook s Distance To assess the in uence of on the overall t7 one calculates YiYmWYinm pMSE W bmWXWXbWM pMSE Note that 02b 72XX 17 so Di is an aggregated version of DFBETASMUy As a rule of thumb7 a Di in the range of 8 N l or above is really in uential ln general7 a Di sub stantially greater than the rest of ten indicates an in uential case o If an in uential case is of high quality7 it carries more infor mation than the rest7 yielding more precise estimates if the model is correct o If an in uential case is of doubtful quality7 discard it One does not need to calcu late n separate ts to obtain DFFITSi7 DFBETASjZ7 and Cook s Di algebraic identities7 all can be calculated from the all data t Using some lVariance In ation Factori With two predictors X1 and X27 1 VlF1 VlF2 2 l 7 r12 where 712 7 X1X27 the cor relation between X1 and X2 With more than two predictors7 VIM 2 Vk 7amp1 l 7 r376 7 where rjk TXjXk The pairwise correlations TM are themselves diagnostics for mul ticollinearity7 but they can miss hidden patterns To assess multicollinearity among the predictors7 one calculates the variance in ation factors7 l VIFj 1 7 R37 where R is the R2 of predicting Xj using the other predictors As a rule of thumb7 manVlFj Z 10 indicates serious multicollinearity problem Note that VlFjZ l 0 Large VlFj s only come in groups Spring 2009
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'