### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 239 Class Note for STAT 51200 with Professor Gu at Purdue

### View Full Document

## 14

## 0

## Popular in Course

## Popular in Department

This 13 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 14 views.

## Similar to Course at Purdue

## Popular in Subject

## Reviews for 239 Class Note for STAT 51200 with Professor Gu at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

STAT 512 Multiple Linear Regression l Multiple Regression Models Examplesl 0 Consider multiple predictors X17 X27 7X1771 A multiple linear regression model is given by Y 50 iXi p71Xpi1 e Sl 1 o For a single predictor X7 a quadratic polynomial regression model is 1 e given by Y o iX zX2E o For two predictors X1 and X27 a regression model with an interaction term is given by Y 60 iXi 52X2 aXin E and a quadratic response surface model is given by Y 60 6le zXz aXle 64X12 65X e k Multiple Regression Models GeneralI The examples above are all of the form Y50 lf1ltXquot39 p71fp1X67 where fj X7s are known functions of the predictors X and fs Slide 2 are to be estimated A linear model is linear in the unknown coe icients fs Generally a linear regression model to be estimated using observations Yi X131 XZ7p1 can be written as Y 50 51X 39 39 39 pilXi7pil 6239 where va is the ith reading of the jth numerical predictor k C Gu Spring 2009 STAT 512 Multiple Linear Regression 2 Linear Regression in Matrix TermsI Putting Yi g lXiJ p71XZ7p71 6139 in matrix terms Y1 l X11 X14771 60 61 Y 1 Xml XMH 571 en Slide 3 or more concrsely Y X e WhereYisngtlt1Xisngtltp ispgtltlandeisngtltl Under the model assumptions E6 07 EY X 02Y 02e 02 LS Estimation Fitted Values Residualsl The LS criterion can be written as Q Y e X WY e Xm Taking derivatives Wrt and set to 0 the LS estimates satisfy XXb XY or b XX 1XY Slide 4 The fitted values are Y Xb XXX 1XY HY and the residuals are eY7YIiHY The variance is estimated by MSE SSEn 7 p Where SSE ee YI i HY k C Gu Spring 2009 STAT 512 Multiple Linear Regression 3 Analysis of Variance Decompose the deviation of Y from Y nirlt irltn7 lgt where 7 Y is systematic and 7 is random It can be shown that Slide 5 2m 7 Y 7 22da 7 Y 206 7 W SSTOn71SSRp7lSSE 1171 The ANOVA table summarizes related information Source SS df MS F SSR MSR Model SSR p 7 l E W Error SSE n 7 1 k Total SSTO n 7 l FTest for 51 5191 It can be shown that EMSR a2 51 pii 0 gt 02 otherwise When l lkl 0 one has MSR F m N These lead to the F test for H0151 5171 0 vs Ha otherwise which rejects H0 when Fquot gt F17a7p717n7p C Gu Spring 2009 STAT 512 Multiple Linear Regression 4 Inferences Concerning jl The same as for SLR Eb 0290 02XX Where the latter is to be estimated by Sl39d 7 1 e 52b MSEXX 1 The inferences concerning j are based on 5339 5339 1501 7 p 843 For example a l 7 a100 Cl for Q is bj i t17a2n7psbj Joint CI S Bonferroni Method Suppose n 23 p 4 and To estimate several parameters jointly one often wants the joint b 7121 b 30 1 7 s 1 7 con dence coe icient to be b2 739677 3b2 39827 above a certain level say b3 7861 sb3 1224 Slide s P 6 bj i Bsbj7 The joint 9577 Cl s for 61 62 and g by the Bonferroni method are given by The Bonferroni method uses j12217oz 61 e 7121 i 262530 B tl aZg 1 7 Where g is the number of parame z e 767i 262582 63 e 7861 i 26250224 ters involved in the estimation 0 The actual conf coef is usually Where 7517052319 2625 greater than the nominal 1 7 Oz C Gu Spring 2009 STAT 512 Multiple Linear Regression 5 Inferences Concerning EYhXh Xz l To estimate the mean response at Xh 1Xh71Xh7p1 EthXh 50 mm gnaw1 x23 use the unbiased estirnate Slide 9 Yb b0 lehJ bHXW1 Xhb It can be shown that Yb NX1 02lAh where 02 02xXX1xh Estimate 02Yh by 52Yh MSEX1XX 1Xh and as usual the inferences are based on k MN garh 1501 p Prediction of New Observation at Xhl To predict a new response at Xh Yh 50 iXh1 pithhpil 6 Xh 6 one has to allow for the variability of 6 Slide 10 With Xh estimated by Xhb and 02 estimated by MSE a l 7 a100 prediction interval is given by Yb i tliaZ nipiMSE 323 or more explicitly Yb i gig27M MSE1 X2XX 1Xh k C Gu Spring 2009 STAT 512 Multiple Linear Regression 6 Extra Sums of Squares A Q Consider potential predictors X1 gt2 gt3 3 X2 and X3 Denote by SSEX1 s s 5 Q3 the SSE from tting the model m gg g g Q Y6061X1E7 g g N a by SSEX1X2 the SSE from t m m A Q Q V3 Slide 11 A A 5 E G in NE 3 ting the model e e e5 gtlt gtlt gtlt a Y o61X1 zX2e 3 3 3 m a E E gtlt L11 L11 L11 3 v2 m m 3 etc For Whatever model tted 3 3 3 i w w w gtlt H H H 5 E SSTO SSR SSE A A A v1 v 1 H V g 5 gt3 g g g g g As predictors are being added 5 5 5 79 g g g t to the model SSE is being re g g g 93 Cl H H duced The amount of SSE reduc m a a g tion achieved through such a pro k V3 V3 Z V3 cess is an extra sum of squares TypeI and TypeIII SS A A A When regression models are tted using the g 5 5 gt2 SAS procedure PROC GLM two types of 5 3 gt3 gt3 gt3 extra SS are automatically computed 03 a i E K g HT 5 5 5 The t e l SS assume a se uential addi E m m m yp q Slide 12 3 tiondeletion scenario so are sensitive to 2 Q the ordering of the predictors They are use E H G g 5 ful for testing hypotheses involving several E g 5 E N parameters such as H0 g g 0 m gtgt 31 V m g H g g g The type lll SS drops one predictor While 3 3 g keeping the rest predictors in the model ET H m They are useful for testing the necessity of 5 H N N N C N indiVidual terms given everything else C Gu Spring 2009 STAT 512 Multiple Linear Regression 7 Tests Based on Extra 8839 To test the hypotheses Consider the model HOCBJ390 VS Ha3 j7 07 Y5051X152X253X3 7 one may calculate and test the hypotheses 75 bi 39 H0 g g0 vs Haow 3bj slide 13 If ltl gt Mia2717 reject H0 Using the general linear test7 one i i calculates Us1ng the general linear test7 one F SSRltX27X3 X12 calculates W39 w If F gt Flia uizh reject H0 F SSEUUNnip If Fquot gt Flimlmip reject H0 SSRX2X3X1 is simply the sum Note that SSER 7 SSEF here of the type l SS of X2 and X3 is Simply the type HI SS Of Xj39 0 Put suspect terms at the end7 and o It can be shown that FquotF 752 add type l SS from bottom up Testing General Linear Hypothesesl When H0 is not of the form j k 0 one has to modify the data to perform the test using SAS Consider the model Consider the model Y5051X152X2 7 Y o iX1 zX2 aX3E Sllde 14 and test the hypotheses and test the hypotheses 106152 VS 117113517562 Ho613 g5 vs Haow Rewrite the model as Rewrite the model as Y6061X1X2 y o fX162X2 XgE7 52 51X2 where Y Y73X175X3 f o61X16X2e 6173 and 5 5375 The The hypotheses becomes hypotheses becomes H0 0 vs Ha 7 0 H026f 0 vs Haow k C Gu Spring 2009 Slide 15 Slide 16 STAT 512 Multiple Linear Regression 8 r2 ssmxllxz Y12 7 set of common variables k explained by some extra predictor Coef cient of Partial DeterminationCorrelation The coef cient of determination R2 proportion of total variation explained by the model The coef cient of partial determination quanti es the proportion of the residual variation in a existing model that is Y213 The coef cient of partial correlation measures the linear association between Y and some X j after detrending against a 73213 Y YX17X37X2 X2X17X3 SSR i m quantifies the SSRX2lX1 X3 SSEX1 X3 Consider the model Y o iX1 zX2E lf X1 and X2 are uncorrelated then SSRX2X1 z SSRX2 lf X1 and X2 are perfectly corre lated ie X2 aX1 b then SSRX2X1 0 Multicollinearity The same in formation is carried in more than one predictors With multicollinearity the regres sion coef cients j s are not as identi able so are not as estimable Multicollinearity One can get good t and reli able inferences concerning the response within the data range Individual coe i cients are less estimable or in regression terpret able Some coef cient estimate say b1 may vary a lot when other terms are includedexcluded sb1 then should be large in the bigger model One may see 7321 gtgt 732127 or SSRX1gtgt SSRX1X2 Gu Spring 2009 STAT 512 Multiple Linear Regression 9 All Subsets Regression ln SAS assignment 57 we Given predictors X17 Xp17 one can ran all subsets regression on P 1 look at all 2 7 7 l pos51ble models the patient satisfaction data of Prob 6157 and collected With the same number of predictors7 R2 and R2 for all 23 7 1 7 pick the best tting model With dif Slide 17 models ferent number of predictors7 use model selection criteria N 12 a R2 criterion Stop adding variables when R2 saturates RH 05 R2 criterion Select a small model with R close to maximum7 where I34 25 2 7 7171 SSE Ra 1 7 nip SSTO l 2 3 4 5 7 1 7 SSTOn7139 k Model Selection using GP or PRESSI ln SAS assignment 57 we as ran all subsets re ression on g Op criterion Calculate SSE OP MSEX1XP1 n 2P the patient satisfaction data of Prob 6157 and collected GP for all the 7 models Select a model with small 0 and Slide 18 a 7 Op z p 3 PRESS criterion Calculate 15 PRESS 210 7 i27 0 m where K is the prediction of Y based on the other 717 1 data points Select a small model with PRESS close to the minimum k C Gu Spring 2009 Slide 19 Slide 20 STAT 512 Multiple Linear Regression 10 Forward regression 1 Start with no variable in 2 With X1 Xj 1 in the model7 search among the Pij still out and pick the largest SSRXkX1Xj 1 Call the variable Xj 3 If Xj passes a F test7 add it to the model and go back to 2 If it fails the test7 stop 0 The full model for the F test is X1Xj 1Xj k Forward Backward Regression I Backward regression 1 Start with all P7 1 variables in the model 2 Search among all variables still in the model and pick the smallest type Ill SS Call the variable Xquot 3 If Xquot fails the F test7 drop it from the model and go back to 2 If it passes the test7 stop 0 The full model for the F test keeps shrinking Stepwise regression 1 Start with no variable in the model 2 Add variables as in forward re gression After each addition7 drop insigni cant ones as in backward regression 3 When all that are in can t be dropped7 and all that are out can t be added7 stop 0 The full model for the F tests is constantly changing k C Gu Stepwise Regression o The F tests are controlled by signi cance SLE and SLS usually SLE Z SLS levels 0 A k variable model selected by stepwise regression may not be the best k variable model 0 There is no best model and don t leave the robot in charge Use the techniques to narrow down choices7 use subject area Make sure the selected model is m knowledge to select terpre table Spring 2009 Slide 21 Slide 22 STAT 512 Multiple Linear Regression ll Data of Prob 615 94y 23 10 A10 10 94y 13 A10 Partial Regression Plots Consider predictors X17 X27 and X3 To assess the net effect of X2 given X17 X3 in the model7 one may use the partial regression plot7 which plots Y 7 YX1Xg versus X2 7 X2X1 X3 Fitting a SLR to the partial regres sion plot of X27 the intercept is 0 and the slope is b2 0 The effect shown is net of linear dependency on X1 and X3 0 These plots may help in spotting problems7 but rarely suggest xes To identify Youtlyers7 use the deleted residuals di Yi i7 where Yim is the prediction at Xi based on the other 71 7 l observa tions It can be shown that 2 6i 039 di 1 7 hm7 02di 1 7 hm7 where hi2 is the ith diagonal of the hat matrix H XX X 1X Studentizing the di s7 one has 7 di 7 5i 7 T MSEi1 711307 where MSEW is based on the other 7 l observations ti Gu Studentized Deleted Residualsl When all are well ti tn7pil One may compare maxi with tliaZn nipil to formally test for outlyers7 but ti s are more often used as informal diagnostics 0 Note that there is no contribu tion from Yi in K The delete one operation is closely related to jac kkm39fe and cross validatio n 0 One may calculate ti using nip7 1 ti 2 5 SSE1 7 hu 7 522 Spring 2009 STAT 512 Multiple Linear Regression l2 Leverage Values 30 simulated points Xi l Xi Z To identify X outlyers use the di are plotted below The points agona 1 Of the hat matrlx H with gt 2 are highlighted XX X X 7 Whlch are known as the leverage values U3 It can be shown that Slide 23 D W i Zlhm n N 39 H it so it pn As a rule of thumb i u D observations with u D DD D hm gt 2g i D n T In i are worth some careful look u To avoid extrapolation one may 4 2 0 2 4 5 check at the location ofinterest Xh x71 X hX X 1Xh k To assess the in uence of To assess the in uence of on Yi one calculates on 6 bj one calculates e i bebi DFFITS DFBETASMi J J 1 hi i39 MSEZCJ39J397 Remember that 02lAi 02h where Cm39 is the jth diagonal of Slide 24 0729 UZH X X 1 02b02X X 1 As a rule of thumb a case is con As a rule of thumb a case is con sidered in uential on its own pre siderecl in uential on the parameter diction if estimate bj if lDFFlTSil gt KDFBETASM gt 2 1271 71 large ZW 71 large 1 ow l ow 7 k 7 C Gu Spring 2009 STAT 512 Multiple Linear Regression l3 To assess the in uence of on the overall t7 one calculates Y Ylt2 gt Y Ym o If an in uential case is of high quality7 it carries more infor DZ mation than the rest7 yielding pMSE i i i 7 XXxbi b more prec1se estimates 1f the model is correct Slide 25 pMSE NODe that 02b 02XX71 o If an in uential case is of so Di is an aggregated version of doulotful quality7 discard it39 DFBETASji 0 One does not need to calcu late n separate ts to obtain DFFITSi7 DFBETASLW7 and Cook s Di Using some As a rule of thumb7 a Di in the range of 8 N l or above is really in uential ln general7 a Di sub algebraic identities7 all can be calculated from the all data t kten indicates an in uential case Variance In ation Factorl stantially greater than the rest of With two predictors X1 and X27 To assess multicollinearity among VIF VIF 1 the predictors7 one calculates the 1 7 2 7 l 7 7 le variance in ation factors7 h X X th 1 W ere 712 r 1 2 e cor HF 7 1 7 R27 relation between X1 and X2 g 39 h Rz tthf d t Sllde 26 With more than two predictors7 W ere J Is 8 0 pre 1C mg 1 Xj us1ng the other predictors VlF 39 Z Vk j J 1 T376 7g As a rule of thumb7 Where We TXj7Xk manVIFj 2 10 The pairwise correlations TM are indicates serious multicollinearity themselves diagnostics for mul prOblem Note that VlFjZl ticollinearity7 but they can miss Large VIFj7s only come in hidden patterns groups R C Gu Spring 2009

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.