STAT 3A03 Applied Regression With SAS Checking your understanding Last updated: Nov 13, 2019 Q. 1 Explain why residual plots are used to check the linear regression assumptions on the random errors. Q. 2 What information could be gained about the assumptions on by plotting the residuals versus a) Fitted values b) Covariates c) Time Give brief examples. Q. 3 Explain the difference between outliers, leverage and influential points. Q. 4 Why do we remove multiple observations and refit the model to check for influence after using plots such as Cook’s Distance, DFFITS, and DFBETAS? Q. 5 What are some reasons you might choose to remove an outlier and/or influential point from your analysis? Q. 6 Consider a categorical variable x with three levels labelled 1, 2, and 3. From class we know we can include this variable as a covariate in a linear regression by using two dummy variables labelled ( 1 if ith observation falls in category j, uij = 0 otherwise, leading to the linear model yi = β0 + β1 ui1 + β2 ui2 + i , for i = 1, . . . , n. Rather than using dummy variables, why don’t we include the categorical variable directly as x? I.e., by yi = β0 + β1 xi + i , for i = 1, . . . , n. Q. 7 When might we choose to use Weighted Least Squares regression? Q. 8 Why might we choose to perform variable selection? Q. 9 Why might you consider it unnecessary to plot residuals versus fitted values and the covariate in a simple linear regression? Q. 10 Explain extrapolation and why it is not usually a good idea. 1