Applied Regression Analysis
Applied Regression Analysis STA 4713
Popular in Course
Popular in Statistics
This 7 page Class Notes was uploaded by Jacinto Carter Sr. on Thursday October 29, 2015. The Class Notes belongs to STA 4713 at University of Texas at San Antonio taught by Staff in Fall. Since its upload, it has received 21 views. For similar materials see /class/231434/sta-4713-university-of-texas-at-san-antonio in Statistics at University of Texas at San Antonio.
Reviews for Applied Regression Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/29/15
STAT 4713 APPLIED REGRESSION ANALYSIS FALL 2003 Dr Nandini Kannan email nkannan utsaedu Home Page httpfacultybusinessutsaedunkannan Chapter 2 Contd Remarks 0 The inferences for g and 51 require the Yi s to be normally distributed However the sampling distributions are fairly robust to slight departures from normality o The variances of the LSE s are affected by the spacing of the covariates Interval Estimation of the Mean Response In many regression examples we would like to estimate the average re sponse for a speci ed level of X 7 say Xh The mean response is denoted by E Yh The point estimator is Yh be leh Since Yh is a linear combination of be and b1 the sampling distribution is normal with mean E Yh E th and variance VarYh 02 1 Xh XV Fiom the formula it is clear that the variability is affected by how far X h is from the average X Using the MSE for the unknown 02 we can generate con dence intervals and hypothesis tests for E Yh using the t distribution The 1001 oz con dence interval for the mean response is A 1 XL X it1 Oz2n 2S Prediction of a New Observation Suppose we wish to predict the response Y for a given level of X say X h We assume that the new observation is the result of a new trial independent of the trials on which the regression is based Let Yhmew denote the new observation The point estimate of Yhmew is clearly Yh be leh In the previous section we have obtained the con dence interval for the mean response at Xh The Cl on the mean response is inappropriate for the prediction problem because it is an interval estimate on the the mean of Y a parameter not a probability statement about a future observation from that distribution The basic idea of a prediction interval is to choose a range in the distri bution of Y and state that the new observation will fall in this range with a certain probability The prediction limits take into account 0 error from the tted model 0 error associated with the future observation The 1001 oz prediction interval on a new observation Yhltnewgt is given by 1 X X 2 h it1 OZ2n 2S 1 Prediction of the mean of m new observations The 1001 a prediction interval on the mean of m observations Elma is given by A 1 1 Xh X2 The Analysis of Variance Approach Partitioning the Total Sum of Squares The variability in the responses is measured by the Total Sum of Squares SSTO 202 Y 2391 This is the inherent variability in the response The natural question to ask is How much of this variability occurs can be explained because of the changes in the values of the covariate The tted model may be viewed as an explanation of the variability in the response If the tted values are close to the original responses the variation in the around l7 will be close to the SSTO The Regression Sum of Squares is given by 831 za a 2 l and measures the variation explained by the regression Using siInple algebra we can show that the Total Sum of Squares SST may be partitioned as SST 232 1 1 232 122 SST SSRSSE where SSE is the Error Sum of Squares or the Variability unexplained Partitioning of Degrees of Freedom The SSTO has n 1 degrees of freedom associated with it One degree of freedom is lost because the deviations about the mean sum to 0 The SSE term has n 2 degrees of freedom associated with it 2 df are lost because of the two parameters we estimated The SSR term has 1 df associated with it The degrees of freedom are thus additive SST SSR SSE n 1 1n 2 We de ne the Mean Sum of Squares as the Sum of Squares divided by its7 associated df m g MSE SSE 71 2 The partitioning of the SS and df may be written in the form of an ANOVA Table Source of Sum of Mean Sum of Variation Squares df Squares EMS Regression SSR 1 MSR 02 5 2XZ X Error SSE n 2 MSE 02 Total SSTO n 1 The last column in the ANOVA Table the Expected Mean Squares repre sents the expected value of the means squares The MSE is unbiased for 02 The MSR is a biased estimator of 02 with the bias a function of B1 When 51 0 both quantities are unbiased for 02 To test the hypothesis H0 51 0 we may use the test statistic i MSR MSE which has an F distribution with 111 2 df under H0 The test given above may be formulated in the more general framework of the General Linear Test approach which involves 3 steps Full Model We begin with the unrestricted or full model we think is appropriate for the data and obtain the error sum of squares which we will denote by SSEF Reduced Model We next consider the hypothesis H0 51 0 When H0 is true we have a reduced or restricted model We t the reduced model to the data and obtain the error sum of squares SSER Test Statistic It can be shown that SSEF g SSER