Popular in Course
Popular in Economcs
This 50 page Class Notes was uploaded by Jayda Beahan Jr. on Saturday October 3, 2015. The Class Notes belongs to EC 228 at Boston College taught by Staff in Fall. Since its upload, it has received 13 views. For similar materials see /class/218056/ec-228-boston-college in Economcs at Boston College.
Reviews for Econometric Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/03/15
Wooldridge Introductory Econometrics 2d ed Chapter 2 The simple regression model Most of this course will be concerned with use of a regression model a structure in which one or more explanatory variables are considered to generate an outcome variable or dependent variableWe begin by considering the simple regression model in which a single explanatory or independent variable is involved We often speak of this as two variable regression or Y on X regression Algebraically 2239 0 liui 1 is the relationship presumed to hold in the population for each observation 239 The values of y are expected to lie on a straight line depending on the corresponding values of r Their values will differ from those predicted by that line by the amount of the error term or disturbance u which expresses the net effect of all factors other than r on the outcome y that is it re ects the assumption of ceteris paribus We often speak of r as the regressor in this relationship less commonly we speak of y as the regressand The coef cients of the relationship 30 and 1 are the regression parameters to be estimated from a sample They are presumed constant in the population so that the effect of a one unit change in r on y is assumed constant for all values of 13 As long as we include an intercept in the relationship we can always assume that E 0 since a nonzero mean for u could be absorbed by the intercept term The crucial assumption in this regression model involves 1 the relationship between and u We consider a random variable as is u and concern ourselves with the conditional distribution of u given If that distribution is equivalent to the unconditional distribution of u then we can conclude that there is no relationship between and u which as we will see makes the estimation problem much more straightforward To state this formally we assume that Eul1Eu0 2 or that the u process has a zero conditional mean This assumption states that the unobserved factors involved in the regression function are not related in any systematic manner to the observed factors For instance consider a regression of individuals hourly wage on the number of years of education they have completed There are of course many factors in uencing the hourly wage earned beyond the number of years of formal schooling In working with this regression function we are assuming that the unobserved factors excluded from the regression we estimate and thus relegated to the u term are not systematically related to years of formal schooling This may not be a tenable assumption we might consider innate ability as such a factor and it is probably related to success in both the educational process and the workplace Thus innate ability which we cannot measure without some proxies may be positively correlated to the education variable which would invalidate assumption 2 The population regression function given the zero conditional mean assumption is My l 33 230 1 3133239 3 This allows us to separate 3 into two parts the systematic part related to r and the unsystematic part which is related to u As long as assumption 2 holds those two components are independent in the statistical sense Let us now derive the least squares estimates of the regression parameters Let 239 1 n denote a random sample of size n from the population where y and r are presumed to obey the relation 1 The assumption 2 allows us to state that 0 and given that assumption that Cov u 0 where Cov denotes the covariance between the random variables These assumptions can be written in terms of the regression error E 30 3133i 0 4 E 30 31 0 These two equations place two restrictions on the joint probability distribution of r and u Since there are two unknown parameters to be estimated we might look upon these equations to provide solutions for those two parameters We choose estimators b0 and b1 to solve the sample counterparts of these equations making use of the principle of the method of moments TL 71 1 Z b0 Z 0 il TL1Z 12 b0 Z 0 239 1 the so called normal equations of least squares Why is this method said to be least squares Because as we shall see it is equivalent to minimizing the sum of squares of the regression residuals How do we arrive at the solution The rst normal equation can be seen to be 50 Q b1j 6 where g and i are the sample averages of those variables This implies that the regression line passes through the point of means of the sample data Substituting this solution into the second normal equation we now have one equation in one unknown 1 Z z39 Q 51 91 0 7 2391 2 Q 912 i 21 2391 b1 21ya i lt2 22gt 221 if 001 13 y lt gt 8 Var1 where the slope estimate is merely the ratio of the sample covariance of the two variables to the variance of r which of course must be nonzero for the estimates to be computed This b1 4 merely implies that not all of the sample values of r can take on the same value There must be diversity in the observed values of as These estimates b0 and bl are said to be the ordinary least squares OLS estimates of the regression parameters since they can be derived by solving the least squares problem TL TL 2 21 62 b0 b133 9 Here we minimize the sum of squared residuals or dif ferences between the regression line and the values of y by choosing b0 and b1 If we take the derivatives 888130 and 888131 and set the resulting rst order conditions to zero the two equations that result are exactly the OLS solutions for the es timated parameters shown above The least squares estimates minimize the sum of squared residuals in the sense that any other line drawn through the scatter of as 3 points would yield a larger sum of squared residuals The OLS estimates provide the unique solution to this problem and can always be computed if i Var1 gt 0 and ii n 2 2 The estimated OLS regression line is then 23239 2 b0 513 10 where the hat denotes the predicted value of 3 correspond ing to that value of r This is the sample regression function SRF corresponding to the population regression function or PRF 3 The population regression function is xed but un known in the population the SRF is a function of the particular sample that we have used to derive it and a different SRF will be forthcoming from a different sample The primary interest 5 in these estimates usually involves bl 8y8a AyAaj the amount by which 3 is predicted to change from a unit change in the level of This slope is often of economic interest whereas the constant term in many regressions is devoid of economic meaning For instance a regression of major companies CEO salaries on the rms return on equity a measure of economic performance yields the regression estimates s 963191 18501r 11 where S is the CEO s annual salary in thousands of 1990 dollars and r is average return on equity over the prior three years in per cent This implies that a one percent increase in ROE over the past three years is worth 18501 to a CEO on average The average annual salary for the 209 CEOs in the sample is 128 million so the increment is about 14 of that average salary The SRF can also be used to predict what a CEO will earn for any level of ROE points on the estimated regression function are such predictions Mechanics of OLS Some algebraic properties of the OLS regression line 1 The sum and average of the OLS residuals is zero TL q0 um 3971 2 which follows from the rst normal equation which speci es that the estimated regression line goes through the point of means 5 3 so that the mean residual must be zero 2 By construction the sample covariance between the OLS 6 residuals and the regressor is zero 00126 13 2 3326 0 13 2 1 This is not an assumption but follows directly from the second normal equation The estimated coef cients which give rise to the residuals are chosen to make it so 3 Each value of the dependent variable may be written in terms of its prediction and its error or regression residual 23239 8239 so that OLS decomposes each 3 into two parts a tted value and a residual Property 13 also implies that 00126 3 0 since 3 is a linear transformation of and linear transformations have linear effects on covariances Thus the tted values and residuals are uncorrelated in the sample Taking this property and applying it to the entire sample we de ne TL SST 2 Egg g2 il n SSE Zn Q2 il SSR Zn 62 21 as the Total sum of squares Explained sum of squares and Residual sum of squares respectively Note that SST expresses the total variation in 3 around its mean and we do not strive to explain its mean only how it varies about its mean The 7 second quantity S S E expresses the variation of the predicted values of 3 around the mean value of y and it is trivial to show that g has the same mean as The third quantity S S R is the same as the least squares criterion S from 9 Note that some textbooks interchange the de nitions of SSE and S S R since both explained and error start with E and regression and residual start with R Given these sums of squares we can generalize the decomposition mentioned above into SST SSE SSR 14 or the total variation in 3 may be divided into that explained and that unexplained ie left in the residual category To prove the validity of 14 note that Z M 232 I Z 3 23239 29239 272 21 21 n Z 62 2 W2 il n TL Xe 2Zeilta z 22 gt2 21 21 21 SST SSR SSE given that the middle term in this expression is equal to zero But this term is the sample covariance of e and 3 given a zero mean for e and by 13 we have established that this is zero How good a job does this SRF do Does the regression function explain a great deal of the variation of y or not very 8 much That can now be answered by making use of these sums of squares R2 rltwygti2SS E1 SS R SST SST The R2 measure sometimes termed the coef cient of determi nation expresses the percent of variation in 3 around its mean explained by the regression function It is an r or simple cor relation coef cient squared in this case of simple regression on a single variable Since the correlation between two vari ables ranges between 1 and 1 the squared correlation ranges between 0 and 1 In that sense R2 is like a batting average In the case where R2 0 the model we have built fails to explain any of the variation in the g values around their mean unlikely but it is certainly possible to have a very low value of R2 In the case where R2 1 all of the points lie on the SRF That is un likely when n gt 2 but it may be the case that all points lie close to the line in which case R2 will approach 1 We cannot make any statistical judgment based directly on R2 or even say that a model with a higher R2 and the same dependent variable is nec essarily a better model but other things equal a higher R2 will be forthcoming from a model that captures more of 33 behavior In cross sectional analyses where we are trying to uinderstand the idiosyncracies of individual behavior very low R2 values are common and do not necessarily denote a failure to build a useful model Important issues in evaluating applied work how do the quantities we have estimated change when the units of 9 measurement are changed In the estimated model of CEO salaries since the 3 variable was measured in thousands of dollars the intercept and slope coef cient refer to those units as well If we measured salaries in dollars the intercept and slope would be multiplied by 1000 but nothing would change The correlation between 3 and is not affected by linear transformations so we would not alter the R2 of this equation by changing its units of measurement Likewise if ROE was measured in decimals rather than per cent it would merely change the units of measurement of the slope coef cient DiViding r by 100 would cause the slope to be multiplied by 100 In the original 11 with r in percent the slope is 18501 thousands of dollars per one unit change in r If we expressed r in decimal form the slope would be 18501 A change in r from 010 to 011 a one per cent increase in ROE would be associated with a change in salary of 0011850118501 thousand dollars Again the correlation between salary and ROE would not be altered This also applies for a transformation such as F 32 g0 it would not matter whether we Viewed temperature in degrees F or degrees C as a causal factor in estimating the demand for heating oil since the correlation between the dependent variable and temperature would be unchanged by switching from Fahrenheit to Celsius degrees Functional form Simple linear regression would seem to be a workable tool if we have a presumed linear relationship between 3 and but what if theory suggests that the relation should be nonlinear 10 It turns out that the linearity of regression refers to 3 being expressed as a linear function of but neither 3 nor need be the raw data of our analysis For instance regressing g on t a time trend would allow us to analyse a linear trend or constant growth in the data What if we expect the data to exhibit exponential growth as would population or sums earning compound interest If the underlying model is 3 I AeXpltTt 15 logy logA i Tt y Aquot rt 16 so that the single log transformation may be used to express a constant growth relationship in which 7 is the regression slope coef cient that directly estimates 810g yat Likewise the double log transformation can be used to express a constant elasticity relationship such as that of a Cobb Douglas function y 1413 17 logy logA i ozloga In this context the slope coef cient oz is an estimate of the elasticity of y with respect to given that 77m 810g ya log by the de nition of elasticity The original equation is nonlinear but the transformed equation is a linear function which may be estimated by OLS regression Likewise a model in which 3 is thought to depend on 1 ll the reciprocal model may be estimated by linear regression by just de ning a new variable 2 equal to 1 presuming a gt 0 That model has an interesting interpretation if you work out its algebra Properties of OLS estimators Now let us consider the properties of the regression estima tors we have derived considering b0 and b1 as estimators of their respective population quantities To establish the unbiasedness of these estimators we must make several assumptions Proposition 1 SLR in the population the dependent variable y is related to the independent variable 1 and the error u as y o iwu 18 Proposition 2 SLRZ we can estimate the population parameters from a sample ofsize n yi 1 Proposition3 SLR3 the error process has a zero conditional mean E u l 13 0 19 Proposition 4 SLR4 the independent variable 1 has a positive variance n 1 1Za 52 gt 0 20 il Given these four assumptions we may proceed considering the intercept and slope estimators as random variables For the slope estimator we may express the estimator in terms of 12 population coef cients and errors 221 j Q 221 i 2 b1 n 7 2 2 21 221 37 396 where we have de ned 33 as the total variation in not the variance of Substituting we can write the slope estimator as 1 3 2 21 53 30 l39 Biz13239 139 33 B0 21 3 l39 31 221 j 33239 l 21 3 4 82 We can show that the rst term in thexnumerator is algebraically zero given that the deviations around the mean sum to zero The second term can be written as 221 if 33 so that the second term is merely 31 when divided by Thus this expression can be rewritten as TL 1 3131 1 7 E 896 23971 showing that any randomness in the estimates of I91 is derived from the errors in the sample weighted by the deviations of their respective values Given the assumed independence of the distributions of and u implied by 19 this expression implies that E B 17 or that b1 is an unbiased estimate of 31 given the propositions l3 above The four propositions listed above are all crucial for this result but the key assumption is the independence of and u We are also concerned about the precision of the OLS estimators To derive an estimator of the precision we must add an assumption on the distribution of the error u Proposition 5 SLR5 homoskedasticity Var u l 13 Varu 02 This assumption states that the variance of the error term is constant over the population and thus within the sample Given 19 the conditional variance is also the unconditional variance The errors are considered drawn from a xed distribution with a mean of zero and a constant variance of 02 If this assumption is violated we have the condition of heteroskedasticity which will often involve the magnitude of the error variance relating to the magnitude of or to some other measurable factor Given this additional assumption but no further assumptions on the nature of the distribution of u we may demonstrate that 02 02 Var b1 221 if 8 22 so that the precision of our estimate of the slope is dependent upon the overall error variance and is inversely related to the variation in the variable The magnitude of does not matter but its variability in the sample does matter If we are conducting a controlled experiment quite unlikely in economic analysis we would want to choose widely spread values of to generate the 14 most precise estimate of 8y833 We can likewise prove that b0 is an unbiased estimator of the population intercept with sampling variance 1 72 215122 02 221 Varltb0n n Z 72 2 221 37 quot396 so that the precision of the intercept depends as well upon the sample size and the magnitude of the values These formulas for the sampling variances will be invalid in the presence of heteroskedasticity that is when proposition SLRS is violated These formulas are not operational since they include the unknown parameter 02 To calculate estimates of the variances we must rst replace 02 with a consistent estimate 32 derives from the least squares residuals 23 yi b0 b1 i1n We cannot observe the error u for a given observation but we can generate a consistent estimate of the ith observation s error with the ith observation s least squares residual 22 Likewise a sample quantity corresponding to the population variance 02 can be derived from the residuals 2 1 2 SSR 3 ltn2eltn2 25 where the numerator is just the least squares criterion SSR divided by the appropriate degrees of freedom Here two degrees of freedom are lost since each residual is calculated by replacing two population coef cients with their sample counterparts This now makes it possible to generate the estimated variances and 15 more usefully the estimated standard error of the regression slope s 1 Sm where s is the standard deviation or standard error of the disturbance process that is s 2 and 895 is It is this estimated standard error that will be displayed on the computer printout when you run a regression and used to construct con dence intervals and hypothesis tests about the slope coef cient We can calculate the estimated standard error of the intercept term by the same means Regression through the origin We could also consider a special case of the model above where we impose a constraint that 30 0 so that y is taken to be proportional to This will often be inappropriate it is generally more sensible to let the data calculate the appropriate intercept term and reestimate the model subject to that constraint only if that is a reasonable course of action Otherwise the resulting estimate of the slope coef cient will be biased Unless theory suggests that a strictly proportional relationship is appropriate the intercept should be included in the model 16 Wooldrtdge Introductory Econometrics 2d ed Chapter 3 Multiple regression analysis Estimation In multiple regression analysis we extend the simple twovariable regression model to consider the possibility that there are additional explanatory factors that have a systematic effect on the dependent variable The simplest extension is the threevariable model in which a second explanatory variable is added 3 Z 30 31331 323132 U 1 where each of the slope coef cients are now partial derivatives of y with respect to the 1 variable which they multiply that is holding 32 xed l 8y 8131 This extension also allows us to consider nonlinear relationships such as a polynomial in z where 131 z and 132 22 Then the regression is linear in 131 and 332 but nonlinear in z 8y8z l 1 2amp2 The key assumption for this model analogous to that which we speci ed for the simple regression model involves the independence of the error process it and both regressors 0r explanatory variables Eu11132 0 2 This assumption of a zero conditional mean for the error process implies that it does not systematically vary with the 13 3 nor with any linear combination of the 13 3 u is independent in the statistical sense from the distributions of the 13 3 The model may now be generalized to the case of k regressors 3303113223kku 3 l Where the B coef cients have the same interpretation each is the partial derivative of y With respect to that 13 holding all other 513 s constant celeris paribus and the u term is that nonsystematic part of y not linearly related to any of the 13 3 The dependent variable y is taken to be linearly related to the 13 3 Which may bear any relation to each other e g polynomials or other transformations as long as there are no exact linear dependencies among the regressors That is no 1 variable can be an exact linear transformation of another or the regression estimates cannot be calculated The independence assumption now becomes Eul11121k0 4 Mechanics and interpretation of OLS Consider rst the threevariable model given above in l The estimated OLS equation contains the parameters of interest 3 2 b0 139 31171 bib 5 and we may de ne the ordinary least squares criterion in terms of the OLS residuals calculated om a sample of size n om this expression TL min 8 Z Z b0 b11721 b21722 Where the minimizatioh bf this expression is performed With respect to each of the three parameters 90 b1 b2 In the case of k regressors these expressions include terms in bk and the minimization is performed With respect to the k 1 1 parameters 90 b1 b2 For this to be feasible n gt k 1 1 that is 2 we must have a sample larger than the number of parameters to be estimated om that sample The minimization is carried out by differentiating the scalar S with respect to each of the Us in turn and setting the resulting rst order condition to zero This gives rise to k 1 1 simultaneous equations in k 1 1 unknowns the regression parameters which are known as the least squares normal equations The normal equations are expressions in the sums of squares and cross products of the y and the regressors including a rst regressor which is a column of 1 s multiplying the constant term For the threevariable regression model we can write out the normal equations as 2y nb0b121b222 7 211313 b021b12 b2212 Z ay b022b1212b22 Just as in the twovariable case the rst normal equation can be interpreted as stating that the regression surface in 3space passes through the multivariate point of means 531 532 3 These three equations may be uniquely solved by normal algebraic techniques or linear algebra for the estimated least squares parameters This extends to the case of k regressors and k 1 regression parameters In each case the regression coef cients are considered in the celeris paribus sense that each coef cient measures the partial effect of a unit change in its variable or regressor holding all other regressors xed If a variable is a component of more than one regressor as in a polynomial 3 relationship as discussed above the total effect of a change in that variable is additive Fitted values residuals and their properties Just as in simple regression we may calculate tted values or predicted values after estimating a multiple regression For observation 239 the tted value is 39239 b0 1 3111321 1 9211322 1 8a and the residual is the difference between the actual value of y and the tted value 8239 21239 23239 9 As with simple regression the sum of the residuals is zero they have by construction zero covariance with each of the 1 variables and thus zero covariance with g and since the average residual is zero the regression surface passes through the multivariate point of means 531 532 53k 3 There are two instances where the simple regression of y on 131 will yield the same coef cient as the multiple regression of y on 131 and 32 with respect to 131 In general the simple regression coef cient will not equal the multiple regression coef cient since the simple regression ignores the effect of 132 and considers that it can be viewed as nonsystematic captured in the error When will the two coef cients be equal First when the coef cient of 132 is truly zero that is when 13 really does not belong in the model Second when 131 and 132 are uncorrelated in the sample This is likely to be quite rare in actual data However these two cases suggest when the two coef cients will be similar when 62 is relatively unimportant in explaining y or when it is 4 very loosely related to 131 We can de ne the same three sums of squares S ST SSE S S R as in simple regression and R2 is still the ratio of the explained sum of squares S S E to the total sum of squares SST It is no longer a simple correlation eg rm squared but it still has the interpretation of a squared simple correlation coef cient the correlation between y and 3 my A very important principle is that R2 never decreases when an explanatory variable is added to a regression no matter how irrelevant that variable may be the R2 of the expanded regression will be no less than that of the original regression Thus the regression R2 may be arbitrarily increased by adding variables even unimportant variables and we should not be impressed by a high value of R2 in a model with a long list of explanatory variables Just as with simple regression it is possible to t a model through the origin suppressing the constant term It is important to note that many of the properties we have discussed no longer hold in that case for instance the least squares residuals cls no longer have a zero sample average and the R2 om such an equation can actually be negative that is the equation does worse than the model which speci es that g g for all 239 If the population intercept SO differs from zero the slope coef cients computed in a regression through the origin will be biased Therefore we often will include an intercept and let the data determine whether it should be zero Expected value of the OLS estimators We now discuss the statistical properties of the OLS estimators of the parameters in the population regression function The population model is taken to be 3 We assume that we have a random sample of size n on the variables of the model The multivariate analogue to our assumption about the error process is now Eu111213k0 10 so that we consider the error process to be independent of each of the explanatory variables distributions This assumption would not hold if we misspeci ed the model for instance if we ran a simple regression with inc as the explanatory variable but the population model also contained me Since inc and inc2 will have a positive correlation the simple regression s parameter estimates will be biased This bias will also appear if there is a separate important factor that should be included in the model if that factor is correlated with the included regressors their coef cients will be biased In the context of multiple regression with several indepen dent variables we must make an additional assumption about their measured values Proposition 1 In the sample none of the independent variables a may be expressed as an exact linear relation of the others in cluding a vector ofls Every multiple regression that includes a constant term can be considered as having a variable 30 1 W This proposition 6 states that each of the other explanatory variables must have nonzero sample variance that is it may not be a constant in the sample Second the proposition states that there is no perfect collinearity or multicollinearity in the sample If we could express one 1 as a linear combination of the other 1 variables this assumption would be violated If we have perfect collinearity in the regressor matrix the OLS estimates cannot be computed mathematically they do not exist A trivial example of perfect collinearity would be the inclusion of the same variable twice measured in different units or via a linear transformation such as temperature in degrees F versus C The key concept each regressor we add to a multiple regression must contain information at the margin It must tell us something about y that we do not already know For instance if we consider an proportion of football games won 132 proportion of games lost and 133 proportion of games tied and we try to use all three as explanatory variables to model alumni donations to the athletics program we nd that there is perfect collinearity since for every college in the sample the three variables sum to one by construction There is no information in eg 33 once we know the other two so including it in a regression with the other two makes no sense and renders that regression uncomputable We can leave any one of the three variables out of the regression it does not matter which one Note that this proposition is not an assumption about the population model it is an implication of the sample data we have to work with Note also that this only applies to linear relations among the explanatory variables 7 a variable and its square for instance are not linearly related so we may include both in a regression to capture a nonlinear relation between y and 13 Given the four assumptions that of the population model the random sample the zero conditional mean of the u process and the absence of perfect collinearity we can demonstrate that the OLS estimators of the population parameters are unbiased Eb 3 j 0 k 11 What happens if we misspecify the model by including irrelevant explanatory variables 1 variables that unbeknowst to us are not in the population model Fortunately this does not damage the estimates The regression will still yield unbiased estimates of all of the coef cients including unbiased estimates of these variables coef cients which are zero in the population It may be improved by removing such variables since including them in the regression consumes degrees of freedom and reduces the precision of the estimates but the effect of overspecifying the model is rather benign The same applies to overspecifying a polynomial order including quadratic and cubic terms when only the quadratic term is needed will be harmless and you will nd that the cubic term s coef cient is far om signi cant However the opposite case where we underspecify the model by mistakenly excluding a relevant explanatory variable is much more serious Let us formally consider the direction and size of bias in this case Assume that the population model is 3 Z 60 l39 31331 139 321132 139 U 12 8 but we do not recognize the importance of 32 and mistakenly consider the relationship y o311u 13 to be fully speci ed What are the consequences of estimating the latter relationship We can show that in this case n Ebl 61 32M 14 221 3321 371 so that the OLS coef cient b1 will be biased not equal to its population value of B 1 even in an expected sense in the presence of the second term That term will be nonzero when g is nonzero which it is by assumption and when the fraction is nonzero But the fraction is merely a simple regression coef cient in the auxiliary regression of 131 on 332 If the regressors are correlated with one another that regression coef cient will be nonzero and its magnitude will be related to the strength of the correlation and the units of the variables Say that the auxiliary regression is 131 dol dl SQl U 15 with d1 gt 0 so that 131 and 132 are positively correlated eg as income and wealth would be in a sample of household data Then we can write the bias as Ebl Bl gdl l6 and its sign and magnitude will depend on both the relation between y and 132 and the interrelation among the explanatory variables If there is no such relationship if 131 and 132 are uncorrelated in the sample then b1 is unbiased since in that 9 special case multiple regression reverts to simple regression In all other cases though there will be bias in the estimation of the underspeci ed model If the left side of 16 is positive we say that b1 has an upward bias the OLS value will be too large If it were negative we would speak of a downward bias If the OLS coef cient is closer to zero than the population coef cient we would say that it is biased toward zero or attenuated It is more di icult to evaluate the potential bias in a multiple regression where the population relationship involves k variables and we include for instance k 1 of them All of the OLS coef cients in the underspeci ed model will generally be biased in this circumstance unless the omitted variable is uncorrelated with each included regressor a very unlikely outcome What we can take away as a general rule is the asymmetric nature of speci cation error it is far more damaging to exclude a relevant variable than to include an irrelevant variable When in doubt and we almost always are in doubt as to the nature of the true relationship we will always be better off erring on the side of caution and including variables that we are not certain should be part of the explanation of y Variance of the OLS estimators We rst reiterate the assumption of homoskedasticity in the context of the k variable regression model Varltu l 331332 02 17 If this assumption is satis ed then the error variance is identical for all combinations of the explanatory variables If it is violated we say that the errors are heteroskedastic and 10 must be concerned about our computation of the OLS estimates variances The OLS estimates are still unbiased in this case but our estimates of their variances are not Given this assumption plus the four made earlier we can derive the sampling variances or precision of the OLS slope estimators 02 1 SSIG1 R 7 quot where SST is the total variation in 133 about its mean and R is the R2 om an auxiliary regression om regressing ZEj on all other 1 variables including the constant term We see immediately that this formula applies to simple regression since the formula we derived for the slope estimator in that instance is identical given that R 0 in that instance there are no other 1 variables Given the population error variance 02 what will make a particular OLS slope estimate more precise Its precision will be increased ie its sampling variance will be smaller the larger is the variation in the associated 1 variable Its precision will be decreased the larger the amount of variable 133 that can be explained by other variables in the regression In the case of perfect collinearity R 1 and the sampling variance goes to in nity If R is very small then this variable makes a large marginal contribution to the equation and we may calculate a relatively more precise estimate of its coef cient If R is quite large the precision of the coef cient will be low since it will be dif cult to partial out the effect of variable j on y from the effects of the other explanatory variables with which it is highly Var bj k 18 ll correlated However we must hasten to add that the assumption that there is no perfect collinearity does not preclude R om being close to unity it only states that it is less than unity The principle stated above when we discussed collinearity that at the margin each explanatory variable must add information that we do not already have in whole or in large part if that variable is to have a meaningful role in a regression model of y This formula for the sampling variance of an OLS coef cient also explains why we might not want to overspecify the model if we include an irrelevant explanatory variable the point estimates are unbiased but their sampling variances will be larger than they would be in the absence of that variable unless the irrelevant variable is uncorrelated with the relevant explanatory variables How do we make 18 operational As written it cannot be computed since it depends on the unknown population parameter 02 Just as in the case of simple regression we must replace 02 with a consistent estimate 82 2 2215 2 2215 19 n k1 n kz l where the numerator is just SSR and the denominator is the sample size less the number of estimated parameters the constant and k slopes In simple regression we computed 32 using a denominator of 2 intercept plus slope Now we must account for the additional slope parameters This also suggests that we cannot estimate a k variable regression model without having a sample of size at least k 1 1 Indeed just as two points de ne a straight line the degrees of freedom in simple regression 12 will be positive iff n gt 2 For multiple regression with k slopes and an intercept n gt k 1 1 Of course in practice we would like to use a much larger sample than this in order to make inferences about the population The positive square root of 32 is known as the standard error of regression or SER Stata reports 3 on the regression output labelled Root MSE or root mean squared error It is in the same units as the dependent variable and is the numerator of our estimated standard errors of the OLS coef cients The magnitude of the SER is often compared to the mean of the dependent variable to gauge the regression s ability to explain the data In the presence of heteroskedasticity where the variance of the error process is not constant over the sample the estimate of 32 presented above will be biased Likewise the estimates of coef cients standard errors will be biased since they depend on 32 If there is reason to worry about heteroskedasticity in a particular sample we must work with a different approach to compute these measures Ef ciency of OLS estimators An important result which underlays the widespread use of OLS regression is the GaussMarkov Theorem describing the relative ef ciency of the OLS estimators Under the assumptions that we have made above for multiple regression and making no further distributional assumptions about the error process we may show that 13 Proposition 2 GaussMarkov Among the class of linear unbi ased estimators of the population regression function OLS pro vides the best estimators in terms of minimum sampling variance OLS estimators are best linear unbiased estimators BLUE This theorem only considers estimators that have these two properties of linearity and unbiasedness Linearity means that the estimator the rule for computing the estimates can be written as a linear function of the data y essentially as a weighted average of the y values OLS clearly meets this requirement Under the assumptions above OLS estimators are also unbiased Given those properties the proof of the GaussMarkov theorem demonstrates that the OLS estimators have the minimum sampling variance of any possible estimator that is they are the best most precise that could possibly be calculated This theorem is not based on the assumption that for instance the u process is Normally distributed only that it is independent of the 1 variables and homoskedastic that is that it is 14 Woodridge Introductory Econometrics 3d ed Chapter 6 Multiple regression analysis Further issues What effects will the scale of the X and y vari ables have upon multiple regression The co efficients point estimates are By8X7 so they are in the scale of the data for instance dol lars of wage per additional year of education If we were to measure either y or X in differ ent units the magnitudes of these derivatives would change but the overall fit of the regres sion equation would not Regression is based on correlation and any linear transformation leaves the correlation between two variables unchanged The R2 for instance will be un affected by the scaling of the data The stan dard error of a coefficient estimate is in the same units as the point estimate and both will change by the same factor if the data are scaled Thus each coefficient s t statistic will have the same value with the same p value irrespective of scaling The standard error of the regression termed Root MSE by Stata is in the units of the dependent vari able The ANOVA F based on R2 will be unchanged by scaling as will be all F statistics associated with hypothesis tests on the param eters As an example consider a regression of babies birth weight measured in pounds on the number of cigarettes per day smoked by their mothers This regression would have the same explanatory power if we measured birth weight in ounces or kilograms or alternatively if we measured nicotine consumption by the number of packs per day rather than cigarettes per day A corollary to this result applies to a dependent variable measured in logarithmic form Since the slope coefficient in this case is an elas ticity or semi elasticity a change in the de pendent variable s units of measurement does not affect the slope coefficient at all since logcy logc logy but rather just shows up in the intercept term Beta coefficients In economics we generally report the regres sion coefficients point estimates when present ing regression results Our coefficients often have natural units and those units are mean ingful In other disciplines many explanatory variables are indices measures of self esteem or political freedom etc and the associated regression coefficients units are not well de fined To evaluate the relative importance of a number of explanatory variables it is com mon to calculate so called beta coefficients standardized regression coefficients from a re gression of y on X where the starred vari ables have been z transformed This trans formation subtracting the mean and dividing by the sample standard deviation generates variables with a mean of zero and a standard deviation of one In a regression of standard ized variables the beta coefficient estimates 8y8X express the effect of a one standard deviation change in X7 in terms of standard deviations of y The explanatory variable with the largest absolute beta coefficient thus has the biggest bang for the buck in terms of an effect on y The intercept in such a regres sion is zero by construction You need not perform this standardization in most regression programs to compute beta coefficients for in stance in Stata you mayjust use the beta op tion eg regress lsalary years gamesyr scndbase beta which causes the beta coefficients to be printed rather than the 95 confidence in terval for each coefficient on the right of the regression output Logarithmic functional forms Many econometric models make use of vari ables measured in logarithms sometimes the dependent variable sometimes both dependent and independent variables Using the double log transformation of both y and X we can turn a multiplicative relationship such as a Cobb Douglas production function into a lin ear relation in the natural logs of output and the factors of production The estimated co efficients are themselves elasticities that is alog yalog Xj which have the units of per centage changes The single log transfor mation regresses logy on X measured in nat ural units alternatively some columns of X might be in logs and some columns in lev els If we are interpreting the coefficient on a levels variable it is alog yan or approx imately the percentage change in y resulting from a one unit change in X We often use this sort of model to estimate an exponen tial trend that is a growth rate since if the X variable is t we have Blogyat or an es timate of the growth rate of y The interpre tation of regression coefficients as percentage changes depends on an approximation that log1 13 m 13 for small 13 If 13 is sizable and we seek the effect for a discrete change in 1 then we must take care with that ap proximation The exact percentage change Ay 100 exp bjAXjgt 1 will give us a more accurate prediction of the change in y Why do so many econometric models utilize logs For one thing a model with a log de pendent variable often more closely satisfies the assumptions we have made for the classi cal linear model Most economic variables are constrained to be positive and their empirical distributions may be quite non normal think of the income distribution When logs are applied the distributions are better behaved Taking logs also reduces the extrema in the data and curtails the effects of outliers We often see economic variables measured in dol lars in log form while variables measured in units of time or interest rates are often left in levels Variables which are themselves ratios are often left in that form in empirical work although they could be expressed in logs but something like an unemployment rate already has a percentage interpretation We must be careful when discussing ratios to distinguish between an 001 change and a one unit change If the unemployment rate is measured as a dec imal eg 005 or 006 we might be concerned with the effect of an 001 change a one per cent increase in unemployment which will be 1100 of the regression coefficient s magni tude Polynomial functional forms We often make use of polynomial functional forms or their simplest form the quadratic to represent a relationship that is not likely to be linear If y is regressed on 1 and x2 it is im portant to note that we must calculate By9x taking account of this form that is we cannot consider the effect of changing 13 while holding x2 constant Thus By9x 2 b1 2b2x and the slope in xy space will depend upon the level of 1 at which we evaluate the derivative In many applications b1 gt 0 while b2 lt 0 so that while 13 is increasing y is increasing at a decreasing rate or levelling off Naturally for sufficiently large 13 y will take on smaller val ues and in the limit will become negative but in the range of the data y will often appear to be a concave function of 13 We could also have the opposite sign pattern b1 lt 0 while b2 gt O which will lead to a U shaped relation in the xy plane with y decreasing reaching a minimum and increasing somewhat like an average cost curve Higher order polynomial terms may also be used but they are not as commonly found in empirical work Interaction terms An important technique that allows for non linearities in an econometric model is the use of interaction terms the product of explana tory variables For instance we might model the house price as a function of bdrms sqft and sqft bdrms which would make the partial derivatives with respect to each factor depend upon the other For instance apriceBbdrms bbdms bsqftbdmssqft so that the effect of an additional bedroom on the price of the house also depends on the size of the house Like wise the effect of additional square footage eg an addition depends on the number of bedrooms Since a model with no interaction terms is a special case of this model we may readily test for the presence of these nonlin earities by examining the significance of the interaction term s estimated coefficient If it is significant the interaction term is needed to capture the relationship AmuyedR2 In presenting multiple regression we established that R2 cannot decrease when additional ex planatory variables are added to the model even if they have no significant effect on y A longer model will always appear to be su perior to a shorter model even though the latter is a more parsimonious representation of the relationship How can we deal with this in comparing alternative models some of which may have many more explanatory factors than others We can express the standard R2 as SSR 1 SSRn 1 SST SSTn Since all models with the same dependent vari able will have the same SST and SSR cannot increase with additional variables R2 is a non decreasing function of k An alternative mea sure computed by most econometrics pack ages is the so called R bar squared or Ad justed R2 R221 R2 1SSRltn ltk1 2 SST n 1 where the numerator and denominator of R2 are divided by their respective degrees of free dom just as they are in computing the mean squared measures in the ANOVA F table For a given dependent variable the denominator does not change but the numerator which is 52 may rise or fall as k is increased An additional regressor uses one more degree of freedom so n k1 declines and SSR declines as well or remains unchanged If SSR declines by a larger percentage than the degrees of freedom then R2 rises and vice versa Adding a number of regressors with lit tle explanatory power will increase R2 but will decrease R2 which may even become nega tive R2 does not have the interpretation of a squared correlation coefficient nor of a bat ting average for the model But it may be used to compare different models of the same dependent variable Note however that we cannot make statistical judgments based on this measure for instance we can show that R2 will rise if we add one variable to the model with a t gt 1 but a t of unity is never sig nificant Thus an increase in R2 cannot be taken as meaningful the coefficients must be examined for significance but conversely if a longer model has a lower R2 its usefulness is cast in doubt R2 is also useful in that it can be used to compare non nested models ie two models neither of which is a proper subset of the other A subset F test cannot be used to compare these models since there is no hypothesis under which the one model emerges from restrictions on the other and vice versa R2 may be used to make informal comparisons of non nested models as long as they have the same dependent variable Stata presents the R2 as the Adj R squared on the regression output Prediction and residual analysis The predictions of a multiple regression are simply the evaluation of the regression line for various values of the explanatory variables We can always calculate g for each observa tion used in the regression these are known as in sample or ex post predictions Since the estimated regression equation is a func tion we can evaluate the function for any set of values XX 23Xlg and form the associ ated point estimate go which might be termed an out of sample or ex ante forecast of the regression equation How reliable are the forecasts of the equation Since the predicted values are linear combinations of the b values we can calculate an interval estimate for the predicted value This is the confidence inter val for Elty0gt that is the average value that would be predicted by the model for a specific set of X values This may be calculated after any regression in Stata using the predict com mand s stdp option that is predict stdpred Stdp will save a variable named stdpred con taining the standard error of prediction The 95 confidence interval will then be for large samples 3 1965tdpred Q 1965tdpred An illustration of this confidence interval for a sim ple regression is given here Note that the con fidence intervals are parabolic with the mini mum width interval at X widening symmetri cally as we move farther from X For a multiple regression the confidence interval will be nar rowest at the multivariate point of means of the X s Displacement cu in plo Fitted values phi 425 461214 l 1760 4840 Weight lbs prediction interval for Ey However if we want a confidence interval for a specific value of y rather than for the mean of y we must also take into account the fact that a predicted value of y will contain an er ror u On average that error is assumed to be zero that is Eu O For a specific value of y though there will be an error ui we do not know its magnitude but we have estimated that it is drawn from a distribution with stan dard error 5 Thus the standard error of fore cast will include this additional source of un certainty and confidence intervals formed for specific values of y will be wider than those as sociated with predictions of the mean y This standard error of forecast series can be calcu lated after a regression has been estimated with the predict command specifying the stdf option If the variable stdfc is created the 95 confidence interval will then be for large samples g 1965tdfcg 196stdfc An illus tration of this confidence interval for a simple regression is given here juxtaposed with that shown earlier for the standard error of predic tion As you can see the added uncertainty associated with a draw from the error distribu tion makes the prediction interval much wider Displacement cu in 7 plof plo Fitted values 474 207 18748 l 1760 4840 Weight lbs prediction intervals for Ey and speCIfic value of y Residual analysis The OLS residuals are often calculated and analyzed after estimating a regression In a purely technical sense they may be used to test the validity of the several assumptions that underly the application of OLS When plotted do they appear systematic Does their dis persion appear to be roughly constant or is it larger for some X values than others Ev idence of systematic behavior in the magni tude of the OLS residuals or in their disper sion would cast doubt on the OLS results A number of formal tests as we will discuss are based on the residuals and many graph ical techniques for examining their random ness or lack thereof are available In Stata help regression diagnostics discusses many Of them The residuals are often used to test specific hypotheses about the underlying relationship For instance we could fit a regression of the salaries of employees of XYZ Corp on a num ber of factors which should relate to their salary level experience education specific qualifica tions job level and so on Say that such a regression was run and the residuals retrieved If we now sort the residuals by factors not used to explain salary levels such as the em ployee s gender or race what will we find Un der nondiscrimination laws there should be no systematic reason for women to be paid more or less than men or blacks more or less than whites after we have controlled for these fac tors If there are significant differences be tween the average residual for eg blacks and whites then we would have evidence of sta tistical discrimination Regression equations have often played an important role in inves tigating charges of discrimination in the work place Likewise most towns and cities as sessments of real estate used to set the tax levy on that property are performed by regres sion in which the explanatory factors include the characteristics of a house and its neighbor hood Since many houses will not have been sold in the recent past the regression must be run over a sample of houses that have been sold and out of sample predictions used to es timate the appropriate price for a house that has not been sold recently based on its at tributes and trends in real estate transactions prices in its neighborhood A mechanical eval uation of the fair market value of the house may be subject to error but previous meth ods used in which knowledgeable individuals attached valuations based on their understand ing of the local real estate market are more subjective
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'