Popular in Course
Popular in Economcs
This 95 page Class Notes was uploaded by Jayda Beahan Jr. on Saturday October 3, 2015. The Class Notes belongs to EC 228 at Boston College taught by Staff in Fall. Since its upload, it has received 44 views. For similar materials see /class/218056/ec-228-boston-college in Economcs at Boston College.
Reviews for Econometric Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/03/15
Woodridge Introductory Econometrics 3d ed Chapter 3 Multiple regression analysis Estimation In multiple regression analysis we extend the simple two variable regression model to con sider the possibility that there are additional explanatory factors that have a systematic ef fect on the dependent variable The simplest extension is the three variable model in which a second explanatory variable is added y5o lxl 22u 1 where each of the slope coefficients are now partial derivatives of y with respect to the 1 variable which they multiply that is hold ing x2 fixed 31 By8x1 This extension also allows us to consider nonlinear relationships such as a polynomial in z where x1 z and x2 2 z2 Then the regression is linear in 1 and x2 but nonlinear in z 8y8z 31 2322 The key assumption for this model analogous to that which we specified for the simple re gression model involves the independence of the error process u and both regressors or ex planatory variables Eu l 17M 0 2 This assumption of a zero conditional mean for the error process implies that it does not systematically vary with the x s nor with any linear combination of the x s u is independent in the statistical sense from the distributions Of the 513 s The model may now be generalized to the case of k regressors y 50 31x1 32x2 Bkwk u 3 where the B coefficients have the same inter pretation each is the partial derivative of y with respect to that 13 holding all other x s constant ceteris paribus and the u term is that nonsystematic part of y not linearly re lated to any of the x s The dependent variable y is taken to be linearly related to the x s which may bear any relation to each other eg poly nomials or other transformations as long as there are no exact linear dependencies among the regressors That is no 1 variable can be an exact linear transformation of another or the regression estimates cannot be calculated The independence assumption now becomes EC I 33173327 Z Mechanics and interpretation Of OLS Consider first the three variable model given above in 1 The estimated OLS equation contains the parameters of interest Q 50 91331 b2x2 5 and we may define the ordinary least squares criterion in terms of the OLS residuals calcu lated from a sample of size n from this expres sion n min 3 2 yr b0 b13311 192x222 6 i1 where the minimization of this expression is performed with respect to each of the three parameters b0 b1 b2 In the case of k regres sors these expressions include terms in bk and the minimization is performed with respect to the k1 parameters b0b1b2 bk For this to be feasible n gt k 1 that is we must have a sample larger than the number of pa rameters to be estimated from that sample The minimization is carried out by differenti ating the scalar S with respect to each of the Us in turn and setting the resulting first order condition to zero This gives rise to k 1 si multaneous equations in k1 unknowns the regression parameters which are known as the least squares normal equations The nor mal equations are expressions in the sums of squares and cross products of the y and the re gressors including a first regressor which is a column of 1 s multiplying the constant term For the three variable regression model we can write out the normal equations as 29 nboI191E1b2 2 7 23319 bolel blZ HDZMM Z2y bozx2b12x12b2zmg Just as in the two variable case the first normal equation can be interpreted as stat ing that the regression surface in 3 space passes through the multivariate point of means i1 i2 g These three equations may be uniquely solved by normal algebraic techniques or linear algebra for the estimated least squares param eters This extends to the case of k regressors and k1 regression parameters In each case the regression coefficients are considered in the ce teris parbus sense that each coefficient mea sures the partial effect of a unit change in its variable or regressor holding all other regres sors fixed If a variable is a component of more than one regressor as in a polynomial relation ship as discussed above the total effect of a change in that variable is additive Fitted Values residuals and their proper ties Just as in simple regression we may calculate fitted values or predicted values after esti mating a multiple regression For observation 139 the fitted value is l b0 blxn b2 2 bkxz k 8 and the residual is the difference between the actual value of y and the fitted value 62 yr Y 9 As with simple regression the sum of the resid uals is zero they have by construction zero covariance with each of the 1 variables and thus zero covariance with g and since the av erage residual is zero the regression surface passes through the multivariate point of means 17 i27 quot397 ika There are two instances where the simple re gression of y on 1 will yield the same coeffi cient as the multiple regression of y on 1 and 2 with respect to 1 In general the simple re gression coefficient will not equal the multiple regression coefficient since the simple regres sion ignores the effect of 2 and considers that it can be viewed as nonsystematic captured in the error u When will the two coefficients be equal First when the coefficient of 2 is truly zero that is when x2 really does not belong in the model Second when x1 and x2 are un correlated in the sample This is likely to be quite rare in actual data However these two cases suggest when the two coefficients will be similar when x2 is relatively unimportant in explaining y or when it is very loosely related to 131 We can define the same three sums of squares SST SSE SSR as in simple regression and R2 is still the ratio of the explained sum of squares SSE to the total sum of squares SST It is no longer a simple correlation eg Tyx squared but it still has the interpretation of a squared simple correlation coefficient the correlation between y and g my A very im portant principle is that R2 never decreases when an explanatory variable is added to a regression no matter how irrelevant that vari able may be the R2 of the expanded regres sion will be no less than that of the original regression Thus the regression R2 may be arbitrarily increased by adding variables even unimportant variables and we should not be impressed by a high value of R2 in a model with a long list of explanatory variables Just as with simple regression it is possible to fit a model through the origin suppress ing the constant term It is important to note that many of the properties we have discussed no longer hold in that case for instance the least squares residuals eis no longer have a zero sample average and the R2 from such an equation can actually be negative that is the equation does worse than the model which specifies that g g for all i If the population intercept g differs from zero the slope coef ficients computed in a regression through the origin will be biased Therefore we often will include an intercept and let the data deter mine whether it should be zero Expected value Of the OLS estimators We now discuss the statistical properties of the OLS estimators of the parameters in the pop ulation regression function The population model is taken to be 3 We assume that we have a random sample of size n on the vari ables of the model The multivariate analogue to our assumption about the error process is now Euxlax27 397xk 0 so that we consider the error process to be independent of each of the explanatory vari ables distributions This assumption would not hold if we misspecified the model for in stance if we ran a simple regression with inc as the explanatory variable but the population model also contained 7162 Since inc and inc2 will have a positive correlation the simple re gression s parameter estimates will be biased This bias will also appear if there is a sepa rate important factor that should be included in the model if that factor is correlated with the included regressors their coefficients will be biased In the context of multiple regression with sev eral independent variables we must make an additional assumption about their measured val ues Proposition 1 In the sample none of the in dependent variables 1 may be expressed as an exact linear relation of the others including a vector of ls Every multiple regression that includes a con stant term can be considered as having a vari able x0 1 W This proposition states that each of the other explanatory variables must have nonzero sample variance that is it may not be a constant in the sample Second the proposition states that there is no per fect collinearity or multicollinearity in the sample If we could express one 1 as a linear combination of the other 1 variables this as sumption would be violated If we have perfect collinearity in the regressor matrix the OLS es timates cannot be computed mathematically they do not exist A trivial example of perfect collinearity would be the inclusion of the same variable twice measured in different units or via a linear transformation such as tempera ture in degrees F versus 0 The key concept each regressor we add to a multiple regression must contain information at the margin It must tell us something about y that we do not already know For instance if we consider 131 proportion of football games won x2 pro portion of games lost and x3 proportion of games tied and we try to use all three as ex planatory variables to model alumni donations to the athletics program we find that there is perfect collinearity since for every college in the sample the three variables sum to one by construction There is no information in eg x3 once we know the other two so in cluding it in a regression with the other two makes no sense and renders that regression uncomputable We can leave any one of the three variables out of the regression it does not matter which one Note that this proposi tion is not an assumption about the population model it is an implication of the sample data we have to work with Note also that this only applies to linear relations among the explana tory variables a variable and its square for instance are not linearly related so we may include both in a regression to capture a non linear relation between y and 13 Given the four assumptions that of the pop ulation model the random sample the zero conditional mean of the u process and the ab sence of perfect collinearity we can demon strate that the OLS estimators of the popula tion parameters are unbiased Ebj by j o 11 What happens if we misspecify the model by including irrelevant explanatory variables 1 variables that unbeknowst to us are not in the population model Fortunately this does not damage the estimates The regression will still yield unbiased estimates of all of the coef ficients including unbiased estimates of these variables coefficients which are zero in the population It may be improved by removing such variables since including them in the re gression consumes degrees of freedom and re duces the precision of the estimates but the effect of overspecifying the model is rather benign The same applies to overspecifying a polynomial order including quadratic and cubic terms when only the quadratic term is needed will be harmless and you will find that the cubic term s coefficient is far from signifi cant However the opposite case where we under specify the model by mistakenly excluding a relevant explanatory variable is much more se rious Let us formally consider the direction and size of bias in this case Assume that the population model is y5o lxl 22u 12 but we do not recognize the importance of 2 and mistakenly consider the relationship y5o311u 13 to be fully specified What are the conse duences of estimating the latter relationship We can show that in this case E121 31 32371 1 it 3332 i1 11 1 so that the OLS coefficient b1 will be biased not equal to its population value of l even in an expected sense in the presence of the second term That term will be nonzero when 2 is nonzero which it is by assumption and when the fraction is nonzero But the frac tion is merely a simple regression coefficient in the auxiliary regression of 2 on 1 If the re gressors are correlated with one another that regression coefficient will be nonzero and its magnitude will be related to the strength of the correlation and the units of the variables Say that the auxiliary regression is 1 dod12u 15 with all gt 0 so that 1 and x2 are positively correlated eg as income and wealth would 14 be in a sample of household data Then we can write the bias as Ebl 31 32d1 16 and its sign and magnitude will depend on both the relation between y and x2 and the inter relation among the explanatory variables If there is no such relationship if x1 and x2 are uncorrelated in the sample then b1 is unbiased since in that special case multiple regression reverts to simple regression In all other cases though there will be bias in the estimation of the underspecified model If the left side of 16 is positive we say that b1 has an upward bias the OLS value will be too large If it were negative we would speak of a downward bias If the OLS coefficient is closer to zero than the population coefficient we would say that it is biased toward zero or attenuated It is more difficult to evaluate the potential bias in a multiple regression where the popu lation relationship involves k variables and we include for instance k 1 of them All of the OLS coefficients in the underspecified model will generally be biased in this circumstance un less the omitted variable is uncorrelated with each included regressor a very unlikely out come What we can take away as a general rule is the asymmetric nature of specification error it is far more damaging to exclude a relevant variable than to include an irrelevant variable When in doubt and we almost al ways are in doubt as to the nature of the true relationship we will always be better off erring on the side of caution and including variables that we are not certain should be part of the explanation of y variance Of the OLS estimators We first reiterate the assumption of homoskedas ticity in the context of the k variable regres sion model Var u 131332 a2 17 If this assumption is satisfied then the error variance is identical for all combinations of the explanatory variables If it is violated we say that the errors are heteroskedastic and must be concerned about our computation of the OLS estimates variances The OLS estimates are still unbiased in this case but our esti mates of their variances are not Given this assumption plus the four made earlier we can derive the sampling variances or precision of the OLS slope estimators a2 SSTj 1 Rg where SSTj is the total variation in xj about its mean and RE is the R2 from an auxiliary regression from regressing xj on all other 1 variables including the constant term We see immediately that this formula applies to sim ple regression since the formula we derived for Var bj j 18 the slope estimator in that instance is identi cal given that R O in that instance there are no other 1 variables Given the population error variance 02 what will make a particular OLS slope estimate more precise Its preci sion will be increased ie its sampling vari ance will be smaller the larger is the variation in the associated 1 variable Its precision will be decreased the larger the amount of vari able xj that can be explained by other vari ables in the regression In the case of perfect collinearity R 1 and the sampling variance goes to infinity If R32 is very small then this variable makes a large marginal contribution to the equation and we may calculate a relatively more precise estimate of its coefficient If R2 is quite large the precision of the coefficient will be low since it will be difficult to partial out the effect of variable j on y from the effects of the other explanatory variables with which it is highly correlated However we must has ten to add that the assumption that there is no perfect collinearity does not preclude R32 from being close to unity it only states that it is less than unity The principle stated above when we discussed collinearity that at the margin each explanatory variable must add informa tion that we do not already have in whole or in large part if that variable is to have a mean ingful role in a regression model of y This for mula for the sampling variance of an OLS co efficient also explains why we might not want to overspecify the model if we include an irrel evant explanatory variable the point estimates are unbiased but their sampling variances will be larger than they would be in the absence of that variable unless the irrelevant variable is uncorrelated with the relevant explanatory variables How do we make 18 operational As written it cannot be computed since it depends on the unknown population parameter 02 Just as in the case of simple regression we must replace 02 with a consistent estimate 2 Zgzl 12 Zznzl 12 n w4 m k l where the numerator is just SSR and the de nominator is the sample size less the number of estimated parameters the constant and k slopes In simple regression we computed 52 using a denominator of 2 intercept plus slope Now we must account for the additional slope parameters This also suggests that we cannot estimate a k variable regression model with out having a sample of size at least k1 In deed just as two points define a straight line the degrees of freedom in simple regression will be positive iff n gt 2 For multiple regression with k slopes and an intercept n gt k 1 Of course in practice we would like to use a much larger sample than this in order to make inferences about the population 19 The positive square root of 52 is known as the standard error of regression or SER Stata reports 5 on the regression output la belled Root MSE or root mean squared er ror It is in the same units as the dependent variable and is the numerator of our estimated standard errors of the OLS coefficients The magnitude of the SER is often compared to the mean of the dependent variable to gauge the regression s ability to explain the data In the presence of heteroskedasticity where the variance of the error process is not constant over the sample the estimate of 52 presented above will be biased Likewise the estimates of coefficients standard errors will be biased since they depend on 52 If there is reason to worry about heteroskedasticity in a particular sample we must work with a different ap proach to compute these measures Efficiency of OLS estimators An important result which underlays the widespread use of OLS regression is the Gauss Markov Theorem describing the relative efficiency of the OLS estimators Under the assumptions that we have made above for multiple regression and making no further distributional assump tions about the error process we may show that Proposition 2 Gauss Markov Among the class of linear unbiased estimators of the pop ulation regression function OLS provides the best estimators in terms of minimum sampling variance OLS estimators are best linear unbi ased estimators BLUE This theorem only considers estimators that have these two properties of linearity and unbi asedness Linearity means that the estimator the rule for computing the estimates can be written as a linear function of the data y es sentially as a weighted average of the y val ues OLS clearly meets this requirement Un der the assumptions above OLS estimators are also unbiased Given those properties the proof of the Gauss Markov theorem demon strates that the OLS estimators have the mini mum sampling variance of any possible estima tor that is they are the best most precise that could possibly be calculated This theo rem is not based on the assumption that for instance the u process is Normally distributed only that it is independent of the 1 variables and homoskedastic that is that it is Mid Woodridge Introductory Econometrics 3d ed Chapter 16 Simultaneous equations mod els An obvious reason for the endogeneity of ex planatory variables in a regression model is si multaneity that is one or more of the ex planatory variables arejointly determined with the dependent variable Models of this sort are known as simultaneous equations mod els SEMs and they are widely utilized in both applied microeconomics and macroeco nomics Each equation in a SEM should be a behavioral equation which describes how one or more economic agents will react to shocks or shifts in the exogenous explanatory vari ables ceteris paribus The simultaneously de termined variables often have an equilibrium interpretation and we consider that these vari ables are only observed when the underlying model is in equilibrium For instance a de mand curve relating the quantity demanded to the price of a good as well as income the prices of substitute commodities etc concep tually would express that quantity for a range of prices But the only price quantity pair that we observe is that resulting from market clear ing where the quantities supplied and demanded were matched and an equilibrium price was struck In the context of labor supply we might relate aggregate hours to the average wage and additional explanatory factors hr 5031w 221u 1 where the unit of observation might be the county This is a structural equation or be havioral equation relating labor supply to its causal factors that is it reflects the structure of the supply side of the labor market This equation resembles many that we have consid ered earlier and we might wonder why there would be any difficulty in estimating it But if the data relate to an aggregate such as the hours worked at the county level in response to the average wage in the county this equa tion poses problems that would not arise if for instance the unit of observation was the indi vidual derived from a survey Although we can assume that the individual is a price or wage taker we cannot assume that the average level of wages is exogenous to the labor market in Suffolk County Rather we must consider that it is determined within the market affected by broader economic conditions We might con sider that the z variable expresses wage levels in other areas which would cetpar have an effect on the supply of labor in Suffolk County higher wages in Middlesex County would lead to a reduction in labor supply in the Suffolk County labor market cet par To complete the model we must add a speci fication of labor demand hr 70 7110i 7222 W 2 where we model the quantity demanded of la bor as a function of the average wage and ad ditional factors that might shift the demand curve Since the demand for labor is a de rived demand dependent on the cost of other factors of production we might include some measure of factor cost eg the cost of capi tal as this equation s z variable In this case we would expect that a higher cost of capital would trigger substitution of labor for capital at every level of the wage so that 72 gt 0 Note that the supply equation represents the behav ior of workers in the aggregate while the de mand equation represents the behavior of em ployers in the aggregate In equilibrium we would equate these two equations and expect that at some level of equilibrium labor utiliza tion and average wage that the labor market is equilibrated These two equations then con stitute a simultaneous equations model SEM of the labor market Neither of these equations may be consistently estimated via OLS since the wage variable in each equation is correlated with the respective error term How do we know this Because these two equations can be solved and rewrit ten as two reduced form equations in the en dogenous variables h and 10 Each of those variables will depend on the exogenous vari ables in the entire system zL and z2 as well as the structural errors u and vi In general any shock to either labor demand or supply will affect both the equilibrium quantity and price wage Even if we rewrote one of these equations to place the wage variable on the left hand side this problem would persist both en dogenous variables in the system arejointly de termined by the exogenous variables and struc tural shocks Another implication of this struc ture is that we must have separate explanatory factors in the two equations If 21 22 for in stance we would not be able to solve this sys tem and uniquely identify its structural param eters There must be factors that are unique to each structural equation that for instance shift the supply curve without shifting the de mand curve The implication here is that even if we only care about one of these structural equations for instance we are tasked with modelling la bor supply and have no interest in working with the demand side of the market we must be able to specify the other structural equa tions of the model We need not estimate them but we must be able to determine what measures they would contain For instance consider estimating the relationship between murder rate number of police and wealth for a number of cities We might expect that both of those factors would reduce the murder rate cetpar more police are available to appre hend murderers and perhaps prevent murders while we might expect that lower income cities might have greater unrest and crime But can we reasonably assume that the number of po lice per capita is exogenous to the murder rate Probably not in the sense that cities striving to reduce crime will spend more on po lice Thus we might consider a second struc tural equation that expressed the number of police per capita as a function of a number of factors We may have no interest in estimat ing this equation which is behavioral reflect ing the behavior of city officials but if we are to consistently estimate the former equation the behavioral equation reflecting the behavior of murderers we will have to specify the sec ond equation as well and collect data for its explanatory factors Simultaneity bias in OLS What goes wrong if we use OLS to estimate a structural equation containing endogenous explanatory variables Consider the structural system 91 04192 3121 91 3 92 04291 3222 92 in which we are interested in estimating the first equation Assume that the z variables are exogenous in that each is uncorrelated with each of the error processes u What is the cor relation between y2 and ul If we substitute the first equation into the second we derive 92 042 04192 3121 91 3222 U2 1 042041 92 0425121 3222 04291 92 4 If we assume that 042041 72 1 we can derive the reduced form equation for y2 as 92 7T2121 7T2222 92 5 where the reduced form error term 112 a2u1 u2 Thus y2 depends on ul and estimation by OLS of the first equation in 3 will not yield consistent estimates We can consistently es timate the reduced form equation 5 via OLS and that in fact is an essential part of the strat egy of the 2SLS estimator But the parameters of the structural equation are nonlinear trans formations of the reduced form parameters so being able to estimate the reduced form pa rameters does not achieve the goal of provid ing us with point and interval estimates of the structural equation In this special case we can evaluate the simul taneity bias that would result from improperly applying OLS to the original structural equa tion The covariance of y2 and ul is equal to the covariance of y2 and v2 042 1 042041 E Go1 Oz20410 6 If we have some priors about the signs of the or parameters we may sign the bias Generally it could be either positive or negative that is the OLS coefficient estimate could be larger or smaller than the correct estimate but will not be equal to the population parameter in an expected sense unless the bracketed expres sion is zero Note that this would happen if 12 O that is if y2 was not simultaneously determined with yl But in that case we do not have a simultaneous system the model in that case is said to be a recursive system which may be consistently estimated with OLS Identifying and estimating a structural equa tion The tool that we will apply to consistently estimate structural equations such as 3 is one that we have seen before two stage least squares 2SLS The application of QSLS in a structural system is more straightforward than the general application of instrumental vari ables estimators since the specification of the system makes clear what variables are available as instruments Let us first consider a slightly different two equation structural system q 0411943121 u1 7 q 042194112 We presume these equations describe the work ings of a market and that the equilibrium con dition of market clearing has been imposed Let q be per capita milk consumption at the county level p be the average price of a gallon of milk in that county and let 21 be the price of cattle feed The first structural equation is thus the supply equation with 11 gt O and l lt O that is a higher cost of production will generally reduce the quantity supplied at the same price per gallon The second equa tion is the demand equation where we pre sume that 12 lt O reflecting the slope of the demand curve in the p q plane Given a ran dom sample on p qz1 what can we achieve The demand equation is said to be identified in fact exactly identified since one instru ment is needed and precisely one is available 21 is available because the demand for milk does not depend on the price of cattle feed so we take advantage of an exclusion restriction that makes 21 available to identify the demand curve Intuitively we can think of variations in 21 shifting the supply curve up and down tracing out the demand curve in doing so it makes it possible for us to estimate the struc tural parameters of the demand curve What about the supply curve It also has a problem of simultaneity bias but it turns out that the supply equation is unidentified Given the model as we have laid it out there is no variable available to serve as an instru ment for p that is we need a variable that affects demand and shifts the demand curve but does not directly affect supply In this case no such variable is available and we can not apply the instrumental variables technique without an instrument What if we went back to the drawing board and realized that the price of orange juice should enter the demand equation although it tastes terrible on corn flakes orange juice might be a healthy substi tute for quenching one s thirst Then the sup ply curve would be identified exactly identified since we now would have a single instrument that served to shift demand but did not enter the supply relation What if we also consid ered the price of beer as an additional demand factor Then we would have two available in struments presuming that each is appropri ately correlated and 2SLS would be used to boil them down into the single instrument needed In that case we would say that the supply curve would be overidentified The identification status of each structural equa tion thus hinges upon exclusion restrictions our a priori statements that certain variables do not appear in certain structural equations If they do not appear in a structural equation they may be used as instruments to assist in identifying the parameters of that equation For these variables to successfully identify the parameters they must have nonzero popula tion parameters in the equation in which they are included Consider an example hours 2 f1logwageeducagekl6wier logwage 2 f2 h0urseducxperxper2gt 8 The first equation is a labor supply relation expressing the number of hours worked by a married woman as a function of her wage ed ucation age the number of preschool children and non wage income including spouses s earn ings The second equation is a labor demand equation expressing the wage to be paid as a function of hours worked the employee s education and a polynomial in her work ex perience The exclusion restructions indicate that the demand for labor does not depend on the worker s age nor should it the presence of preschool kids or other resources available to the worker Likewise we assume that the woman s willingness to participate in the mar ket does not depend on her labor market ex perience One instrument is needed to identify each equation age k16 and wier are avail able to identify the supply equation while Xper and xper2 are available to identify the demand equation This is the order condition for identfication essentially counting instruments and variables to be instrumented each equa tion is overidentified But the order condition is only necessary the sufficient condition is the rank condition which essentially states that in the reduced form equation log wage g educ age kl6 wier xper 51319672 9 at least one of the population coefficients on xperxper2 must be nonzero But since we can consistently estimate this equation with OLS we may generate sample estimates of those coefficients and test the joint null that both coefficients are zero If that null is re jected then we satisfy the rank condition for the first equation and we may proceed to esti mate it via 2SLS The equivalent condition for the demand equation is that at least one of the population coefficients agekl6wier in the regression of hours on the system s exogenous variables is nonzero If any of those variables are significant in the equivalent reduced form equation it may be used as an instrument to estimate the demand equation via QSLS The application of two stage least squares via Stata s ivreg command involves identifying the endogenous explanatory variables the ex ogenous variables that are included in each equation and the instruments that are excluded from each equation To satisfy the order con dition the list of excluded instruments must be at least as long as the list of endogenous explanatory variables This logic carries over to structural equation systems with more than two endogenous variables equations a struc tural model may have any number of endoge nous variables each defined by an equation and we can proceed to evaluate the identifica tion status of each equation in turn given the appropriate exclusion restrictions Note that if an equation is unidentified due to the lack of appropriate instruments then no econometric technique may be used to estimate its param eters In that case we do not have knowledge that would allow us to trace out that equa tion s slope while we move along it Simultaneous equations models With time series One of the most common applications of 2SLS in applied work is the estimation of structural time series models For instance consider a simple macro model Ct oI MYt T I tIuu It 7071TtU2t Y2 CtIItI Gt 10 In this system aggregate consumption each quarter is determined jointly with disposable income Even if we assume that taxes are ex ogenous and in fact they are responsive to income the consumption function cannot be consistently estimated via OLS If the interest rate is taken as exogenous set for instance by monetary policy makers then the invest ment eduation may be consistently estimated via OLS The third equation is an identity it need not be estimated and holds without er ror but its presence makes explicit the simul taneous nature of the model If r is exoge nous then we need one instrument to estimate the consumption function government spend ing will suffice and consumption will be exactly identified If r is to be taken as endogenous we would have to add at least one equation to the model to express how monetary pol icy reacts to economic conditions We might also make the investment function more re alistic by including dynamics that investment depends on lagged income for instance Yt1 firms make investment spending plans based on the demand for their product This would allow Yt1 a predetermined variable to be used as an additional instrument in estimation of the consumption function We may also use lags of exogenous variables for instance lagged taxes or government spending as in struments in this context Although this only scratches the surface of a broad set of issues relating to the estimation of structural models with time series data it should be clear that those models will generally require instrumental variables techniques such as 2SLS for the consistent estimation of their component relationships Woodridge Introductory Econometrics 4th ed Chapter 7 Multiple regression analysis with qualitative information Binary or dummy variables We often consider relationships between ob served outcomes and qualitative factors mod els in which a continuous dependent variable is related to a number of explanatory factors some of which are quantitative and some of which are qualitative In econometrics we also consider models of qualitative dependent vari ables but we will not explore those models in this course due to time constraints But we can readily evaluate the use of qualitative in formation in standard regression models with continuous dependent variables Qualitative information often arises in terms of some coding or index which takes on a number of values for instance we may know in which one of the six New England states each of the individuals in our sample resides The data themselves may be coded with the biliteral MA RI ME etc How can we use this factor in a regression equation In the data state takes on six distinct val ues We must create six binary variables or dummy variables each of which will refer to one state that is that variable will be 1 if the individual comes from that state and O oth erwise We can generate this set of 6 vari ables easily in Stata with the command tab state genst which will create 6 new vari ablesin our dataset st1 st2 st6 Each of these variables are dummies that is they only contain 0 or 1 values If we add up these variables we get exactly a vector of 1 s sug gesting that we will never want to use all 6 variables in a regression since by knowing the values of any 5 We may also find the pro portions of each state s citizens in our sample very easily summ st will give the descriptive statistics of all 6 variables and the mean of each st dummy is the sample proportion living in that state How can we use these dummy variables Say that we wanted to know whether incomes dif fered significantly across the 6 state region What if we regressed income on any five of these st dummies income 504 5181513232535t3548t4555t5u 1 where I have suppressed the observation sub scripts What are the regression coefficients in this case g is the average income in the 6th state the dummy for which is excluded from the regression l is the difference between the income in state 1 and the income in state 6 g is the difference between the income in state 2 and the income in state 6 and so on What is the ordinary ANOVA F in this context the test that all the slopes are equal to zero Precisely the test of the null hypoth esns H01M1M2M3M4M5M6 2 versus the alternative that not all six of the state means are the same value It turns out that we can test this same hypothesis by ex cluding any one of the dummies and including the remaining five in the regression The co efficients will differ but the p value of the ANOVA F will be identical for any of these regressions In fact this regression is an ex ample of classical one way ANOV testing whether a qualitative factor in this case state of residence explains a significant fraction of the variation in income What if we wanted to generate point and in terval estimates of the state means of income Then it would be most convenient to refor mulate 1 by including all 6 dummies and removing the constant term This is alge braically the same regression The coefficient on the now included 5256 will be precisely that reported above as g The coefficient reported for stl will be precisely 30 31 from the pre vious model and so on But now those coef ficients will be reported with confidence inter vals around the state means Those statistics could all be calculated if you only estimated 1 but to do so you would have to use lincom for each coefficient Running this alternative form of the model is much more convenient for estimating the state means in point and inter val form But to test the hypothesis 2 it is most convenient to run the original regression since then the ANOVA F performs the appro priate test with no further ado What if we fail to reject the ANOVA F null Then it appears that the qualitative factor state does not explain a significant fraction of the variation in income Perhaps the relevant clas sification is between northern more rural New England states NEN and southern more pop ulated New England states NES Given the nature of dummy variables we may generate these dummies two ways We can express the Boolean condition in terms of the state vari able gen nen state VT state NH I state ME This expression with parens on the right hand side of the generate state ment evaluates that expression and returns true 1 or false 0 The vertical bar is Stata s OR operator since every person in the sample lives in one and only one state we must use OR to phrase the condition that they live in northern New England But there is another way to generate this nen dummy given that we have st1st6 defined for the regression above Let s say that Vermont New Hamp shire and Maine have been coded as st6 314 and st3 respectively We may just gen nen st3st4st6 since the sum of mutually exclu sive and exhaustive dummies must be another dummy To check the resulting nen will have a mean equal to the percentage of the sample that live in northern New England the equiva lent nes dummy will have a mean for southern New England residents and the sum of those two means will of course be 1 We can then run a simplified form of our model as regress inc nen the ANOVA F statistic for that regres sion tests the null hypothesis that incomes in northern and southern New England do not differ significantly Since we have excluded nes the slope coefficient on nen measures the amount by which northern New England income differs from southern New England in come the mean income for southern New Eng land is the constant term If we want point and interval estimates for those means we should regress inc nen nes noc Regression with continuous and dummy vari ables In the above examples we have estimated pure ANOV models regression models in which all of the explanatory variables are dummies In econometric research we often want to com bine quantitative and qualitative information including some regressors that are measurable and others that are dummies Consder the simplest example we have data on individu als wages years of education and their gen der We could create two gender dummies male and female but we will only need one in the analysis say female We create this vari able as gen female gender F We can then estimate the model wage 2 30 leduc gfemale u 3 The constant term in this model now becomes the wage for a male with zero years of ed ucation Male wages are predicted as b0 bleduc while female wages are predicted as b0b1educb2 The gender differential is thus b2 How would we test for the existence of sta tistical discrimination that say females with the same qualifications are paid a lower wage This would be H0 g lt O The t statistic for b2 will provide us with this hypothesis test What is this model saying about wage struc ture Wages are a linear function of the years of education If b2 is significantly different than zero then there are two wage profiles parallel lines in educwage space each with a slope of b1 with their intercepts differing by What if we wanted to expand this model to consider the possibility that wages differ by both gender and race Say that each worker is classified as racewhite or raceblack Then we could gen black race black to cre ate the dummy variable and add it to 3 What now is the constant term The wage for a white male with zero years of education Is there a significant race differential in wages If so the coefficient b3 which measures the difference between white and black wages ce teris paribus will be significantly different from zero In educ wage space the model can be represented as four parallel lines with each in tercept labelled by a combination of gender and race What if our racial data classified each worker as white Black or Asian Then we would run the regression wage g 1educ I gfemale 3Black 4Asianu 4 where the constant term still refers to a white male In this model b3 measures the differ ence between black and white wages ceteris paribus while b4 measures the difference be tween Asian and white wages Each can be examined for significance But how can we determine whether the qualitative factor race affects wages That is a joint test that both 3 O and 4 O and should be conducted as such We should not makejudgments based on the individual dummies coefficients but should rather include both race variables if the null is rejected or remove them both if it is not When we examine a qualitative factor which may give rise to a number of dummy variables they should be treated as a group For instance we might want to modify 3 to consider the effect of state of residence 5 wage 2 30 leduc gfemale Z yjstj u 33921 5 where we include any 5 of the 6 st variables designating the New England states The test that wage levels differ significantly due to state of residence is the joint test that vj 03 1 5 Ajudgment concerning the relevance of state of residence should be made on the basis of this joint test an F test with 5 numerator degrees of freedom Note that if the dependent variable was mea sured in log form the coefficients on dummies would be interpreted as percentage changes if 5 was respecified to place log wage as the de pendent variable the coefficient b1 would mea sure the percentage return to education how many percent does the wage change for each additional year of education while the coeffi cient b2 would measure the approximate per centage difference in wage levels between fe males and males ceteris paribus The state dummies would likewise measure the percent age difference in wage levels between that state and the excluded state number 6 We must be careful when working with vari ables that have an ordinal interpretation and are thus coded in numeric form to treat them as ordinal For instance if we model the in terest rate corporations must pay to borrow corprt as a function of their credit rating we consider that Moody s and Standard and Poor s assign credit ratings somewhat like grades AAA AA A BAA BA B C et cetera Those could be coded as 127 Just as we can agree that an A grade is better than a a triple A bond rating results in a lower bor rowing cost than a double A rating But while GPAs are measured on a clear four point scale the bond ratings are merely ordinal or ordered everyone agrees on the rating scale but the differential between AA borrowers rates and A borrowers rates might be much smaller than that between B and C borrowers rates es pecially the case if C denotes below invest ment grade which will reduce the market for such bonds Thus although we might have a numeric index corresponding to AAAC we should not assume that acorprtamdex is con stant we should not treat index as a cardi nal measure Clearly the appropriate way to proceed is to create dummy variables for each rating class and include all but one of those variables in a regression of corprt on bond rat ing and other relevant factors For instance if we leave out the AAA dummy all of the ratings class dummies coefficients will then measure the degree to which those borrowers bonds bear higher rates than those of AAA borrowers But we could just as well leave out the 0 rating class dummy and measure the effects of rat ings classes relative to the worst credits cost of borrowing Interactions involving dummy variables Just as continuous variables may be interacted in regression equations so can dummy vari ables We might for instance have one set of dummies indicating the gender of respondents female and another set indicating their mar ital status married We could regress lwage on these two dummies lwage 2 0 blfemale b2married a which gives rise to the following classification of mean wages conditional on the two fac tors which is thus a classic two way ANOVA setup male female unmarried b0 0 b1 married 0 2 0 91 92 We assume that the two effects gender and marital status have independent effects on the dependent variable Why Because this joint distribution is modelled as the product of the marginals What is the difference between male and female wages b1 irrespective of marital status What is the difference between un married and married wages b2 irrespective of genden If we were to relax the assumption that gen der and marital status had independent effects on wages we would want to consider their interaction Since there are only two cate gories of each variable we only need one in teraction term fm to capture the possible ef fects As above that term could be generated as a Boolean noting that amp is Stata s AND operator gen fmfemale1 amp married1 or we could generate it algebraically as gen fmfemalemarried In either case it represents the intersection of the sets We then add a term b3fm to the equation which then ap pears as an additive constant in the lower right cell of the table Now if the coefficient on fm is significantly nonzero the effect of being fe male on the wage differs depending on marital status and vice versa Are the interaction ef fects important that is does thejoint distribu tion differ from the product of the marginals That is easily discerned since if that is so b3 will be significantly nonzero Two extensions of this framework come to mind Sticking with two way ANOVA con sidering two factors effects imagine that in stead of marital status we consider race 2 whiteBlaCkAsian TO run the model With out interactions we would include two of these dummies in the regression say Black and Asian the constant term would be the mean wage of a white male the excluded class What if we wanted to include interactions Then we would define chlack and fAsz cm and include those two regressors as well The test for the significance of interactions is now a joint test that these two coefficients are jointly zero A second extension of the interaction concept is far more important what if we want to con sider a regular regression on quantitative vari ables but want to allow for different slopes for different categories of observations Then we create interaction effects between the dum mies that define those categories and the mea sured variables For instance wage b0b1femaleb2educb3 female gtlt educu Here we are in essence estimating two sepa rate regressions in one a regression for males with an intercept of b0 and a slope of b2 and a regression for females with an intercept of b0 b1 and a slope of b2 b3 Why would we want to do this We could clearly estimate the two separate regressions but if we did that we could not conduct any tests eg do males and females have the same intercept The same slope If we use interacted dummies we can run one regression and test all of the special cases of this model which are nested within that the slopes are the same that the intercepts are the same and the pooled case in which we need not distinguish between males and females Since each of these special cases merely involves restrictions on this gen eral form we can run this equation and then just conduct the appropriate tests If we extended this logic to include race as de fined above as an additional factor we would include two of the race dummies say Black and Asian and interact each with educ This would be a model without interactions where the effects of gender and race are considered to be independent but it would allow us to es timate different regression lines for each com bination of gender and race and test for the importance of each factor These interaction methods are often used to test hypotheses about the importance of a qualitative factor for in stance in a sample of companies from which we are estimating their profitability we may want to distinguish between companies in dif ferent industries or companies that underwent a significant merger or companies that were formed within the last decade and evaluate whether their expenditures on RampD or adver tising have the same effects across those cat egories All of the necessary tests involving dummy vari ables and interacted dummy variables may be easily specified and computed since models without interacted dummies or without cer tain dummies in any form are merely restricted forms of more general models in which they appear Thus the standard subset F test ing strategy that we have discussed for the testing of joint hypotheses on the coefficient vector may be readily applied in this context The text describes how a Chow test may be formulated by running the general regression running a restricted form in which certain con straints are imposed and performing a com putation using their sums of squared errors this computation is precisely that done with Stata s test command The advantage of set ting up the problem for the test command is that any number of tests eg above for the importance of gender or for the importance of race may be conducted after estimating a sin gle regression it is not necessary to estimate additional regressions to compute any possible subset F test statistic which is what the Chow test is doing Woodridge Introductory Econometrics 4th ed Chapter 2 The simple regression model Most of this course will be concerned with use of a regression model a structure in which one or more explanatory variables are consid ered to generate an outcome variable or de pendent variableWe begin by considering the simple regression model in which a single ex planatory or independent variable is involved We often speak of this as two variable regres sion or Y on X regression Algebraically yt5031 u 1 is the relationship presumed to hold in the pop ulation for each observation 139 The values of y are expected to lie on a straight line depending on the corresponding values of 13 Their values will differ from those predicted by that line by the amount of the error term or disturbance u which expresses the net effect of all factors other than 1 on the outcome y that is it re flects the assumption of ceteris paribus We often speak of 1 as the regressor in this rela tionship less commonly we speak of y as the regressand The coefficients of the relation ship 30 and l are the regression parameters to be estimated from a sample They are pre sumed constant in the population so that the effect of a one unit change in 1 on y is assumed constant for all values of 13 As long as we include an intercept in the rela tionship we can always assume that E u 0 since a nonzero mean for u could be absorbed by the intercept term The crucial assumption in this regression model involves the relationship between 1 and u We consider 1 a random variable as is u and con cern ourselves with the conditional distribution of u given 13 If that distribution is equivalent to the unconditional distribution of u then we can conclude that there is no relationship between 13 and u which as we will see makes the es timation problem much more straightforward To state this formally we assume that EuxEuO 2 or that the u process has a zero conditional mean This assumption states that the unob served factors involved in the regression func tion are not related in any systematic manner to the observed factors For instance con sider a regression of individuals hourly wage on the number of years of education they have completed There are of course many factors influencing the hourly wage earned beyond the number of years of formal schooling In work ing with this regression function we are as suming that the unobserved factors excluded from the regression we estimate and thus rel egated to the u term are not systematically related to years of formal schooling This may not be a tenable assumption we might con sider innate ability as such a factor and it is probably related to success in both the edu cational process and the workplace Thus in nate ability which we cannot measure without some proxies may be positively correlated to the education variable which would invalidate assumption 2 The population regression function given the zero conditional mean assumption is E y l w 30 51er 3 This allows us to separate y into two parts the systematic part related to 13 and the un systematic part which is related to u As long as assumption 2 holds those two compo nents are independent in the statistical sense Let us now derive the least squares estimates of the regression parameters Let 1341 z 1 n denote a random sam ple of size n from the population where y and x are presumed to obey the relation 1 The assumption 2 allows us to state that Eu O and given that assumption that Covau 0 where 0022C denotes the covariance between the random variables These assumptions can be written in terms of the regression error E yr 50 51 O 4 E M yr 50 51 0 These two equations place two restrictions on the joint probability distribution of 1 and u Since there are two unknown parameters to be estimated we might look upon these equations to provide solutions for those two parameters We choose estimators b0 and b1 to solve the sample counterparts of these equations mak ing use of the principle of the method of mo ments n 71 1 2 yr b0 191 0 5 i1 n 71 1 2 331 yr b0 191 0 i1 the so called normal equations of least squares Why is this method said to be least squares Because as we shall see it is equivalent to min imizing the sum of squares of the regression residuals How do we arrive at the solution The first normal equation can be seen to be 190 Q 91 6 where y and i are the sample averages of those variables This implies that the regression line passes through the point of means of the sam ple data Substituting this solution into the second normal equation we now have one equa tion in one unknown b1 0 Z 331 yr Q 191 191 i1 2 MG Q b1 2 MWi i 21 21 22 21 xi ixyi g b1 2 21 513 Covay b1 Vara 7 where the slope estimate is merely the ratio of the sample covariance of the two variables to the variance of 13 which must be nonzero for the estimates to be computed This merely implies that not all of the sample values of 1 can take on the same value There must be diversity in the observed values of 13 These estimates b0 and bl are said to be the ordi nary Ieast squares OLS estimates of the regression parameters since they can be de rived by solving the least squares problem 7 7 mins Z 612 2 yr bo blz 2 8 50751 i1 i1 Here we minimize the sum of squared residu als or differences between the regression line and the values of y by choosing b0 and b1 If we take the derivatives 888b0 and BSdbl and set the resulting first order conditions to zero the two equations that result are exactly the OLS solutions for the estimated parame ters shown above The least squares esti mates minimize the sum of squared residuals in the sense that any other line drawn through the scatter of xy points would yield a larger sum of squared residuals The OLS estimates provide the unique solution to this problem and can always be computed if i Var13 gt O and ii n 2 2 The estimated OLS regression line is then Y 50 191 9 where the hat denotes the predicted value of y corresponding to that value of 13 This is the sample regression function SRF cor responding to the population regression func tion or PRF 3 The population regression function is fixed but unknown in the popu lation the SRF is a function of the particular sample that we have used to derive it and a different SRF will be forthcoming from a differ ent sample The primary interest in these es timates usually involves bl By9x AyAx the amount by which y is predicted to change from a unit change in the level of 13 This slope is often of economic interest whereas the con stant term in many regressions is devoid of economic meaning For instance a regres sion of major companies CEO salaries on the firms return on equity a measure of economic performance yields the regression estimates 8 963191 185017 10 where S is the CEO s annual salary in thou sands of 1990 dollars and r is average re turn on equity over the prior three years in per cent This implies that a one percent in crease in ROE over the past three years is worth 18501 to a CEO on average The average annual salary for the 209 CEOs in the sample is 128 million so the increment is about 14 of that average salary The SRF can also be used to predict what a CEO will earn for any level of ROE points on the esti mated regression function are such predictions Mechanics Of OLS Some algebraic properties of the OLS regres sion line 1 The sum and average of the OLS resid uals is zero i1 which follows from the first normal equation which specifies that the estimated regression line goes through the point of means my so that the mean residual must be zero 2 By construction the sample covariance be tween the OLS residuals and the regressor is zero n Covex Z xiei O 12 1 21 This is not an assumption but follows directly from the second normal equation The esti mated coefficients which give rise to the resid uals are chosen to make it so 3 Each value of the dependent variable may be written in terms of its prediction and its error or regression residual yr Y 62 so that OLS decomposes each y into two parts a fitted value and a residual Property 3 also implies that Coveg 0 since g is a linear transformation of 13 and linear transformations have linear effects on covariances Thus the fitted values and residuals are uncorrelated in the sample Taking this property and applying it to the entire sample we define n SST Edi y i1 n SSE 20 h i1 n SSR 2632 S H H as the Total sum of squares Explained sum of squares and Residual sum of squares re spectively Note that SST expresses the total variation in y around its mean and we do not strive to explain its mean only how it varies about its mean The second quantity SSE expresses the variation of the predicted values ofy around the mean value ofy and it is trivial to show that g has the same mean as y The third quantity SSR is the same as the least squares criterion S from 8 Note that some textbooks interchange the definitions of SSE and SSR since both explained and error start with E and regression and residual start with R Given these sums of squares we can generalize the decomposition mentioned above into SST SSE SSR 13 or the total variation in y may be divided into that explained and that unexplained ie left in the residual category To prove the validity of 13 note that 3 m d wo mf ixm mZ i1 S H H p wa m 121 26224 2 ZeiAig 11 i1 21 m2 i1 SST SSR I SSE given that the middle term in this expression is equal to zero But this term is the sample covariance of e and y given a zero mean for e and by 12 we have established that this is zero How good a job does this SRF do Does the regression function explain a great deal of the variation of y or not very much That can now be answered by making use of these sums of squares SSE 1 SSR SST SST The R2 measure sometimes termed the coef ficient of determination expresses the percent of variation in y around its mean explained by the regression function It is an r or simple correlation coefficient squared in this case of simple regression on a single 1 variable Since the correlation between two variables ranges between 1 and 1 the squared correlation ranges between 0 and 1 In that sense R2 is like a batting average In the case where R2 O the model we have built fails to ex plain any of the variation in the y values around their mean unlikely but it is certainly possible to have a very low value of R2 In the case R2 Txy 2 where R2 1 all of the points lie on the SRF That is unlikely when n gt 2 but it may be the case that all points lie close to the line in which case R2 will approach 1 We can not make any statistical judgment based di rectly on R2 or even say that a model with a higher R2 and the same dependent variable is necessarily a better model but other things equal a higher R2 will be forthcoming from a model that captures more of 35 behavior In cross sectional analyses where we are trying to understand the idiosyncracies of individual behavior very low R2 values are common and do not necessarily denote a failure to build a useful model Important issues in evaluating applied work how do the quantities we have estimated change when the units of measurement are changed In the estimated model of CEO salaries since the y variable was measured in thousands of dollars the intercept and slope coefficient refer to those units as well If we measured salaries in dollars the intercept and slope would be multiplied by 1000 but nothing would change The correlation between y and 1 is not af fected by linear transformations so we would not alter the R2 of this equation by changing its units of measurement Likewise if ROE was measured in decimals rather than per cent it would merely change the units of measure ment of the slope coefficient Dividing r by 100 would cause the slope to be multiplied by 100 In the original 10 with r in percent the slope is 18501 thousands of dollars per one unit change in r If we expressed r in decimal form the slope would be 18501 A change in r from 010 to 011 a one per cent increase in ROE would be associated with a change in salary of 0011850118501 thousand dollars Again the correlation between salary and ROE would not be altered This also ap plies for a transformation such as F 32C it would not matter whether we viewed tem perature in degrees F or degrees C as a causal factor in estimating the demand for heating oil since the correlation between the dependent variable and temperature would be unchanged by switching from Fahrenheit to Celsius de grees Functional form Simple linear regression would seem to be a workable tool if we have a presumed linear re lationship between y and 13 but what if theory suggests that the relation should be nonlinear It turns out that the linearity of regression refers to y being expressed as a linear func tion of x but neither y nor 1 need be the raw data of our analysis For instance regressing y on t a time trend would allow us to analyse a linear trend or constant growth in the data What if we expect the data to exhibit expo nential growth as would population or sums earning compound interest If the underlying model is y Aexprt 14 logy logArt y Art 15 so that the single log transformation may be used to express a constant growth relation ship in which 7 is the regression slope coef ficient that directly estimates alog yat Like wise the double log transformation can be used to express a constant elasticity relation ship such as that of a Cobb Douglas function y A213 16 logy logAalogx In this context the slope coefficient or is an estimate of the elasticity of y with respect to 13 given that 77W 2 alog yalog 1 by the defini tion of elasticity The original equation is non linear but the transformed equation is a linear function which may be estimated by OLS re gression Likewise a model in which y is thought to de pend on 151 the reciprocal model may be estimated by linear regression by just defin ing a new variable z equal to 151 presuming 1 gt 0 That model has an interesting inter pretation if you work out its algebra We often use a polynomial form to allow for nonlinearities in a regression relationship For instance rather than including only 1 as a re gressor we may include 1 and x2 Although this relationship is linear in the parameters it implies that 3 B 2713 so that the effect of 1 on Y now depends on the level of 1 for that observation rather than being a constant factor Properties Of OLS estimators Now let us consider the properties of the re gression estimators we have derived consider ing b0 and b1 as estimators of their respective population quantities To establish the unbi asedness of these estimators we must make several assumptions Proposition 1 SLRl in the population the dependent variable y is related to the indepen dent variable 1 and the error u as y5031xu 17 Proposition 2 SLR2 we can estimate the pop ulation parameters from a sample of size n 33 Z 17 quot397 Proposition 3 SLR3 the error process has a zero conditional mean E u 13 O 18 Proposition 4 SLR4 the independent vari able 1 has a positive variance n 1 1 En 5131 a gt o 19 2 21 Given these four assumptions we may pro ceed considering the intercept and slope esti mators as random variables For the slope es timator we may express the estimator in terms of population coefficients and errors 2amp1 332 i yr i 2amp1 332 i yr 21 i2 5 b1 20 where we have defined 5 as the total variation in 13 not the variance of 13 Substituting we can write the slope estimator as n b1 2 211 33 yz 2amp1 332 i 50 51 W 8 50 2721 5E 51 Z1 531394 Z1 5 239 8 21 We can show that the first term in the nu merator is algebraically zero given that the deviations around the mean sum to zero The second term can be written as 21 xi if 5 so that the second term is merely 31 when divided by 5 Thus this expression can be rewrit ten as l n 91514 3 xi 33uz showing that any randomness in the estimates of b1 is derived from the errors in the sample weighted by the deviations of their respective 1 values Given the assumed independence of the distributions of 1 and u implied by 18 this expression implies that E 191 51 or that b1 is an unbiased estimate of l given the propositions above The four propositions listed above are all crucial for this result but the key assumption is the independence of 1 and u We are also concerned about the precision of the OLS estimators To derive an estimator of the precision we must add an assumption on the distribution of the error u Proposition 5 SLR5 homoskedasticity Var u 13 Varu 02 This assumption states that the variance of the error term is constant over the population and thus within the sample Given 18 the con ditional variance is also the unconditional vari ance The errors are considered drawn from a fixed distribution with a mean of zero and a constant variance of 02 If this assumption is vi olated we have the condition of heteroskedas ticity which will often involve the magnitude of the error variance relating to the magnitude of 13 or to some other measurable factor Given this additional assumption but no fur ther assumptions on the nature of the distri bution of u we may demonstrate that a2 a2 Var b1 1 m2 8 22 so that the precision of our estimate of the slope is dependent upon the overall error vari ance and is inversely related to the variation in the 1 variable The magnitude of 1 does not matter but its variability in the sample does matter If we are conducting a controlled ex periment quite unlikely in economic analysis we would want to choose widely spread values of 1 to generate the most precise estimate of ByBx We can likewise prove that b0 is an unbiased es timator of the population intercept with sam pling variance 022E213 022E213 2amp1 er E2 5 23 so that the precision of the intercept depends as well upon the sample size and the mag nitude of the 1 values These formulas for the sampling variances will be invalid in the presence of heteroskedasticity that is when proposition SLR5 is violated Var b0 2 71 1 These formulas are not operational since they include the unknown parameter 02 To calcu late estimates of the variances we must first replace 02 with a consistent estimate 52 de rives from the least squares residuals eizyi bO blxi i 1n 24 We cannot observe the error u for a given ob servation but we can generate a consistent es timate of the ith observation s error with the ith observation s least squares residual 11 Like wise a sample quantity corresponding to the population variance 02 can be derived from the residuals 1 quot SSR 2 2 e 25 vi 2lr n 2 where the numerator is just the least squares criterion SSR divided by the appropriate de grees of freedom Here two degrees of free dom are lost since each residual is calculated by replacing two population coefficients with their sample counterparts This now makes it possible to generate the estimated variances and more usefully the estimated standard error of the regression slope where s is the standard deviation or standard error of the disturbance process that is s 2 and 5x is 5 It is this estimated standard error that will be displayed on the computer printout when you run a regression and used to construct confidence intervals and hypoth esis tests about the slope coefficient We can calculate the estimated standard error of the intercept term by the same means Regression through the origin We could also consider a special case of the model above where we impose a constraint that g 2 0 so that y is taken to be propor tional to 13 This will often be inappropriate it is generally more sensible to let the data calcu late the appropriate intercept term and rees timate the model subject to that constraint only if that is a reasonable course of action Otherwise the resulting estimate of the slope coefficient will be biased Unless theory sug gests that a strictly proportional relationship is appropriate the intercept should be included in the model