Econometric Analysis ECON 3161
Popular in Course
Popular in Economcs
This 0 page Class Notes was uploaded by Wilfrid Schuster on Monday November 2, 2015. The Class Notes belongs to ECON 3161 at Georgia Institute of Technology - Main Campus taught by Levent Kutlu in Fall. Since its upload, it has received 7 views. For similar materials see /class/234297/econ-3161-georgia-institute-of-technology-main-campus in Economcs at Georgia Institute of Technology - Main Campus.
Reviews for Econometric Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 11/02/15
1 Chapter 1 1 1 Econometrics Econometrics is an area of economics that utilizes the statistical meth ods for estimating economic relationships testing economic theories and evaluating government or business policies The biggest obstacle for econometrics is that it has to deal with non experimental data Hence unlike natural sciences in econometrics we have to use the data what the nature gives us Hence although most of the time econometricians borrowed mathematical tools from statistics econometricians have de veloped some additional tools in order to address these area speci c problems So how does an econometrician proceed The rst step is constructing an economic model which consists of equations representing various re lationships For example Becker 1968 examines the crime behavior In his model Becker assumes that the individuals choose the time they want to spend on the criminal activities The individuals know that with some probability they might get caught When an individual gets caught heshe might be sentenced Hence the individual has to decide whether to get involved in a criminal activity or not The decisions of the individual is determined by his her utility function Then one can model the time spent on crime via the following relationship 9 f9017273747579067907 1 where y hours spent in criminal activities 901 wage for an hour spent in criminal activity 902 hourly wage in legal activity 903 income other than from crime or employment 90 probability of getting caught x6 expected sentence if convicted and x7 age Of course other factors might affect the behavior of an individual These factors might be missing from the analysis due to data limitations or for some other reasons Many times these missing factors can be dealt with by incorporating an econometric error term to the model This error term captures the effect of the missing factors on the dependent variable y Hence the econometric version of the above model would be 9 f901790279037904790579067907 5 2 where 6 represents a random statistical error term Note that we were not speci c about the f function This function depends on the unknown utility function of individuals The econome trician might speci cally derive the f function by making some assump tions about the utility function or the econometrician might be agnostic about how f is derived and directly choose a functional form for f By far the most common functional form is the linear functional form For example we can model the hours spent in criminal activities by the following relationship 9 50 f 51901 52902 f 53903 f 54904 55905 56906 57907 t 5 3 Although most of the time this linearity assumption does not hold ex actly we can consider this model as a rst degree approximation to the true functional form With this model we might want to test whether one of the factors really effect y For example we can test whether the age 907 has a signi cant e ect on y or not Generally we use 95 signi cance levels for this purpose and test whether 57 0 or not 12 The Structure of Economic Data Cross Sectional Data consists of a sample of individuals households rms cities states countries or a variety of other units at a given point in time Sometimes the data might have observations from different time periods For econometric analysis it is essential to assume that the data is obtained by random sampling from underlying population so that it represents the population Sometimes it is not easy to obtain a random sample Some examples are o If we want to collect data about the wealth of randomly selected people the richer people might reject to announce their wealth This is an example of sample selection which is an advanced topic o If the population is not large enough so that the samples from the population are related to each other Hence they are not independent For example many times neighbor countries similar in many aspects Example 11 The table below is an example WAGE1RAW data set from Wooldridge 4e of a cross sectional data set obsno wage educ exper female married 1 310 11 2 1 0 2 324 12 22 1 1 3 300 11 2 0 0 525 1156 16 5 0 1 526 350 14 5 1 0 The de nitions of the variables are given as follows obsno observation number wage wage in dollars per hour educ years of education exper years of potential labor force experience female an indicator for gen der and married an indicator for marital status female and married variables are called dummy variables They are either 0 or 1 depending on whether a condition holds or not For example if the individual is a female then the female variable takes the value 1 and otherwise it takes the value 0 Time Series Data A time series data set consists of observations of a variable or several variables over time Since the past events in uence the future events this relationship should be taken into account when analyzing a time series data The time order of the data is an important factor when we are examining a time series data The dependency of time series data makes the analysis of the time series data more di icult than the cross sectional data and requires new tools Also note that the analysis depends on the frequency of the data This is due to the fact that as the frequency of the data increases the dependence between two consequent time period increases Example 12 Below you can see two ctitious time series data sets The rst data set is annual and the second one is quarterly The variables are obsno observation number avgmin average minimum wage rate unemp unemployment rate gdp gross domestic product obsno year avgmin unemp gdp 1 1960 20 154 8787 2 1961 21 160 9250 3 1962 23 148 10159 37 1996 335 189 42816 38 1999 335 168 44967 obsno quarter avgmin unemp gdp 1 19601 20 153 8780 2 1960 11 205 159 9251 3 1960111 208 147 10101 63 1975111 235 168 35756 64 19751V 236 169 35867 Pooled Cross Sectional Data Sometimes the data at hand might have both features of cross section and time series For example we might have two cross sectional data from two different years say before a deregulation and after the deregulation In that case we might pool the data as if it were a cross sectional data keeping in mind that these data set is a combination of two data sets This type of data set can be good for examining the effect of deregulations for example Example 13 The table below is a ctitious example of a pooled cross sectional data set We have two new variables in this table year the year of observation and dereg a dummy variable for deregulation By using a pooled cross sectional data we not only doubled the size of the data set but also had the chance to analyze the effects of deregulation on the wage rate Note that the cross sectional units for 1990 need not be the same as that of 1992 obsno year wage educ exper female married dereg 1 1990 310 11 2 1 0 0 2 1990 324 12 22 1 1 0 3 1990 300 11 2 0 0 0 525 1990 1156 16 5 0 1 0 526 1990 350 14 5 1 0 0 527 1992 314 13 4 0 0 1 528 1992 337 12 20 1 1 1 1051 1992 332 11 6 1 0 1 1052 1992 338 10 5 0 1 1 Panel or Longitudinal Data A panel data is a mixture of time series and cross sectional data The difference between pooled cross sectional data set is that the analysis of this type of data set has more time periods 4 and requires attention for the time series aspect of the data Consider a data set that includes information on the cost of airlines over time In this data set we not only have many airlines as cross sections but also many time periods Moreover for the panel data set we need to use the same cross sectional units in our analysis Hence the names of the units are important here For pooled cross sectional data set we did not store the name of the cross sectional units as they were not necessary that does not mean that it will not be necessary We might still want to store and use this information Of course if the researcher believe that it is OK to pool a panel data heshe can do it Example 14 The table below is an example of a panel data set The variables in this airline a number corresponding to each of the airlines quarter the quarter of observation cost the total cost of the airline Pl labor price Pf fuel price and stage average stage length Note that not every airline need to have the same number of observations over time Whenever each cross sectional unit has the same number of observations over time we say that our panel data is balanced otherwise the panel data is said unbalanced obsno airline quarter cost Pl Pf stage 1991 1 201 11 31 200 2 1 1991 ll 202 21 32 205 3 1 1991 lH 200 12 35 221 36 1 1999 lV 215 14 39 231 37 2 1991 l 235 11 45 251 38 2 1991 ll 231 18 50 261 72 2 73 3 1991 l 233 10 38 240 720 10 19991V 252 19 42 242 13 Examples and Issues in Econometrics Example 15 The non experimental nature of econometric data at least most of the times might be an obstacle for the econometrician For example assume that the econometrician wants to learn the effect of an additional year of education Ethically it is not possible to randomly pick individuals and assign them some desired level of education then observe their salaries This kind of experiments are much more easier to design in natural sciences Hence the econometrician has to work with whatever data the nature provides him her The biggest obstacle for the econometrician is that it is the individuals that choose their education levels Hence probably more talented people will choose to get more education as it is less costly for them and thus these individuals will earn more How do we know whether they earn more because they are more talented or they got more education More precisely how do we know the net effect of education Probably the net positive effect of education is less than what we might imagine if we ignore this self selectivity issue Moreover the econometrician cannot observe or know all the factors determining the salary The good news is that if the omitted factors are independent with the observed factors we can still use the econometric techniques to get useful results Hence probably in the analysis of salary we should use a measure of talent like 1Q so as to get more sensible results Example 16 Assume an econometrician investigates the relationship between the crime rates and number of police of cers In his study he observes that most of the time those cities with a lot of police o icers tend to have higher crime rates He gets excited about this surprising re sult But how is it possible that higher number of police of cers increase the crime rate The simple answer is that these two variables are si multaneously determined That is higher crime rate probably increases the number of police o icers and increase in the number of police o i cers probably decreases the crime rate Hence any econometric model ignoring this simultaneity risks becoming irrelevant Example 17 Consider a daily times data for the SP500 De nitely yes terday s observation is not independent with today s observation Hence we cannot claim that the relevant error terms for these observations are independent To address this kind of issues time series econometricians developed some methods Example 18 Consider a panel data for costs of US airlines It is not very sensible to assume that all these airlines share similar structures Given similar inputs some of them will be producing more less This might be due to many reasons such as unobserved inef ciency levels or unmeasured structural differences Using a pooled cross sectional model might not be our best interest for this kind of problems Moreover we might want to learn more about rm speci c differences Essentially panel data methods are designed speci cally to address these issues 2 Chapter 2 21 Simple Linear Regression Model As we mentioned earlier by far the most widely used functional form in econometrics is the linear functional form In this section we will con sider the simplest version of this model We assume that there are only three variables in our model The dependent variable or the explained variable or the response variable or the predicted variable or the re gressand y the independent variable or the explanatory variable or the control variable or the predictor variable or the regressor x and the error term the disturbance term u The simple linear regression model or bivariate linear regression model assumes the following rela tionship 95061u 4 The variable x is assumed to be observed by the econometrician but u is not observed The error term is assumed to be a random variable which captures all the factors other than 90 that affect y The parameters 60 and 51 are assumed to be the population parameters that represent the true linear relationship between 90 and y If u is independent with x then a change in 90 holding other factors xed leads to Ay51A AU 5 le or Byi 751 Example 21 Assume that we want to model an individual s wage by a simple linear regression model A possible model would be wage 60 leducu 6 where wage is measured dollars per hour and educ is years of education The error term u include labor force experience work ethic innate ability and many other factors that might e ect wage As we remarked earlier educ might not be independent with M For now if we assume that it is independent with u we can argue that one unit increase in educ leads to a 51 increase in wage If the econometrician is not happy about this property he can use other functional forms depending on the analysis Until now although we mentioned that u is a random error term we did not make any statement about the distributional properties of M Our rst assumption about u is about its conditional expectation We as sume that the expected value of u conditional on x is zero ie 0 Remember that the density function of u condi tional on x is de ned as Note that is a func tion of x and by our assumption that 0 we have 0 E0 ffufulxfacdudac ffufuxdudx f u fuacdx du f ufudu Hence we assume that this function is always equal to zero regardless of the outcome 90 which im plies that E 0 For example assume that u is the same as innate ability Then E abil leduc 8 denotes the expected ability for a person with 8 years of education and Eabilleduc 16 shows the expected ability for a person with 16 years of education Hence our assumption that 0 implies that regardless of the education level the ex pected ability will be equal to zero Here remember our remark that on average individuals with higher ability levels might be getting more education because it is less costly for them to get more education If we believe that this is the case then conditional expectation of ability would depend on the education level Hence our assumption would be violated A simple way to understand the conditional expectation is given as fol lows One can consider at in the conditional expectation as a non random variable More precisely at is the observed value of a random variable X implying that we know 90 Hence anything for a usual expectation works for a conditional expectation if we consider at as a non random variable For example Ewe 61 60 6190 60 6190 where the last equality follows because we assume that E 0 Here we abused our notation EYlX 90 would be a more precise notation We are taking the expectation of a random variable Y conditional on X 90 From Z EYlX x it is trivial that Z is not a random variable and is a function the observed value of X ie x in contrast is a random variable and we can talk about its expectation 8 EEYlX In what follows we will be making this abuse of notation repeatedly ie we will use small letters for both the random variable and its observed sample value unless this leads to a confusion Finally as you can see for our simple linear regression model E is a linear function of x 22 Derivation of OLS Estimates Let 35 90L1 be a random sample from a population of size n Assume that there is a linear relationship between y and x which is described by the following equation yd 50 f 51 W 7 where M denotes the error term that includes all the factors affecting y other than 90 The parameters 60 and 51 are unknown to the econometrician and the econometrician is interested in estimating these two parameters As we mentioned in the preceding section we do not want 90 and u to be correlated Hence we assume that COUxu 0 Since Cot9010 7 and 0 we have 0 Moreover the expected value of the error term is assumed to be equal to zero The two assumptions sometimes called moment conditions will consti tute our estimator Remember that the expectations are related to the population whereas the averages are related to the sample Since in real life all we have is the sample observations of the random variables for estimation purposes we use the sample counterpart of the population moment conditions Hence the estimators of 50 and 51 are given by the solution of the following equation system 1 quot A 1 quot A A 511 W 60 61139139 0 8 l quot v vlnxv mg 73900 9 n l l n lyl O 1 1 11 i1 From Equation 8 we have 30 g 7 319 where g and 9 0 denotes aver ages AAfter plugging this solution into Equation 9 we get 2 1 9043 7 50 5195i 21 95102 7 g 7 519 5195i 039 Hencev 21 i 3 i 31 i 93 0 or 22990491 3 31219013900139 50 There fore 2100 7 7 3 6121x 7 mg implying that 51 W given that 2100 7 2V 0 Note that 31 is noth ing but the ratio of the sample covariance of x and y to the sample variance So shortly Bl This result is not surprising as we have 51 for population if COUxu 0 here we abuse the notation by using 9034 rather than X To see this note that y 5061xu Hence Cauyac Cov6061u Cov61xu 61001100 mac 6100Ux90 Covux ll1700 as wanted Another way to derive the same estimates is minimizing the sum of squared errors Hence the solution of the following problem gives the OLS estimates n n mi n i 50 519002 10 0 1 11 i1 The optimal values for 50 and 51 are denoted by BO and 81 These values are exactly equal to the values that we derived from the moment conditions From these predicted values we can also predict u by 11 y 7 BO 7 3190 Similarly we can predict y for given 90 values This is done to make out of sample inferences for y The prediction of y is given by g 30 3190 This equation gives the OLS regression line The population counterpart of this is called population regression line and is given by 60 6190 Example 22 In this example we try to explain the annual salary of a CEO by average return on equity roe for the CEO s rm for the last three years and the constant term If we assume linearity the relationship is given by salary 60 61706 u 11 The Stata code and output for the data set CEOSALl is given below cioses aii the graphs be careFu1 about the underscore graph drop aii capture drop hat mroe w su narizes statistics of given variapies in our case saiary and roe sun saiary roe Variab39ie Obs Mean std Dev Min Max sa1ary 209 128112 1372345 223 14822 roe 209 1718421 8518509 5 563 regresses saiary on roe and constant you can use regress instead of reg reg saiary roe source 55 1 MS Number of obs E9 F 1 207 277 Mode1 516641904 1 516641904 Prob gt F 0978 Residua1 386566563 207 186747132 Risquared 0132 Adj Resguared 60084 Tota1 391732982 208 188333164 Root MSE 13666 sa1ary CoeF std Err t Pgtiti 95 ConF Intervai r 1850119 1112325 166 0098 73428196 4043057 icons 9631913 2132403 452 0000 5427902 1383592 w generate riapie consisting of predictions of saiary evaiuated at observed roe vaiues predict saiaryhat xb generates a variapie consisting of predictions of u evaiuated at observed roe vaiues predict uhat resid w generates a variapie eguai to the mean of roe For aii observations e e nroe nean 0e r generates prediction of saiary evaiuated at the mean of roe note that this is nothing but the mean of saiary gen nsaiaryhat p cons broenroe regresses saiary oniy on roe you can use noconstant instead of noconst reg sa1ary roe noconst Source 55 df MS Number of obs 309 F 1 208 315188 Mode1 310089943 1 310089943 Prob gt F 00000 Residua1 424667941 208 204167279 Risquared 04220 Adj Resguared 04193 Tota1 734757884 209 351558796 Root MSE 14289 sa1ary CoeF std Err t Pgtiti 95 Conf Intervai roe 6353796 5155639 1232 0000 5337395 7370196 w piots scatter diagran For saiary and roe graph twoway scatter saiary roe nanepiot1 it you wa t to have more piots just use parenthesis as in peiow ifit piots the Fitted vaiues For y or sanpie regression graph twoway 1Fit saiary roe scatter saiary roe nanepiot2 1Fi39tci39 adds Confidence interva1s to the WFi39t Mots graph twoway ifitci saiary roe scatter saiary roe namepiot3 wanna 199a salary thousands 3 wanna 5nnn wanna wanna 2a m ranm an Emmy aaraa avg tted values 199a salarymunsannsi wanna wanna 5nnn 2a m ranm an Emmy aaraa avg 95 cw Fmea values o 199a salarymunsanas 3 Warning When you use the predict command the Stata uses the last regression Be careful about this Now we ll talk about the estimates and tables We have two regression results One with the constant term and other without the constant term These models propose very different coe icients for roe This suggests us that we have to be careful about omitting the constant term That is why most of the time the econometrician includes the constant term 12 in their analysis unless the economic theory tells the otherwise From this point on we will talk about the model with the constant term As it can be seen the table shows the coe icients standard errors t statistics p values and 95 con dence intervals Assume that the econometrician wants to be 95 con dent that roe affects salary Then heshe can simply look at con dence intervals announced in the table 1f the interval does not include 0 then we can conclude with 95 con dence that roe affects salary Note that the con dence intervals depend on the sample estimator and model that we are using In general the more variables we include in our models the larger the con dence intervals gets Hence this is the pay off for including more variables P value gives another valuable information for deciding whether a variable is signi cant ie it effects the dependent variable or not The smaller the p value is the more we are tempted to reject the null hypothesis that the variable is not signi cant In another words small p value implies that the variable has a signi cant coef cient Generally we use 5 signi cance levels ie 95 con dence and any p value smaller than this would mean that the coe icient of the variable is signi cant In our case the p value of the constant term is 0000 Hence in any conventional signi cance levels ie 1 5 or 10 the constant term is signi cant This proposes that we should not omit the constant term In any case even if the constant term is not signi cant it is safer to not to omit it from our model unless the economic theory states otherwise By looking at the p value we can conclude that the coe icient of roe is not signi cant for 5 level but it is signi cant for 10 level Another simple way to understand whether a variable has a signi cant coe icient or not is looking at the t value For 5 signi cance level any t value larger than 196 many times we simply use 2 would lead to rejection of the null hypothesis that a regressor x has no affect on the regressand y Hence this table provides a variety ways of testing the signi cance of a parameter estimate Later we will talk more about this table 23 Properties of OLS Theorem 23 Let g BO 1 3190 Thus y g a 110 12 i1 b A A g 60612 14 c i 23 739 15 Note that from the moment condition derivation of OLS estimates a trivially follows Of course from the residual square minimization in terpretation again we can derive these conditions From a b can be proved by replacing 11 with yii 30 and taking the average of this expression Finally for c it is enough to see that y 11 and De nition 24 We de ne the total sum of squares SST the explained sum of squares SSE and residual sum of squares SSR as follows SST 20 7 y2 16 SSE 21 7 22F 17 SSE Z a 18 Theorem 25 The relationship between SST SSE and SSH is given by SST SSE SSR 19 Now we prove the property 2 372nyi 39139 39i 332 20 Why 2 2 1 7 3 07 Here is the proof 3 3 3 Zuiyi y Mii 3 V 11139 21 De nition 26 R squared or coe icient of determination is de ned as SSE SSR 327177 SST SST 22 R2 is the ratio of explained variation compared to total variation Note that 0 S R2 S l and it is equal to square of sample correlation coef cient between y and The interpretation of R2 is as follows the independent variable x explains 100R2 of the variation in the depen dent variable y Hence the higher R2 is the better a model is But a caution is needed when comparing two models with di erent depen dent variables Whenever two model at hand has different dependent variables R2 cannot be use to compare these two models in terms of goodness of t 24 Units of Measurement and Functional Form In our econometric example we examined the relationship between CEO salary salary and return on equity roe It is important to note that in our analysis the units of measurement was important We measured the salary in annual thousands of dollars and reo was measured in per centage changes If we change the units of measurement then the OLS 15 estimates change as well Let salarydol be another variable that gives the annual salary of a CEO in terms of dollars rather than thousand dollars Then salarydol 10005ala7 y would give us the relationship between the salary and salarydol variables Assume that we used salarydol in stead of salary Then since salary 60 617 06 u we can see that salarydol 10005ala7 y 100050 1000617 oe 1000u b0 b17 oe L where 0 100030 b1 100061 and U 1000u Hence the true values of the parameters are multiplied by 1000 Similar to the true values the estimates for these values are multiplied by 1000 as well This trivially follows from the OLS parameter estimate values that we derived from the moment conditions Similarly if we use 7 06d60 7 06 100 as our regressor then the coe icient estimate of 7 06d60 would be 131 10031 Finally by using the formula for goodness of t R2 one can check that R2 does not depend on the unit of measurement This is essentially be cause of the fact that R2 is calculated as a ratio of two variation measures and the units cancel out during the division Another question about the regression analysis is that How good is the linearity assumption Isn t it too restrictive The simple answer to these questions is that it depends Many times a simple linear model is ne for estimation purposes Note that by linearity we do not mean the linearity of the functional form of the dependent variable y We rather mean the linearity of the parameters Fortunately y 50 6190 u my 60 611 ua my 60 51ua y 6061lnua y 50 61902 u are all linear models for estimation purposes For example letting g lny and a lnx allows us to rewrite the equation lny 50 llnx u as g 50 619 u Hence as soon as we can transform y and x in such a way that we get a linear equation of the transformed variables we are ne An example to a non linear regression model is y m u Consider the seemingly non linear model y exp6O A careful reader would notice that taking the logarithm of both sides of the regression leads us to a linear equation ie lny 6061x In order to get this model in linear form we need to assume a multiplicative error term rather than an additive error term ie y exp3061x expu In this example we also see that we do not need to assume an additive error term for seemingly non linear models The vital point is to get an additive error term after the linearization As it can be seen the linearized version of y exp6O 6190 expu model is y 50 6190 M which has an additive error term Whenever we cannot make any obvious transformation to y and x to linearize the model While we can still estimate these non linear models this is called NLS Non linear Least Squares they are more complicated Some of these functional forms have particular importance For example 16 if we model the relationship between y and x as follows lny 50 61 lnx u then we can estimate the elasticity of y with respect to x 61 by 31 Similarly lny 60 61 lnx a model allows us to estimate the percentage change in y as a response to a unit change in 90 Example 27 We continue our CEO salary example here This time rather than using the linlin model we use loglog model Remember that linlin stands for linear y linear x and loglog stands for log y and log This time rather than using roe as an explanatory variable let s use sales annual sales in millions of dollars as our explanatory variable We de ne our variables in the model as follows lsalary ln salary and lsales lnsales The Stata output for lsalary 60 llsales a model is given below reg lsalary lsales Number of obs E9 F 1 207 r Source 5530 w 00000 Risquared O2108 Adj Risquared 32070 Root MSE 50435 140661688 52 6559944 Model 140661688 Residual 254376785 Total 667221632 320779631 lsalary CoeF Std Err t Pgtltl 95 ConF Interval 2566717 0345167 744 0000 1886224 3247209 lsales icons 4821997 2883396 1672 0000 4253538 5390455 In contrast to our earlier model in this model p values for both variables are very low indicating that the coe icient estimates are signi cant at any conventional signi cance levels Of course this implies that 0 does not lie in the con dence intervals and t value is larger than 2 Remember that these are conditions for 5 signi cance test In the table we see R2 value as well as adjusted R2 value We will introduce the adjusted R2 later R2 21 which is a relatively low number This means that the variation in the explanatory variable ln sales explains 21 of the variation in the dependent variable lnsalary Caution not salary 1n social sciences sometimes even a very small R2 might be good Hence it is really not very easy to evaluate the absolute meaning of R2 The best way to evaluate R2 is gaining some experience on a speci c area an comparing the R2 at hand with the earlier observed values and make our judgments based on this experience In any case R2 can be very useful even if we are not familiar the standards of the R2 for similar types of studies lt s power comes from the fact that we can compare different models for a speci c data so that we can pick the best model among the candidates Basically the model with highest R2 wins in terms of goodness of t Of course goodness of t is not the only 17 factor determining which model we have to choose But de nitely it is one of the best tools that help picking the best model As we mentioned earlier we cannot use R2 for comparing models with different dependent variables eg we cannot compare the RZ s of y 50 6190 u and lny 60 6190 u but we can compare the RZ s of y 50 6190 u and y 60 llnx u 25 Assumptions and Their Implications for the OLS Estimators Before beginning this section recall that all of y x and u are random variables in the model y 50 6190 u ln the OLS estimation we use the sample observations of these random variables Hence whenever we talk about about the population model our variables are random variables and thus we can take their expectations and whenever we talk about the sample y and 90 would stand for the observed values of the random variables y and x A1 Linearity in Parameters In the population model the depen dent variable y is related to the independent variable x and the error m as y 50 t 5190 U 23 For the advanced readers the linearity assumption is important as OLS is an optimization problem For the linear case the rst order and second order conditions for this optimization problem assures that the sum of squared errors has a unique minimum For non linear functional forms we cannot assure this Hence a special treatment is necessary in such a case A2 Random Sampling We have a random sample of size n yx i l 2n following the population model in Equation 23 Importance of random sampling and solution to non random sampling will be clearer later A3 Sample Variation in the Explanatory Variable The sample outcome on x ie xi s are not all the same This is a weak assumption But in practice even if the variation in x is not zero this would cause a problem This generally happens whenever we do not have enough data to identify any variation A4 Zero Conditional Mean 0 As mentioned earlier this assumption implies that E 0 In what follows x denotes the all information about at variable ie x 1 x2 90 An implication of this notation is that 0 implies by law of iterated expectations that E 0 Also whenever A2 holds E 0 if and only if E 0 Hence practically we will be using E 0 assumption Theorem 28 Assumptions Al A4 implies that the OLS estimates are unbiased ie Ewe 50 and 61 Note that 24gt ELM 9 v 7 a 7 ELM e 250 Sim 7 2 51 2 90139 i 93W 060 151 Z11F 2 51 1490diui i1 210 7 216 7 2 7i2 and l39 7 2 where AW W 11 739 Hence Bl 61 AW ELI du 61 where 50011 and sVar stand for sample C011 and sample Var most of the time these are de noted by sxy and 8 consists of two components A xed component or say non random conditional on x 61 and a random component con ditional on 90 AW 21 Conditional on x the randomness of 31 is attributed to AW 2 1 din component Let s calculate the con ditional expectation of 31 E61 AW ELI 51 AW ELI But we assumed that 0 which implies 19 that 61 Then from a property of conditional expectation we have A l Now consider the conditional expectation ofABO ELBOW EM i lilx Ewe 619 11 7 60 TAEl l 60 09 0 60 Hencea El ol 60 Thus we conclude that the OLS gives unbiased para meter estimates The unbiasedness result depends on Al A4 and if anyone of these assumptions fail it is not guaranteed For example A2 assumption is violated when we have a sample selection For example consider a survey which asks individuals their wages Assume that we want to use this survey for estimating the effect of some factors on salary If the individuals with higher salaries refuse to provide this data then this would mean that there is a sample selection here Hence when making the estimation we have more information that just observing x and y We further know that we do not observe higher levels of y Hence this implies that we have to make assumptions on gt c 90 ie we know observed 90 values and we know that y gt c for some 0 Hence this changes the moment conditions that we have to use for the estimation which invalidates the usual OLS parameter estimates For the time being it would be enough to know that when there is sample selection we have a sample selection bias Hence don t worry about the underlying math Later we will relax this assumption Example 29 Data set MEAP93dta Let maltth denote the per centage of tenth graders at a high school receiving a passing score on a standardized math exam We want to examine the effect of the feder ally funded school lunch program on student performance We expect to that the lunch program to have a positive effect ceteris paribus on performance The data includes information on Michigan high schools for 1992 1993 school year Let lm gchp denote the percentage of students who are eligible for the lunch program The Stata estimates are given in the table below Note that the coef cient of lm gchp has a negative sign which is not in line with our expectations The reason seems to be that we are omitting some variables which are correlated with lnrgchp For example school s quality and resources are contained in u and these variables are likely to be correlated with the regressor Hence before making fast conclusions about our models we have to think twice 20 use MEAP93DTA reg matth 1ncl vpr g source Number of obs 408 1 405 8377 00000 Model 755525597 755525597 Prob gt Residual 371519145 915071786 R7sq awed o1710 Adj R7squared 01590 Total 448171805 110115923 oot MSE 95559 matth CoeF Std Err t Pgtltl 95 ConF Interval lncl vprg 73188643 0348393 7915 0000 73873523 72503763 cons 3214271 9975824 3222 0000 3018164 3410378 l 02 A5 Homoskedasticity Var Note that Varmilxi 7 7 02 and Vadui 7 E0392 0392 Hence this assumption implies that the unconditional variance is the same as the conditional variance Note that Vadui 0392 does not imply that Vaduilxi 0392 Let s calculate the conditional variance of OLS estimator Bl Var81lVar61 Ax 25 11 Ax2 Z dZZVaMuilx 1 Ax2 d U2 Similarly Varg olx 26 In what follows otherwise is stated we drop the conditional from the variance for the sake of notational simplicity But bear in the mind that unless otherwise is stated or it is clear from the content whenever 21 we say Vad o or Varg l we mean Var80l and Var81l The implication of above formula is that the larger the error variance 0392 is the larger the variance of 31 is This is intuitive since the larger the variance of u is the larger the uncertainty about y is The larger the variation in x is the smaller the variance of 31 is This is also intuitive since the higher total variation in 90 implies that we are collecting more and more information about the relationship between y and 90 Moreover as the sample size increases the total variation in 90 increases which makes the variance of 31 smaller Note that equations 25 and 26 are valid only if we have homoskedasticity Hence whenever we do not have homoskedasticity ie heteroskedasticity the calculated standard errors are invalid In order to get correct standard errors somehow we have to incorporate this into our model Most of the time we are more interested in Var81 than VaTBO Of course we do not know 0392 and we need to estimate it First note that 0392 We do not know u but we know its estimate s observed values ie 11 We can use 11 instead of u in order to get an estimate for VaTBl Hence a candidate would be the sample counterpart of ie 21 11 The problem is that one can show that 21 quot77202 This suggests that we can use 33921 rig ELI as an unbiased estimator for 0392 Note that in n 7 2 2 is called the degrees of freedom and it is equal to number of 6 parameters When we were talking about estimating sample average this number was equal to l as we were interested in only one parameter ie 60 After estimating 0392 estimation of variances of 6 parameters is trivial One can calculate the corresponding standard errors by simply taking the square roots of the relevant variance estimates 26 Examples E1 Let kids denote the number of children ever born to a woman and let educ denote years of education for the woman A simple model relating fertility to years of education is kids 60 leducu 27 where u is the unobserved error a What kinds of factors are contained in u Are these likely to be correlated with level of education Answer lncome age and family background such as number of sib lings are just a few possibilities It seems that each of these could be correlated with years of education Income and education are probably 22 positively correlated age and education may be negatively correlated because women in more recent cohorts have on average more edu cation and number of siblings and education are probably negatively correlated b Will a simple regression analysis uncover the ceteris paribus effect of education on fertility Answer Not if the factors we listed in part a are correlated with educ Because we would like to hold these factors xed they are part of the error term But if u is correlated with educ then Euleduc 0 and so SLR4 fails E2 The following table contains the ACT scores and the GPA for eight college students GPA is based on a four point scale and has been rounded to one digit after the decimal Student GPA ACT 1 28 21 2 34 24 3 30 26 4 35 27 5 36 29 6 30 25 7 27 25 8 37 30 a Estimate the relationship between GPA and ACT using OLS that is obtain the intercept and slope estimates in the equation 134 BO 61ACT 28 Comment on the direction of relationship Does the intercept have a use ful interpretation here Explain How much higher is the GPA predicted to be if the ACT score is increased 5 points Answer Let y GPA ac ACT and n 8 Then a 25875 3 32125 2 100 7 7 g 58125 and 2100 7 m2 56875 We obtain the slope as 5 1 W 01022 rounded to four places after the decimal Hence 50 337519 32125701022 25875 05681 So we can write 713 05681 01022ACT 29 23 The intercept does not have a useful interpretation because ACT is not close to zero for the population of interest If ACT is 5 points higher increases by 1022 gtllt 5 511 b Compute the tted values and residuals for each observation and verify that the residuals approximately sum to zero Student GPA 0131 0 27143 0357 34 30209 3791 30 32253 72253 35 33275 1725 36 35319 0681 30 3123171231 27 3123174231 37 36341 0659 P3 00 OONCDC yPOOMH You can verify that the residuals as reported in the table sum to 0002 which is pretty close to zero given the inherent rounding error 0 What is the predicted GPA when ACT 207 Answer When ACT 20 0131 5681 1022 3 20 z 261 d How much of the variation in CPA for these eight students is ex plained by ACT Answer The sum of squared residuals 2 1 11 is about 4347 rounded to four decimal places and the total sum of squares Ziggy 7 32 is about 10288 So the R squared from the regression is R2 l 7 SSHSST l 7 434710288 577 Therefore about 577 of the variation in CPA is explained by ACT in this small sample of students E3 Consider the savings function where S is savings and I is income 550511uu e where e is a random variable with EM 0 and Vare 0393 Assume that e is independent of I a Show that 0 so that the key zero conditional mean assump tion is satis ed 24 Answer When we condition on inc in computing an expectation inc becomes a constant So EI elI I EelI I 0 because EelI Ee 0 b Show that VarulI UEI so that the homoskedasticity assumption is violated In particular the variance increases with I Answer Again when we condition on inc in computing a variance inc becomes a constant So VarulI VarU elI IZVarelI UgI because VarelI a 0 Provide a discussion that supports the assumption that the variance of savings increases with family income Answer Families with low incomes do not have much discretion about spending typically a low income family must spend on food clothing housing and other necessities Higher income people have more discre tion and some might choose more consumption while others more saving This discretion suggests wider variability in saving among higher income families 3 Chapter 3 31 Multiple Regression Analysis It is obvious that many of the times the simple linear regression model is not enough for explaining a given variable There can be many rele vant factors that affect the variable of interest and omitting them might cause so called omitted variable bias The simplest solution is including these variables in our model and using the multiple regression analysis Essentially the logic of multiple regression analysis is exactly the same as the simple regression analysis Hence almost all of the properties that we mentioned earlier will remain to be valid for the multiple regression analysis But we will have some additional complications Let s give an example to the multiple regression analysis Example 31 Let s consider the effect of education educ and experi ence exper on wage wage 60 leduc 626901367 u 31 25 Hence wage is determined by two independent variables educ and exper and unobserved factors which are contained in u Here 51 measures the ceteris paribus effect of educ on wage Mathematically 61 is equal to the partial derivative of wage with respect to educ ie 2759 61 So what are bene ts of including the eacper variable First we can explain the variation in wage better Second if educ and eacper are correlated and we do not include eacper in explanatory variables in other words include it in a term the OLS estimate for 51 would be biased In this equation we assume that Ewageleducexpe7 0 Of course we can still have some other factors in a which are correlated with educ or and eacper Example 32 Another example to multiple regression analysis is y o61522u 32 Here we use the multiple regression analysis to model y as a non linear function of explanatory variables Note that I do not mean that this is a non linear regression model This model is still a linear regression model This can be seen by de ning Z 902 The difference of this model from the former one is that it is not possible to talk about ceteris paribus effect of 90 here Because we cannot x 9amp2 while changing so But we can still talk about the effect of one unit change in x on y In derivative terms this can be approximated by 37 WW 61 25290 The general multiple regression model is given by y 60 51951 62952 f 6199519 As in the simple regression model our key assumption for the multiple regression analysis is Elul17277kl 0 33 Again this assumption implies that E 0 Remember that the ex pected value ofEulx1ac2ack is equal toEu ie EEul1x2 But we assume that Eul1ac2ack 0 Hence EEulx1ac2ack E0 0 Eul1ac2xk 0 assumption also implies that COUuxj 0 forj l2k Hence whenever we have an omitted variable ie the variable is included in a term that 26 is correlated with xj for some j 12k ie if the omitted factor is correlated with anyone of the regressors we have Cauu 901 0 This implies that Eul1x2 0 cannot be true as otherwise we would have COUu 901 0 Basically from these arguments it should be clear that an omitted variable bias is very related to the violation of Eulx12k 0 assumption Example 33 This example shows how to interpret the multiple re gression coef cients Consider the following model ln salary 60 61 ln sales gceoten 63ceoten2 u 34 In this model y lnsalary 901 lnsales 902 ceoten and 903 ceatenZ But how do we interpret the coef cients What are the mean ing of 61 62 and 637 We ll look at the ceteris paribus effects of the explanatory variables In model 34 61 is the elasticity of salary with respect to sales If 63 0 then 10052 is approximately the ceteris paribus increase in salary when ceoten increases by one year 53 0 case is more complicated and we postpone it for the time being 32 Calculating the OLS Estimates As in the simple regression case we can either derive the OLS estimates from the moment conditions or use the minimum of sum of squared residuals interpretation For the latter interpretation the OLS estimates are calculated as follows Zeb 50 7 61901139 52902139 i 35 min oy ly Zyquoty k 11 Hence we try to nd 803182Bk such that 21y 7 50 7 61901 7 62902 7 7 is minimized Then the OLS residuals de ned as follows 11 y 7 BO 7 81901 7 82902 7 7 11 is the predicted value of the error term m similarly 80 81901 82902 is the predicted value of the dependent variable y The line generated by y BO Bl1i822iu equation is called the OLS regression line As we did earlier we can decompose the dependent variable into two parts Explained 39 and unexplained or residual 11 That is y 11 Returning back to the estimates the rst order conditions for this optimization problem is given by 27 295 780 7 4819015 7829025 i 0 36 71 29011195 60 619017 529025 0 11 29021195 50 519017 529025 0 71 290141191quot 50 519011quot 529025 i 0 71 1n system 36 we have k 1 linear equations and k 1 unknowns We assume that this system has a unique solution which we call the OLS estimates Of course this is simply an assumption and if we are not careful in our modeling this assumption can easily be violated These rst order conditions coincide with the sample counterparts of the fol lowing moment conditions 0 and 0 for j 12k Here E 0 is derived from the uncorrelatedness of xj and u Example 34 The variables in GPA1dta include the college grade point average COZGPA high school GPA hsGPA and achievement test score ACT for a sample of 141 students from a large university both college and high school GPAs are on a four point scale cd CDocuments and Settings0wnerLDesktopCh2 Stata CDocuments and Settings0wnerLDesktopCh2 Stata use GPAl reg ColGPA hsGPA ACT source Number of obs 41 2 F 138 1478 Model 342355505 2 171182753 Prob gt p 00000 Residual 159824444 138 115814814 Risquared 01754 Adj Risquared 01545 Total 194050994 138514995 Root MSE 34032 COlGPA Coef Std EV V t Pgtltl 95 Conf Interval hsGPA 4534559 0958129 473 0000 2540047 5429071 ACT 009425 0107772 087 0383 70118838 0307358 cons 1285328 3408221 377 0000 512419 1950237 Note that rst I stated Stata which folder to use by cd command Then 1 used use command to load the data Finally 1 used reg 28 command to do the regression OLS From this table we can nd the OLS regression line to predict college GPA from high school GPA and achievement test score colGPA 12863 04534hsGPA 00094ACT 37 In this equation the intercept is 12863 This means that wheEer hsCPA 0 and ACT 0 our prediction for colCPA ie colCPA would be equal to 12863 Of course for this example the intercept is not very meaningful as no one has a zero GPA We can attribute this thing to the fact that this linear approximation is not good as we get farther away from the mean of explanatory variables In the end this is only an approximation In any case many of the times the intercept have meaningful interpretations As expected both hsGPA and ACT have positive coef cients indicating that holding the other factor xed an increase in these variables would lead to an increase in colCPA or we predict that colCPA will increase Note that colCPA is our prediction and in the real sample we might rather observe a decrease in colCPA Now consider two students A and B with the same ACT scores but A s hsCPA is one point above B s hsCPA 1n this case we expect or predict to see that A s colCPA is approximately 04534 higher than that of B This example shows the idea of changing one of the regressors while xing the others Mathematically we took the partial derivative of colCPA with respect to hsCPA Sometimes we are interested the effect of change in more than one variable at the same time Suppose that we want to predict the change in colCPA in response to1point increase in CPA and 5 points increase in ACT Then ACOZCPA 04534AhsGPA 00094AACT 04534 gtllt 1 00094 5 05004 Hence we predict the colGPA to increase by 05004 points Suppose we changed the speci cation of our model to ln colGPA 50 lhsGPA BZACT u 38 The corresponding OLS regression results are 29 gen Moism logcolGPA reg lColGPA hsGPA ACT source Number of obs 41 1 F 2 D 1453 Model 35798674 17899337 Prob gt F 00 0 Residual 170020961 01232036 Rrsq 0 1739 Adj Risquared 0 1620 Total 205819635 014701403 Root MSE lColGPA CoeF Std Err t Pgtltl 95 ConF Intervak hsGPA 1477234 0312503 473 0000 0859322 2095147 ACT 0027982 0035151 080 0 427 70041521 0097486 Cons 5398716 1111623 486 0000 32007 7596731 Note that by gen command we rst de ned a new variable ln colCPA which is equal to natural logarithm of colCPA Then we regressed this variable on hsCPA and ACT by reg command In both of the regressions the coe icient of ACT not signi cant at any conventional signi cance levels 1 5 or 10 We can easily see this by looking at the corresponding p values from the tables In both regressions the p values are more than 5 Hence we fail to reject the hypothesis that the coef cient of ACT is zero Remember that failing to reject the null does not imply that the null is true But throughout this course we will argue as if this is the case This approach is general in econometric modeling although it is questionable Hence we conclude maybe falsely that there is no linear relationship between ACT and colCPA or lncolCPA In any case when predicting the dependent variables we should still include the insigni cant variables Hence we should not take them equal to zero just because they are not signi cant But if the econometrician believes that maybe because of the insigni cant coe icient ACT is not affecting colCPA or lncolCPA heshe can drop this variable from his her model and get the following estimates reg ColGPA hsGPA Number of obs 441 F l 139 2885 333506006 1 3 33506006 Prob gt F 00000 source Model Residual 160710394 139 i15618988 Risquared 01719 Adj Risquared 01659 Total 194060994 140 138614996 Root MSE 34003 ColGPA CoeF Std Err t Pgtltl 95 conf Interval hsGPA 4824346 0898258 537 0000 304833 6600362 on 1415434 3069376 461 0000 8085635 2022304 Note that the estimates for while the coe icients of the constant and hsCPA are not the same as the ones from the model with ACT they are 30 not very different from them Most of the time omitting an insigni cant variable does not cause too much of change in the estimates of the other parameters In order to see the reason we need a better understanding of the OLS estimates of the coe icients Consider the following population model 3 50 511 622i W 39 Now consider two regression models that are trying to explain the pop ulation true model wage 11 822 40 yi50 511 41 Since 31 gives the ceteris paribus effect of 901 on y when calculating it the affect of 902 partialled out Hence 31 measures the effect of 901 on y that is not due to 902 In order to nd this effect one can regress 901 on the constant and x2 and nd the residuals from regression These residualsnare free off the effect of 902 Then as in the simple regression 51 W where 7 is the residual from the regression 901 on the 11 739 constant and 902 On the other hand 51 w Note that quotL an 177 2 901 7 21 is nothing but the residual of regressiXoinlgcl1 orl the constant Hence by demeaning 901 we are partialling out the effect of the constant It can be shown that the relationship between 61 and 31 is given by 61 31 6261 where 61 is the slope coe icient from the simple linear regression of x on the constant and 902 Hence 51 61 only if either 62 0 or 61 0 Since the coe icient of ACT was not signi cant we failed to reject that 82 0 Hence because of this we expected to have not much change in the estimate of 61 We would also not have a lot of change in the estimates if the correlation between hsCPA and ACT is low Indeed if the correlation is zero the estimate of 51 would be e ected even if we omit ACT In practice there is always some correlation and the real question is whether this correlation is statistically signi cant or not It can be seen that omitting ACT variable decreased R2 This is ex pected because omitting ACT decreases the explanatory power of the regressors Indeed adding a new regressor always increases R2 31 33 OLS Assumptions A1 Linear in Parameters The model in population can be written as y506115225kku 42 where 606162 6k are the unknown parameters of interest and u is an unobservable random error or disturbance term A2 Random Sampling We have a random sample of n observa tiOIlS yi71i71i7771i 1i 1 27quot 7n A3 N 0 Perfect Collinearity In the sample and population none of the independent variables is constant and are no exact linear rela tionships among the independent variables A4 Zero Conditional Mean Eulx12x3xn 0 Theorem 35 Under assumptions A1 4 51 for any j l 2 k for any values of the population parameter 61 ln other words the OLS estimators of the population parameters are unbiased Example 36 We provide some examples in order to clarify the mean ing of perfect multicollinearity a Assume that assumption A3 does not hold What would the conse quence of this be Now consider the following model score 60 lstudyd gstudyw u 43 where score is the score of an exam and studyd and studyw are the number of days and weeks studied by students respectively Note that we can write studyd 7studyw implying that studyd and studyw contains the same information It is impossible to know 51 and 62 simultaneously We can only determine the value of one of them given the value of the other Hence for identi cation purposes we can just omit one of these variables say studyw by setting its coef cient equal to zero b Consider the following population model 9 507L619 522633u 44 32 where 901 902 903 In this example we have a similar situation To see this note that y 60 61x2x3 62 63x3u 60 61 62 61 63x3u Hence again A3 is violated 0 Consider the following population model y o61522u 45 This population model does not show perfect multicollinearity although 902 is a function of 90 Because the relationship between 90 and x2 is not linear 1 Consider the following population model 1ny50511nx521nx2u 46 Although this example seemingly resembles c it is very different Note that lny 6051 ln52 ln2u 50 lln6221nacu Hence here we observe a perfect multicollinearity due to the linear relationship of 11190 and 111 x2 211190 e If the sample size is very small then we would have perfect multi collinearity More precisely if n lt k l of parameters then we will surely have perfect multicollinearity Most of the time in sample data this kind of problems happen only because of such mistakes that are done by the econometrician Hence a perfect multicollinearity is a sign of econometrician s mistake Later we will see that we still have some econometric problems if our sample shows near multicollinearity A5 Homoskedasticity Varulx12n 0392 For the sake of notational simplicity from this point on we denote the variables 901 902 90 by ac A Theorem 37 Under Assumptions Al A5 the variance of 51 condi tional on x is given by Vad j Wife forj l2k where SST 2 1 907 iij2 is the total sample variation in xj and R is the R2 from regressing xj on all other independent variables including the intercept 33 34 Including Irrelevant Variables and Excluding Relevant Variables Let s consider what would happen to bias of parameters if we include irrelevant variables ln order to see this assume that the true model is given by y 50 f 51901 f 52952 f U 47 such that 52 0 Since the econometrician does not know that 62 0 he will include 902 in his regression equation and get the following estimate 39 30 f 81901 i 82952 48 From Theorem 5 we know that EBO 60 61 and 52 0 Hence our parameter estimates would still be unbiased Note that the observed value of 82 would almost never be exactly equal to zero But we know that its average across all random samples is equal to zero Now assume that 62 a 0 and the econometrician estimates the following model 9 50 f 51901 11 49 We know that U 62902 u Hence Cor90111 6200U12 Cm 90111 62Cw1x2 y 0 unless Cau1x2 a 0 or 62 a 0 But EMxl 0 implies that Caux1v 0 Hence EMxl 0 assumption violated Hence we cannot assure the unbiasedness of our estimator We know that adding new variables to our regression model leads to a higher R2 But why don t we include as many variables as we can The simple answer is that adding new variables increases the variance of the parameter estimates Hence there is a bias variance trade off To see this consider the following population model and two regressions y 0 f 11 i 22 f U a g 1901 62902 51 39607L61391 52 2 Then Vam l Vam l 5 Note that R2 0 for 61 as R2 would be zero if we regress a variable on constant Moreover R2 is a non decreasing if we add more regressors Hence variance would increase if we add more regressors 35 Multicollinearity We know that Vad j 127122 Now we examine this variance a Effect of 0392 A larger error variance in the population model leads to larger variance estimates for 31 ie higher uncertainty for the parameter estimates This is because the noise in the population model makes it more dif cult to estimate the partial effect of any of the independent regressors Increasing the sample size will not effect 0392 For a given random variable there is only one way to reduce the error variance Adding more explanatory variables b Effect of SSTj The larger the variation in xj is the larger SST is implying a smaller Vad j When we add more observation SST cannot decrease and generally increases Hence Vad j decreases as SST increases This is sensible because the higher the variation in x is the more information we collect about the relationship between y and 901 Using similar values of xj would not add much information 0 Effect of Hg R is the proportion of the variation of xj that is explained by other independent variables Consider the extreme case that xj is linearly related with the other regressors In this case R12 would be equal to 1 Hence this would violate the Assumption A3 No perfect collinearity Similarly whenever R12 is high the regression method nds it hard to identify the parameters similar to the perfect multicollinearity case Hence the variance of the relevant parameter increases as R increases and converges to in nity as R goes to l 0392 and SST being held constant High correlation between one or more independent variables is called multicollinearity This means that the variance estimate for the parameters is very large This leads to unstable but still unbiased parameter estimates Even a small change in the data say adding one more observation can lead to a substantial change in 5 if R12 is high This is essentially a small sample problem because Var 1 decreases as the sample size increases Hence if we have enough data then this kind of instability can be prevented Of course getting additional data might be costly if not impossible A possible attempt to 35 solve multicollinearity is omitting one of the variables that are causing multicollinearity But many times this would lead to biased parameter estimates As a rule of thumb if the parameter estimates are signi cant we can safely say that we do not have multicollinearity problem Another simple rule of thumb is using variance in ation factor VlF which is obtained directly from the variance formula and is given by VI 171R lf VI gt 10 we might think that the multicollinearity is problem 4 Chapter 4 41 Sampling Distributions of the OLS Estimators Although we provided a statistical primer on hypothesis testing and dis tributions as well as ways to interpret the Stata outputs for parameter estimates we did not speci cally examine the distributions of the OLS estimates Up to now we only considered two moments of the OLS para meter estimates distributions Expectation and Variance While these two moments are enough to describe a normally distributed random vari able they are not enough to describe many of the other distributions ln this section we will add one more assumption to our multiple regression model A6 The population error u is independent of explanatory variables 901902 mack and is normally distributed with zero mean and variance 0392 u N N0 02 Note that Assumption A6 implies that 0 and Var Varu 02 lndependence implies that E E and Varmlx Varu Hence A6 implies A4 and A5 We still would like to include A4 and A5 in our assumption list so as to remind us that Al A4 im plies unbiasedness of the OLS estimator AlA2A3 and AG would also imply unbiasedness but it would be much stronger than Al A4 for this purpose We call Assumptions Al A6 classical linear model CLM assumptions Under the CLM assumptions the OLS estimates are the best unbiased estimators ie their variance is the minimum among all the estimators including the non linear ones Remember that under Gauss Markov assumptions the OLS estimators are the best unbiased estimators only among the linear estimators Under CLM assumptions ylac w N60 61901 52902 6kk0392 where so is a shorthand for 1 2 36 So how good is this assumption The central limit theorem CLT from statistics argues that if the sample size is large enough then the error term u is approximately normally distributed Of course this happens only if the assumptions of the CLT satis ed For example it is an estab lished nding that the wage conditional on some relevant explanatory variables is not normally distributed probably due to minimum wage restrictions ln such cases many times we can transform the model in a way so as to get normally distributed error terms For example log transformation of wage variable is believed to solve the non normality problem Luckily there are some tests to determine whether the error terms are normally distributed or not Hence many times it is a good idea to use these tests to assure that the error term is normally distrib uted Stata assumes normally distributed error terms when announcing standard errors Hence this is a crucial assumption in the regression analysis Theorem 41 Under the CLM assumptions Al A6 conditional on the sample values of the independent variables 8139 N N6jvvar8j 53 where Vad j Therefore 6 i 6 m N0 l 54 We illustrate above Theorem 1 in the simple linear regression framework Remember that for the simple regression model we showed that 61 61 Ax 2 1 din where Ax and 90 7 9 We know that conditional on x u s are normally distributed and any linear combination of normally distributed random variables are also normally distributed Hence conditional on x 61 51 AZL1du is also normally distributed Note also that since 617s are normally distributed any linear combination of them are also normally distributed eg 31 52 N N61 gal 7181 t 42 Testing Hypothesis about a Single Population Parameter The following theorem gives a generalization of the t test we mentioned in the appendix part Earlier we only tried to test whether the mean of a 37 random variable is equal to a speci c value Now our purpose is testing whether the estimate of a population parameter of a random variable 61 is equal to an hypothesized value 61 Note that we again have t test here because of the additional uncertainty introduced by not knowing the 0392 term If Theorem 42 Under CLM assumptions A1 A6 7 6 17A1 V tnnknl 556139 where k l is the number of parameters in the population model y 60 61901 62902 6m u Theorem 2 enables us to test the equality of 51 parameters to certain values The most widely used test is testing whether a parameter value is equal to zero or not The corresponding hypothesis test for this is given by Hoi j0 HA63 0 In the regression analysis 51 a 0 means that the particular variable has an effect on the dependent variable This is generally what the econome trician wants as many times our purpose is determining the factors af fecting a variable and these factors speci c relationship with the variable of interest dependent variable Rejecting the null hypothesis implies such a relationship For the above hypothesis we use the t statistics or t ratio 5 t 86 57 Of course one sided test is similar to two sided test For more informa tion about one sided test I direct you to your book or class notes Here I want to make an important remark The p values provided by Stata are for two sided tests But you can easily use the t values announced in Stata to do your one sided test Also the con dence intervals are valid only for two sided tests Example 43 Using the Wageldta gives 38 use WAGEl reg 39Iwage educ exper tenure Number of obs 526 F 3 522 8039 Mode39l 468741776 3 156247259 Prob F 00000 Residua39l 101455574 522 194359337 z I v a z m 1 m a n I n n I o w as o Adj Rsquared 0 3121 Tota39l 148329751 525 28253286 Root MSE 4086 39Iwage Coef Std Err t Pgtt 95 Conf Interva39l educ 092029 0073299 1256 0000 0776292 1064288 exper 0041211 0017233 239 0017 0007357 0075065 tenure 0220672 0030936 713 0000 0159897 0281448 cons 2843595 1041904 273 0007 0796756 4890435 This Stata output gives the two sided tests but we want to derive our one sided test H0 1 5mm 0 58 HA 6511757 gt 0 The degrees of freedom n 7 k l 526 74 522 is given in the df column of Stata output Since it is large we can use the standard normal critical values for 5 signi cance levels ie 1645 note that this is a one tailed test hence the critical value is not 1960 The corresponding t statistic is given by 551105 0 0041211 temper A 7 7723914 59 5608mm 0017233 Since temps 2 3914 gt 1645 tmtiml we reject the null hypothesis and conclude that 69011637 is a signi cant factor for wage Note here that we used a loglin model in order to get normally distributed error term so as to get valid standard errors For the time being do not worry about getting normally distributed error terms Just know that it is necessary for a valid inference analysis Although Stata already provide us the two sided test information let s test the two sided version of the above hypothesis test H0 I 5mm 0 60 HA 1 5mm 0 39 temps value is invariant to whether we use one sided or two sided test Hence temps 2 391 4 This time our critical value is changed to 196 We still reject the null hypothesis as you can see This is in line with the corresponding con dence interval 0 is not in the interval and p value p value 017 lt 05 announced in Stata output We can of course test other hypothesis For example let s assume that for some reason we want to test whether tmme 02 We cannot see the direct answer to this question in the Stata output The test we want to make is I Baum 02 61 HA 5 tenme 26 02 The corresponding t value is given by t Btenure 7 Btenure 7 62 866te nu7 e 7 866tenure Hence the t value for our sample is given by 0220672 7 02 t 0668 22 63 Homework Read the p value section of Chapter 4 from your text book 43 Con dence Intervals Consider the population model y 60 61901 62902 kxk u Using the fact that t corresponding con dence intervals for 61 For example a 1001 7 00 N THUHD we can easily derive the con dence interval for 57 is given by Bjicnk1 1a2 8461 Since the con dence interval is a counterpart of a two tailed hypothesis testing the corresponding critical values should be calculated in the same manner with a two tailed hypothesis test More precisely for a 95 con dence interval we must use the critical values for the 975 percentile of the t distribution Of course if the sample size is large then we can simply use the critical values for the normal distribution instead 40 Example 44 Let n 32 and consider the following population model y o511522 Let s say that the parameter estimates are 80 405 81 04 and 5 2 105 and the corresponding standard errors are 8650 101 8631 021 and 8652 002 Then the 1001 704 95 con dence interval for 62 is given as follows Sgicnk1 1a2 8632 105i0323 152 02 105 i 2045 02 or 1 009 1 090 9 Remark Construction of con dence interval assumes not only unbiased parameter estimates for the 6 parameters but also correct estimates for the standard errors We also need a distributional assumption for the error term In our case we assume that u term is normally distributed Hence all of the CLM assumptions are necessary for the con dence interval construction 1f anyone of the assumptions are violated then the con dence intervals risks being irrelevant Homework Read the relevant part of the book in Chapter 4 You are assumed to know how to calculate con dence intervals 44 Testing Hypothesis for Linear Combinations of Parameters 1n the earlier sections we talked about testing hypothesis about single parameters 1t is perfectly valid to apply similar procedures for testing a hypothesis about a linear combination of parameters as well The key point is to note that any linear combination of normally distributed random variables is also normally distributed We already know that Bj s are normally distributed Hence their linear combinations should be normally distributed as well Now let s illustrate how we can manage to test a hypothesis involving a linear combination of parameters Example 45 Consider the following regression model 9 60611622533U Let s say that for some reason we want to test whether 51 62 More precisely let H0 61 52 and HA 61 lt 62 We can represent this 41 null hypothesis as follows H0 61 7 52 0 and HA 51 7 62 lt 0 The corresponding t statistics for this hypothesis test is given by B 7 B 5551 52 Nowifwede ne 61762weget H0 0 0andHA 0 lt0 As you can see this new hypothesis is no different than the hypothesis t tests that we dealt with earlier The only challenging part is that now we have to calculate the standard error for We already know that since 81 and 82 are normally distributed 9 81 7 82 is also normally distributed The expectation ofg is given by EBl 782 51 752 The variance of is given by Vad Vad l 7 32 VadBl 7 2C0u8132 Vad g Hence the standard error for g is equal to VarBl 7 2Cov8182 Va7 82 We did not give the formula for CUU81 While it is not very hard to calculate COU8182 there is an easier way to do the same test without calculating 001181 Here is the idea we know that 0 51 7 62 Hence we can replace 51 with 05 Then 9 60511522533U 60 0621522 63x3u 600x162x1x2633u Thus our transformed model becomes y 60 0901 62901 x2 63903 u From this point on it we simply use the usual hypothesis testing methods on the parameters of this model In particular H0 0 0 and HA 0 lt 0 can be tested in this new model without any need the knowledge of CUUltBh An obvious extension of the above test is testing more than one linear restriction at the same time Most frequent and the simplest examples to such tests are the exclusion tests For example assume that we want to test 52 0 and 53 0 at the same time Of course the econometrician can test the signi cance of these parameters individually but the joint test is a more proper way to do this if we are interested in omitting these two variables at the same time The corresponding hypothesis can be described as follows H0 52 53 0 and HA H0 is not true Note that this hypothesis test is different than H0 62 7 53 0 and HA H0 In the former we test latter we test 52 0 and 53 0 whereas in the latter we test 51 62 Of course the latter allows 51 5 and 52 5 but not the former An easy way to test our hypothesis H0 52 53 0 and HA H0 is not true is comparing the SSRs of restricted and unrestricted models More speci callylet SSH be the SSR from the unrestricted model y 50 61901 62902 63903 u and SSE be the SSR from the restricted model y 50 61901 M Then de ne the F statistic as follows 42 SSH 7 88Rq SSRun 7 k 71 where q is the number restrictions or dfT 7 dfu and n 7 k 7 l is the degrees of freedom for the unrestricted model In order to implement a hypothesis test we need the distribution of F statistic Maybe not surprisingly F N Fq nikil It is clear from the de nition of F statistic that whenever this statistic is larger than a critical value we reject the null hypothesis Note that it is possible to observe individually insigni cant variables although as a group these variables are jointly signi cant This is a sign of multicollinearity When we look at individual signi cance due to multicollinearity the uncertainty on the estimates of each of these parameters in this group is high But as a group we can determine their joint effect easier Another way to calculate the F statistic for testing exclusion restrictions is using RZ s of the restricted and unrestricted models The F statistic in terms of RZ s is given by F Bi 7 339 1Rin If i 1 where Bi and R72 are RZ s from the unrestricted and restricted models respectively This follows from the fact that SST SST1 7 R3 and SST SST1 7 Note that the R2 version the F statistic is valid only for the exclusion restrictions whereas the SSR version is valid in general Since the general application is F test is more complicated we provide an example here Example 46 Consider the following model 9 50 51901 52902 53903 f 54904 U Assume that the econometrician wants to test 61 152 0 53 0 and 64 0 Note that since this test includes 51 1 this is not a whole exclusion test Hence F statistic derived from R2 is not valid After applying the restrictions the restricted model becomes y 50 x1 u or y 7 x1 50 u Hence we have to use the SSH from the following model y 7 x1 60 u Note that the formula with R2 is not valid because the SSR s from regression models with different independent variables 43 5 Chapter 5 5 1 Consistency Until now we considered the nite sample properties of OLS estimators ie we did not require the sample size to be large When we have a large sample OLS has some nice properties The purpose of this section is examining these properties Before mentioning these properties we give some of the important tools that will be useful for large sample data analysis De nition 51 Given a sample Y1Y2 Yn let be an estimator of 0 We say that 0 is a consistent estimator of 0 if for every 6 gt 0 we have P 970 gtEgt0asngt00 Hg is not consistent it is called inconsistent When is a consistent esti mator of 0 we say that the probability limit of g is equal to 0 Similar to the limit in calculus a consistent estimator is denoted by plim 0 Indeed we can consider the consistency concept a probabilistic version of limit concept from the calculus lntuitively the consistency of 9 means that as the sample size goes to in nity the distribution of g be comes more and more concentrated around 0 Consistency is one of the minimal requirements in econometrics That is if an estimator is not consistent most of the times it is deemed as useless Essentially incon sistency means that Even if we have in nite size of data our estimator is not close to the parameter of interest in a probabilistic sense Note that as we will see shortly an unbiased estimator might be consistent and a consistent estimator can be biased For at least one special case it is very easy to show the consistency of an estimator Assume that g is an unbiased estimator of 0 and Vad gt 0 as n gt 00 then it can be shown that g is a consistent estimator of 0 Consider 7 as an estimator of M where K s are independently distributed with mean M Then we showed that ED7 M and Varl7 Hence Y is not only an unbi ased estimator of H but also its variance is converging to 0 as n goes to in nity This implies that Y is a consistent estimator of M Theorem 52 Law of Large Numbers For any sequence of Y1 Y2 YT independently identically distributed random variables with mean u we have 44 plimY M Property 53 Let f be a continuous function and 34 and B be two estimators such that plim z 04 and plim Then we have a pumccwz Mme m b plimlt r 6 aw c pumm d plimCB 046 provided 6 a 0 e plim04 04 ie a non random number s plim is equal to the number itself For illustrative purposes we show that the OLS estimates for 6 parame ters are consistent For simplicity consider the simple regression model 9 50 t 5195 Then 81 210 7 93 219 i W 2310 2X50 6195i 22110 7 W ELM 70 51 2292ch implying that 45 A l 7 7 n 11 1 l pl1m61 pl1m 61 7 272 1 n 24 i 90W l b 51 H m 52 7 2V plim 22 mu 7 2E PlimG EL 9012 502 plim W 7 plim m 617L plimz2 7 plim g plimm plim 7plim22 El uil ElilElUil Elil2 001190 u Varz plim 7 plim 51 61 61 51 as was to be shown The proof of consistency of 80 follows from the consistency of 61 and equality 50 g 7 519 Remember that for the simple regression model 82 quot7 2 1 is an unbiased estimator of 0392 while another natural estimator maybe at a rst look even more natural g2 zz is a biased estimator of 0392 For illustrative purposes we show that g2 is a consistent estimator of 02 We know that y 60 6190 u and y 80 8190 11 Since 80 and 81 are consistent estimators of 50 and 61 respectively 11 y 7 BO 7 8190 is a consistent estimator of m The proof of this statement is given as follows plim i 7 plimjy 730 73190 7 plim50 61i W i 60 6195139 7W plim50 60 plim61 Here note that two random variables x and y are said to have the same plim s when plimz 7 y 0 Then by the law of large numbers we have plim 2 plim 2 1 Mind Minx Eu2 Varu 0392 Hence we found a biased but consistent estimator ie 2 Let Y1Y2Yn be a random sample with mean u Consider Y1 as an estimator of M De nitely M and thus Y1 is an unbiased estimator of M But even if the sample size goes to in nity we do not have that for any 6 gt 0 POYl 7 M gt E 7gt 0 The intuitive reason for this is that Y1 does not utilize the larger sample size and it really does not matter whether sample size is in nity or not for this estimator The plim is a generalized version of limit and it is related to the convergence 46 of functions rather than numbers So for plim we are talking about convergence of a function in some sense to another function Remember that a random variable is nothing but a function ln this case Y1 de nitely not converging to the constant M function as the sample size goes to in nity For unbiasedness we needed Assumptions Al A4 While Al A3 are still needed for consistency A4 ie Eulx1x2k 0 can be replaced with A4 Note that A4 is a weaker assumption than A4 in the sense that A4 implies A4 Question ls s an unbiased estimator of 039 ls s a consistent estimator of a A4 0 and 0011u 901 0 for j l 2k For the simple liner regression case the necessity of A4 for consis tency can be seen from the following equality plim51 61 plim31 51 if and only if COUxu 0 Here we abstract ourselves from Varac 00 case 52 Large Sample Inference We know that under A6 the parameter estimates for 6 parameters are normally distributed because u s are normally distributed lf u s are not normally distributed then the t and F statistics are not valid for an exact inference Luckily if the conditions for the central limit theorem CLT hold we can still use t and F statistics to get an approximate inference Below we remind the CLT for your convenience Theorem 54 Central Limit Theorem For any sequence of Y1 Y2 independently identically distributed random variables with mean M and variance 0392 we have 2 N WY 391 N01 where amp stand for asymptotically distributed Theorem 55 Asymptotic Normality of OLS Under Assump tions Al A5 we have 47 73quot was i 61 Mo MB 7 61 i N 01 556139 2 7 r 1 n 2 2 where aj i pl1m 217 11 and 7 are the residual squares from re gressing 90 on the other independent regressors Note that since we are talking about asymptotic distribution we can replace t distribution with the normal distribution as we assume a large sample size The smaller the sample size is the worse the asymptotic approximation is If u is normally distributed this is not a problem for a small sample size and we can simply use N 12494 to make our inferences But if the sample size is not large and u is not normally distributed then our inferences using t distribution would not be valid Unfortunately there is no general rule on how large the sample size should be to assure a good approximation It really depends on the individual example at hand But as a rule of thumb n 2 30 is generally thought to be OK For your information for some examples especially those that involve non linear models this number can be much higher Note that this theorem requires homoskedasticity of the error terms This is one of the requirements of CLT Fortunately there are ways to deal with heteroskedasticity Moreover there are versions of CLT which enable us working with random variables that are not identically distributed For example Lyapunov CLT says that under a condition so called Lyapunov s condition for a sequence of independent random variables SCH with E M and Var 02 1 n a Z 7 EH K e u Nlt01 Sn 2 n n 2 where s 7 211 0 Now let s illustrate why the estimates of the variance of the parameter estimates gets smaller as the sample size goes to in nity We estimate this variance by A Varg J 62 sen1 i R 48 A 2 where 039 1s a consistent estimator of 0392 Earlier we used 6392 82 22 11 Similar to 2 22 11 it is easy to show that plim82 0392 We know that l 7 will converge in probability to a number between 0 and l as the sample size grows unboundedly Finally plims Tj Varxj 039 by LLN where is the population variance of 901 Hence SST nag ie SST grows approximately at the same rate as no as the sample size grows By combining these facts we conclude that Vad j shrinks to zero at the same rate as Hence the larger the same size the smaller our consistent estimates for Vad j 6 Chapter 6 61 Additional Remarks on Functional Forms Example 61 Consider the following model 1119 50 f 51111051 f 52952 64 Earlier we learned that 10082 is a good estimate of the percentage change in y in response to 1 unit change in 902 if the change Of course if the change in y is large then this approximation ie A y 2 100 A ln y gets worse Fortunately we can calculate the exact value of this change by the following formula A g 100exp82 A 902 7 1 ln practice many times we simply use the approximation As a rule of thumb 5 change in y is ne Assume that 82 306 and we want to nd the exact percentage change in y due to 1 unit increase and decrease in 902 Then 100exp306 71 358 and 100exp306 71 7 l 7264 Hence the corresponding percentage changes are 264 and 358 Note that 82 306 lies between these two numbers lndeed this is always true This is one of the reasons for why loglin model is useful Remark In real data sometimes we might have observations with value 0 This prevents us taking the logarithm A possible but not the wisest solution for this is replacing y with y l ie using lny 1 If the number if the number of 0 value observations is not large then the percentage approximation still works ne Example 62 Consider the following model 49 wage 3 3ex 7 0006ex2 65 where wage hourly wage and ex years of experience Remember that the effect of ex on captured by a quadratic functional form We expect that this effect to be positive But example ex 16 we would predict that a unit increase in ex would lead to 3 72 006 gtllt 26 0012 change in hourly wage Many times this is not expected If the observation number satisfying this negative relationship is not large this is still an acceptable model But if this number is large then that is a signal of a bad model Example 63 Consider the following example 9 50 51901 52902 63951952 u 66 The partial effect of x1 on y can easily be calculated by taking the partial derivative of above equation Of course 51 alone does not show the ceteris paribus effect of x1 on y in this setting Now consider the following reparameterization of the above model 040 041901 t 042902 043001 M0952 M2 l u 67 Note that y 040 041901 042902 043051 M12 M2 t u 68 a0 mm agxg a3x1x2 7 M1902 7 M2902 M1M2 u 040 043M1M2 041 043M 042 043M1902 043901902 u 50 51901 52902 53901902 u Hence 041 51 63M2 since 043 53 which is nothing but the ceteris paribus effect of x1 on y Of course we do not know M1 and M2 We rather can substitute M1 and M2 with x1 and x2 respectively Hence by the help of this regression we can easily calculate whether x1 has a signi cant effect on y or not Without this transformation we had to calculate 8661 Bang which requires the knowledge of 00143183 62 Additional Remarks on Goodness of Fit Lets de ne the population variance of y and u as 039 and 0393 respectively Then R2 is an estimate of the population R2 given byp2 l 7 Ti0395 Remember that R2 l 7 Here SSHn and SSTn estimates 0393 and 0395 respectively But obviously these estimates are biased though still consistent Another estimate of p2 would be using some unbiased estimates of 03912 and 0395 Starting from this idea we de ne the adjusted R2 as follows 2 i SSRn i k 71 R 1 SSTn 71 69 Note that 32 also a consistent estimate of p2 Unfortunately similar to R2 it is not unbiased estimator Because the ratio of two unbiased estimators is not an unbiased estimator But many of the times it is a better measure of goodness of t One of the most important reasons for this is that while R2 monotonically increases as we add more regres sors R2 punishes additional regressors Hence R2 might decrease as a consequence of an additional variable This makes it a very good candi date for choosing models Unlike R2 32 does not necessarily lie in the unit interval ie 0 l and can be negative An interesting property of R2 is that adding a new variable increases 32 if and only if the corre sponding t statistic on the new variable is greater the unity Similarly adding a group of variables increases 32 if and only if the corresponding F statistic is greater the unity Example 64 Consider the following competing models 1110 50 51901 52902 53903 U 70 1119 50 51901 52902 54904 U We want to decide whether to include 903 or 904 in our model One way to test this is using a super model that contains all the variables of both models and test each model against the super model The problem with this approach is that we can reject or accept both models This leads to inconclusive outcomes Hence an F test might not help us Fortunately we can use 32 in this case equivalently R2 as the number of variables are the same for both models The model with higher R2 should be 51 chosen for a better model t 1n econometrics we try to use the simplest model that explains most That is why we try to avoid adding many variables to our models One of the limitations of R2 is that it cannot compare models with distinct dependent variables For example R2 cannot tell us which of the following model is better in terms of t 11124 60 61901 62962 63963 u 71 950 51901 52902 54954 u 7 Chapter 7 1n econometrics sometimes we need to use qualitative variables in ad dition to the quantitative variables For example we might need to describe the gender in a numerical way This is done by introducing a so called a dummy variable which takes the value 1 for one of the gender say for female and 0 for the other say male Hence in the data when ever we see 1 this implies that the corresponding observation belongs to a female Example 71 Consider the following model wage 60 ledu gfemale u 72 where wage hourly wage edu years of education and female a dummy variable equal to 1 when the gender is female and 0 otherwise Ewageedu e female 1 EWO ledu gfemale uedu e female 1 60 61e 621 Euedu efemale 1 60 f 515 f 52 Ewageedu e female 0 EMU ledu gfemale uedu e female 1 60 61e 620 Euedu efemale 0 50 515 Hence Ewageledu efemale 1 7 Ewageledu efemale 0 73 50 615 62 7 50 616 62 Therefore the coef cient of the dummy variable measures the wage dif ference between male and female for a given education level ie the level of education is the same for both genders Here the base group in our example it is male takes the value zero and the coef cient of the dummy variable or simply dummy gives the difference in depen dent variable between the base group and the other group The simple difference between these groups is that each group is assumed to have its own intercepts Hence the intercept for the regression line of males is 60 and the corresponding intercept for females is 60 62 Note that we did not have two dummy variables 1 for females and the other for males The reason is simple female male l implying that there would be a perfect multicollinearity if we include both of the dummies in the model This is one of the most widely encountered mistakes which requires some caution If we drop the constant term then we can avoid this multicollinearity Hence an alternative way to model the wage is wage ledu gfemale 63male u 74 The interpretation of 52 is NOT the same as the earlier model though Also there are issues in interpreting R2 in this setting Hence in this course most of the times we will include the constant term and drop one of the dummies Example 72 We want to see the effect of ownership of a computer on collage GPA For this purpose we use data set GPAldta The variables are colCPA collage GPA hsCPA high school GPA ACT ACT score and PC dummy for owning a computer ie PC 1 if owns computer PC 0 otherwise The following simple model is estimated by STATA reg co39IGPA PC hsGPA ACr Source 55 df MS Nunber of abs 14 F 3 137 1283 Mode39l 425741863 3 141913954 Prob gt F 0000 Residual 151486808 137 110574313 Rsquared 02194 Adj Rsquared 02023 Tota39l 194060994 140 138614996 Root MSE 33253 co39IGPA Coef Std Err t Pgtt 95 Conf Interva39l PC 1573092 0572875 275 0007 0440271 2705913 hsGPA 4472417 0936475 478 0000 2620603 632423 Mfr 008659 0105342 082 0413 0121717 0294897 cons 126352 3331255 379 0000 6047871 1922253 As it can be seen from the regression the coef cient of PC is positive and signi cant Hence we conclude that ownership of a computer increases the collage GPA by 16 The other variables are used to control other factors Since ACT is not signi cant probably dropping this variable will not change the coef cient estimate of PC Example 73 We want to see the effect of being in a colonial side on the house prices For this purpose we use data set WRICELdta The variables are self explanatory except colonial which is the dummy variable for being in a colonial side The following simple model is estimated by STATA reg 39Iprice 39I39Iotsize 39Isqrft bdrms co39lon39ia39l Source 55 df MS Nunber of abs F 4 83 38 38 Model 5 20397919 4 13009948 Prob gt F 0000 Residual 281362433 83 033899088 Rsquared 0 6491 j Rsquared 06322 Tota39l 801760352 87 092156362 Root MSE 18412 39Ipr139ce Coef Std Err t Pgtt 95 Conf Interva39l 39I39Iots39i Ze 1678189 0381807 440 0000 0918791 2437587 39Isqr 7071931 092802 7 62 0000 5226138 8917725 bdrms 0268305 0287236 093 0353 0302995 0839605 co39lon39ia39l 0537962 0447732 120 0233 035256 1428483 cons 1 349589 651041 207 0041 2644483 0546947 We can see that colonial variable does not have a signi cant coe icient for any conventional signi cance levels But for the sake of expositional purposes we ignore this fact at the moment The coef cient of colonial is equal to 54 implying that being in a colonial side increases the house prices by approximately 54 Example 74 We want to see the effect of gender on wage For this purpose we use data set WAGEldta The variables are self explanatory 54 except female which is the dummy variable for being female lwage is logarithm of wage and expersq and tenuresq stand for squares of exper and tenure respectively The following simple model is estimated by STATA reg 39Iwage fena39le educ exper expersq tenure tenursq Source 55 df MS Nunber of abs 526 F 6 519 6818 Model 653791009 6 108965168 Prob gt F 0000 Residual 829506505 519 159827843 Rsquared 04408 Adj Rsquared 04343 Total 148329751 525 28253286 Root MSE 39978 39Iwage Coef Std Err t Pgtt 95 Conf Interval female 296511 0358055 828 0000 3668524 2261696 educ 0801967 0067573 11 87 0000 0669217 0934716 0294324 0049752 5 92 0000 0196585 0392063 expersq 0005827 0001073 5 43 0000 0007935 0003719 tenure 0317139 0068452 63 0000 0182663 0451616 tenursq 0005852 0002347 249 0013 0010463 0001241 669 0989279 421 0000 2223425 6110394 The coef cient of female is signi cant Hence the gender has an effect on wage If we use the usual approximation we conclude that the woman earn 297 less than men This is too much for an approximation Remember that as a rule of thumb 5 is OK but 297 is just too much for an approximation If we use the exact formula for calculating the exact change 100exp81 7 l 100 gtllt expi297 7 l 725 696 As it can be seen the difference is not negligible But as we mentioned earlier one can still use the approximation if he she does not want to be speci c about the base group Because 297 is close to the average of predicted percentage change in wage for two baseline scenarios ie female is the base group or male is the base group Example 75 Multiple categories We still consider the above data set but this time we would like to add one more category to our model Marital status That would mean that the number of groups is now four Married men married women single men and single women This means that if we have a constant term in the model we have to drop one of these categories in order to avoid perfect multicollinearity As a base group we choose single males and omit the corresponding dummy variable The variables in the following regression is clear gen marrma1e 1fema39lemarr139ed gen marrfem fema39lemarr139ed gen singfem fema1e1marr1ed reg 39Iwage marrma39le marrfan singfem educ exper expersq tenure tenursq Source 55 df MS Nunber of abs 526 F 8 517 55 25 Mode1 683617623 8 854522029 Prob gt F 0000 Residua39l 799679891 517 154676961 Rsquared 0 4609 dj Rsquared 04525 Tota1 148329751 525 28253286 Root MSE 39329 39Iwage Coef Std Err t P t 95 Conf Interva39l marrma1e 2126757 0553572 384 0000 103923 3214284 marr 1982676 0578355 343 0001 311889 0846462 s39ingfan 1103502 0557421 198 0048 219859 0008414 u 0789103 0066945 11 79 0000 0657585 092062 exper 0268006 0052428 5 0000 0165007 0371005 expersq 0005352 0001104 4 85 0000 0007522 0003183 tenure 0290875 30 0000 0158031 0423719 tenursq 0005331 0002312 231 0022 0009874 0000789 3213781 100009 321 0001 1249041 5178521 All the coe icients are signi cant at 5 level Since the base group is single males the coe icient estimates for the dummy variables show the proportionate differences in wage relative to the base group single males For example a married man is predicted to earn approximately 213 more than a single man The effect of the base group is represented by the intercept term Although these estimates are relative to the base group single male we can use the estimates to see the relative differ ences in other groups as well For example married men are predicted to have 213 7 7198 411 more wage than married women But if we want to test whether the difference is signi cant or not we need to know the correlation between the coe icients of the relevant dummy variables in addition to the their standard errors In such a case we can run another regression which enables us directly testing for such an effect More speci cally in this regression we need to take one of mar ried male or married female groups as a base group The corresponding regression results are gen singm e lfema1e1marr1ed reg 39Iwage marrma39le singma39le singfem educ exper expersq tenure tenursq Source 55 df MS Nunber of abs 526 F 8 517 5525 Mode39l 683617623 8 854522029 Prob gt F 00000 Residual 799679891 517 154676961 Rsquared 0 4609 Adj Rsquared 04525 Tota39l 148329751 525 28253286 Root MSE 39329 39Iwage Coef Std Err t Pgtt 95 Conf Interva39l marrma39le 4109433 0457709 898 0000 3210234 5008631 s39ingma39le 1982676 0578355 343 0001 0846462 3 8 s39ing 0879174 0523481 1 0094 0149238 1907586 u 0789103 0066945 11 79 00 0 0657585 0920 0268006 0052428 5 0000 0165007 0371005 expersq 0005352 0001104 4 85 00 0 0007522 0003183 tenure 0290875 30 0000 0158031 0423719 tenursq 0005331 0002312 231 0022 0009874 0000789 cons 1231105 1057937 116 0245 0847279 3309488 From the above table we see that the coe icient of mamrmale is sig ni cantly different from zero for 5 signi cance level for a two tailed test This implies that there at least statistically is a wage difference between married man and woman A more proper test in this scenario would be a one tailed test Note that for a two tailed that the p value is given by P gt ltl ie P gt t P lt t For a one tailed test the p value is either P gt t or P lt t depending on the direction of inequal ity In our example P gt 898 P lt 898 is equal to the two tailed p value Due to the symmetry of normal distribution or t distribution P gt 898 P lt 898 2 gtllt P gt 898 So the one tailed p value P gt t is nothing but equal to 5p valuetwo tailed Similarly P lt t is given by 5l pvaluetwo tailed Example 76 Ordinal variables Let s suppose that we want to test whether beauty has any effect on wage ln BEAUTYdta data set the beauty of men and women are put into three categories below average 0 average 1 and above average In order to describe beauty we generated 3 variables belavg a dummy for below average beauty abvavg a dummy for above average beauty and beau a variable taking the value 0 for below average beauty 1 for average beauty and 2 for above beauty We announce two regression results using these variables reg 39Iwage be39lavg abvavg exper union goodh39lth b39lack married b39igcity sm39l39lcity educ 39if fema39e Source 55 df MS Nunber of abs 82 F 10 813 2697 Mode39l 598242916 10 598242916 Prob gt F 0000 Residual 180336962 813 221816681 Rsquared 0 2491 dj Rsquared 02399 Tota39l 240161253 823 291811973 Root MSE 47097 39Iwage Coef Std Err t Pgtt 95 Conf Interva39l be39lavg 1528579 0531626 288 0004 25721 0485058 abvavg 0057251 0380387 015 0 880 0803907 0689405 exper 0131521 0014734 893 0 000 0102599 0160442 1343289 0365447 368 0 000 0625958 2060619 goodh39lth 0262149 0702993 037 0709 1117747 1642044 b39lack 17561 0760118 418 0000 4667635 1683586 marr1ed 132143 0433694 3 05 0002 0470139 2172724 b1gc1ty 2621237 0469145 5 59 0000 1700359 3542116 sm39l39lc1ty 126410 0387992 3 26 0001 0502519 2025688 0579269 0066963 865 0000 0447828 071071 cons 6000161 1192465 503 0000 3659489 8340833 gen beau l abvavg be39lavg reg 39Iwage beau exper un39ion goodh39lth b39lack marr39ied b39igc39ity sm39l39lc39ity educ 39if fema39e0 source 55 df Ms mmber of obs 824 F 9 814 29 29 Mode39l 587431451 9 652701612 Prob gt F 00000 Res39idua39l 181418108 814 222872369 R squared 02446 Adj R squared 02362 Tota39l 240 161253 823 291811973 Root MSE 47209 39Iwage Coef std Err t Pgt t 95 Conf Interva39l 0523697 0275334 190 0058 0016752 1064146 exper 0132591 0014761 898 0000 0103616 0161565 Lln39ion 1407697 0365146 386 0000 0690958 2124436 goodl l39ltl l 0224116 0704452 032 0750 1158641 1606873 b39lack 3119683 0761501 410 0000 461442 1624946 marr39ied 1364375 0434288 314 0002 0511919 221683 b39igc39ity 2690746 04692 573 0000 1769761 361173 sm39l39lc39ity 1327799 0387838 342 0001 0566519 2089079 educ 0579481 0067122 863 0000 0447728 0711234 CD39IS 5099939 1198672 425 0000 2747086 7452792 Note that the regression is done conditional on female 0 ie these regression results are for men In the rst table abuavg does not have a signi cant sign and economically the value of the coef cient is small Hence although it has a wrong sing this is negligible and is only inci dental But the coef cient of belavg variable is signi cant at any con ventional levels indeed if by a one tailed test we can see that it is signi cantly negative The corresponding p value is 00042 0002 In the latter table the coe icient of beau is not signi cant for 5 level for a two tailed test But it is signi cant for a one tailed test which is more proper for our example Now we ask ourselves whether there is a constant partial effect between groups That is does going from belavg to avg change the wage the same as going from avg to abuavg If this is the case then the second model would be preferred otherwise the rst 58 one would be preferred We can simply look at the R2 here We see that the rst model has a greater R2 Hence in terms of goodness of t the rst model is preferred We can also statistically test this by the F test For this purpose we need the SSR s from the unrestricted model 1 and restricted model 2 models as well as corresponding degrees of freedoms The F statistic for this test is given by 18142 7180341 F 48688 384F 1813 5 18034813 gt ml l 7 Note that the number of increments is equal to two and we want to test whether these increments are the same or not Hence there is only one restriction From the F test we conclude that the restricted model is rejected Hence the F test agrees with the conclusion of 1 In case they do not agree we can try examining the models more carefully and try to nd more evidence supporting one of the models Example 77 In this example we will see how we can use dummy variables for allowing different slopes for different groups For example suppose that we want to study whether the education level has the same e ect on wage for men and women More precisely we want to examine whether the amount of wage increment for men as a result of education level increment is the same as women for the same levels of education To answer this question we have to compare the slopes of men and women for education One way to do this thing is adding femeduc educ gtllt female variable to the model as in the second model below gen femeduc fema39eeduc reg 39Iwage fema39le educ exper expersq tenure tenursq Source SS df MS Number of obs 526 F 6 519 6818 Mod 653791009 6 108965168 Prob gt F 0 0000 Residua39l 829506505 519 159827843 Rs LI red 4408 Adj Rsquared 0 4343 Tota39l l 148329751 525 28253286 Root MSE 39978 39Iwage Coef Std Err t Pgtt 95 Conf Interva39l fema39le 296511 0358055 828 0000 3668524 2261696 edJC 0801967 0067573 11 87 0000 0669217 0934716 r 0294324 0049752 59 000 0196585 0392063 expersq 0005827 0001073 543 0000 0007935 0003719 tenure 0317139 0068452 463 0000 0182663 0 51616 tenursq 0005852 0002347 249 0013 0010463 0001241 C 416691 0989279 421 0000 2223425 6110394 reg 39Iwage fema39le educ ferneduc exper expersq tenure tenursq Source SS df MS Number of obs 526 F 7 518 5837 Mode39l 654081534 7 934402192 Prob gt F 0000 Residua39l 82921598 518 160080305 Rsquared 4410 Adj Rsquared 0 4334 Tota39l l 148329751 525 28253286 Root MSE 4001 39Iwage Coef Std Err t Pgtt 95 Conf Interva39l fema le 2267886 1675394 135 0176 5559289 1023517 edJC 0823692 0084699 9 72 0000 0657296 0990088 femed1c 0055645 0130618 0 43 0670 0312252 0200962 ex r 0293366 0049842 589 0000 019545 0391283 expersq 0005804 0001075 5 40 0000 0007916 0003691 tenure 0318967 006864 465 0000 0 8412 0453814 tenursq 00059 0002352 251 0012 001052 000128 cons 388806 1186871 328 0001 1556388 6219732 In the second model the coef cient of femeduc variable is insigni cant at any conventional signi cance levels Moreover it is economically small Thus there is no evidence against the hypothesis that the return to education is the same for men and women In the second regression the coe icient of female is insigni cant The reason for this seems to be multicollinearity Because femeduc and female are highly cor related ln this regression the coe icient of female measures the wage difference when educ 0 Obviously there are not many observations close to this value The implication of this is that we wouldn t get very precise estimates which is re ected in the standard error estimates A better model might used the demeaned values of education variable Example 78 Testing difference between groups Assume that we are interested in testing whether there is any difference between two groups say men and women Then we must allow a group that allows the intercept and all the slopes to differ across the groups 60 gpa 60 defamale lsat gfemale gtllt sat ghsperc 76 63female hsperc 63t0th7 8 63female tothrs The 6 parameters re ect the differences between groups If 60 61 62 63 0 then we conclude that there is no difference between groups This is de nitely a testable hypothesis All we need to do is calculating the correspond F statistic gen femsat fema39esat gen femhsperc fema39ehsperc gen femtothrs fema39etothrs reg cumgpa fema39le sat fensat hsperc femhsperc tothrs femtothrs Source SS df MS Nunber of abs 732 F 7 724 3515 Mode39l 181589407 7 259413439 Prob gt F 0000 Residual 534309148 724 73799606 Rsq ed 0 2537 Adj Rsquared 02464 Tota39l l 715898555 731 979341389 Root E 85907 cumgpa Coef Std Err t Pgtt 95 Conf Interva39l fema39le 1113638 528539 211 0035 2151 0759859 sat 0006113 000235 260 0009 0001499 0010727 femsat 0011167 0005 223 0026 0001351 0020984 hspe 0059675 0017765 336 0001 0094551 0024798 femhsperc 0000508 0041025 01 0990 0080035 008105 tot rs 0103004 0010928 943 0000 0081549 0124459 femtothrs 0055599 0020696 269 0007 0014968 009623 cons 1213984 2648281 458 0000 6940617 1733907 reg cumgpa sat hsperc tothrs Source 55 df MS Nunber of abs 73 F 3 728 7472 Mode39l 168533658 3 561778861 Prob gt F 0000 Residual 547364897 728 751874858 Rsq ed 0 2354 Adj Rsquared 02323 Tota39l 715898555 731 979341389 Roo E 86711 cumgpa Coef Std Err t Pgtt 95 Conf Interva39l sat 0009028 0002079 434 0000 0004947 0013109 hsperc 0063791 0015678 407 0000 0094572 0033011 tothrs 0119779 0009314 1286 0000 0101494 0138064 9291105 2285515 407 000 4804118 1377809 R2 e R2q 2537 e 2354 4 F u T 4 4383 gt 237 E054724 1Rink1 1 e 2537724 61 77 Hence the null is rejected Example 79 Chow Tbst Assume that there are two groups with ml and n2 n n1 n2 sample sizes and we want to test whether the intercept and all slopes are equal across the groups Each group is given described by the following model 24 6905911 5922 6Mxku 78 for g 1 2 If we want to test the equality of these groups then we have k l restrictions The unrestricted model is y 61062OG61116211G612x26222a39quot61k1 62kk0u 79 where G is a group dummy which is equal to 0 for group 1 and equal to l for group 2 So if we have 610 621 622 62 0 then two groups have the same coe icients The unrestricted model has n observations and 2k 1 parameters Hence the corresponding degrees of freedom for this model is n 7 2k 1 A key point in this test is that the SSH can be written as the sum of two SSR s from two separate groups More precisely SSH SSR1 SSRZ where SSRQ is the sum of squared error from regression model for group 9 12 Also let SSRP be the sum of squared errors from the pooled model 79 Then the desired F statistic is given by 5534 7 55131 SSR2k 1 55131 SSR2n i 20 1 8 Chapter 8 81 Heteroskedasticity Robust Standard Errors Remember that when the assumptions Al A4 are satis ed Al A4 the OLS estimates are not only unbiased but also consistent For these results to hold we did not need homoskedasticity Similarly R2 and 132 62 are valid measures for goodness of t even under heteroskedasticity We know that if the assumptions Al A5 hold then the OLS is the best linear unbiased estimator BLUE ie the parameter estimates have the smallest variance among the class of linear estimators Moreover if Al A6 hold then OLS is the best estimator Unfortunately under heteroskedasticity the best variance results do not hold Also since the standard error estimates are not valid statistical tests and con dence interval calculation would not be valid eg F test and t test Hence we have to nd a way to solve heteroskedasticity problem One of the solutions is nding so called heteroskedasticity robust standard errors Now let s see why the heteroskedasticity leads to inconsistent standard error estimates Consider the simple linear regression model yd 5051u 80 We know that 81 51 Hence 1711 1 7E A 2 100 7 Var61liVar61 81 ELM mgvm wz l ELM 97052 SLOW WU ZLW 922 If a 0392 then VaTBll which is the 11 739 formula provided earlier Of course under heteroskedasticity this equal ity does not hold as is it shown Fortunately White 1980 showed that we can consistently estimate the standard errors by utilizing the following formula A 211 rig112 V li N 82 W 58R lt gt where 7 denotes the ith residual from regressing xj on all other variables and SSRj is the sum of the squared residuals from this regression The square root of VaTBJ is called heteroskedasticity robust standard error HRSE or simply robust for 31 The good thing about these es timates is that they are robust to the form of heteroskedasticity for large 63 samples After calculating the HRSEs the heteroskedasticity robust t statistic can be calculated as follows t7 Bji jo i hrseBJ 83 Example 81 As an example we estimate the 111wage equation by both usual OLS and robust standard errors The second regression table gives the robust standard errors use WAGEl reg 39Iwage educ exper expersq tenure tenursq Source 55 df MS Nunber of obs 52 F 5 520 6026 Mode39l 544184647 5 108836929 rob F 0000 Residua39l 939112867 520 180598628 Rsqu ed 0 3669 A j Rsquared 03608 Tota39l 148329751 525 28253286 R t E 42497 39Iwage Qaef Std Err t Pgtt 95 Conf Interva39l educ 0845258 0071614 1180 00 704 0985946 exper 029301 0052885 554 0000 0189115 0396905 expersq 0005918 0001141 519 0000 0008159 0003677 tenure 037122 0072432 513 0000 0228927 0513517 tenursq 0006156 0002495 247 0014 0011056 0001255 co 2015715 1014697 199 0048 0022306 4009124 reg 39Iwage educ exper expersq tenure tenursq robust Linear regression Nunber of obs 526 F 5 520 5454 rob gt F 00000 Rsquared 03669 t M 42497 Robust 39Iwage Qaef Std Err t Pgtt 95 Conf Interva39l educ 0845258 0075047 1126 0000 0697826 0992691 expe 029301 0052419 559 0000 0190032 0395988 expersq 0005918 0001126 526 0000 0008 3 0003706 tenure 0371222 0077558 479 0000 0218857 0523588 tenursq 0006156 0002877 214 0033 0011807 0000505 co 2015715 1036779 194 0052 0021075 4052506 As it can be seen from the tables the parameter estimates are the same But the standard errors t statistics p values and con dence intervals are different Because all of these are related to the estimate of the variance of the estimates of coe icient parameters Vam j example we do not even know whether there is heteroskedasticity or 64 In this not We just applied found the robust estimates If we look at the standard error estimates of these two tables we do not see a substantial difference But in some other example it is possible to see a substantial difference and it is particularly important to correct this misspeci cation for such cases The robust standard errors can be either larger or smaller than the usual standard errors In practice they are usually larger as in our example 82 Testing Heteroskedasticity Although we know that the robust standard errors and test statistics calculated by these are robust to unknown forms of heteroskedasticity we would still like to use the usual OLS standard errors if we know that the error term is homoskedastic First under homoskedasticity the t statistics have exact t distribution under the CLM assumptions Second under heteroskedasticity OLS is not the BLUE and it is possible to obtain better estimates than OLS if the form of heteroskedasticity is known Hence we will provide a test for heteroskedasticity Consider the following model y60611622quot6kku 84 where Al A4 is assumed In the null hypothesis we assume that homoskedasticity holds Hence we try to see how much the data follows this assumption If we cannot reject the null say for 5 levels then we may conclude that the het eroskedasticity is not a problem Remember that we only fail to reject the null If the heteroskedasticity holds then Var E can be a function of the regressors A simple function can be de ned as follows U2 50511 522m 5kkv 85 where U is an error term with mean zero given xj s Our null hypothesis is H0 61 62 6k 0 Of course we do not know 112 Hence we use 112 instead and make an F test for the joint signi cance of 6 sThat is 112 50511 522m 5kkv 86 65 The corresponding F statistics is given by Rigk F 1 elmmike 1 87 Remember that the R2 for the restricted model is zero Because this is a regression on a constant term That is why we only compute the unrestricted R2 for this F test Another test is Breusch Pagan test LM nRi2 N xi This test assumes that the error term u is normally distributed Summary of the test Breusch Pagan test 1 Estimate the model 84 and obtain 2 Get the Big from the OLS 86 3 From either F or LM statistic test the null hypothesis that the model is signi cant excluding the constant term Example 82 For this example we use HPRlCEldta in order to test for heteroskedasticity in a simple housing price equation The estimated regression model is given by use HPRICEl reg price 39Iots139ze sqr bdrms Source 55 df MS Number of abs 88 F 3 84 5746 Mode39l 617130701 3 205710234 Prob gt F 0000 Residua39l 300723805 84 35800453 Rsquared 0 6724 Adj Rsquared 06607 Tota39l 917854506 87 105500518 Root MSE 59833 price Coef Std Err t Pgtt 95 Conf Interva39l 39Iots39ize 0020677 0006421 322 0002 0007908 0033446 sqr 1227782 0132374 928 0000 0964541 1491022 bdrms 1385252 9010145 154 0128 4065141 3177018 cons 2177031 2947504 074 0462 8038466 3684405 In order to test heteroskedasticity we predict the error term and regress its square on the regressors 66 predict LLhat residua39ls gen uhat2 LLhatAZ reg uJ IatZ 39Iotsize sqrft bdrms Source 55 df MS Number of abs 88 3 84 534 Mode39l 701213780 3 233737927 Prob gt F 0020 Residua39l 36775e09 84 437800035 Rsquared 0 1601 Adj Rsquared 01301 Tota39l 43787e09 87 503302767 Root MSE 66166 uhat2 Coef Std Err t Pgtt 95 Conf Interva39l 39Iots39ize 015209 0710091 284 0006 603116 427302 sqr 1691037 146385 116 0251 1219989 4602063 bdrms 104176 996381 105 0299 9396526 3023173 cons 5522795 3259478 169 0094 1200462 9590348 We can calculate the F statistic for testing H0 610mm Emmet Emma 0 Note that HA is trivial here by Riw 7 16013 N F7 1 Ri W 7 k i 1 7 1 7 160188 7 3 7 1 i 534 88 gtF5384 2 272 Hence we reject the null hypothesis of homoskedasticity One can calcu late the p value for this test as 002 Indeed the corresponding p value is provided in the Stata output and is shown as Probgt F Similarly the relevant F statistic is given in the output table just below the Number of obs row Another possible test is so called the White test This test is almost the same as Breusch Pagan test The only difference is that in White test 112 is regressed not only on the regressors but also the cross products of the regressors For example for y 60 61901 62902 63903 u the test is done by F test using the following regression 2 2 2 2 U 040 041901 040902 043903 044901 045902 046903 89 047901902 048901903 049902903 U One of the possible problems with White test is that it uses a lot of degrees of freedom In order to overcome this issue a similar test can be proposed Note that 67 39 60 81951 82952 83953 90 This suggests that we can use the following regression for testing het eroskedasticity a2 a0 my emf u 91 Then one can use usual F test in order to test for heteroskedasticity You can install the White test by the following command line ssc install whitetst replace ssc 39insta39l39l whitetst rep39lace checking wh39itetst consistency and verifying not a39lready insta39l39led insta39l39l39ing 39into cadop us insta39l39lat39ion comp39lete reg pr139ce 39Iots139ze sqr bdrms Source 55 df MS Number of abs 88 F 3 84 5746 Mode39l 617130701 3 205710234 Prob gt F 0000 Residua l 300723805 84 35800453 Rsquared 0 6724 Adj Rsquared 06607 Tota39l 917854506 87 105500518 Root MSE 59833 pr139ce Coef Std Err t Pgtt 95 Conf Interva39l 39Iots39ize 0020677 0006421 322 0002 0007908 0033446 sqr 1227782 0132374 928 0000 0964541 1491022 bdrms 1385252 9010145 154 0128 4065141 3177018 cons 2177031 2947504 074 0462 8038466 3684405 wh39itetst White39s genera39l test statistic 3373166 Ch39i sq 9 Pva39lue 10e04 In the above output you can see how I installed whitetst command to Stata This command gives the relevant outputs for the White test In our example the P value for the test suggest that we should reject the null hypothesis that the error term is homoskedastic 83 Weighted Least Squares Under heteroskedasticity it is possible to get consistent standard errors via utilizing the formula for heteroskedasticity robust standard errors But these estimates are not the most e icient ones If we know the form of heteroskedasticity we can calculate so called weighted least squares 68 WLS estimates which are more ef cient than the OLS estimates More over the t and F statistics for these estimates are exact in contrast to the ones that we derive from the robust estimates In this section we assume that the function form for the heteroskedasticity is given as follows Varulx 02hx 02 92 where Mac gt 0 is some function determining the functional form of heteroskedasticity Example 83 Consider the following regression model M 50 f 511 622i f BMW m M 93 where ac munch and Varu ac 0392 Hence Varulx 02hx 02m In order to solve the heteroskedas ticity problem we do the following transformation to our model i amp a m M w m51m52mquot395kmm y 509 51953 f 52952 f 94 where forj 01 2k This method of transforming the error term into a homoskedastic error term is called WLS The reason is for this name is that in this method each observation is given different weight in the minimization problem for the OLS More precisely those observations with higher error vari ances are given less weight as they are less reliable In the minimization problem of OLS each squared residual is weighted by Example 84 Assume that the assumptions Al A4 hold Let s remem ber our simple savings equation say 60 61mm m u mow 95 where Varvlac 02 In this example Mac Mine inc Since income is always positive Mac gt 0 implying that the variance is guaranteed to be positive The transformed model is 69 say 1 60 Vino m Jr 61 11 96 Note that since the original model satis es Al A4 the transformed model also satis es these assumptions In contrast to the original model the transformed model satis es A5 homoskedasticity assumption as well Finally if the original model satis es A6 ie normality of u then the transformed model also satis es this assumption When inter preting this model we have to be careful We use the transformed model just to get more e icient parameter estimates But indeed we are still in terested in the original model Hence the interpretation should be done using the original model Note that most of the times the R2 from this transformed model does not have much meaning as it shows how much the transformed regressors explains the transformed dependent variable We are trying to explain the dependent variable not a transformation of it Example 85 For this example we use a data set from Stata data bank In order to load this data we use the following command line use httpwwwatsuclaedustatstataadoanalysishetdata clear This is a data set from httpwwwatsuclaedustatstataadoanalysiswls0htm website The OLS estimates and heteroskedasticity tests for our data set are given by use httpwwwatsuc39laedustatstataadoana lys39ishetdata c39lear reg exp age ownrent 39income 39incanesq source 55 df Ms mmber of obs 72 F 4 67 5 39 Mode39l 174935701 4 437339252 Prob gt F 00008 Residua39l 543256203 67 810830153 R squared 02436 Adj R squared 01984 Tota39l 718191903 71 101153789 Root MSE 28475 exp Coef std Err t Pgt t 95 Conf Interva39l age 3081814 5 514717 056 0 578 1408923 7925606 ownrent 2794091 8292232 034 0737 1375727 1934546 income 234347 8036595 292 0005 7393593 3947581 39incunesq 1499684 7469337 201 0049 299057 0879857 CD39IS 2371465 1993517 119 0238 6350541 1607611 70 hettest Breusch Pagan Cook weisberg test for heteroskedast39ic39ity Ho Oons variance Variab39Ies fitted va39lues of em CI I iZ l 2923 Prob gt ch12 00000 M39I39itetst White39s genera l test statistic 1432896 111 51112 P Va lue 2802 One of the heuristic ways of seeing this is plotting the residuals as a function of one of the suspected regressors Here we use rvpplot and rvfplot commands for this purpose 1000 1500 1 1 o Residuals 500 1 500 1 1000 1500 1 1 o Residuals 500 1 Fitted values 71 Note that in our case the White test does not reject the null hypoth esis of homoskedasticity But from the plot above and a version of Breusch Pagan BP test we can clearly see that the residuals are show ing heteroskedasticity In this version of BP test the squares of the residuals are regressed on the tted dependent variable and constant Hence the test only involves 1 degree of freedom One of the reasons for why the White test might have failed to reject homoskedasticity is that it considers other variables and the cross product terms as well For our example the White test involves a regression with 13 variables as well as the constant term Our regression model has a variable that is the square of another variable Note that the X2 test is done for 12 degrees of freedom rather than l3 13 is the number variables excluding the constant The reason for this is that Stata drops some of the variables whenever the degree of multicollinearity is high If the variance only depends on one of these variables the ability of this test to detect this kind of relationship might be limited especially for a small sample data set For this reason we also check whether the variance depends on the income alone The following regression suggests that the variance can be a quadratic function of income Here inc tells Stata to include all the variables that start with inc In our case these variables are income and incomesq reg LIJ39IatZ 39inc source 55 df Ms Number of obs 72 F 2 69 3 24 4773le11 2 23865811 gt F 00451 Residua39l 50774e12 69 73585e10 R squared 00859 Adj R squared 00594 Tota39l 5 5547e12 71 7823 5e10 Root MSE 2 7e05 LIJIatZ Coef std Err t Pgtt 95 Conf Interva39l 39income 1705976 7193888 237 0021 2708349 3141118 39incomesq 1415629 6928713 204 0045 2797869 3338828 CD39IS 3033527 1549709 196 0054 6125111 5805709 The F test F 324 gt F0052 69 p value 0451 shows that income and incomesq the square of income are jointly signi cant for 5 level We also use the BP test to see whether income effects the variance Here only the income is assumed to affect the variance The test results are 72 quiet39ly reg exp age ownr ent income incomesq hettest income Br eusch Pagan Cook weisber39g test for heteroskedastic ity O Constant var39l ance Var 1 ab39les 1 ncome chi 21 14 94 Prob gt chi2 00001 For the time being let s say that the variance is proportional to income Hence we ignore incomesq term for the variance Then consider the OLS results for the normalized model gen exps expsqrt39income gen ages agesqrt39income gen ownrents ownrent sqrt39income gen 39incomesqs 39incomesqsqrt39income gen 5 lsqrt39income gen 39incomes 39incomesqrt39income reg exps ages ownrents 39incomes 39incunesqs csnoconstant source 55 df Ms Number of obs 72 5 67 15 84 Mode39l 151800477 5 303600953 Prob gt F 0 0000 Res39idua39l 1283774 35 67 191608111 R squared 05418 Ad R squared 05076 Tota39l 280177911 72 389135988 MSE 13842 exps Coef std Err t Pgtt 95 Conf Interva39l es 2935011 4603331 064 0526 121233 6253275 ownrents 5049365 6987914 072 0472 889857 189973 39Incomes 2021694 7678152 263 0010 4891284 355426 39incomesqs 1211364 827314 1 46 0148 2862689 4399621 5 1818706 1655191 110 0276 5122481 1485068 There is an easier way to get the same estimates 73 gen w inc l income reg exp age ownrent 39income 39incomesq aw w inc sum of wgt 39is 24956e01 source 55 df Ms Number of obs 72 F 4 67 573 Mode39l 126623481 4 316558701 Prob gt F 00005 Res39idua39l 370380821 67 552807195 R squared 0 2548 Adj R squared 02103 Tota39l 497004301 71 700006058 Root MSE 23512 exp Coef std Err t Pgtt 95 Conf Interva39l age 2935011 4603331 064 0526 121233 6253276 ownrent 5049364 6987914 072 047 889857 1 9 97 39 202 1694 7678152 263 0010 4891285 355426 39incomesq 1211364 827314 146 0148 2862689 439962 CD39IS 1818706 1655191 110 0276 5122481 148 5068 In this regression we provide the analytical weights aw wincl for the minimization problem of the WLS Remember that for the OLS we minimize the sum of squared residuals For WLS estimation are the residuals Hence we minimize ELI ELI That is why the weights are in the WLS estimation Note that the parameter estimates and variance related estimates are the same for these two tables But for example F statistic and R2 are different In the latter table the constant term is included though in the former table there is no constant term Did we solve the problem by WLS When determining this it is easier to consider the former table and underlying model in it For simplicity lets denote owm ent by owm income by inc and incomesq by inc2 Our regression is eacp 76 1 6 age 6 owm 6 inc 6 inc2 v Mine 7 Oxinc lxinc Zxinc Zxinc kxinc Vino 1 901139 902139 903139 904139 W i 7 7 7 7 7 98 9 50953139 51953 62 5395 64 99 an where forj 01 2 34 and h moi We need to determine whether is really homoskedastic or not Hence all we need to do is applying white test on Equation 97 I would include the constant term when implementing the White test For simplicity let s try the simpli ed version of the white test For this purpose we can use tted option of whitetst command This command automatically will include the constant term in the second stage of the simpli ed White test 74 040 113 0423912 11139 100 Since we run another regression we rerun our relevant OLS and do simpli ed White test in order to check whether the heteroskedasticity correction worked or not We see that the simpli ed White test cannot reject homoskedasticity reg eXps ages ounrents incomes incomesqs cs noconstant source 55 df Ms NLmber of obs 72 F 5 67 1584 Mode39l 151800477 5 303600953 Prob gt F 0 000 Res39idua39l 128377435 67 191608111 R s uared 05418 Adj R squared 05076 Tota39l 280177911 72 389135988 Root MSE 13842 exps Coef std Err t Pgtt 95 Conf Interva39l 2935011 4603331 064 0526 121233 6253275 ownrents 5049365 6987914 072 0472 889857 189973 39 2021694 7678152 263 0010 4891284 355426 39incomesqs 1211364 827314 146 0148 2862689 4399621 cs 1818706 1655191 110 0276 5122481 1485068 wn39itetst f39itted wn39ite39s spec39ia39l test statistic 3138481 Ch39i sq 2 P va lue 2082 The hettest which implements Breusch Pagan test command does not work when the original model does not include a constant term We can do this test manually as follows pred39ict LIJ39Iatres39id gen LIJ39IatZ LlJ39latAZ reg LIJ39IatZ ages ownrents 39incomes cs source 55 df Ms NLmber of obs 7 F 4 67 0 97 Mode39l 13796e10 4 34489e09 Prob gt F 4289 Res39idua39l 2 3779e11 67 3 5491e09 R ed 0548 Ad R squared 0 0016 Tota39l 25159e11 71 35435e09 59575 LIJIatZ Coef std Err t Pgtt 9 5 Conf Interva39l 7638244 1989763 038 0702 3207759 4735408 ownrents 2222216 3010326 074 0463 8230852 3786 39incomes 3644307 6416772 057 0572 164522 4 91636 24 cs 2541877 2250461 113 0263 7033816 1950062 cons 2212886 2477211 089 0375 2731648 715742 75 Note that we already run our rst stage regression for the BP test Hence we can directly generate 11 and 112 Our null hypothesis for the BP test is H0 ages owm ents incomes cs 0 In contrast to Stata we can use F statistic do do this test In our case the corresponding F statistics and p value are F4 67 97 and p value 4289 Therefore again we cannot reject the homoskedasticity of our model One can also use the graphical tools to see whether there is heteroskedasticity or not rvpplot and rvfplot commands will not work in our case as the regression does not include a constant term We can use the scatter command instead For example scatter uhat yhat 600 o o 1 2 o 3 20 I 3 a n o o o 39 5 o 390 o o 39 o o I o c39 39 0 v 39 I 39 o R i i i i 0 50 200 250 i i 100 150 Linear prediction The test statistics are more promising than visual inspection Appar ently we might not have solved the problem Maybe the functional form that we picked was not the best one For example earlier we deduced that income and incomesq are jointly signi cant Hence we can try a functional form that is a function of these two variables Consider the following model for the variance Varuix expa0 041901 042902 041901 101 The exponential term assures the non negativity of the variance Hence M2 expa0 041901 042902 041901 expv where Eexpvi 1 or 76 similarly lnu2 040 041901 042902 0 U with 0 For our example lnVarul 040 alincome agincomesq 1 Since we do not know u s we rather use the squares of 11 s ie ln 2 040 041901 04ng amok 1 After running this regression we can get exp zo 341901 ing zkxk This procedure is called feasible generalized least squares GLS or feasible WLS FGLS is more general than FWLSl Summary 1 Run the regression of y on x1 x2 90k and the constant and obtain the residuals 11 2 Run the regression of ln 2 on 901902 xk and the constant and obtain exp 3 Run the WLS with weights This is equivalent to multiplying model 1 variables by If we knew h then GLS gives unbiased parameter estimates The feasible version parameter estimates we do not know h are not unbiased and thus it is not BLUE But these parameter estimates are consistent and are asymptotically ef cient Hence this procedure is valid only for large sample data Example 86 We continue with our earlier example This time we apply feasible WLS The estimation results are reg exp age ownrent 39income 39incomesq source 55 df Ms Number of obs 72 4 67 53 Mode39l 174935701 4 437339252 Prob gt F 00008 Res39idua39l 543256203 67 810830153 R squared 02436 Adj R squared 01984 Tota39l 718191903 71 101153789 out MSE 28475 exp Coef std Err t Pgtt 95 Conf Interva39l age 3081814 5514717 056 0 578 1408923 7925606 ownrent 2794091 8292232 034 0737 1375727 1934546 39income 234347 8036595 292 0005 7393593 3947581 39incomesq 1499684 7469337 201 0049 299057 0879857 CD39IS 2371465 1993517 119 0238 6350541 1607611 Get res39idua39ls pred39ict LIJ39Iat res39id Log transform squares of res39idua39ls gen 39ILIJ39IatZ 39IogLIJ39Iat2 77 reg 39IuJ39IatZ 39ihcome 39ihcomesq source 55 df Ms Number of obs 72 F 2 69 12 07 Mode39l 92145144 2 46072572 Prob gt F 0000 Res39idua39l 263431364 69 381784586 R s uared 0 2591 Adj R squared 02377 Tota39l 355576508 71 500811983 Root MSE 19539 39IuJIatZ Coef std Err t Pgtt 95 Cohf Interva39l 248571 5181758 480 0000 1451977 3519443 39ihcomesq 2448906 0499075 491 0000 3444534 1453278 CD39IS 4137353 1116255 371 0000 1910485 6364222 pred39ict 39IuZJ39Iat Xb Estimate the variance gen varJ39Iat exp39u2J39Iat Get the weights gen w lvarJ39Iat Do the WLS reg exp age ownreht 39ihcome 39ihcomesq aw w sum of wgt 39is 28166e 02 source 55 df Ms Number of obs F 4 67 6969 Mode39l 287257607 4 718144018 Prob gt F 00000 Res39idua39l 690414785 67 103046983 R s uared 08062 Adj R squared 07947 Tota39l 356299086 71 501829698 Root MSE 101 51 exp Coef std Err t Pgtt 95 Cohf Interva39l age 1233683 2551197 048 0630 6325894 3858528 ownreht 5094976 5281429 096 0338 54468 1563675 39 1453045 463627 313 0003 5276412 2378448 39ihcomesq 793828 3736716 212 0037 15 968 4797646 CD39IS 1178675 1013862 116 0249 3202352 8450027 9 Chapter 9 In this section we will talk about the consequences of measurement error Assume that we would like to explain 34 and our regression model is 3f 60 61901 t 52902 t t 5199019 t U7 El 0 and Varu ac 102 cannot be observed with full precision The econo metrician observes 34 instead of 34 where 34 34 6 Unfortunately 34 Here 6 is the 78 measurement error of the econometrician which is independently identi cally distributed with mean 0 and variance 0393 Then the econometrician would be using the following model for the estimations y606116225kkue 103 If e is not correlated with the regressors OLS estimates would still be unbiased and consistent But the parameter estimates from such an estimation would have higher standard errors than the one with out measurement error This is simply because 039 0393 gt 0393 and A 2 2 VarQBJlx W Example 91 Consider we want to the effect of job training grants on the ratio of defected item production Let s denote the true value of this variable by Tde f Probably we cannot measure this variable perfectly Let s denote our observation by Tdef Then 7 de Tdef e The corresponding regression model is rdef a0algrantue Probably the rms taking grant would have more incentive to underre port the ratio defected items Therefore 007 7 grant e 0 implying that the parameter estimates for 040 and 041 would be biased and incon sistent Thus if the measurement error is systematics the problem can be more severe relative to the non systematic measurement error case Of course the measurement error can be in the explanatory variables as well For simplicity consider the simple regression model 950 51 f 60611 75 111 60 613 014 615 where at is the true value and x1 xi e is the observed value of the explanatory variable If Couex1 0 along with the natural assumptions Cauue 0 and 001100110 0 then 6 u 7 616 is uncorrelated with 901 Hence the parameter estimates would in unbiased As in the earlier case the 79 error variance is higher relative to the no measurement error case That is Var5 03 fai gt Ti assuming that 61 y 0 If Couexf 0 along with the natural assumptions Cauue 0 and 001100111 0 then 6 u 7 le is correlated with 901 Hence the parameter estimates would in biased To see this note that 0 COL6901 001102901 7 e 001102901 7 e 001102901 7 0 Thus COL6901 a a 0 10 Chapters 10 11 101 Stationary and Nonstationary Time Series De nition 101 The stochastic process act t l 2 is said to be strictly stationary if for any set of time indices 1 S t1 lt t2 lt lt tk and for all integers h 2 l the distribution of t1xt2xtk is the same as the distribution of xt h act h actkirh Let act t l2 be a stationary stochastic process Then for any i and j the random variables x and xj has the same distribution Hence Varac Varmj and for stationary stochastic processes De nition 102 The stochastic process act t 1 2 with lt 00 is said to be covariance stationary if Ext Varmt Cazact th autocovariance do not depend on t Note that stationarity implies covariance stationarity Henceforth when ever we say stationarity we refer to strict stationarity Now we give a vague de nition This de nition has many variations and we only give the general idea De nition 103 A stationary process act is said to be weakly de pendent if and only if the dependence between act and xth goes away as h grows without bound De nition 104 A stationary process act is said to be asymptoti cally uncorrelated if and only if Carma th 7 0 as h 7 oo Intuitively asymptotically uncorrelated stochastic processes is a form of weak dependence The importance of weak dependence comes from its close relationship with LLN and CLT More speci cally some form 80 of weak dependence along with stationarity is required in order get the time series framework versions of LLN and CLT Example 105 Consider the following stochastic process ytptyt71vtat1727 104 where ut is an independently identically distributed random variable with mean 0 and variance 0393 and vi is independent of yo Assume that gt is a stationary stochastic process and let Vadyt 039 lt 00 Note that since yt1 is a linear function of Ut1vt2111 and yo Covyt1vt 0 Then a Varyt Varptyt1 Ut praryt1Var1t pfai 03 Hence 1 ipf ri 05 or a 1 which requires lpl lt 1 But 039 must be invariant to t Therefore pt should be constant ie pt p Similarly My Elyt Elmpl Utl pElyt1E1tl pElyt1 0 pMy Thus My pMy implying that for stationarity we need either p 0 or My 0 Hence if we want to talk about autocorrelation we need to assume that My 0 Finally we nd the autocovariance of yt Let h 2 1 and consider the following equation yth Pyth71 Uth PPythi2 Uth71 Uth ngthi2 f PUth71 f Uth 105 P2 PtHike Uth72 Jr PUth71 Uth PatHM P2Uthi2 PUth71 Uth 9th Philvt1 phi2vt2 Uth Phyt Z plvthii i0 Noting that Elyt 0 and Elytvtirk 0 for k 2 l we get 81 Couch 24m E ytyml E ytlE 24ml 106 ytytjl hil 3amp0th Z PlUtHHJl 10 hil phElyfl Z p Elythml i0 E E E Hence the autocovariance does not depend on the time as well We conclude that the AR1 process that we just described is a stationary stochastic process Moreover note that GUIgt My VarytVa7 yth ajph HTSO39S CWTQJt 9pm Therefore AR1 process is asymptotically uncorrelated as ph gt 0 as h gt 00 Example 106 A simple example of weakly dependent stochastic process is a moving average process of order one or shortly MA1 process which is de ned by ytvtpvt1t12 107 where Ut is an iid process with mean 0 and variance 0393 Note that the mean and the variance of yt does not depend on time 82 Eyt Em pvt11 EM pEUt11 0 p0 0 108 and Varyt Varvtpvt1 Varvtp2Va7 1t1 1 p20392 109 Moreover GUIgt yth COMM PUt717Uth PUt71h 110 0011vt7 Uth 0011th7 pvt111h CUUPUt71Uth Cazpvtil pump 0pag00ifh1 0000ifhgt1 7 p05 if h 1 0 if h gt 1 Hence the autocovariance of yt ie COUytyth does not depend on time Finally Varyt lt 00 implying that gt is a covariance sta tionary stochastic process Moreover MA1 process is asymptotically uncorrelated 102 OLS Assumptions for Time Series Now we provide our regression assumptions and two theorems for the time series framework Let act 1tx2txkt andX x lac 2 x T Here X is nothing but a matrix with rows consisting of act Hence X contains all the information about the regressors at all time periods but act contains only the information about time t TSA1 The stochastic process tyt t 12n follows the linear model y 60 lm 62202 6201 u 111 where n is the number of observations and IL is a sequence of errors TSA2 No explanatory variable is constant or has a perfect linear relationship with the other explanatory variables 83 TSA3 The explanatory variables are strictly exogenous E uth l 0 Hence TSA3 says that the conditional expectation of IL on all the information about the explanatory variables is equal to zero TSA1 The stochastic process tyt t l2n follows the linear model yt 60 lm 62202 6201 u 112 where n is the number of observations act 1tx2txm and Mt is a sequence of errors Moreover act yt is stationary and weakly dependent More precisely LLN and CLT can be applied to sample means TSA2 No explanatory variable is constant or has a perfect linear relationship with the other explanatory variables TSA3 The explanatory variables are contemporaneously exoge nous Eutlt 0 Hence TSA3 says that the conditional expectation of IL on the infor mation about the explanatory variables at time t is equal to zero Note that Euth 0 implies that 0 Because EEuthlact E The intuition in here is as follows We cannot improve our pre diction by rst conditioning on more information and then predicting using the smaller amount of information A similar but easier to under stand example is as follows EEutlft1l1t E will Here It and It stands for the information at time t The prediction of a random variable th by using the information at time t should be the same as prediction of prediction of It at time t 1 So it is useless to predict your next period s prediction Here of course the information at time t is smaller than the information at time t 1 As a conclusion assumption TSA3 is a stronger assumption than TSA3 Unfortunately in order to get un biased parameter estimates we need strict exogeneity of the explanatory variables Hence without this assumption we can have consistent but biased parameter estimates Finally TSA3 does not allow correlation between LL and lagged values of explanatory variables On the other hand TSA3 does not make such a restriction Example 107 Consider the following regression model 84 yt 60 6190 622471 u 113 In this model yt is determined not only by act but also by yt1 Note that since Y contains the information on yt for t lt T and X contains information on act we have EuthY gt 7 60 7 61m 7 623ml ut a 0 Therefore TSA3 is violated Another way to see this violation is evaluating the covariance of Mt and yt Remember that E 11th Y 0 implies that Covut5 0 and Cou1bty5 0 Indeed Covutyt Vadut gt 0 But Emilxt 351 can still be zero Thus there is nothing contradicting with TSA3 Hence from this model we can still have consistent parameter estimates Example 108 Assume that the government is using the past murder information when determining today s police force size Consider the following regression model mratet 60 lpsizet 39th 114 Assume that psizej is uncorrelated with U forj g t But since mratet1 is correlated to Mt TSA3 does not hold On the other hand TSA3 holds Theorem 109 AUnder Assumptions TSAl TSA3 the OLS estimates are unbiased 6 Theorem 1010 Under AAssumptions TSA1 TSA37 the OLS esti mates are consistent plim j 61 We continue to provide our assumptions TSA4 The error term is homoskedastic Varuth Varmt 0392 Hence the variance of the error term does not depend on X TSA5 There is no serial correlation or no autocorrelation Cor ut utth Carrututh 0 for any h f 0 When TSA5 is violated ie COUututth 0 we say that errors suffer from serial correlation or autocorrelation TSA4 The error term is homoskedastic Vadutlxt Varmt 0392 TSA5 There is no serial correlation or no autocorrelation Cm ut uthlt xth Carrututh 0 for any h y 0 85 Hence the variance of the error term does not depend on X Example 1011 Consider the following model yt6051tm 115 Mt putil M where Ut is iid random variable with mean 0 and variance 0395 Var vi 0393 assume for simplicity that vi and no are independent with act and m is a stationary stochastic process with Vadut 0393 Note that 00110 upl Covput11 vi W71 116 POW11471711471 COMM U471 pai 0 Hence this model su ers from serial correlation Note that if the un conditional covariance is not zero then the conditional cannot be zero as well Theorem 1012 Under Assumptions TSA1 TSA5 the variance of B conditional on explanatory variables X is given by 02 V A 117 W SW1 R lt gt where SST is the total sum of squares of acjt and R is the R2 from the regression of acjt on all the other explanatory variables including the constant term Theorem 1013 Under Assumptions TSA1 TSA5 72 31 is an unbiased estimator of 0392 Theorem 1014 Under Assumptions TSA1 TSA5 the OLS estima tors are the BLUE Theorem 1015 Under Assumptions TSA1 TSA5 the OLS esti mators are asymptotically normally distributed Moreover under the 86 null hypothesis the t statistics has t distribution and F statistic has F distribution The usual construction of con dence intervals are approx imately valid TSA6 The errors are independent of X and are independently and identically distributed as N00 2 Note that TSA6 implies TSAB TSAA and TSA5 Theorem 1016 Under assumptions TSAl TSA6 CLM assumptions for time series the OLS estimates are normally distributed conditional on X Moreover under the null hypothesis the t statistics has t dis tribution and F statistic has F distribution The usual construction of con dence intervals are valid 103 Random Walks In the preceding parts we argued that under weak dependence assump tion we can use usual OLS inference procedures Unfortunately if weak dependence assumption does not hold we cannot use LLN and CLT Hence our OLS inferences would be invalid Earlier we shoved that under lpl lt l assumption AR1 process is asymptotically uncorrelated which is a form of weak dependence Whenever p 1 this is not valid anymore Consider the following model yt M71 W 118 where Ut is iid random variable with mean 0 and variance Uggand Ut is independent of yo This process is called a random walk The name comes from the fact that our best prediction for the next period is the same as today s observation That is ElmHAM yt for any h 2 0 The difference between these two values yt1 and yt is random Hence the path of a random walks looks totally random The plot below shows three random walk and three AR1 processes with Ut N N0l and yo 0 As it can be seen although the initial value for all random walks are the same they navigate away from 0 unsystematically On the other hand the AR1 processes are more persistent around zero 87 hree Ran m Mr J J I 394 W WW W4 2 T 1 a Three AR ytyt71 119 M72 W Util yoUtUt71m11 Then Elyo 120 and Vamyt Vary0 121 i1 t Vary0 Z Varv 11 050 mi Hence as time grows the variance of yt grows unboundedly Thus gt is not a covariance stationary process hence not a stationary process Note that if yo 0 is a non random constant then Varyt tag On of the important properties of random walks is their persistency That is our best prediction for future is today s value Hence as we men tioned earlier Elythlytl E M Uth Uth71 Ut1lytl M for any h 2 0 Because EUtslyt EUt5 0 yt contains no infor mation about future error terms Hence the conditional expectation is the same as the unconditional one In contrast to random walk the conditional expectation of an AR1 process is given by Eythlytl 89 Ephyt 201 pivthlyt phyt for any h 2 0 Thus provided that lpl lt l today s observation has diminishing effect on the prediction of the future The autocorrelation of a random walk is given by assuming that 050 0 COUyth7 yt VarythVa7 yt Calyo Uth Uth71 11190 W W71 Jr 111 Carmamy 122 lt Maillwil For any given b there is t such that C07 7 ythyt gt 0 Hence as ymptotic uncorrelatedness is not satis ed for a random walk Since nonstationary processes does not satisfy many of the properties that enables us using usual OLS inferences we somehow have to deal with this issue One of the ways is constructing stationary versions of these nonstationary variables De nition 1017 A stochastic process is said to be integrated of order k or Ik if and only if after k differencing it becomes stationary To be more precise let Ayt gt 7 ypl Agyt Ayt 7 Aypl Agyt A23h A237ml w Akth Ak7lyt Ak7lyt71 Then yt is Ik if and only if Akyt is stationary lf yt is a stationary process no differencing is necessary Thus a stationary yt is 0 Example 1018 Consider the random walk yt M71 W 123 where Ut is iid random variable with mean 0 and variance Uggand Ut is independent of yo Note that Ayt gt 7 ypl Ut which is de nitely a stationary process Therefore yt is 1 How do we understand whether a process is 0 or 17 An informal way would be estimating p in the AR1 model If lpl lt 1 then the 90 estimate of p is consistent yet biased Unfortunately we can never know whether lpl lt l or not But as a rule of thumb if lpl gt 9 or maybe lpl gt 8 or you can check whether i1 lies in two standard error con dence interval Note that if p is actually equal to i1 these con dence intervals are not valid Hence these suggestions are riskylll we can conclude that the process not stationary Then differencing might be required 11 Chapter 12 111 Introduction to Serial Correlation In this section we are interested in time series data In time series frame work since we observe the same variables over time it is possible that the error terms are not independent of each other The consequences of non independence are 1 standard errors tests and con dence intervals will not be valid 2 OLS estimates are not BLUE These negative results are true even in the large sample One way to model such a dependence in the error terms is as follows yt o51tut 124 Mt Putil t Ut where Ut are uncorrelated random variables with mean zero at and 11 are uncorrelated at all time periods tk l2T Vadut 0393 Varvt 03 and lpl lt 1 Note that Cwmut11 Covput11 Unapl pCauut11ut11 CUU Utut1 poi 0 poi Hence ut s are serially correlated This type of correlation is called AR1 serial correlation or simply AR1 AR Auto Regressive Similarly we can de ne AR2 serial correlation as follows ut plum pgut72 Ut 125 where Ut are uncorrelated random variables with mean zero at and 11 are uncorrelated at all time periods tk 12T Vadut 0393 Vadvt 0393 and p1 and p2 some are subject to some restrictions Now let s try to some of the implications of serial correlation closer For the simple regression model 124 the parameter estimate of 51 is given by A 220 7 51 61 if 126 BLOW 93V For simplicity de ne x act 7 a and AW Then 61 21 7 61 1490 231 xfut The conditional variance of 81 given 90 is for notational simplicity we drop conditionals vmwl Ax2Var mi 127 14952 ZWDZVM WJ 2 3 xfacfjCovut Utj T71 Tit 9020i 2 Z Z acfacfjEutut t1 j1 A2 140003 2A2039i Z Z scalp Ms N H H t1 j1 T71 Tit 2 1lt 1lt 039 t1 Zj1 ttjP7 2 T7 2T Uu 1lt2 Z 1lt2 Zt1 t t1 t Additional calculations for the above equality are provided below utj Putj4 Um PPutji2 Wail Utj 128 PPPuzj73 Vera wail Um Pjut PjilUtH PjiZUHz PUtj71 MH Hence COMM WM GUIUta Pjut f Pjil39UtJrl l l Pvtj71 Utj 129 PjCUUUt7 10 Note that when p 0 Vad l 2T0 2 which is the familiar variance t 17 formula that we used earlier Most of the times p gt 0 and xfacf gt 0 implying that ignoring serial correlation leads to underestimation of the variance of 81 Remark Even if we have serial correlation provided that the data is stationary the usual the goodness of t measures R2 and R2 are valid Unfortunately if the variables in our model are not stationary this is not true anymore 112 Serial Correlation with Lagged Explanatory Variables In this section we consider two scenarios For the rst one consider the following model yt 50 f 514 Hz 130 where Elbtlyt1 0 and gt is a stationarity process Let My Eyt and 039 Varyt The stationarity implies that u Em 131 El o t iytA Jr Utl 60 lMy 0 60 lMy 7 o Hence My 7 li l Moreover a Varyt 132 var60 l lltil l Mt Biladym WWW 26100vumyt71 iUi 03 03 17 Note that we need lt l for stationarity In this model Assumption TSAB holds Hence the parameter estimates 60 and 61 are consistent but biased By EmuF1 0 we know that Cou1otyt1 0 But this does not imply that Cou1btm1 0 To be more speci c 2 7 Hence 0391 7 Cattot um 10th gm 7 60 7 612 133 COMM M71 510011ltUt7 M72 0 7 5100Uutyt2 5100UUtyt72 0 Therefore in this model we have serial correlation in the error term and lagged dependent variables in the regressors but we still have consistent parameter estimates In the following model the lagged explanatory variable leads to inconsistent parameter estimates The model is given by yt 50 t 51yt71 Ht 134 th P1144 M where U is a stationary AR1 process with Vadut 0393 Note that 0011utyyt71 COWmpl Jr Wt 50 t 51yt72 UH 135 0011Put71t7519t72 0011Put71t7ut71 0011Ut75137t72 1071 0 xvi 0 wi0 Now we show that the above model can be written in another familiar form 5M4m mm 50 t 514 putil f Wt 50 t 514 pltyt71 50 51yt72 l Ut 17 50 51 I Mp1 ll2 W 040 Ollytil 04272 W 94 This suggests that many times serial correlation is an indicator of in complete speci cation of the model In our example the reason for serial correlation was omitting ytlg from the regression Hence if the econometrician includes a lagged dependent variable and assumes serial correlation he she should have good reasons for that 113 Testing for Serial Correlation In this section we describe ways to test for serial correlation First we consider a test with strictly exogenous regressors ie Euth 0 This rules out the models with lagged dependent variables Consider the following model yt50511t522t5k90ktut 137 10Put71 where Ut is iid random variable with mean 0 and variance 0393 and Ut and MD are independent with the explanatory variables The null hypothesis for no serial correlation is H0 p 0 138 Noting that under the null hypothesis ut is weakly dependent Hence if we knew ut we could have tested this hypothesis by running the following regression ut pupl vi 139 Unfortunately IL is not observable to the econometrician But we can observe at For large sample we can rather use the following regression for testing serial correlation t mm vi 140 It can be shown that with the help of strict exogeneity assumption the distribution of i is asymptotically the same as t 95 Summary of the test with strict exogeneity 1 Run the regression yt 60 Black ght kxkt ut and get residuals t fort 12T 2 Run the regression t papl Ut for t 23 T and get i and corresponding t statistic t You can add the constant to this regression as well The t statistics would still be valid asymptotically 3 Using the t statistic implement the test get con dence interval and p value etc Many times p gt 0 Hence you can pick the alternative H1pgt0 Now we relax the strict exogeneity assumption For the large sample the following test is valid even if the regressors contain lagged dependent variables Of course the test is still valid if the explanatory variables do not contain the lagged dependent variables Without providing the details we simply provide the procedure Summary of the test Without strict exogeneity 1 Run the regression yt 60 6190 ght kxm ut and get residuals t for t l 2 T 2 Run the regression at 040 a1x1t 042902 041901 p t1 Ut for t 2 3 T and get i and corresponding t statistic t The explana tory variables in here are allowed to be lagged values of the dependent variable Here you can use heteroskedasticity robust standard errors and corresponding robust t statistic 3 Using the t statistic implement the test get con dence interval and p value etc Many times p gt 0 Hence you can pick the alternative H1pgt0 Whenever we have strict exogeneity it is preferable to use the test for mer test test with strict exogeneity There are other kinds of serial correlation For example for quarterly data we might have ut put4 vi 141 In this case the procedures described above are still valid with obvious modi cations More precisely use the regressions Lt p t4 vi and at 040 041901t 042902 a m Data instead 0f at P t71 and Lt 040 041961t 042902 041901 plinl Ut respectively Similarly if U follows an ARk process we need to use F test in order to test the joint signi cance of relevant p parameters 114 Remedies for Serial Correlation In this section we assume that Assumptions TSAl TSA4 hold along with stationarity of the data and we relax Assumption TSA5 by con sidering the ARl serial correlation Assume that the explanatory vari ables are strictly exogenous For simplicity we restrict the number of explanatory variables to one Finally for notational simplicity we drop the conditioning on X The model is given by yt5061xtm 142 Mt Putq M where Ut is iid random variable with mean 0 and variance 0393 variance of Mt is 0393 and vi and no are independent with the explanatory variables Then by stationarity we have a 7 143 In order to solve the serial correlation problem we use the following transformation yt Pyt71 50 51 Mt 3030 51t71 utelb for t 2 144 501 P 510 pt71ut Putn 501 P 510 Pt71 W 01 t 50500 515 at for t Z 2 145 Where gt yt pytila 500 1 i P it 9 pt717 and at Ut The OLS estimators from this regression is not BLUE because we have to omit one observation from the estimation Hence our standard error estimates cannot be the best ones using the data at hand In order to overcome this problem we can use 97 3J1 50 f 51901 U1 146 but Varu1 0393 0393 Hence we have to convert the variance of error term in equation 146 to 0393 This can be done by the following transformation 17 p2gt12y1 6017 W 611 e p212x117 p2gt12u1 147 01 331 509N001 51501 111 148 where 101 7 p212 i1 p2121 and 111 p212U1 Hence the nal regression becomes gt 50500 f 61it at 149 The reality is that we do not know p and this method is no more than a ction Fortunately we can use these ideas to get feasible estimates for large sample data Below we provide the procedure for the feasible GLS FGLS that solves the serial correlation issue Procedure for the FGLS 1 Run OLS for yt 60 619 U and get residuals t 2 Run OLS for Lt mpl vi and get J You will drop the rst observation 3 Run OLS for t 60im61 ct t to estimate 50 and 61 The usual standard errors test statistics are asymptotically valid The extension to multiple explanatory case is trivial 12 Chapter 15 Example 121 Introduction of IV estimation Consider the fol lowing simple linear regression model y5051u 150 where Covu a 0 We know that the OLS coe icient estimates for this regression model will be biased and inconsistent There is a way to x inconsistency problem we will still have bias Assume that there is a variable Z such that COLx 2 a 0 and Covzu 0 This variable is called instrumental variable IV for so or instrument for ac Since IVs are not correlated with the error term u they are exogenous to our model They can help us to get consistent parameter estimates even if x is correlated with M It is not always possible to check the COUZ u 0 assumption Most of the times we can deduce this through economic theory But we can check whether 90 and Z are correlated or not A simple test would be running the following regression ac a0a12v 151 If 041 is signi cant we can conclude that COUxz 0 Now we see how an TV can help us to solve the inconsistency problem In order to see this note that Cauyz 001160 6190 uz 152 6100Uxz Covuz 6100Ux2 Therefore 7 Covyz 51 7 W 153 The sample counterpart of this gives us the consistent estimate follows from LLN for 61 1 20 i 93 i 5 Hence the consistent estimate for 50 is given by Bo g i 819 155 Note that when Z x the IV estimator is the same as the OLS es timator One of the problems with IV estimation is that it is a large sample method For the small sample the bias in the estimates can be substantial Example 122 Assume that we want to estimate the relationship be tween education and wage We know that education is correlated with ability and somehow we do not have the data for any proxy variable to measure ability such as IQ Hence if we simply run the following regression the parameter estimates will be biased and inconsistent lnwage 60 leduc u 156 Because u contains ability in it and thus u and educ are correlated Last digit of phone numbers phone probably satis es Cauph0neu 0 But clearly Cauph0ne educ 0 implying that phone is not a valid IV for educ What about mothers education level m0teduc This variable is probably correlated with educ But it is also likely that this variable is correlated with ability Hence 0011m0theduc u a 0 Another can didate for IV might be number of siblings siblings Generally siblings is associated with lower educ Hence if siblings is not correlated with u then it can be an IV My subjective belief is that siblings can be cor related with u But even if there is such a correlation this correlation would probably be less than Carreduc moteduc 100 1201 Statistical Inference for the IV Estimator Weak IVs and Goodness of Fit Assume that Varu x 0392 Homoskedasticity Then it can be shown that the standard error for 51 is given by A 72 SW W quot72 is the estimator for Varu 0392 Ta is the total sum of squares of the 90 R72 is the R2 from the regression of x on Z and the constant A 2 where 039 The OLS counterpart of this standard error is A 72 StdW W Note that whenever Z 90 Hi 1 But if the correlation between 90 and Z is low the IV estimator assuming that OLS is valid have larger standard errors This kind of le are said to be weak IV Moreover even if 2 and u are moderately correlated if Z is a weak IV the bias can be large Hence whenever OLS is valid although the IV estimators are still consistent the OLS estimators are preferred as they have smaller standard errors Of course another reason for this choice is that the OLS estimator is unbiased in small samples in contrast to IV estimator Finally most estimation packages compute the R2 after the IV estimation This R2 for an IV estimation can be negative We cannot use this R2 to calculate the F statistic Also OLS would have a smaller R2 compared to IV But of course our sole purpose is not error mini mization Hence R2 cannot be used to choose the estimation method OLS or IV 1202 IV Estimation for the Multiple Regression Model In this section we denote the endogenous variables by y and the TVs by Z Now consider a simple multiple regression model 91 60 61322 6321 u where 0 Covuy2 a 0 and 0011u 21 0 101 This form is called structural equation as it involves a causal relationship Since 21 is a regressor we need another exogenous variable for IV say 22 Then we have 0 Covu21 0 and Cauu22 0 We can use the sample counterparts of these conditions in order to identify our parameters 6061 and 62 That is we will have three equations three unknowns n 2371139 Bo 7613213976221139 0 11 291139 Bo 3192i 3221021139 0 11 ZOJM 30 8132139 3221022139 0 11 Except economic reasoning we cannot test whether 22 is correlated with u or not But we can test whether it is correlated with yg In order to do this we can run the following regression 92 040 0421 042Z2U This regression is called reduced form regression as it represents yg in terms of exogenous regressors If the coe icient of 22 is signi cant we can use 22 as an IV 1203 Two Stage Least Squares In this section we consider how to deal with the case where the number IVs is larger than the number of endogenous regressors For simplicity assume that there is only one endogenous regressor in the model 321 60 61322 6221 u where 0 Covuy2 7 0 COLu 0 for i 123 and Cauy2z a 0 for i 2 3 Then 0 Cauuz 0 tori 123 These conditions COUuz 0 for i 23 are called exclusion restrictions lead to 102 four equations for three unknowns BO 81 and 32 In general such a system does not have any solution In order to solve this issue we can use an optimal combination of these exogenous variables This is done by regressing yg on all the exogenous variables 32 040 04121 04222 04323 U The best IV for yg is the linear combination of all the exogenous regres sor 32 040 04121 04222 04323 Obviously y is not correlated with M In order to identify 51 the IV for yg should not be perfectly correlated with 21 Hence we need either 642 or 643 to be signi cant This can be testes by the usual F test for H0 042 043 0 Here the idea is decomposing yg into two parts One uncorrelated with the error term 34 and the other correlated with the error term 1 yg 7 34 Of course we are only interested in the uncorrelated part as it would be a good lV Finally since we do not know the 04 parameters we cannot know y Hence we rather use its estimate 322 as our IV 392 540 154121 154222 54323 By these ideas we can obtain IV estimation in two stages This is called two stage least squares ZSLS estimator Procedure for ZSLS 1 Regress yg on exogenous regressors and constant and get 322 eg 392 540 54121 54222 54323 2 Replace yg with 372 and run OLS to get the consistent parameter estimates eg run OLS for yl 50 51392 5221 6 Note that 12 y w 172 5 Hence yl 60 6192 6221 mm This implies that the standard errors from the second stage will not be valid Because the second stage error involves and additional error term 615 This is only a tool to get the consistent parameter estimates The correct standard errors are automatically calculated in many statistical packages Indeed these packages calculate ZSLS estimates in only one stage The details of the estimation is beyond the scope of this course 103 When we have multiple endogenous variables in the right hand side the IV estimation requires that the number of le the excluded exogenous variables is at least as large as the number of right hand side endogenous variables This condition called order condition is necessary but not su icient The su icient conditions called rank conditions are beyond the scope of this course Finally the F test for ZSLS requires some caution The R2 form of F test is not valid anymore This should be clear from the fact that R2 can be negative for an IV estimation Luckily the statistical packages calculate an approximate F test and we do not need to worry about the details 13 Chapter 14 131 Panel Data Methods In this chapter we expand the simple OLS model to allow for unobserv able cross sectional heterogeneities More precisely we concentrate on panel data techniques Some of the advantages of panel data over cross sectional data are 1 Control for individual heterogeneity 2 More in formative data and in turn less collinearity among the variables and 3 Can control the effects that cannot be control by cross sectional or time series data alone Let s say that we have a data on income y years of schooling 90 race dummy d1 which is equal to l for blacks sex dummy which is equal to l for males d2 and a dummy for black males d3 which is equal to l for black males If we believe that the effect of educational achievements vary across groups the relevant regression becomes 9 60 ldl 62d2 63d3 5495 sdw JV 66d2 JV 675L395 u 157 If we want to test the effects of race 61 55 0 or effect of sex 62 56 0 or the interaction of race and sex 63 57 0 we can use the F test We already talked about these when we were talking about dummy variables If the econometrician believes that both the intercept and the slope coef cients are di erent then separate regressions can be carried out for each group On the other hand if the different groups do not have any affect 61 52 53 55 56 57 0 then a pooled regression 104 can be carrier out which improves the e iciency of the estimates Note that when dealing with different groups the econometrician should be careful about heteroskedasticity It is possible to have different error variances for different groups For example for the production function it is not very unlikely to see larger error variances for the larger rms Hence in such cases the heteroskedasticity should be taken care of In what follows we assume that slope parameters are the same across groups yet the intercepts are different Hence the heteroskedasticity of the rms are captured by the intercept term More precisely our model is given by yit 04 51901 52902 BMW Wt 158 where t 12T i 12N and uh is an error term Note that 04 is constant within a group so constant across time yet it depends on the group Here 04 which we call the e ects term represents di erent intercepts for each of the N cross sections In the simplest scenario the uit s are independent and uh N N0 0393 That is the variance of the error term does not depend on the group there is no serial correlation and we have homoskedasticity One of the obstacles for estimating this model is that if the number of groups is large estimation of Dzs can be computationally di icult For example consider a data set with 8 000 banks For this data set the number of parameters that is to be estimated is more than 8000 if we want to estimate 04 s Due to the dummy variables the R2 of such a regression is expected to be high and we should not be excited too much about this One can use this R2 to construct an F statistics for testing the group effect That is for testing whether the group dummies are signi cant or not Note that for a valid calculation of R2 we have to include that constant term in the regression Hence there are N 7 l dummies in the signi cance test we exclude the constant term If the dummies are jointly insigni cant then the model should not include the dummy variables and only include the constant term Note that for the slope parameters ie all parameters other than the dummies it really does not matter which dummy variable is the base dummy In the following regressions wage is logarithm of wage dd are dummies for group i 69011637 is years of experience and expersq 69011672 The following regressions illustrates that the coe icients of non dummy variables do not depend on the way in which the dummies are treated of course the R2 s are not the same and only the one with the constant term is valid 105 reg 39Iwage exper dd noconstant source 55 df Ms Number of obs F 10 54 11667 Mode39l 184144728 10 184144728 Prob gt F 0 0000 Res39idua39l 852331996 54 157839259 R s uared 0 9558 Adj R squared 0 9476 Tota39l 192668048 64 301043825 out MSE 39729 39Iwage Coef std Err t Pgtt 95 Conf Interva39l exp 0320446 0854264 038 0709 2033142 1392249 expersq 0059262 0066105 090 0374 007327 0191795 ddl 1248734 2656356 470 0000 61 1781302 dd2 7568293 1513301 5 00 0000 4534308 1060228 dd3 6367531 1008867 631 0000 4344874 8390187 dd4 4348821 0716989 6 07 0000 2911345 5786297 dd5 3736453 0599298 6 23 0000 2534932 4937974 dd6 2331615 0477992 4 88 0000 1373297 3289932 dd7 2799916 040970 683 0000 1978501 3621331 dd8 1286526 0358494 359 0001 0567788 2005264 reg 39Iwage exper ddl dd2 dd3 dd4 ddS dd6 dd7 source 55 df Ms Number of obs 64 F 9 54 609 Mode39l 865761828 9 961957587 Prob gt F 0 0000 Res39idua39l 852331996 54 157839259 R s uared 0 5039 Adj R squared 04212 Tota39l 171809382 63 272713305 39729 39Iwage Coef std Err t Pgtt 95 Conf Interva39l exp 0320446 0854264 038 0709 2033142 1392249 expersq 0059262 0066105 090 0374 007327 0191795 ddl 2195136 2005061 109 0278 1824769 6215041 dd2 2422189 1017135 238 0021 0382956 4461422 dd3 2936795 0678 433 0000 1577306 4296283 dd4 1775769 0496612 3 58 0001 0780121 2771417 dd5 1678012 0422244 3 97 0000 0831463 252456 dd6 0616247 0331075 1 86 0068 0047519 1280012 dd7 1329601 0283778 4 69 0000 0760659 1898542 CD39IS 1029221 2867955 3 59 0001 4542305 1604211 reg 39Iwage exper ddl dd2 dd3 dd4 ddS dd6 dds source 55 df Ms Number of obs 64 F 9 54 Mode39l 865761828 9 961957587 Pr gt F 00000 Res39idua39l 8 52331996 54 157839259 R ared 0 5039 Adj R squared 04212 Tota39l 171809382 63 272713305 MS 39729 39Iwage Coef std Err t Pgtt 95 Conf Interva39l exp 0320446 0854264 038 0709 2033142 1392249 expersq 0059262 0066105 90 0374 007327 0191795 ddl 7112068 2005061 355 0001 1113197 3092163 dd2 2231413 1017135 219 0033 4270646 0192179 dd3 0165606 067809 0 24 0808 1525095 1193882 dd4 0551032 0496612 111 0272 154668 0444616 dd5 0183429 0422244 043 0666 102997 066312 dd6 0934954 0331075 282 0007 1598719 0271189 dd8 11634 0248306 4 69 0000 1661224 0665577 cons 1959941 2867955 83 0000 1384951 2534931 106 This procedure where we include dummies in the regression is called least squares dummy variable regression LSDV As we mentioned the number of dummies can be arbitrarily large and many times we are not interested in the dummies In such cases it is more sensible to perform a xed effects or within estimation The idea is transforming the model in such a way as to get rid of the a term Note that E 041 51E1 52521 51931 m 159 7 7 1 7 7 1 7 7 7 1 Where yd 72141 901139 7 T 90m for J 7 1 2 m If and U17 fziuit are the averages of the relevant variables over time Therefore yit 31 51001 31 52002 i 32139 519901911 Wt m 160 Let y 73139 3611 111 71139 for l 2 If and U U 7E be the time demeaned variables Then 3 5195i 18295311 JV 181995211 it 161 Now that we got rid of the effects term this transformed model can easily be estimated by OLS The good thing is that the slope estimates and the standard errors from LSDV and OLS are the same This can be seen from the following xed effects estimation output of Stata Note that in this output we have the constant term This constant term is not supposed to be there for the usual xed effects estimation model that we described This constant term is nothing but the average of all the effects terms 1 Also note that uft s add up to zero when summed over time This decreases the degrees of freedom by N and the nal degrees of freedom becomes NT 7 N 7 k The xed effects estimator by its nature shows similar properties to the OLS For example whenever the error term is correlated with the regressors we would get inconsistent parameter estimates Similarly heteroskedasticity would lead to inconsistent standard error estimates Another way to model the group heterogeneity is assuming a random effects term These models are called random effects models Consider the following model 107