### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Applied Econometrics A,RESEC 213

GPA 3.84

### View Full Document

## 27

## 0

## Popular in Course

## Popular in Agricultural & Resource Econ

This 121 page Class Notes was uploaded by Jalyn Schaefer II on Thursday October 22, 2015. The Class Notes belongs to A,RESEC 213 at University of California - Berkeley taught by Staff in Fall. Since its upload, it has received 27 views. For similar materials see /class/226561/a-resec-213-university-of-california-berkeley in Agricultural & Resource Econ at University of California - Berkeley.

## Similar to A,RESEC 213 at

## Popular in Agricultural & Resource Econ

## Reviews for Applied Econometrics

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/22/15

Imbens Lecture Notes 15 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics DISCRETE RESPONSE MODELS IV MOFADDEN7S CONDITIONAL LOGIT MODEL FOR GASELECTRIC DRYER PURCHASES McFadden 1982 is interested in analyzing the chOice by households tO purchas an electric dryer7 a gas dryer or nO dryer at all He uses a conditional lOgit model The starting point is a indirect utility function that depends on the Operating and capital cost Of the device and interactions Of the indicators for the choices with some individual characteristics The utility for the electric dryer for household 2 is U2elec O lec l lec 39 owni lelec 39 perSOHSi it2150 39 gasavi oper elec 7 Operl aw elec 7 capl 8135150 The utility for the gas dryer for household 2 is U2gas Ogas lyas 39 owni lgas 39 perSOHSi 339229 39 gasavi oper gas 7 Oper aw gas 7 Cap 82929 The utility for nO dryer for household 2 is Uimo Omo lmo 39 owni Zmo 39 perSOHSi Smo 39 gasavi 82220 The Operating and capital cost Of nO dryer are assumed tO be zero by McFadden He probably has not done much handwashing McFadden assumes that the three disturbances are independent7 and identically distributed with extreme value distribution with cdf Fltegt expe expee Imbens Lecture Notes 15 ARE213 Fall 04 2 Household 239 chooses the electric dryer if ULzzlec maXUielecy Uiga97 Uin07 and similarly for the other options De ne Uzelec O lec i l lec 39 own lelec 39 persons i ielec 39 gasav 80per 39 elecoper cap 39 eleccapv and similarly U and U zgas zn039 The implication of the model is that the probability of buying an electric dryer is eXpU 15150 P MC emu1 exami examiner and similarly Was exami SXDWZM expWZgM eXpUZm7 ex U HMO p W expUZelec eXpUZgas expUZn039 Mom data on the operating and capital cost7 and the individual characteristics we cannot identify all parameters Suppose we subtract an individual speci c7 choice invariant cl from UMBC Um and UMO That would not change the ranking7 so we cannot tell that apart from the original model So7 choose Ci 780gas 7 lyas 39 own 7 lgas 39 persons 7 igas 39 E353 That would amount to xing in the original model ows 319 gyas ag 0 Imbens Lecture Notes 15 ARE213 Fall 04 3 Table 1 CONDITIONAL LOGIT ESTIMATES MCFADDEN 2982 variable variable name in paper coeff choic speci c covariates Operating cost for dryer CDOPCOST 00144 Capital cost for dryer CDCPCOST 00160 Individual speci c choice invariant covariates Gas available electric GASAVT 127 House owner electric OWNl 060 Number of people in household electric PERSONST 0075 lntercept electric Cl 210 Gas available no dryer GASAV3 156 House owner no dryer OWN3 159 Number of people in household no dryer PERSONS3 040 lntercept no dryer 03 002 McFaddenls estimates are given in Table 1 Even more than in the binary logit and probit models these coef cients are dif cult to interpret So instead McFadden reports some elasticities For example consider the elasticity of the probability of buying an electric dryer with respect to the operating cost of an electric dryer 6Prelec elec 7 oper Eelec lecioper delec 7 oper Prelec This elasticity will depend on the values of the covariates We will evaluate the elasticities at the means of the variables given in Table 2 The derivative of the probability of buying an electric dryer with respecti to the operating Imbens Lecture Notes 15 ARE213 Fall 04 4 Table 2 MEANS variable electric gas no CHOICE 0447 0235 0318 CDOPCOST 3117 756 0 CDCPCOST 23320 25880 0 GASAV 0719 OWN 0873 PERSONS 331 cost of an electric dryer is 6Prelec 7 expUZelec delec 7 oper T Operimt expUZelec expUZgas eXpUZm 2 exleaec 7 0 577cost p eXpUielec eXpUigas MIam Evaluate this at the parameter estimates in Tabel 17 and at the means of the covariates given in Tabel 2 and you get 00036 So7 the derivative of the probability of buying an electric dryer with respect to the operating cost of an electric dryer7 at the means of the covariates is equal to 00036 The probability of buying an electric dryer at the covariate means is 05116 The average operating cost for an electric dryer is 3117 Thus7 the elasticity is 6Prelec elec 7 oper 3117 i 3077 22 delec 7 oper Prelec 0 00 05516 0 Eelecelec70per The elasticitiy for the probability of buying a gas dryer with respect to the operating cost of an electric dryer is 023 The elasticitiy for the probability of buying nor with respect to the operating cost of an electric dryer is 023 This is identical to the elasticity for the gas dryer by the HA property of the conditional logit model Note that we could have considered the average elastiticies7 by averaging the elasticity evaluated at each set of values of the covariates However7 that information is not available Irnbens Lecture Notes 15 ARE213 Fall 04 5 from the paper7 and it is likely to be very similar to the elasticities evaluated at the average values for the covariates REFERENCES MCFADDEN D 1982 Qualitative Response Models77 in Hildenbrand ed Advances in Econometrics Econometric Society Monographs7 Cambridge University Press Irnbens Lecture Notes 13 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics DISCRETE RESPONSE MODELS H ORDERED MULTINOMIAL RESPONSE MODELS Now we consider discrete response models with more than two possible responses In this lecture we limit ourselves tO the case where the outcomes are ordered There are two important cases In the rst there is an underlying continuous variable but we only Observe an indicator for a particular range An example is earnings data that may come coded in intervals 0 010 1050 5000 Another example is educational choices where outcomes may be coded as less than high schOOl high schOOl some college more than college Such data are referred tO as interval coded data Typically we model the underlying continuous variable as linear and normal or logistic with a set Of covariates Y Xg 8 with the Observed outcome an indicator for the interval for j 0 J Yj nag 5 ltO j17 with 04J1 00 do 700 and Otj1 lt 047 In this case the key assumption is that the boundaries 047 are known a priori In this case we are typically interested in the conditional expectation Of the latent outcome Elile rather than in the distribution Of the Observed outcome my j X Imbens Lecture Notes 13 ARE213 Fall 04 2 Another possibility arises when the responses are ordered but there is no clear mapping from the underlying continous response which may itself be somewhat vague to the discrete response For example respondents in a survey may be asked about their interest in a particular service and asked to respond in one of three categories not interested somewhat interested and very interested In that case we may still wish to model the response through the same latent variable approach Y X e ifOthYltOtj1 with 04 00 do 700 and org1 lt 04 However in this case we may not wish to impose the boundary values 047 a priori and prefer to estimate them jointly with the regression parameters A second difference with the interval coded case is that now we typically are interested in the distribution of the observed outcome In both cases the statistical model is the same although the substantive interpretation differs Let us assume for the moment that the e are normal with mean zero and variance 02 The conditional probability that Y 0 is PrY OlX PrX e lt 041 PreU lt 041 7 X U ltIgt041 7 X U For 0 ltj lt J the probability is 1W 7 legt 7 Pm lt X a lt am 7 Prltltaj 7 MW lt 80 lt am 7 MW am 7 X U WW 7 X U and for the last probability we have PrY JlX Prog lt X e Prog 7 X U lt 80 17 ltIgtog 7 X U Imbens Lecture Notes 13 ARE213 Fall 04 3 This way we can build up the log likelihood function as a function of 02 and a Denote this by L oz 02 N Mm a2 Z 1K 01nlt1gtltlta17 sza l H J71 21 j 1Hlt1gtaj1 X U New 7 X U Hm J 1 7 my 7 mm There are a couple of caveats in maximizing the log likelihood function First consider the case where the or are known If J 17 so there are two choices7 we cannot identify 02 We are back in the binary response world where the error variance is not identi ed7 and we typically set it equal to unity With J gt 17 we can identify the variance separately and we do not need to normalize the parameters With J large we get approximately back to the case where we observe the actual value of the outcome In fact7 in all cases we only observe K discretely7 so you could argue that the ordered discrete response model is always appropriate In practice if J is more than ve or so people rarely bother using a discrete reponse model and just go ahead with the standard linear model Second7 consider the case with or unknown This is a much more dif cult case7 both in terms of computation and in terms of interpretation Now there are a couple of normaliza tions We cannot identify the intercept in separately from the location of the boundary points 04 so typically we normalize the intercept to zero Second7 we cannot identify the scale of the boundary points from the error variance so typically we normalize 02 to one Another issue for the case with unknow or is what we should report The parameters are not very useful in their own right7 considerably less even than in the binary response model They tell us whether the corresponding covariate is positively or negatively associated with Imbens Lecture Notes 13 ARE213 Fall 04 4 the latent outcome7 but that does not have a real meaning A positive value for also tells us that an increase in X is associated with a lower probability of Y 07 and an increase in the probability that Y J It does not even tell us what the sign is on the probability that Y j for interior j We can calculate the derivative of the probability with respect to the covariate7 but this is different for all choices7 and so there is potentially a lot to report In practice you should look for meaningful things to report Consider a speci c policy7 and the effect on the predicted probabilities for all the choices7 and possibly do this for a couple of policy experiments Irnbens Lecture Notes 3 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics ORDINARY LEAST SQUARES HI OMITTED VARIABLE BIAS AND PROXY VARIABLESW 43 A OMITTED VARIABLE BIAS Often we estimate a linear regression function but we are not completely sure that we have included all the relevant regressors Here we investigate how emitting a variable affects the coef cients on the regressors we are most interested in Suppose the true regression functionis Yz O i l Xil K XiK Z Zi5i We refer to this as the long regression77 Now suppose we estimate the regression function without Zi YlryorylX YK XiK77i7 77 referred to as the short regression This regression is largely de nitional the coef cients are de ned to be 7 In addition it is useful to consider the arti cial regression of the omitted Z on a constant and the Xik Zi6061 Xi16K XiKV Again this is de nitional choose 6 EXX 1EXZ so that V Z 7X6 is by de nition uncorrelated with X If we estimate the short regression and focus on the kth regressor we will estimate 71 We are interested in the relation between 7 and m To see what this will look like consider the long regression and substitute in for the omitted Zi Yz O i l Xil K XiK Z Zi5i Imbens Lecture Notes 3 ARE213 Fall 04 2 O i l Xil i K XiK Z 60 l 61 Xi16KXiKV8 303239503132395139Xi1 K Z395K39XiK Z39V8i Since the composite error term g V e is uncorrelated with the X s by de nition the regression coef cients in this representation are what you get from the short regression So Yk k z557 or the omitted variable bias the difference between the coef cient in the short regression 71 and the coef cient in the long regression 6k is equal to the product of the coef cient on the omitted variable g and the coef cient on the included regressor Xi in a regression of the omitted variable on all included regressors 61 These equalities also hold exactly for the estimated regression coef cients or 75 k Z395k The practical relevance ofthis may seem small In practice we do not observe the omitted variable and so we cannot estimate these regression coef cients If we could we would not have the bias Nevertheless this result is extremely useful Let us see how this is used in practice Consider a wage regression of log earnings on education of the type we looked at before logearr1gsi 50455 00667educ 00849 00062 We may be concerned that we did not control for differences in ability between different individuals We can think of that as having erroneously omitted ability from this regression The long regression would thus have been logearningsl 60 61 educl g ability 8 Imbens Lecture Notes 3 ARE213 Fall 04 3 Now what can we say about this It is likely that the coef cient on ability is positive Also7 it seems plausible that the correlation between ability and education is positive Hence the bias is positive we over estimate the returns to education because high ability people who would already have relatively high earnings also have high levels of education Now let us see how this works out when we actually have a measure for ability Here we take one such measure from the NLSY7 namely an IQ measure This is obviously a awed measure of ability even if there is such thing7 but it will do for our purpose First look at the long regression mummy 47050 00443educi 000631ql 01003 00071 00010 The coef cient on education is indeed much smaller than when we did not control for ability The difference in the coef cients is 00667 7 00443 00224 This should be equal to the product of the coef cient on the omitted variable 00063 and the coef cient in a regression of the omitted variable on the included ones That regression leads to E1 536872 35388educi 26229 01922 with slope coef ent of 353887 so that the product is indeed 00024 B PROXY VARIABLES So far we have largely noted the omitted variable bias7 and developed methods for at least assessing the sign of the bias What else can we do An additional approach is to use a proxy variable For this I reasonably closely follow Wooldridgels discussion The idea is to include another covariate into the regression and thus eliminate at least part of the bias that stems from omitting the earlier variable So suppose we would like to run the long Imbens Lecture Notes 3 ARE213 Fall 04 4 regression Yi of i Xiifnf K Xin Z ZifQ We do not observe Zi but instead we observe a proxy W This proxy variable does not enter into the original regression but it is uncorrelated with 8 It is correlated with the omitted variable Z and so we will run the regression Yi Y0 Y1 Xii YK XiK YW WiVi Wooldridge discusses two assumptions that make W a perfect proxy variable The rst is that W is uncorrelated with e Hence if we were to run the regression Yi 0 1 Xil i K XiK Z Zi W m i giy the estimator for w would converge to zero Second partialling out the proxy variable Wi the omitted variable Z is uncorrelated with the included variables X1 XZK Formally if run the regression Zi60 i 61 Xi16K XiK6W WiV7 the estimators for 61 6K converge to zero Under those assumptions we can directly use the earlier results on omitted variable bias Think of the regression function including all Xi1 XIK Z W are the long regression the regression functions omitting Z but includ ing X1 XIKVV as the short regression and the regression of Z on X1 XIKW as the arti cial regression The coef cient on Xi in the short regression is by the omitted variable bias result equal to 7 6k z 61 6 because 6 is zero and so there is no bias for this coef cient The coef cient on W in the short regression is 6W g 6W g 6W which is biased but we typically do not care about the coef cient on the proxy variable Imbens Lecture Notes 3 ARE213 Fall 04 5 Now let us look at this without assuming that VVZ is a perfect proxy variable There is now a large number of regressions oating around For expositional reasons we look at the case with a single covariate Xi We are interested in l in the regression Yi of i39Xi Z39Zi81i We wish to compare the bias resulting from omitting Z from this regression7 and estimating the regression Yi 01 Xi52i7 1 with that of the regression where we replace ZZ with HQ and estimate Yi Yo Y139Xi YW39Wi83i 2 We also need to consider the long regression YiO oOti XiOtZ39ZiOtW39Wi84i7 3 and the arti cial regressions Wi 0KX39X KZ39Z 85 7 Zi606X Xi6W lVi56iv and Zi600X Xi57i First consider the relation between the coef cients of interest and the coef cients from the long regression Using the omitted variable bias formula we have ZOtz04W39 27 Imbens Lecture Notes 3 ARE213 Fall 04 6 and m 04104W39 X Next consider the bias from running Using the omitted variable bias formula the bias is equal to Z39QXOtz0XaW Z0X 4 Now consider the bias from replacing Z by 1 Using the omitted variable bias formula for the third time we have Yi 041042395X 31 QW KX i O Z de so that the bias is 704W KX az6X Now suppose the own effect of the proxy variable is fairly small aw 0 In that case the bias for omitting Zl is dz 92 and the bias from replacing it with 1 is dz 6X The latter is smaller if ldxl lt 162 that is7 if controlling for WZ lowers the correlation between Zl and Xi Finally7 let us see how this plays out with real data Suppose we are interested in regression log earnings on education controlling for iq The estimated regression would be logltr1gsi 47050 00443educi 000631ql 01003 00071 00010 If we did not observe IQ we could estimate the regression without it logvarr1gsi 50455 00667educi 00849 00062 Imbens Lecture Notes 3 ARE213 Fall 04 7 Alternatively we could estimate a regression using a test score KWW knowledge ofthe world of work as a proxy variable This proxy variable regression is ma a 48004 00479educ 00140wa 00890 00066 00019 Here the proxy variable method seems to work quite well To understand that better let us look at the various components of the bias First the long regression ma a 45925 00347educ 00040101 00117wa 01002 00072 00011 00019 The three arti cial regressions are m 96469 08285 educ 01475101 16603 01180 00172 i 449921 28657educ 04950KWW 27229 02009 00578 and nally if 536872 4 35388educ 26229 01922 So rst decomposing the bias from the omitted variable regression is 042 9X 04W Kg 9X 00046 gtlt 35388 00117 gtlt 01475 gtlt 35388 00224 Irnbens Lecture Notes 3 ARE213 Fall 04 8 and the bias from the proxy variable regression is 704W KX az 6X 700117 gtlt 08285 00046 gtlt 28657 00035 The bias from the proxy regression is one sixth of that of the omitted variable regression This is despite aw being fairly large 000177 but due to the offsetting biases in the proxy variable regression Irnbens Lecture Notes 12 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics LIMITED DEPENDENT VARIABLE MODELS 11 SELECTION MODELS W 1741 1 THE MODEL ln this lecture we study selectizm models Typically they consist Of two equations7 one outcome equation describing the relation between an outcome Of interest Y and a vector Of covariates Xi and the second7 the selection equation7 describing the relation between a binary participation decision Di and another vector Of covariates Z There are various forms Of these models Here we consider a speci c case7 originally studied by Heckman 1979 YI Xi 8L 1 D 1zn gt 0 2 The parametric form Of the model assumes that ZNXIIZI N3vp pigs lt3 The variance for 771 is normalized tO one since we only Observe the sign Of le y 771 For a random sample from the population we Observe Di Z and Xi Only for Observations with Di 1 dO we Observe Y This model is known as the Heckman selection model7 or the type ll TObit model Amemiya7 or the prObit selection model WOOldridge Variations include the case where Zg y 771 is Observed if Di 17 sO that the selection equation is not prObit but tObit That case is referred tO as type ll tObit by Amemiya and the tObit selection model by WOOldridge The classic example is a wage equation7 where we only Observe the age if the individual decided tO work D 1 Unlike in the TObit case non participation does not imply that Imbens Lecture Notes 12 ARE213 Fall 04 2 K is negative In fact7 the Tobit model is a special case of this model with 81 771 y and thus Di 1Y Z 0 In the wage example we think that those who participate in the labor market get relatively high wages compared to those who decided not to participate If the selection equation had hours worked7 with the actual number of observed if hours is positive7 we would have the tobit selection model Another example is that of people buying life insurance see Wooldridge We are inter ested in the relation between the price people pay for life insurance and their characteristics However7 we only observe the price of life insurance for those who purchase it We do not know what price people who choose not to purchase life insurance would have paid7 had they done so The selection equation models the decision to purchase life insurance Here we may be concerned that those who did purchase the life insurance and thus who had relatively high values of 77 paid different prices from those who did not Speci cally7 those who purchase life insurance may be less healthy7 and that may imply they pay relatively high prices for life insurance7 conditional on covariates if these do not adequately control for health status The rst key issue is that the disturbances in the two equations are potentially correlated If they are known not to be correlated7 p 07 then conditional on Z and D1 1 it would still be the case that 81 is independent of Xi and so we could just do least squares on the complete observations Second7 is whether Z and Xi are the same If not7 we have exclusion restrictions In particular it is important whether we have variables in Z that are not in Xi That is7 variables that affect participation but not the outcome directly 2 MAXIMUM LIKELIHOOD ESTIMATION The likelihood function is tricky to derive First consider observations with Di 0 For these units we only know that Zg y 771 lt 0 Although we possibly observe Xi we know nothing about Yi Hence the likelihood contribution for these units is just the probabilty Imbens Lecture Notes 12 ARE213 Fall 04 3 that DZ 0 PrDi OlZsz PrW lt Zl Y M ZiV 1 ZrV Next consider the observations with Di 1 Rather than look at the probability that Di 1 times the conditional density of Y1 given Di 17 we look at the marginal density of Y1 normal with mean Xi and variance 0 times the conditional probability of Di 1 given Yi So7 the rst factor is 1 I Xi8US39 The conditional distribution of 771 given K and Xi is normal with mean pUg 7 X1167 and variance 1 7 pZUg Thus the probability that DZ 1 given Yi Xi and Z is Wm gt Zf YlYinizzi PFW 902 Yr X13 gt Zh 902 Yr X31K7thi mltmWMQWKEgtZWWMQWKEEQampZO xl 70203 1 70203 1 ltVWMAWKXWU 1 ZU 1 92ng Combining all these parts leads to the following log likelihood function N Mm 037p Zn 7 Digt1nlt1e ltIgtltZhgtgt i1 Imbens Lecture Notes 12 ARE213 Fall 04 4 Dz In ltIgt Zlv 04 7 m lt17 mar2 mas m 7 Xmas 71mg Maximizing this is messy7 with the terms for observations with Di 1 consisting of the logarithm of a sum It is possible7 but in addition to the dif culty of calculating the deriva tives7 the computational problem tends to be somewhat badly behaved7 so that iterative methods do not always converge to the maximum likelihood estimator 3 HECKMAN TWO STEP ESTIMATOR Heckman proposed a different estimator First note that because of the normality as sumption we have El l l 5 771397 4 where 6 p 0395 Thus we have ElYilXiz ml Xi 547139 ln addition7 ElleivzivDi1lEllethZl Y77igt Ol Elmlm gt Zhl MZZW where a WUWu is the invese Millls ratio Thus EliAX ml Xl 5 Zh Imbens Lecture Notes 12 ARE213 Fall 04 5 Heckmanls idea is the following First estimate 7 by probit maximum likelihood This works well much easier than doing the full selection model by maximum likelihood Then calculate for each unit with Di 1 the inverse Millls ratio Z1Vy7 and regress K on Xi and This is a relatively straightforward way of getting a point estimate for However7 there are some disadvantages relatively to maximum likelihood First7 the estimator is not necessarily ef cient7 whereas maximum likelihood is Second7 getting the variance is not easy in general You have to take account of the fact that y in the inverse Millls ratio is estimated In one simple case it is not so dif cult If we just want to test whether there is a selection problem7 and get the test statistic under the null of no selection 6 07 we do not have to take account of the fact that y is estimated7 and we can simply use ols standard errors 4 THE CASE WITHOUT EXCLUSION RESTRICTIONS Formally7 we do not need exclusion restrictions7 and Z can be identical to Xi In practice you are likely to get close to perfect collinearity7 and will end up with large standard errors The identi cation in this case comes purely from the functional form That is7 AZ is a nonlinear function of Xi and the conditional expectation of Y given X ends up being nonlinear in Xi with the nonlinear part interpreted as selection bias Typically we are not so sure about the functional form that we would be comfortable just interpreting nonlinearities as evidence for endogeneity of the covariates With exclusion restrictions variables in Z that are not in Xi7 the sensitivity is much less of an issue In that case there is variation in AZ conditional on Xi so the selection bias coef cient is separately identi ed In fact for these cases there are identi cation results that do not rely on normality However7 just as in instrumental variables settings exclusion restrictions are often dif cult to motivate Why should a variable Zi affect the decision to participate7 if it is not related to the outcome of interest Examples of exclusion restrictions that have been used in the female wage equation are presence and age of children However7 Imbens Lecture Notes 12 ARE213 Fall 04 6 without suf cient additional control variables these may affect wages directly through human capital accumulation arguments Even in this case identi cation is controversial though See the paper by Little for a skeptical view from a statistician on these models REFERENCES AMEMIYA T 1985 Advanced Econometrics Harvard University Press Cambridge MA HECKMAN J 1979 Sample Selection Bias as a Speci cation Error77 Ecunamctm39ca Vol 47 No 1 153 162 LITTLE R 1985 A Note about Models for Selectivity Bias77 Econometrica Vol 53 No 6 1469 1474 Irnbens Lecture Notes 1 ARE213 Fall 03 1 ARE213 Econometrics Fall 2003 UC Berkeley Department of Agricultural and Resource Economics ORDINARY LEAST SQUARES I ESTIMATION lNFERENCE AND PREDICTING OUTCOMES W 421 4 Let us review the basics of the linear model We have N units individuals rms or other economic agents drawn randomly from a large population On each unit we observe on outcome Y for unit 239 and a K dimensional vector of explanatory variables X X1X2 XlK We are interested in explaining the distribution of Y in terms of the explanatory variables X using a linear model Yi Xi 81 1 ln matrix notation YX e or avoiding vector and matrix notation completely Yz O i l Xil i n i K XiK i gi We assume that the residuals e are independent of the covariates or regressors and normally distributed with mean zero and variance 02 Assumption 1 eilX N N0 02 We can weaken this considerably First we could relax normality and only assume indepen dence Assumption 2 e L X Imbens Lecture Notes 1 ARE213 Fall 03 2 We can weaken this assumption further by requiring only mean independence Assumption 3 EelX 0 or even further requiring only zero correlation Assumption 4 Elsi X 0 We will also assume that the observations are drawn randomly from some population We can also do most of the analysis by assuming that the covariates are xed but this complicates matters for some results and it does not help very much See the discussion on xed versus random covariates in Wooldridge page 9 Assumption 5 The pairs XY are independent draws from same distribution with the rst two moments of X nite The ordinary least squares estimator for solves N i Yi 7 Xi 239 mgn 12 6 This leads to B X X 1X Y The exact distribution of the ols estimator is B N ha x x Without the normality of the 8 it is dif cult to derive the exact distribution of However under the independence Assumption 2 and a second moment condition on e variance nite and equal to 02 we can establish asymptotic normality me 7 N 602 ElXX D Imbens Lecture Notes 1 ARE213 Fall 03 3 Typically we do not know 02 We can consistently estimate it as H M2 N724 Yi BXil39 l H H Dividing by N 7 K 7 1 rather than N corrects for the fact that K 1 parameters are estimated before calculating the residuals e K 7 B Xi This correction does not matter in large samples7 and in fact the maximum likelihood estimator A2 1 N A 2 UmlNE Yi Xigt i1 is a perfectly reasonable alternative So in practice7 whether we have asymptotic normality or not7 we will use the following distribution for B and 71 V 32 1 iXX N i1 Often we are interested in one particular coef cient Suppose for example we are inter ested in Q In that case we have Bl Nlt 17 7227 where Vi is the 239j element of the matrix V We can use this for constructing con dence intervals for a particular coef cient For example7 a 95 con dence interval for l would be 317196 261496 i722 Imbens Lecture Notes 1 ARE213 Fall 03 4 We can also use this to test whether a particular coef cient is equal to some preset number For example7 if we want to test whether 01 is equal to 017 we construct the t statistic 3701 szf and compare it to a normal distribution t Let us look at some real data The following regressions are estimated on data from the National Longitudinal Survey of Youth NLSY The data set used here consists of 935 observations on usual weekly earnings7 years of education7 and experience calculated as age minus education minus six We will use these data to look at the returns to education Mincer developed a model that leads to the following relation between log earnings7 education and experience for individual 239 logearningsi 60 61 educl 62 experl 33 exper 5 Estimating this on the NLSY data leads to logearr1gsi 4016 0092educl 0079experi 7 0002exper 0222 0008 0025 0001 So7 a 95 con dence interval for 01 is 01095 00923 7 196 0008 00923 196 0008 007757 01071 The t statistic for testing 01 01 is 00923 7 0113901727 0008 so at the 90 level we do not reject the hypothesis that 01 is equal to 01 Now suppose we wish to use these estimates for predicting a more complex change For example7 suppose we want to see what the estimated effect is on the log of weekly earnings Imbens Lecture Notes 1 ARE213 Fall 03 5 of increasing a persons education by one year Because changing an individualls education also changes their experience in this case it automatically reduces it by one year this effect depends not just on 01 To make this speci c let us focus on an individual with twelve years of education high school and ten years of experience so that experZ is equal to 100 The expected value of this personls log earnings is logvarE1gsi 4016 0092 12 0079 10 7 0002 100 57191 Now change this personls education to 13 Their experience will go down to 9 and exper2 will go down to 81 Hence the expected log earnings is logvarE1gsi 4016 0092 13 0079 9 7 0002 81 57696 Hence the expected gain is the difference between these two predictions which is equal to 0051 Now the question is what the standard error for this prediction is The general way to answer this question is as follows The vector of estimated coef cients 0 is approximately normal with mean 0 and variance V We are interested in a linear combination of the 07s namely X0 where A 0171719 Therefore NB N Mo s XVA In the above example we have the following values for the covariance matrix V 00494 700011 700047 00001 00001 0000 00000 00006 00000 00000 Hence the standard error of X0 is 00096 and the 95 con dence interval is 00317 00693 The second method is very easy in the linear case We are interested in an estimator for Imbens Lecture Notes 1 ARE213 Fall 03 6 X To analyze this we reparametrize from 30 30 31 to 73132193933 32 32 33 33 The inverse of the transformation is 30 31 Y32193933 2 33 Hence we can write the regression function as logearningsl o y 7 g 7 19 g educ g exper g exper2 e 30 Y educ g exper educi g exper2 19educ 8 Hence to get an estimate for y we can regress log earnings on a constant education expe rience minus education and experience squared minus 19 times education This leads to the estimated regression function logearr1gsi 4016 0051 educ 0079 exper educi 7 0002 exper 19 educi 0222 0010 0025 0001 Now we obtain the estimate and standard error directly from the regression output Let us also look at a nonlinear version of this Suppose we are interested in the regression of log earnings on education The estimated regression function is logvarr1gsi 50455 00667educ 00849 00062 Imbens Lecture Notes 1 ARE213 Fall 03 7 Now suppose we are interested in the average effect of increasing education by one year for an individual with currently eight years of education7 not on the log of earnings7 but on the level of earnings At x years of education the expected level of earnings is Elearningsleduc x expwo l x 7227 using the fact that if Z N NW 02 then ElexpZ expu 022 Hence the parameter of interest is 9 expwo 619 022 7 expwo 61 8 022 Let us write this more generally as 9 97 where y 6 02 We have an approximate distribution for y We 7 v e M0 V Then by the Delta method7 WW 7 6 N In this case 3g explt o i399022 eXp o i398Uz2 6i 9exp o l902278exp 0 l8022 V expwo 619 122 7 expwo 61 8 122 We estimate this by substituting estimated values for the parameters7 so we get 5g exp E0 31 9 322 7 eXpBO f Bl 28 322 199484 67 9exp o l9amp2278exp 0 l 8amp22 4685779 7 expwo 19amp227 expwo i8amp22 99742 Imbens Lecture Notes 1 ARE213 Fall 03 8 To get the variance for 0 gW we also need the full covariance matrix including for the parameter 02 Using the fact that the estimator for 02 is independent of the estimators of the other parameters and that it has asymptotic variance equal to 202 we have 00072 700005 00000 V 00000 00000 03488 Hence the standard error for the parameter of interest is 60269 Finally suppose we are interested in the average effect on the level of earnings of increas ing education levels by one year In terms of the parameters of the linear regression model the parameter of interest is now eXpWo 3139educi1 022 eXDWO 31 39edUCi 022 1M2 0797 139 1 Substituting estimated values for the parameters leads to expwo 01 educ 1 022 7 expwo 01educi 022 290527 M 9 ll i1 Even though this is a much messier function the principle is the same We already have the covariance matrix and just need the derivatives of g y 6g 1 N expwo 01 educ 1 022 7 expwo 01 educ 022 67 N Z educ 1 expwo 61educ 1 022 7 educ expwo leduc 022 7 i1 explt o 01 educ 1 022 7 expwo 01 educ 022 The standard error in this case is 90522 Imbens Lecture Notes 9 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics MAXIMUM LIKELIHOOD ESTIMATION IV CLASSICAL TESTING W After estimating the exponential model for the unemployment durations Lancaster con siders an extension Consider the hazard function or escape rate M49670 hy 79il jpry SYlt9AWSYIXVAW The hazard function is just another way of characterizing a distribution like the density function the distribution function the survivor function the moment generating function or the characteristic function It is just a particularly convenient and interpretable way of describing a distribution or durations Given the hazard you can calculate the distribution function as y FQWg 1 T 8Xplt hsx0dsgt 0 and hence the density function The exponential model implies that the hazard function stays constant over the duration of the spell equal to expx in our previous speci cation To see what this means take a person and look at their chances of nding a job on the rst day of being unemployed These chances are the same as the chances that this same person would nd a job on the fthieth day given that he has been unsuccessful in nding work in the rst forty nine days This may be reasonable but it might also be something you do not wish to impose from the outset Lancaster therefore considers an extension allowing the hazard function to either increase stay constant or decrease over time This extension is known as the Weibull distribution WAm a 1 y expx Imbens Lecture Notes 9 ARE213 Fall 04 2 Note that this reduces to the exponential distribution if 04 0 The implied density function for the Weibull distribution is few a a 1 w expatexpeya1expltmgt The moments of this distribution are EYle expltikx Oz 1 r Note that for the case with Or 0 this reduces to the exponential case with ElYle apecw Plt1 k and thus with k 1 the mean of the exponential distribution is ElYlX exp7x The log likelihood function for this model is Lo Z lt1na 1 ozlnyl x 7 ygoz 1 expx gt H 1 One can estimate this model using any of the numerical methods described before Newton Raphson Davidon Fletcher Powell The one minor complication is that numerical algo rithms have to take account of the restriction that 04 gt 71 with Or 71 the density is degenerate and all probability mass piles up at y 0 Table 1 presents the maximum likelihood results for both the exponential and Weibull models The standard errors are based on the second derivatives evaluated at the maximum likelihood estimates see discussion below To test the hypothesis Oz 0 against the alternative hypothesis Oz 31 0 there are three classical tests the likelihood ratio the Wald and the score test We shall consider the three tests in a general context as well as giving the formula for the case where we are testing the Imbens Lecture Notes 9 ARE213 Fall 04 3 Table 1 EXPONENTIAL AND WEIBULL ESTIMATES Exponential Model Weibull Model scale 02012 00110 intercept 45086 0 1463 33686 0 1584 age 00168 00066 00069 00067 educ 01685 00102 01405 00104 locrat 00435 00147 00349 00147 scale parameter in the Weibull model Suppose we have a model for a random variable Z7 specifying the density function fltzl007 607 where we split the parameter vector 6 into two parts7 6 666 1 The dimension of 60 is K0 the dimension of 61 is K1 and the dimension of 6 is K K0 K1 In our example7 2i ghxi 60 oz 61 0 K0 1 and K1 4 three regressors7 age7 education7 and locrate7 and an intercept We are interested in testing the null hypothesis H0 60 07 against the alternative hypothesis H1 60 7E 0 Let 6u 60u61u denote the unrestricted mlels In our example7 60u o 702012 and 6m 6mm 7336800006950140500349 Also let 6 6061 061 denote the estimates based on the restricted model7 that is7 based on the restriction 60 07 so that in our example 61 6m 745086 00168 701685 00453 Let L090 01 denote the log Imbens Lecture Notes 9 ARE213 Fall 04 4 likelihood function N M90491 Z lnfzilt90t91 171 7 l lt1na 1 or lnyi x 7 yloz 1 expz gt H N 01 argmaxelmw argmaxg 2 mm 7 y expltzwgtgt i1 and N Am ul argmaxeo leo l argmaxa Z lt1na 1 04 lnyi 25 7 yga 1 expz gt i1 In addition let Sz00 91 denote the score function MQ BO 91 L lny 7lnyy 1expx 39 7 oz1 82700701 lt 210061 lt x 7 x 41 expx let 7z7 90 91 be the Hessian 2 2 39113193 M007 01 3391319 M007 01 32 321D 719790791 lt WMGMQ mammal gt 7m70nmzywexpltwgt 7x 1nyy 1expx 7x 1nyy 1expw 7xx y 1expx and7 nally7 let 10 91 be the information matrix evaluated at 190191 090191 7E HZ 90191 We will use various estimates of 109361 depending on where we evaluate the matrix and how we calculate or approximate the expectation For the second7 there are three choices 1 Use the average of the second derivatives Imbens Lecture Notes 9 ARE213 Fall 04 5 2 Use the average of the outer product of the rst derivatives 1 N i I I 10 7 N A2591 9521 0 3 Use the expectation of the rst or second derivatives 10 E 7127 91 E152705270 l This choice is very uncommon7 because typically we dont actually specify the full density7 only the conditional density of Y given X Hence we cannot calculate the full expectation7 and if we are calculating only the conditional expectation there do not seem to be a lot of advantages to bothering with that at all For the rst choice the possibilities are to evaluate the estimate at 1 The restricted estimates 1 2 The unrestricted estimates 0 The leading choices include the average ofthe second derivative evaluated at the unrestricted estimates 05735 731586 00282 700834 00252 A 1 N A 731586 1197647 728217 716543 750445 117NZ7Mzi u 00282 728217 02122 701859 00660 i1 700834 716543 701859 05196 700067 00252 750445 00660 700067 10344 or the average of the second derivative evaluated at the restricted estimates 08099 747084 00393 701069 00343 A 1 N A 747084 1296672 728933 714860 751158 127NZ7Mzm9r 00393 728933 02090 701812 00691 21 701069 714860 701812 05119 700126 00343 751158 00691 700126 10389 Imbens Lecture Notes 9 ARE213 Fall 04 6 or estimates based on the rst derivatives at the restricted estimates 07787 754504 00402 700411 00375 754504 1018923 720587 709485 731203 8zi6TSzl6T 00402 720587 01412 701177 00419 1 700411 709485 701177 03241 700233 00375 731203 00419 700233 06575 A 1 13N39 M2 or at the unrestricted estimates 09221 763600 00559 700743 00502 A 1 N 763600 1282672 727230 710994 742049 I4NZSzl0u8zi6u 00559 727230 01911 701652 00554 71 700743 710994 701652 04388 700231 00502 742049 00554 700231 08652 All three classical tests are based on the quadratic approximation to the log likelihood function around the true and therefore maximizing in the limit values 30 The three are rst order equivalent7 meaning that if the null hypothesis is correct7 their difference7 multiplied by xN7 converges to zero in probability First7 if the null hypothesis is true and 93 07 the value of the log likelihood function at 190T 07 91 should not be much smaller than the value of the log likelihood function at 00u701u This is the basis of the Likelihood Ratio Test Formally7 the test statistic is LR 2 mama L0 1 2 723695 7 723845 300 If the null hypothesis is true this statistic has for large N a Chi squared distribution with degrees of freedom equal to the dimension of 90 one in our example Second7 if the limiting log likelihood function is maximized at 90 07 the derivative of the log likelihood function with respect to 90 at that point should be be close to zero This is the basis of Raols Score Test or the Lagrange Multiplier Test Formally7 N N 1 A 271 2 LM N 1 SltZi70701r I E 5277070 i1 Imbens Lecture Notes 9 ARE213 Fall 04 7 Note that the LM statistic is can also be written as 1 LM7 Ni N 5021397 07g joo 250Zi707 1rlv 1 i1 M2 but that this is in general not equal to N N Zsozi0 h 25 my i1 i1 Any estimate of the information matrix can be used Typically researchers do not use 2 because it would require calculation of the unrestricted estimates7 and the key advantage of the Lagrange Multiplier test is that it avoids calculation of the unrestricted estimates There is some evidence that using the average of the second derivatives and therefore fl is to be preferred over calculation of the expectation ie7 If only a conditional density is speci ed calculation ofthe latter is dif cult in any case because calculation of the expectation requires speci cation of the full density One particular form that is popular for the LM test is LNSS S 1S LN LM N LNLN 3234 where LN is the N vector of ones so that LNLN N7 and S is the N gtlt K matrix with the 2th row equal to the score vector Szi r so that 4V8 5zi7 04 This form has the interpretation of N times the uncentered R2 in a regression of a vector of ones that is7 LN on the scores 8 The least squares coef cients are B S S 1S LN 700118 and the R2 is AA A A W A l 1 RziYY wwi ss moo67z Y Y LNLN LNLN LNLN Imbens Lecture Notes 9 ARE213 Fall 04 8 Finally the restricted and unrestricted estimates of 90 should be close together if the null hypothesis is correct or in other words the unrestricted estimate of 90 should be close to zero This is the basis of the Wald Test Formally the Wald test is de ned as W N so 7 90 foorl oo 7 90 N 95 ow so where foo is the top left part of the inverse of Again any of the estimates ofthe information matrix can be used Here typically the average evaluated at the unrestricted estimates fl 05735 or Z 09221 leading to W 3373 and W 2098 respectively are used because of their superior properties if the null hypothesis is false Result 1 As N goes to in nity under the null hypothesis LR i X2dim00 Result 2 As N goes to in nity under the null hypothesis LRiLM o LBJ1130 LMiiV o For formal proofs of these two results see for example Engle 1984 or Holly 1985 Here is an informal argument for the case where 90 is a scalar and there are no nuisance parameters no 91 Expand the log likelihood around the maximum likelihood estimate A 6L A A 1 32L A L09 L09 a799 0 7 0 56720 0 Imbens Lecture Notes 9 ARE213 Fall 04 9 The derivative of the log likelihood function at the maximum likelihood function is equal to zero7 so 2 L 7 L0gt a 7 0 m N 7 0 I6 This is clearly very close to the test statistic from the Wald test Given the limiting distri bution W 7 9 Marilee the limiting chi squared distribution follows immediately To see the link with the score or Lagrange multiplier test7 expand the derivative of the log likelihood function around the true value 6L 6L 62L 670lt6gt676lt6gtlt676gt67mlt6gt Evaluating this at 9 so that the rst derivative is equal to zero7 6L A 62L 0 099 9 T0297 and thus A 32L 3L 9 i 9 971 i9 7672 Renormalizing this gives N mo 7 egtIlt6gt1iN Esme Imbens Lecture Notes 9 ARE213 Fall 04 10 which by a central limit theorem has a limiting normal distribution with mean zero and 1 variance equal to I6 Then squaring both sides and multiplying by the information matrixgives 1 N 2 A 2 2 N akini I 2 N076I0I6 NQ 949 demonstrating the approximate equality of the Wald and Lagrange multiplier or score tests Next we consider a fourth way of testing hypothesis in the same context as before We have a model for a random variable Z specifying the density function fltzl607017 where we split the parameter vector 9 into two parts 9 0661 We are interested in testing the null hypothesis H0 90 0 against the alternative hypothesis H1 90 7 0 Let uo and ul denote the unrestricted mlels and ro 0 and rl denote the restricted mlels that is conditional on the restriction 90 0 The test we consider compares the estimates of the parameter not affected by the restric tion 1 and 1 If the null hypothesis is true the two are estimating the same thing GI and should be close to each other The restricted estimate should be more precise because it exploits a true restriction If the null hypothesis is false it is likely the two estimators are estimating different things and there is likely to be a larger difference between them This is the basis of the Hausman Wu Test Imbens Lecture Notes 9 ARE213 Fall 04 11 Consider the unrestricted maximum likelihood estimates 1 away In large sam ples uo 7 03 d 100 101 menace 7 where we partition the the information matrix and its inverse as i 100 101 1 i 100 101 17110 In I 7 110 I 39 The asymptotic variance for m m 7 9 is therefore A 71 vw wm 7 61 7 I 7 1131 1131110 100 7 1011131110 1011131 See the appendix to these lecture notes for the partioned inverse of a matrix Note that I 7 If is positive semide nite Now consider the restricted ml estimate rl In large samples mm 7 9 Megs The variance for this estimator is obviously smaller than the variance for the unrestricted estimator We can try to calculate directly the variance of the difference V N r1 7 ul by looking at the joint distribution of and ul but there is a simpler argument that goes to the heart of the testing procedure Consider a larger set of estimators that includes both rl and ul as special cases A17 r1A u1 r1 u1 rl The variance of this estimator is we 7 vlt 1 A2 vlt u1 7 91 2 Ao 1 u1 7 31 Imbens Lecture Notes 9 ARE213 Fall 04 12 Taking the derivative of this variance with respect to A evaluated at A 0 gives 3V A A A A AO 2 39 C0T17 0M1 7 0T1 Now7 and this is crucial7 this derivative must be equal to zero because the estimator with A 0 is ef cient as the mle its asymptotic variance is equal to the Crame r Rao bound Hence the covariance of 9T1 and 9T1 7 9M1 is zero and thus will 7 WM 91 7 91 7 vain we 7 61 and thus of W011 7amp1 M0111 7 In Thus the general form of the Hausman Wu test in a maximum likelihood context is H W m e m WM 7 wow m 7 a 7 N m 7 1T 111 7 113517 m 7 in A A 9 A A N39 1917i 017 39 1111110 100 101I 1110gt1011 1gt 39 1917i 017 7 where the superscript 7g denotes the generalized inverse Under the null hypothesis that 90 07 this test statistic would have a Chi squared distribution with degrees of freedom equal to the rank of 101 note that this may differ from the degrees of freedom for the other tests The test is therefore obviously not in general going to be rst order equivalent to the other tests discussed here7 although it can be if the rank of 101 is high enough Imbens Lecture Notes 9 ARE213 Fall 04 13 The test requires the estimator under the null hypothesis to be ef cient something like a maximum likelihood estimator but the unrestricted estimator need not be ef cient In general the test can be used when the alternative hypothesis is relatively vague A note of caution with this test Taking the difference in two variance matrices even if nominally this difference should be positive semi de nite can often lead to extremely large test statistics when the difference is close to zero In case the variance is not positive de nite one can take generalized inverses As a result however the small sample properties of this test are not always attractive Example Consider a linear regression model Yi X i o Xli i 813 or in matrix notation Y X030 X131 8 We assume 8 is conditionally normal with mean zero and variance 02 We are interested in testing the null hypothesis Ho e 07 against the alternative Ha 60 31 0 We use the following notation X X 7 x00 XOl XX lt X33 Xi 7 XX 1 lt X10 X11 Imbens Lecture Notes 9 ARE213 Fall 04 14 The restricted estimator is BM XffX lY with distribution BM N Nlt 17U2 39 X131 The unrestricted estimator is BM XlngY X11X 1Y with distribution BM N N 317 02 39 X11 N 317 02 Xfll Xf11X10X00 X01X1711X1071X01X1711gtgt So using the formula given above the variance for the difference is the difference in the variances and BM 7 3m N N 0702 39 Xf11X10X00 X01X1 11X1071X01Xf11 v with the rank of the variance equal to the rank of X01 assuming both X00 and X11 have full rank In this case one can also calculate the variance directly with the same result Consider the special case where X0 and X1 are scalars In that caSe BM is equal to BM 370 30 where 30 is the coef cient on X0 in a regression of X0 on X1 The Hausman test is testing whether 60 60 0 that is whether the product of the coef cient of the excluded regressor 80 and the coef cient on the included regressor in a regression of the excluded regressor on the included one 60 is equal to zero Clearly if the two regressors are uncorrelated the test has no power against the alternative that 60 differs from zero D Imbens Lecture Notes 9 ARE213 Fall 04 15 Example The second example is an instrumental variables model Yi Xi 5i We are concerned that Xi is not orthogonal with the disturbance e For that contingency there is an instrumental variable available Zi which we are con dent is independent of 8139 We are considering the null hypothesis that X is exogenous that is7 in this case7 independent of the disturbance e Formally H0 EXiei 07 against the alternative H1 EXiei 31 0 One approach is to test 7 0 in the regression Yi Xi YZi 5139 Here we use a Hausman test Estimate the model ef ciently under the null hypothesis B X X 1X Y with approximate variance WBT 0 X XYl Now consider the instrumental variables estimator Imbens Lecture Notes 9 ARE213 Fall 04 16 with approximate variance Wm 03 X504 Obviously for this estimator to be well de ned we need the matrix X X to be invertible which in turn requires that the matrix Z X has at least full row rank The dimension of X is N gtlt K the dimension of Z is N gtlt M so the dimension of Z X is M gtlt K and so we need at least M 2 K instruments for this to work Using the fact that under the null hypothesis 3 is ef cient as the maximum likelihood estimator combined with the fact that the estimator Bu is consistent under weaker conditions the test statistic would be Maintaining as before the assumption that the matrix Z X has full row rank and writing x x n ZZ Z 1Z X n we can write the factor in the middle the inverse of the variance as 9 ltX ZZ Z 1Z X 1 i X ZZ Z 1Z X n nrl lf 77 77 is invertible this reduces to using the formula in the appendix 71 X ZZ Z 1Z X 1ltX ZZ Z 1Z X 1 77904 X ZZ Z 1Z X 1 and the degrees of freedom of the test are equal to the number of regressors K the number of columns in In general the degrees of freedom are equal to the rank of the 77 which Imbens Lecture Notes 9 ARE213 Fall 04 17 is the same as the rank of the matrix 77 77 That is only regressors not included in the set of instruments are counted in the degrees of freedom So for example ifX X0 X1 and Z X0 Z1 X can be partitioned in X X0 X1 with X0 X0 and thus 77 770 771 with 770 0 In that case the degrees of freedom is equal to the rank of 771 D APPENDIX 1 Consider a matrix A B O D 39 Assuming the matrix is invertible its inverse is A B 1 i E F O D 7 G H 7 with E A 1 131 7 OA lB 1OA 1 H D 1 D lOA 7 BD lO 1BD 1 F 7A 1 131 7 OA lB 1OA 1BD 1 G D 10A 1 131 7 OA lB 1OA 1 2 Suppose both A and B are invertible matrices and A B and A 1 B 1 are invertible Then A 1 7 A B 1 A 1A 1 B l 1A 1 Imbens Lecture Notes 9 ARE213 Fall 04 18 REFERENCES ENGLE R 1984 Wald Likelihood Ratio and Lagrange Multiplier Tests in Econo metrics in Griliches and lntrilligator eds Handbook of Econometrics Vol 111 Elsevier North Holland HAUSMAN 1978 Speci cation Tests in Econometrics Eczmomctm39ca Vol 46 No 6 1251 1271 HOLLY A 1987 Speci cation Tests An Overview in Bewley ed Advances in Econometrics Cambridge University Press Cambridge Irnbens Lecture Notes 19 ARE213 Fall 04 Econometrics ARE213 UC Berkeley Department of Agricultural and Resource Economics Fall 2004 PANEL DATA H FIXED EFFECTS In this lecture we consider the same setup7 with a linear model Yit X1253 Ci 8m with cl an unobserved individual speci c7 time invariant component However7 compared to the random effects discussion we relax the assumption that cl is independent of the observed covariates X We continue to maintain the exogeneity assumption on the residuals El8ithi17 7XiT7Cil 07 and in fact for inference we make the stronger assumption Ekie ci Xi 02 IT We consider a couple of estimators The rst is based on simply adding a N dimensional vector of time invariant covariates Z7 with its jth element for unit 239 in period equal to Z ZlM 1239j Then if we de ne c to be the vector with typical element Ci we can write Yit X1253 ZiC 8h The rst estimator is just the least squares estimator for this regression function Y1 7X 725 2 Hogan t m 16 Imbens Lecture Notes 19 ARE213 Fall 04 2 The estimators for both and c are unbiased However the estimators for c are not con sistent As we get more and more observations we do not get more information about Ci To see this consider the special case without X In that case 6 2t YitT 57 which has variance UZT which does not go to zero as N goes to in nity We call the estimator for the dummy variable estimator lm A nice feature is that we can also use its associated standard error Here it is important that we estimate the error variance 02 as 1 m 00 Xft ZfC27 Lt and not just divide by N The number of parameters K for 6 and N for 6 increases with the sample size so the degrees of freedom correction is important lmplementing this estimator could be dif cult if N is very large The least squares estimator requires inverting a N K gtlt N K dimensional matrix The other estimators employ various techniques to remove c from the regression function For the second estimator de ne the unit averages 1 T 1 T KIme andXTZXt t1 t1 as well as the deviations from the mean YitYit Yiy and XitXi Xi Note that in general we can write the T gtlt K matrix of deviations from the means with tth row equal to Xi as Xi Asz where A IT 7 tTtTT Imbens Lecture Notes 19 ARE213 Fall 04 3 so that A is idempotent AA IT 7 tTtTTIT 7 LTtTT IT 7 LTtTT 7 LTtTT LTL TLTL TTZ IT 7 tTtTT A Then the between estimator 35 is based on the regression Yit Xlt 3982 Because 07 this estimator is again unbiased ln fact7 some careful linear algebra shows that it is identical to the dummy variable estimator Bd 3d In this case7 however7 we cannot use the ols variance to get the right standard errors The residual from this regression is 5 It has the following properties El gtl EKW gilzl H8121 281539 5l 02 7 202T UzT 12171T and 39 H813 39 8139s 813 3951quot gig 395139 0 7 UZT 7 UZT UZT 7U2T To get the correct variance7 consider the difference between the estimator and 71 A 1 1 6b 7 7 N gum N gxggi Imbens Lecture Notes 19 ARE213 Fall 04 4 First7 note that XllA Aei XllA el Thus the variance of the between estimator is vwb gammy 1 71 i1 In order to get the correct variance we just need to estimate 02 Let us look at the residuals from the between regression The expectation of their square is equal to 021 7 1T Hence we can just average their squares7 and then multiply by TT 7 1 to get a consistent estimator Another estimator estimator is based on differencing De ne AYit Yit Yitily AXit Xi X1171 andDeltl u 813 8134 Then we can write AYit AXit A8117 for t 27 7T OLS for this regression is consistent If the original errors 8 are uncor related we now get a more complicated covariance matrix with the rst differenced errors correlated Fixed effect methods do not easily extend to nonlinear models There are some excep tions where it is still possible to remove the individual component without distributional or independence assumptions A famous example is Chamberlainls xed effect logit model Consider the case with two periods Conditional on the individual effect we have exp Xit Ci PrltYit 1lXit7Ci Irnbens Lecture Notes 19 ARE213 Fall 04 5 Chamberlain suggests looking at units with Y 31 Yig Then PFCi1 1lXi17Xi27Yi1 7 Yiz ElPFOii 1lXi17Xi27Yi1 7 YiZyciM XilyXi27Yi1 7 Yul i E Pro1391 17512 OlXilyX yCi i PFCi1 17512 OlXi17Xi27Ci PFCi1 07512 1lXi17Xi27Ci Xily 5XP Xi10i 1 1EXP Xi1Ci 1EXP Xi2Ci 5XP Xi10i 1 1 5XP Xi2Ci 1EXP Xi1ci 1exp Xi2Ci 1exp Xi1ci 1exp Xi2ci eXp Xi1 Ci SXDW XH Ci SXDW XH Ci 1 exp Xi1 X12 7 which means we can just do a standard logistic regression for the observations with Y 31 Yig using the rst difference of the regressors This does not work for the probit niodel7 demonstrating that this is a very special case Imbens Lecture Notes 14 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics DISCRETE RESPONSE MODELS HI MULTINOMIAL CONDITIONAL AND NESTED LOGIT MODELS Here we focus again on models for discrete chOice with more than two choices We assume that the outcome Of interest the chOice Y takes on non negative integer values between zero and J Y E 01J Unlike the ordered case there is nO particular meaning tO the ordering Examples are travel modes bustraincar employment status employedunemployedOut Of the laborforce marital status singlemarrieddivorcedwidowed and many others We wish tO model the distribution Of Y in terms Of covariates In some cases we will distinguish between covariates X that vary by units individuals Or rms and covariates that vary by chOice and possibly individual Xij Examples Of the rst type include in dividual characteristics such as age or education An example Of the second type is the cost associated with the chOice for example the cost Of commuting by bustraincar This distinction only arises from the economics or general scienti c substance Of the problem McFadden developed the interpretation Of these models through utility maximizing chOice behavior In that case we may be willing tO put restrictions on the way covariates affect choices costs Of a particular chOice affect the utility Of that choice but not the utilities Of other choices The strategy is tO develop a model for the conditional probability Of choice 339 given the covariates Suppose the model is PrY j X PjX 9 Then the lOg likelihood function is M2 L09 J 1Yi 7 111309 0 0 ilj I MULTINOMIAL LOGIT Imbens Lecture Notes 14 ARE213 Fall 04 2 Suppose we only have individual speci c covariates Then we can model the response probability as expX J PFY le 1 2218mewa for choicesj 1J and 1 P Y X rlt 0 1L1expltX mgt for the rst choice This is a direct extension of the binary response logit model It leads to a very well behaved likelihood function and is easy to estimate More interestingly it can be viewed as a special case of the following conditional logit H CONDITIONAL LOGIT Suppose all covariates vary by choice and possibly also by individual7 but that is not essential here Then McFadden proposed the conditional logit model eXple PrYI leIo Xu 220 expXI forj0J The multinomial logit model can be viewed as a special case of this Suppose we have a vector of individual characteristics Xi with dimension K Then de ne for each choice 339 the vector of covariates Xij as the vector of dimension K gtlt J 17 with all zeros other than the elements K gtlt j 1 to K gtlt j 1 which are equal to Xi X1 0 0 0 X10 I 7 Xij 7 X1 7 XIJ 0 0 0 Xi HI LINK WITH UTILITY MAXIMIZATION Imbens Lecture Notes 14 ARE213 Fall 04 3 McFadden motivates this model by extending the latent index model to multiple choices Suppose that the utility for individualz39 associated with choicej is Uij ij3 817 1 Furthermore let individual 239 choose option 339 that is K j if that provides the highest level of utility7 or may if UijgtUH foralll0J ties have probability zero because of the continuity of the distribution for 8 Now suppose that the 817 are independent accross choices and individuals and have type I extreme value distributions Then the choice Y1 follows the conditional logit model The type I extreme value distribution has cumulative distribution function F6 exp eXp6 and probability density function fe exp7e exp7 exp7e This distribution has a unique mode at zero7 a mean equal to 0587 and a a second moment of 199 and a variance of 165 See Figure 1 for the probability density function and the comparison with the normal density Note the assymetry of the distribution Given the extreme value distribution the probability of choice 0 is PrUi0 gt U117 7UVZ39O gt Prlt8i0 7 gt 811781390 i gt 8 00 SioX O iX 1 Sio f Xsz infJ f8i0 flt8ugtd8i0 7d8ij 700 7 700 co Imbens Lecture Notes 14 ARE213 Fall 04 exp780i eXp7 exp780i exp7 exp7ei0 7 Xgo X1516 gtlt eXp7 exp7ei0 7 Xi o Xi J DdeZO exp780i exp 7 exp7eOZ 7 exp7ei0 7 Xgo X1516 eXp8i0 Xlo XlJ d8i0 expXi O 210 expltXzo gt To see the different steps in this derivation note that explt7egt explt7 explt7egtgtde 7 M 7 explt7 explt7cgtgt for the extreme value distribution Also7 exp7e exp7 eXp76 7 6d6 expkn C exp exp77d77 7 expltcgt explt7ngt explt7 explt7ngtgtdn 7 expltcgt by Change of variables7 which we apply with c 7 71nlt1 expXli 7 Xzom exam 7 X153 IV INDEPENDENCE OF IRRELEVANT ALTERNATIVES Imbens Lecture Notes 14 ARE213 Fall 04 5 The main problem with the conditional logit is the property of independence of irrelevant alternative HA Consider the conditional probability of choosingj given that you choose either 339 or l7 PrY jlY E jl eXple prol jlYi E 77 1 W This probability does not depend on the characteristics of alternatives other than 339 and Z This is sometimes unattractive McFaddenls famous blue busred bus example illustrates this Suppose there are three choices commuting by car7 by red bus or by blue bus A sensible model would be to think that people have a preference over cars versus buses7 but are indifferent between red versus blue buses That would imply that the conditional probability of commuting by car given that one commutes by car or red bus would probably differ from the same conditional probability if there is no blue bus Presumably taking away the blue bus choice would lead all the current blue bus users to shift to the red bus7 and not to cars The solution is to allow in some fashion for correlation between the errors in the latent utility representation With choice set that contains multiple versions of essentially the same option7 we should allow the latent utilities for these choices to be identical7 and so the error terms would have to be perfectly correlated This can be done in a number of ways We analyze the rst one in the following discussion 111 NESTED LOGIT One way to induce correlation between the choices is through nesting them Suppose the set of choices 01 J can be partitioned into S sets B1 Bs so that 01JUf1Bs Let ZS be set speci c variables It may be that the set of set speci c variables is just a vector of indicators7 with ZS an S vector of zeros with a one for the 5th element Now let Imbens Lecture Notes 14 ARE213 Fall 04 6 the conditional probability of choice 339 given that K 6 BS be equal to 8Xp 1X39 PFCi thYi 6 B9 W ln addition suppose the probability of set BS is eXpZ a 21635 eXpp1Xlz ps PrYi e leXi 7 S 211 expltzzagt 2163 expo valor If we x pg 1 for all s then eXpXl Z160 PrYi f 2151 2163 10913 ZtOt and we are back in the conditional logit model The extra coef cient p5 implicitly allows for correlation of the errors in The joint distribution function of the 817 is 5 pg Fei07 8 exp lt Z expZs oz exp pje gt 91 jEBs Within the sets the correlation coef cient for the 817 is equal to 1 7 p2 Between the sets the 817 are independent How do you estimate these models One approach is to construct the log likelihood and directly maximize it That is complicated7 especially since the log likelihood function is not concave7 but it is not impossible An easier alternative is to directly use the nesting structure Within a nest we have a conditional logit model with coef cients ps Hence we can directly estimate pS using the concavity of the conditional logit model Denote these estimates of ps by Then the probability of a particular set BS can be used to estimate p5 and 04 through A p5 expZa 2163 eXpXfl pslgt expZs oz psWs 21 expZt oz ZleBt expXgl ptgtps 251 expltZlOz tlVt Pro 6 leXi Imbens Lecture Notes 14 ARE213 Fall 04 7 where W5 ln expXi l psgt leBS known as the inclusive values Hence we have another conditional logit model back that is easily estimable These two step estimators are not ef cient The variancecovariance matrix is provided in McFadden 1981 These models can be extended to many layers of nests See for an example of a complex set of nesting Goldberg 1995 It should be noted that both the order of the nests and the elements of each nest are very important REFERENCES GOLDBERG P7 19957 Product Differentiation and Oligopoly in International Markets The Case of the Automobile Industry77 Ecortametrica7 637 891 951 MCFADDEN D7 1981 Econometrica Models of Probabilistic Choice77 in Structural Analysis of Discrete Data with Econometric Applications Manski and McFadden eds 198 2727 MlT Press7 Cambridge7 MA extreme value distribution solid and normal distribution dashed 896 PINELOPI KOUJIANOU GOLDBERG Household Buy At Least Do not Buy One Car a Car Buy at Least Buy Only One New Car Used Car Class1 Class2 Class9 Foreign Domestic Model Model FIGURE 1 Automobi1e choice model As shown in McFadden 1978 1981 the assumption of the generalized extreme value distribution implies that the conditional choice probabilities at each node s of the tree as well as the marginal probability of purchasing a car will be given by multinomial logit formulas that have the following general form Pifjs l IifAisAjs l k g Il lsAksAjs 1 E js l where Iif10g Z eXPXs19s1Ai PECis s The subscript is denotes a specific alternative within the choice set CjH where js1 denotes the previous stage choice on which the current decision is condi tioned similarly the subscript s 1 refers to one tree node below the current one X 839 represents a vector of explanatory variables specific to alternative is at stage s and 0s is the parameter vector specific to stage s to be estimated The inclusive value terms I measure the expected aggregate utility of subset is while the coefficients AssAsk1 which are estimated along with the parameters 0 in the model re ect the dissimilarity of alternatives belonging to a particular subset As McFadden 1978 has shown the nested structure depicted in Figure 1 is consistent with random utility maximization if and only if the coefficients of the inclusive value terms lie within the unit interval As the dissimilarity coefficients approach 1 the distribution of the error terms tends towards an iid extreme value distribution and the choice probabilities are given by the simple multinomial logit model As the coefficients approach 0 the error terms become perfectly correlated and consumers choose the alternative with the highest strict Irnbens Lecture Notes 5 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics ORDINARY LEAST SQUARES V MEASUREMENT ERROR W 44 Here we look at the effect of measurement error on least squares estimates In addition to classical measurement error we look at measurement error that is uncorrelated with the reported value In addition we look at measurement error in the regressors as well as in the outcome variable In all cases the key equation is the omitted variable bias formula Let X denote the true value of a variable of interest7 and X the recorded value The measurement error is the difference between the recorded and true value EEXin 1 The standard Classical Measurement Error CME model7 assumes that the measurement error is independent of the true value 8 L X Assuming that the measurement error has mean zero7 this implies E8X 0 This model is typically defended by reference to physical measurement models where often passive recording of measurements based on imprecise measuring instruments takes place The alternative model7 which we refer to as the Optimal Prediction Error OPE model7 is based on the assumption that the measurement error is independent of the reported value 8 L X This implies EMX 0 An argument in support of this model views the agent reporting the data as fully aware of the lack of precision of the measuring instrument Suppose the agent is asked to provide the value of some variable The agent has no way of ascertaining the true value X of this variable7 but has available a awed or noisy measure7 X X x with the measurement error 77X independent of the true value of the variable7 exactly as in the CME model However7 suppose that the agent is aware of the lack of precision of the measurement7 and corrects for this by reporting the best estimate of the Imbens Lecture Notes 5 ARE213 Fall 04 2 underlying true value X based on this measurement To operationalize this we interpret best in terms of a quadratic loss function7 which implies the agent would report the expected value of the true value given the agents information set An alternative would be to assume absolute value loss7 in which case the agent would report the median of X given the information set For most of the illustrative calculations below the mean and median will give the same answers because we assume normality Under this interpretation the error e X7X should have mean zero given the information set ofthe agent Since the reported value is clearly in the information set7 this implies that the error has mean zero given the reported value A crucial ingredient in the OPE model is the information set It may be that the respon dent only has a single unbiased measurement of the underlying true variable Alternatively7 other variables7 which themselves may enter the econometric model of interest7 may be used to produce this estimate For example7 Ashenfelter and Krueger 1994 survey twins and ask each sibling to report both their own education and their siblings education To the ex tent that a respondent is not fully aware of their siblings education level7 but has knowledge of related items7 such as occupation7 it is plausible that such information would be used to infer the education level Let us consider a homoskedastic linear regression model for two scalar variables Y and X Yoz XV 2 where V l X EVX 07 VarVlX IE and the parameter of interest7 is the ratio of the covariance of Y and X over the variance of X Possibly mismeasured values Y and X are recorded We consider the implications for least squares estimates of based on a random sample from Y7 X of various properties of the measurement error The basic piece of information available to the respondent is assumed to be a pair of Imbens Lecture Notes 5 ARE213 Fall 04 3 noisy measures of the underlying variables ZVAX77X7 We assume that the basic measurement errors 77X77y are independent of the true value of the regressor X and of V but potentially correlated with each other For expositional reasons we also assume joint normality X uX a 0 0 0 y 0 0 a5 0 0 N 0 0 0 2 lt3 77X U W p XmUWX any 773 0 0 0 p x 71y Unx U 71y U72y We consider three cases relating the basic measurements X and 7 to the reported values X and Y The rst case is the CME model where the reported value is identical to the unbiased measurement In addition we consider two versions of the OPE model The rst OPE1 is where the respondent reports their best estimate based only on the noisy measure of the mismeasured variable itself The second OPE2 is where the respondent reports their best estimate based on the noisy measures of both variables Table 1 summarizes the three models In each of the three cases we consider the sign of the difference between the Table 1 THREE MODELS FOR MEASUREMENT ERROR Reporting Model X Y Classical Measurement Error XCME X YCME 7 Optimal Prediction Error 1 XOpE1 YOpE1 Optimal Prediction Error 2 XOpE2 EXlXT YOpE2 EYlXT probability limit of the least squares estimator using the noisy measures YX7 and the limit of the least squares estimator 6 using the true values YX call this SignB 76 I MEASUREMENT ERROR IN THE REGRESSOR Imbens Lecture Notes 5 ARE213 Fall 04 4 First we consider the case with Ile 07 where the measurement error is con ned to the regressor Thus Y T Y under all three models First we brie y review the CME case The reported value is XCME X X 77X Let us use the proxy variable bias argument to derive the bias We are interested in the regression function Yioz XV 4 but7 using X X 77x as a proxy for X we estimate gtltz Y5y6 6 5 To get the bias it is also useful to consider the regression function Y07BlXBZXw 6 In the last equation the population value for the second parameter is 32 0 because of the independence between 77X and 8 so that w 8 Running 4 rather than 6 and thus ommitting X leads to no bias in the coef cient on X Bl Running 5 rather than 6 and thus ommitting X leads to a bias in the coef cient on X 6 e B c quot quot 7 gr 7 1 ovX 7XVarXi 1a iag Hence the least squares estimator therefore underestimates the coef cient in the regression with the true values 7 CovYXCME 7 0 CME T VarXCME T 0 HEIX which is less than in absolute value This is the standard case of classical measurement error leading to a bias towards zero Next consider the OPE1 case The reported value X is linear in the unbiased measure ment X with coef cient 10721X1U 10 721X XOPE1 EXlX EXlX 77X Imbens Lecture Notes 5 ARE213 Fall 04 5 1U3 1U721X X X 1a 1agX 1a 1UX This follows from the joint normality of X and To see the bias from this model consider the regression function YT 04339X Ot X0PE197 with the composite error terms 5 equal to 5 V 3 39 XT XOPE139 Since by assumption in the OPE1 model the reporting error XOPE1 7 X is independent of the reported value X the composite error terms 5 is independent of X and there is no bias resulting from the measurement error or 3me 6 Finally consider the case where the respondent adjusts the report to take into account not just the unbiased measurement X but also the accurately measured outcome Y XOPE2 EXXi7 EXlXY This can be interpreted as estimating X based on two noisy measurements X X77X and Y 704 XV with uncorrelated errors 77X and V The resulting reported value is therefore a weighted average of the population mean u and the two unbiased measurements 1U M 1U721X 10 1agx 62ag X 10 1agx 203 XOPE2 m 1U c1U x zUE Y7oz Al HX2 X 3 7 Imbens Lecture Notes 5 ARE213 Fall 04 6 with all M 2 0 and A1 A2 A3 1 We can rewrite this as a linear function of the true value and independent disturbances V X0PE2Ai39ux23X23977x339B Simple but tedious calculations show that the probability limit of the least squares estimator is equal to 1U c103X zUE g 1 OPE2 zUg 1U721X I 10 32Uglz which is greater than in absolute value In this case the least squares estimator over estimates the magnitude of the regression coef cient due to the correlation between the reported value and the disturbance V in the regression which is induced by the use of Y in producing the best estimate of the regressor X H MEASUREMENT ERROR IN THE OUTCOME VARIABLE Now we consider measurement error in the outcome variable and assume the regressor is accurately measured 02X 0 and thus X X X Under the CME assumption we can write the regression model as YYYnyoz XVny By assumption both components of the composite error term V 77y are independent of X so there is no bias and CME Next consider the case where the agent reports Y The unconditional mean of Y is or uX with variance z 0 03 and so the best estimate of Y based on 57 is 132U U 57 1 152a a31agy 162a a31a y39 YOPE1 04 3 MX Imbens Lecture Notes 5 ARE213 Fall 04 7 The slope coef cient in a regression Of Y on X is so the slope coef cient in a regression Of YOpE1 on X is 7 1 UZY OPE 1 2 0 03 111 which means 3me is biased towards zero Finally consider the case where the respondent reports the best estimate Of Y given Y and X Based on X alone the best estimate Of Y would be 04 X Knowledge Of both X and Y can be interpreted as knowledge Of both 04 X and Y 7 Oz 7 X 77y V Hence we can write memEpp X1asXeEmXi a XEVlXinyvl a XEVlnyvl 103 WW WY 1a421UT 39 V 7IY Because V and 77y are independent Of X again there is no bias from regressing YOpE2 on X 111 MEASUREMENT ERROR IN BOTH THE REGRESSOR AND OUTCOME VARIABLE Here we consider the case where both regressor and outcome are measured with error In each case the individual reporting the variables has available an unbiased measurement XX77X7 YY77Y7 with possibly correlated errors quotXIXVN0 0 WW WY 0 pnxmU x UWX any Imbens Lecture Notes 5 ARE213 Fall 04 8 We look at the bias resulting from the three models considered before OME OPE1 and OPE2 In general with the errors in X and 57 77X and 77y respectively potentially correlated the biases from measurement error cannot be signed If the correlation between the measurement errors is zero the direction of the bias follows intuitively from the previous calculations These results combined with those of Sections 41 and 42 are reported in Table 2 If the correlation between 77X and 77y is close enough to one the bias will always be Table 2 MEASUREMENT ERROR BIAS IN SLOPE COEFFICIENT Reporting Model CME OPE1 OPE2 XX XEXlX XEXlXY 0W any YY YEYlY YEYlXY No Error 0 0 no bias no bias no bias Error in Regressor Only gt 0 0 towards zero no bias away from zero Error in Outcome Only 0 gt 0 no bias towards zero no bias Error in Both zero correlation gt 0 gt 0 towards zero towards zero away from zero upward and if it is close enough to negative one the bias will always be downward To see how big these effects can be we report in the Section 5 some numerical calculations based on numbers relevant for wage regressions IV MEASUREMENT ERROR IN WAGE REGRESSIONS AND BOUNDS ON THE RETURNS TO EDUCATION As an example we consider the regression of the logarithm of wages on education where both may be measured with error and the interest is in the coef cients from the regression based on the true values In some cases one can argue that interest should be in the regression on perceived values For example if individuals do not know their own income Imbens Lecture Notes 5 ARE213 Fall 04 9 with certainly one may argue that their estimated income is more relevant for consumption decisions than true income Here we would argue that in answering a survey an individual may have insuf cient incentive to carefully check his or her records and that if the value of the variable is needed for making economically meaningful decisions one might acquire the relevant information We calculate some of the moments of hourly wages and education levels from NLSY data The earnings measure used is the logarithm of the usual weekly wage and the education measure is years of completed schooling The estimated regression function based on these data is 516 0061 X 009 0006 The standard deviation of the log wage is 0y 043 and the standard deviation of the education level is 0X 22 To nd appropriate numbers for the measurement error variances we turn to some of the validation studies For the measurement error in the education level we take our numbers from the Ashenfelter and Krueger 1994 study Ashenfelter and Krueger asked twins about their own education as well as their twin siblingls level of education Using those data they estimate a reliability ratio of approximately 90 implying that the variance of the mea surement error is approximately ten percent of the variance of education We therefore use LIX xOTX 0X 063 For log wages we take our numbers from Bound and Krueger 1991 and Pischke 1995 who analyze the validation study of the PSlD Their numbers suggest a reliability ratio of 75 and hence any gtlt 0y 03 Although these validation studies are obviously different from the NLSY in the way individuals were selected and in the formulation of the questions and the estimates are all based on the CME assumption they may be informative about the relative amount of measurement error for the earnings and education measures Based on these error variances and the distribution of the observed variables we calculate the true parameter values 0 and percentage bias X 100 under different measurement error scenarios Table 3 summarizes the results The results in the rst three Imbens Lecture Notes 5 ARE213 Fall 04 10 Table 3 TRUE RETURNS TO EDUCATION IN THE PRESENCE OF MEASUREMENT ERROR ESTIMATED RETURNS TO EDUCATION ARE EQUAL TO 0061 Reporting Model OME OPE1 OPE2 XX XEX1X XEXX17 Y I YEYli7 YEYX17 U x any p x 71y No Error 000 000 0061 0 0061 0 0061 0 Error in Regressor 063 000 0069 12 0061 0 0055 11 Error in Outcome 000 030 0061 0 0077 21 0061 0 Error in Both 063 030 090 0108 44 0109 44 0076 20 063 030 050 0090 32 0095 36 0068 10 063 030 000 0069 12 0077 21 0057 7 063 030 050 0047 30 0060 2 0046 33 063 030 090 0030 103 0046 33 0036 69 rows with measurement error in at most one variable re ect the qualitative results in Sections 41 and 42 For example in the second row with only measurement error in the regressor comparing the estimated parameter of 0061 with the true parameter value of 0069 implies that the estimated value is biased downward by 12 The largest bias in these three rows is on the order of 21 When both variables are measured with error and with the errors correlated the bias can get much larger With zero correlation the bias for the classical measurement error model is 12 Allowing the correlation between measurement errors to go to 090 the bias goes to 44 and with the correlation up to 090 the bias goes to 103 Similarly for the other reporting models the bias goes up considerably although not quite as much as under the CME model One conclusion is that the classical measurement error model may overstate the biases associated with measurement error as well as understate them A second point is that although classical measurement error alone in the dependent variables does not lead to bias Imbens Lecture Notes 5 ARE213 Fall 04 11 if correlated with measurement error in the regressors it can affect the results considerably REFERENCES HYSLOP D AND G IMBENS T7 20017 Bias from Classical and Other Forms of Measurement Error77 Journal of Business and Economic Statistics Imbens Lecture Notes 1 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics ORDINARY LEAST SQUARES I ESTIMATION lNFERENCE AND PREDICTING OUTCOMES W 421 4 Let us review the basics of the linear model We have N units individuals rms or other economic agents drawn randomly from a large population On each unit we observe on outcome Y for unit 239 and a K dimensional vector of explanatory variables X X1X2XK where typically the rst covariate is a constant X 1 for all 239 1 N We are interested in explaining the distribution of Y in terms of the explanatory variables X using a linear model Yi Xi 812 1 ln matrix notation Y X6 e or avoiding vector and matrix notation completely K YI l Xil i n i K XiK i gi Z k39Xik8i k1 We assume that the residuals e are independent of the covariates or regressors and normally distributed with mean zero and variance 02 Assumption 1 eilX N N002 We can weaken this considerably First we could relax normality and only assume indepen de nce Irnbens Lecture Notes 1 ARE213 Fall 04 2 Assumption 2 81 l Xi combined with the normalization that 0 We can even weaken this assumption further by requiring only mean independence Assumption 3 HeilXil 07 or even further7 requiring only zero correlation Assumption 4 Elsi Xi 0 We will also assume that the observations are drawn randomly from some population We can also do most of the analysis by assuming that the covariates are xed7 but this complicates matters for some results7 and it does not help very much See the discussion on xed versus random covariates in Wooldridge page 9 Assumption 5 The pairs Xth are independent draws frarn same distributian with the rst twa maments 0f Xi nite The ordinary least squares estimator for solves N min K 7 Xi 2 lt e gt This leads to B X X 1X Y The exact distribution of the ols estimator is 3 N 172 X5071 Imbens Lecture Notes 1 ARE213 Fall 04 3 Without the normality of the 8 it is dif cult to derive the exact distribution of However under the independence Assumption 2 and a second moment condition on e variance nite and equal to 02 we can establish asymptotic normality VMB i 6 N 0 02 EXX 1 Typically we do not know 02 We can consistently estimate it as A2 1 N A 2 quot mgw ml Dividing by N 7 K 7 1 rather than N corrects for the fact that K 1 parameters are estimated before calculating the residuals Y 7 B Xi This correction does not matter in large samples and in fact the maximum likelihood estimator N A 1 A 2 012m N Xigt 7 is a perfectly reasonable alternative So in practice whether we have asymptotic normality or not we will use the following distribution for B W v where v IEiXX irl I mixKiri lt2 and V z Mg 7 lt3 Often we are interested in one particular coef cient Suppose for example we are inter ested in 6 In that case we have Bk N31ka7 Imbens Lecture Notes 1 ARE213 Fall 04 4 where Vi is the Lj element of the matrix V We can use this for constructing con dence intervals for a particular coef cient For example7 a 95 con dence interval for 01 would be 3 i 196 xka k i 196 We can also use this to test whether a particular coef cient is equal to some preset number For example7 if we want to test whether 0k is equal to 017 we construct the t statistic A 7 01 t L V ka and compare it to a normal distribution Let us look at some real data The following regressions are estimated on data from the National Longitudinal Survey of Youth NLSY The data set used here consists of 935 observations on usual weekly earnings7 years of education7 and experience calculated as age minus education minus six Table 1 presents some summary statistics for these 935 observations The particular data set consists of men between 28 and 38 years of age at the Table 1 SUMMARY STATISTICS NLS DATA Variable Mean Median Min Max Standard Dev Weekly Wage in 421 388 58 2001 199 Log Weekly Wage 594 596 406 760 044 Years of Education 135 12 9 18 22 Age 331 33 28 38 31 Years of Experience 136 13 5 23 38 time the wages were measured We will use these data to look at the returns to education Mincer developed a model that leads to the following relation between log earnings7 education and experience for individual 239 logearningsl 01 02 educi 03 experl 04 experl2 8139 Imbens Lecture Notes 1 ARE213 Fall 04 5 Estimating this on the NLSY data leads to logltarIE1gsi 4016 0092educ 0079exper 7 0002exper2 0222 0008 0025 0001 The estimated standard deviation of the residuals is I 041 In brackets are the standard errors for the parameters estimates based on the square roots of the diagonal elements of the variance estimate Using the estimates and the standard errors we can construct the con dence intervals For example a 95 con dence interval for the returns to education measured by the parameter 327 is 01095 00923 7196 00080092 196 0008 0077501071 The t statistic for testing 02 01 is 7 00923 7 0113901727 0008 so at the 90 level we do not reject the hypothesis that 02 is equal to 01 Now suppose we wish to use these estimates for predicting a more complex change For example suppose we want to see what the estimated effect is on the log of weekly earnings of increasing a persons education by one year Because changing an individualls education also changes their experience in this case it automatically reduces it by one year this effect depends not just on 01 To make this speci c let us focus on an individual with twelve years of education high school and ten years of experience so that exper2 is equal to 100 The expected value of this personls log earnings is logearIE1gsi4016 009212 007910 7 000210057191 Imbens Lecture Notes 1 ARE213 Fall 04 6 Now change this personls education to 13 Their experience will go down to 9 and exper2 will go down to 81 Hence the expected log earnings is logearE1gsi 4016 0092 13 0079 9 7 0002 81 57696 The difference is 02 7 03 7 19 04 00505 Hence the expected gain of an additional year of education7 taking into account the effect on experience and experience squared is the difference between these two predictions7 which is equal to 0051 Now the question is what the standard error for this prediction is The general way to answer this question is as follows The vector of estimated coef cients 0 is approximately normal with mean 0 and variance V We are interested in a linear combination of the 07s7 namely 02 7 03 7 19 04 X07 where A 0171719 Therefore X0 NX0XVA where V is the variance in equation In the above example7 we have the following values for the covariance matrix V 00494 700011 700047 00001 00001 0000 00000 00006 00000 00000 Hence the standard error of X0 is 000967 and the 95 con dence interval for X0 is 0031100693 The second method for getting an estimate and standard error for X0 is very easy in the linear case We are interested in an estimator for X0 To analyze this we reparametrize from 01 01 02 to 70203193904 03 03 04 04 Imbens Lecture Notes 1 ARE213 Fall 04 7 The inverse of the transformation is 31 32 Y s1939 4 3 34 Hence we can write the regression function as logearningsl 01 y 03 19 04 educl 03 experi 04 exper 81 31 Y educl 02 experi educi 04 experl2 19educi 8139 Hence to get an estimate for y we can regress log earnings on a constant7 education7 expe rience minus education and experience squared minus 19 times education This leads to the estimated regression function logltarIE1gsi 4016 0051 educl 0079 experi educi 7 0002 experl2 19 educi 0222 0010 0025 0001 Now we obtain the estimate and standard error directly from the regression output Let us also look at a nonlinear version of this Suppose we are interested in the regression of log earnings on education The estimated regression function is logearr1gsi 50455 00667educi 00849 00062 The estimate for 02 is 02 01744 Now suppose we are interested in the average effect of increasing education by one year for an individual with currently eight years of education7 not on the log of earnings7 but on the level of earnings At x years of education the expected level of earnings is 1Eearningsleduc x expwi 32 39 25 0237 Imbens Lecture Notes 1 ARE213 Fall 04 8 using the fact that if Z N NULUZ then ElexpZ expu 022 Hence the parameter of interest is 6 expwl g 9 022 7 expwl 62 8 022 Getting an estimate for 6 is easy Just plug in the estimates for 6 and 02 to get expBl 32 9 4 322 7 expBl 32 8 4 322 199484 However7 we also need a standard error for this estimate Let us write this more generally as 6 97 where y 6 02 We have an approximate distribution for 7 WW 4 y m M0 9 Then by the Delta method7 W99 7 m N o In this case 3g explt o31399022 eXp o i398Uz2 67 9exp o l9UZ278exp o l8022 V expwowl 9022gt e expwo 18022 We estimate this by substituting estimated values for the parameters7 so we get 5g eXp E0 4 81 9 4 322 7 expw0 4 Bl 8 322 199484 67 9 expwo 61 9 622 7 8 expwo 618 622 4685779 7 gecpr 619 422 e expwo 4 61 8 422 9974 To get the variance for 6 9W we also need the full covariance matrix7 including for the parameter 02 Using the fact that because of the normal distribution the estimator for 02 is Imbens Lecture Notes 1 ARE213 Fall 04 9 independent of the estimators of the other parameters7 and that it has asymptotic variance equal to 2047 we have A 67382 704873 00000 9 00362 00000 00608 Hence the variance for the parameter of interest is 1 6 A6 1 199484 67382 704873 00000 199484 N6ly 26ly7 4685779 00362 00000 4685779 7 7 99742 00608 99742 127562 Finally7 suppose we are interested in the average effect on the level of earnings of in creasing education levels by one year That is7 for each individual we estimate the effect of increasing their education level by one year7 from whatever level it was7 followed by averaging over all individuals In terms of the parameters of the linear regression model7 the parameter of interest is now a much messier function 090 1M2 expwl 02 educl 1 022 7 expwL 02 educl 022 i 1 Substituting estimated values for the parameters leads to N 0 g0 ltexp01 02 educl 1 022 7 exp01 02 educl 92 290527 41 Even though this is a much messier function7 the principle is the same We already have the covariance matrix Q for 0 and 02 we just need the derivatives of the new transformation 97 expwl 02 educl 1 022 7 expwl 02 educl 022 N i Z educl 1 expwl 02educl 1 022 7 educl expwl geduci 022 N391 lex d 1 2271 2 z 2 p 1 2 e ucl U 2exp0102 educlU2 57 The standard error in this case is 28895 Imbens Lecture Notes 9 ARE213 Fall 03 1 ARE213 Econometrics Fall 2003 UC Berkeley Department of Agricultural and Resource Economics MAXIMUM LIKELIHOOD ESTIMATION IV CLASSICAL TESTING W After estimating the exponential model for the unemployment durations Lancaster con siders an extension Consider the hazard function or escape rate M49670 hy 79i1 10PTy S Y ltyAW S YIXVA The hazard function is just another way of characterizing a distribution like the density function the distribution function the survivor function the moment generating function or the characteristic function It is just a particularly convenient and interpretable way of describing a distribution or durations Given the hazard you can calculate the distribution function as y IQ719 1T 8Xplt hsx0dsgt 0 and hence the density function The exponential model implies that the hazard function stays constant over the duration of the spell equal to expx in our previous speci cation To see what this means take a person and look at their chances of nding a job on the rst day of being unemployed These chances are the same as the chances that this same person would nd a job on the fthieth day given that he has been unsuccessful in nding work in the rst forty nine days This may be reasonable but it might also be something you do not wish to impose from the outset Lancaster therefore considers an extension allowing the hazard function to either increase stay constant or decrease over time This extension is known as the Weibull distribution hwwy ycv a 1 74 expx Imbens Lecture Notes 9 ARE213 Fall 03 2 Note that this reduces to the exponential distribution if 04 0 The implied density function for the Weibull distribution is mwm a 1 wepr exp7ya1expltmgt The moments of this distribution are EYle expltikx Oz 1 r Note that for the case with 04 0 this reduces to the exponential case with ElYlel apecw Plt1 k and thus with k 1 the mean of the exponential distribution is EYlX exp7x The log likelihood function for this model is Lo Z lt1na 1 04 lnyi mm ygoz 1 expx gt H 1 One can estimate this model using any of the numerical methods described before Newton Raphson Davidon Fletcher Powell The one minor complication is that numerical algo rithms have to take account of the restriction that 04 gt 71 with 04 71 the density is degenerate and all probability mass piles up at y 0 Table 1 presents the maximum likelihood results for both the exponential and Weibull models The standard errors are based on the second derivatives evaluated at the maximum likelihood estimates see discussion below To test the hypothesis Oz 0 against the alternative hypothesis Oz 31 0 there are three classical tests the likelihood ratio the Wald and the score test We shall consider the three tests in a general context as well as giving the formula for the case where we are testing the Imbens Lecture Notes 9 ARE213 Fall 03 3 Table 1 EXPONENTIAL AND WEIBULL ESTIMATES Exponential Model Weibull Model scale 02012 intercept 45086 33686 age 00168 00066 00069 00067 educ 01685 01405 locrat 00435 00349 scale parameter in the Weibull model Suppose we have a model for a random variable Z7 specifying the density function 7 607 where we split the parameter vector 6 into two parts7 6 6661 The dimension of 60 is K0 the dimension of 61 is K1 and the dimension of 6 is K K0 K1 In our example7 zl yiwi 60 oz 61 6 K0 1 and K1 4 three regressors7 age7 education7 and locrate7 and an intercept We are interested in testing the null hypothesis H0 60 07 against the alternative hypothesis H1 60 7E 0 Let 6 60m61u denote the unrestricted mlels In our example7 60 o 702012 and m Bweib 73368600069 70140300349 Also let 061 061 denote the estimates based on the restricted model7 that is7 based on the restriction 60 07 so that in our example Ben 745086 00168 701685 00453 Let L090 01 denote the log Imbens Lecture Notes 9 ARE213 Fall 03 4 likelihood function 1 N L090 61 7 Z In f2il007 61 7 Irma 1 a 111m m 7 m 1 expxl gt 11 H N argmax91L0 91 argmax Z 7 yi expz 171 and N Am an argmax9091Lt90 91 argmaxa Z lt1na 1 or lnyi mm 7 ylla 1 expz gt i1 In addition let 82 90 91 denote the score function alnfzl00 91 71 lny 7lnyya1expx 390 7 a1 82700701 lt 831191fz 00701 xixya1 expx 7 let Hz0061 be the Hessian Hz 90 91 ig algo l 9 2l00701 aila xzwoyen gamma 7mi nmzywexpltwgt 7x 1nyy 1expx 7x 1nyy 1expx 7xx y 1expx and7 nally7 let 10 91 be the information matrix evaluated at 190191 090191 7E HZ 90191 We will use various estimates of 109361 depending on where we evaluate the matrix and how we calculate or approximate the expectation For the second7 there are three choices 1 Use the average of the second derivatives Imbens Lecture Notes 9 ARE213 Fall 03 5 2 Use the average of the outer product of the rst derivatives 1 N i I I 10 7 N 521 9521 0 3 Use the expectation of the rst or second derivatives 10 E lHZy l E152705270 l This choice is very uncommon7 because typically we dont actually specify the full density7 only the conditional density of Y given X Hence we cannot calculate the full expectation7 and if we are calculating only the conditional expectation there do not seem to be a lot of advantages to bothering with that at all For the rst choice the possibilities are to evaluate the estimate at 1 The restricted estimates a 2 The unrestricted estimates 0 The leading choices include the average ofthe second derivative evaluated at the unrestricted estimates 05735 731586 00282 700834 00252 A 1 N A 731586 1197647 728217 716543 750445 117NZ7Mzi u 00282 728217 02122 701859 00660 i1 700834 716543 701859 05196 700067 00252 750445 00660 700067 10344 or the average of the second derivative evaluated at the restricted estimates 08099 747084 00393 701069 00343 A 1 N A 747084 1296672 728933 714860 751158 127NZ7Mzm9r 00393 728933 02090 701812 00691 21 701069 714860 701812 05119 700126 00343 751158 00691 700126 10389 Imbens Lecture Notes 9 ARE213 Fall 03 6 or estimates based on the rst derivatives at the restricted estimates 07787 754504 00402 700411 00375 754504 1018923 720587 709485 731203 Sz0z0 00402 720587 01412 701177 00419 1 700411 709485 701177 03241 700233 00375 731203 00419 700233 06575 A 1 13N39 M2 or at the unrestricted estimates 09221 763600 00559 700743 00502 763600 1282672 727230 710994 742049 Sz0u5z0u 00559 727230 01911 701652 00554 1 700743 710994 701652 04388 700231 00502 742049 00554 700231 08652 A 1 I4NI M2 All three classical tests are based on the quadratic approximation to the log likelihood function around the true and therefore maximizing in the limit values 93193 The three are rst order equivalent meaning that if the null hypothesis is correct their difference multiplied by N converges to zero in probability First if the null hypothesis is true and 93 0 the value of the log likelihood function at 190T 07 91 should not be much smaller than the value of the log likelihood function at 0W01u This is the basis of the Likelihood Ratio Test Formally the test statistic is LR 2 M1202 171 L0 irgt 2 723695 7 723845 300 If the null hypothesis is true this statistic has for large N a Chi squared distribution with degrees of freedom equal to the dimension of 90 one in our example Second if the limiting log likelihood function is maximized at 90 0 the derivative of the log likelihood function with respect to 90 at that point should be be close to zero This is the basis of Raols Score Test or the Lagrange Multiplier Test Formally N N 1 A 271 A i5182i001TI 39 E SltZi061T i1 Imbens Lecture Notes 9 ARE213 Fall 03 7 Note that the LM statistic is can also be written as 1 LM7 Ni N 50Zi707 1rl jOO 25021397 Ov lT7 1 i1 M2 but that this is in general not equal to N N ZSOzZ my Em my i1 i1 Any estimate of the information matrix can be used Typically researchers do not use 2 because it would require calculation of the unrestricted estimates7 and the key advantage of the Lagrange Multiplier test is that it avoids calculation of the unrestricted estimates There is some evidence that using the average of the second derivatives and therefore 11 is to be preferred over calculation of the expectation ie7 If only a conditional density is speci ed calculation ofthe latter is dif cult in any case because calculation ofthe expectation requires speci cation of the full density One particular form that is popular for the LM test is LNSS S 1S LN LM N LNLN 3234 where LN is the N vector of ones so that LNLN N7 and S is the N gtlt K matrix with the 2th row equal to the score vector 5217 r so that 4V8 5217 07 This form has the interpretation of N times the uncentered R2 in a regression of a vector of ones that is7 LN on the scores 8 The least squares coef cients are B S S 1S LN 700118 and the R2 is e A A W A 71 R2YYJNLNMMOO677 Y Y LNLN LNLN LNLN Imbens Lecture Notes 9 ARE213 Fall 03 8 Finally the restricted and unrestricted estimates of 90 should be close together if the null hypothesis is correct or in other words the unrestricted estimate of 90 should be close to zero This is the basis of the Wald Test Formally the Wald test is de ned as W N o0 fow i001o0 71 Nogi001o0 where foo is the top left part ofthe inverse of Again any ofthe estimates of the information matrix can be used Here typically the average evaluated at the unrestricted estimates fl 05735 or Z 09221 leading to W 3373 and W 2098 respectively are used because of their superior properties if the null hypothesis is false Result 1 As N goes to in nity under the null hypothesis LR i X2dimi90 Result 2 As N goes to in nity under the null hypothesis LRiLM o LBJ1130 LMiiVio For formal proofs of these two results see for example Engle 1984 or Holly 1985 Here is an informal argument for the case where 90 is a scalar and there are no nuisance parameters no 91 Expand the log likelihood around the maximum likelihood estimate A 6L A A 1 32L A L09 m L09 6709 lt0 7 56720 o 7 92 Imbens Lecture Notes 9 ARE213 Fall 03 9 The derivative of the log likelihood function at the maximum likelihood function is equal to zero7 so 2 L 7 L0gt a 7 0 m N 7 0 I6 This is clearly very close to the test statistic from the Wald test Given the limiting distri bution W 7 9 Morlm the limiting chi squared distribution follows immediately To see the link with the score or Lagrange multiplier test7 expand the derivative of the log likelihood function around the true value 6L N 6L 62L 6796 679m lt0 e 0 We Evaluating this at 9 so that the rst derivative is equal to zero7 6L A 62L 00 0 T0297 andthus 7 62L 771 6L 7 lt676gt7672lt6gt 766 Renormalizing this gives N ma 7 6 10 1LN Esme Imbens Lecture Notes 9 ARE213 Fall 03 10 which by a central limit theorem has a limiting normal distribution with mean zero and 1 variance equal to I6 Then squaring both sides and multiplying by the information matrix gives A 1 N 2 N 0 7 9310 I61 mam demonstrating the approximate equality of the Wald and Lagrange multiplier or score tests Next we consider a fourth way of testing hypothesis in the same context as before We have a model for a random variable Z specifying the density function 7 607 where we split the parameter vector 9 into two parts 9 0661 We are interested in testing the null hypothesis H0 90 0 against the alternative hypothesis H1 90 7 0 Let uo and an denote the unrestricted mlels and ro 0 and rl denote the restricted mlels that is conditional on the restriction 90 0 The test we consider compares the estimates of the parameter not affected by the restric tion 1 and lu If the null hypothesis is true the two are estimating the same thing GI and should be close to each other The restricted estimate should be more precise because it exploits a true restriction If the null hypothesis is false it is likely the two estimators are estimating different things and there is likely to be a larger difference between them This is the basis of the Hausman Wu Test Imbens Lecture Notes 9 ARE213 Fall 03 11 Consider the unrestricted maximum likelihood estimates 1 0 1 In large sam ples 0703 d 100101 Waiwikwolzw v where we partition the the information matrix and its inverse as i 100 101 71 7 100 101 17110 I11 I 7 110 In 39 The asymptotic variance for m m 7 9 is therefore 11 1 1 1 1 1 vw wm i 01 I 113 113 110 100 7 101113 110 101113 See the appendix to these lecture notes for the partioned inverse of a matrix Note that I 7 If is positive semide nite Now consider the restricted ml estimate rl In large samples i A 1 d N0r1 01 N071111 The variance for this estimator is obviously smaller than the variance for the unrestricted estimator We can try to calculate directly the variance of the difference V N011 7 711 by looking at the joint distribution of rl and ul but there is a simpler argument that goes to the heart of the testing procedure Consider a larger set of estimators that includes both and ul as special cases 17A r1 u1 r1 u1 rl The variance of this estimator is V090 W21 V V09 7 91 2Ao 1 u1 7 91 Imbens Lecture Notes 9 ARE213 Fall 03 12 Taking the derivative of this variance with respect to A evaluated at A 0 gives 3V A A A A AO 2 39 C0T17 0M1 7 0T1 Now7 and this is crucial7 this derivative must be equal to zero because the estimator with A 0 is ef cient as the mle its asymptotic variance is equal to the Cram r Rao bound Hence the covariance of 9T1 and 9T1 7 9M1 is zero and thus V u1 V r1 6M 7 6T1 V6r1 V6u1 1 and thus or mail 7amp1 M0111 7 In Thus the general form of the Hausman Wu test in a maximum likelihood context is HW lu lr V 1u V 1r79 lu lr 7 N m 7 1 111 7 113817 m 7 91 N39 lu lr 39 1131110 100 101I 1110gt1011 1gt g 39 lu lT7 where the superscript 9 denotes the generalized inverse Under the null hypothesis that 90 07 this test statistic would have a Chi squared distribution with degrees of freedom equal to the rank of 101 note that this may differ from the degrees of freedom for the other tests The test is therefore obviously not in general going to be rst order equivalent to the other tests discussed here7 although it can be if the rank of 101 is high enough Imbens Lecture Notes 9 ARE213 Fall 03 13 The test requires the estimator under the null hypothesis to be ef cient something like a maximum likelihood estimator but the unrestricted estimator need not be ef cient In general the test can be used when the alternative hypothesis is relatively vague A note of caution with this test Taking the difference in two variance matrices even if nominally this difference should be positive semi de nite can often lead to extremely large test statistics when the difference is close to zero In case the variance is not positive de nite one can take generalized inverses As a result however the small sample properties of this test are not always attractive Example Consider a linear regression model Yi X i o Xli i 813 or in matrix notation Y X030 X131 8 We assume 8 is conditionally normal with mean zero and variance 02 We are interested in testing the null hypothesis Ho e 07 against the alternative Ha 60 31 0 We use the following notation X X 7 XOO XOl XX lt X3 Xi 7 XX 1 lt X10 X11 Imbens Lecture Notes 9 ARE213 Fall 03 14 The restricted estimator is BM X lX lY with distribution BM N Nlt 17U2 39 X131 The unrestricted estimator is BM XlngY X11X 1Y with distribution BM N N 317 02 X11 N 31702 Xfll Xf11X10X00 X01X1711X1071X01X1711gtgt So using the formula given above the variance for the difference is the difference in the variances and BM 7 Ban N N 07 02 39 Xf11X10X00 X01X1 11X1071X01X1711 v with the rank of the variance equal to the rank of X01 assuming both X00 and X11 have full rank In this case one can also calculate the variance directly with the same result Consider the special case where X0 and X1 are scalars In that case B is equal to BM 630 30 where 30 is the coef cient on X0 in a regression of X0 on X1 The Hausman test is testing whether 60 60 0 that is whether the product of the coef cient of the excluded regressor 60 and the coef cient on the included regressor in a regression of the excluded regressor on the included one 60 is equal to zero Clearly if the two regressors are uncorrelated the test has no power against the alternative that 60 differs from zero D Irnbens Lecture Notes 9 ARE213 Fall 03 15 Example The second example is an instrumental variables model Yi Xi 5i We are concerned that Xi is not orthogonal with the disturbance e For that contingency there is an instrumental variable available ZZ which we are con dent is independent of 8139 We are considering the null hypothesis that X is exogenous that is7 in this case7 independent of the disturbance e Formally H0 EXiei 07 against the alternative H1 EXiei 31 0 One approach is to test 7 0 in the regression Yi Xi YZi 5139 Here we use a Hausman test Estimate the model ef ciently under the null hypothesis 3 X Xgt 1ltX Ygt with approximate variance WBT 0 X XW Now consider the instrumental variables estimator u X X 1X Y7 where X ZZ Z 1Z X Imbens Lecture Notes 9 ARE213 Fall 03 16 with approximate variance VWJ WXXYf Obviously for this estimator to be well de ned we need the matrix X X to be invertible which in turn requires that the matrix Z X has at least full row rank The dimension of X is N gtlt K the dimension of Z is N gtlt M so the dimension of Z X is M gtlt K and so we need at least M 2 K instruments for this to work Using the fact that under the null hypothesis 3 is ef cient as the maximum likelihood estimator combined with the fact that the estimator Bu is consistent under weaker conditions the test statistic would be 1 71 71 71 71 71 79 rmezgxm quxm mv lmzam Zm iRXgt ltX X 1X Y 7 X X 1X Ygt Maintaining as before the assumption that the matrix Z X has full row rank and writing XXnmzm4zxm we can write the factor in the middle the inverse of the variance as 9 ltX ZZ Z 1Z X 1 i X ZZ Z 1Z X n nrl lf 77 77 is invertible this reduces to using the formula in the appendix 71 X ZZ Z 1Z X 1ltX ZZ Z 1Z X 1 n nrl X ZZ Z 1Z X 1 and the degrees of freedom of the test are equal to the number of regressors K the number of columns in In general the degrees of freedom are equal to the rank of the 77 which Imbens Lecture Notes 9 ARE213 Fall 03 17 is the same as the rank of the matrix 77 77 That is only regressors not included in the set of instruments are counted in the degrees of freedom So for example ifX X0 X1 and Z X0 Z1 X can be partitioned in X X0 X1 with X0 X0 and thus 77 770 771 with 770 0 In that case the degrees of freedom is equal to the rank of 771 D APPENDIX 1 Consider a matrix A B O D 39 Assuming the matrix is invertible its inverse is A B 1 7 E F O D 7 G H 7 with E A 1 131 7 OA lB 1OA 1 H D 1 D 10A i BD lO 1BD 1 F 7A 1 A 1BD i OA lB 1OA 1BD 1 G D 10A 1 131 7 OA lB 1OA 1 2 Suppose both A and B are invertible matrices and A B and A 1 B 1 are invertible Then A 1 7 A B 1 A 1A 1 B l 1A 1 Imbens Lecture Notes 9 ARE213 Fall 03 18 REFERENCES ENGLE R 1984 Wald Likelihood Ratio and Lagrange Multiplier Tests in Econo metrics in Griliches and lntrilligator eds Handbook of Econometrics Vol 111 Elsevier North Holland HAUSMAN 1978 Speci cation Tests in Econometrics Econometrica Vol 46 No 6 1251 1271 HOLLY A 1987 Speci cation Tests An Overview in Bewley ed Advances in Econometrics Cambridge University Press Cambridge Irnbens Lecture Notes 4 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics ORDINARY LEAST SQUARES IV CLUSTERING AND VARIANCE ESTIMATION W 634 Mo When we looked at the standard linear model Yi Xi 8m or in matrix notation Y X e we assumed we had independent observations Often that is not quite true In general this makes progress dif cult but let us put on some more structure Suppose that the pairs Yi X are clustered Let S be index for the cluster so that with K clusters S E 1 K Within each cluster the YiX are correlated but YX7s from different clusters are independent To do asymptotics we assume that the number of observations per cluster is xed and the number of clusters increases Let us initially also assume that the number of observations per cluster is the same for all clusters and equal to M More generally the sample size in cluster or group k is Mk The total sample size is N 251 Mk or M K in the special case with a constant group size It is useful to introduce some additional notation and give some preliminary results Let Z be the N gtlt K matrix of group or cluster indicators with typical element Zij 1S 339 If LN is the N dimensional vector with all elements equal to one than Z LN gives a K vector with the group sizes as elements With Y is an N dimensional vector Z Z 1Z Y is the K vector with group means Further more keeping in mind that in general for a matrix X XX X 1XY is the projection of Y on X we have that ZZ Z 1Z Y is the N vector with each element equal to the group mean Finally Y 7 ZZ Z 1Z Y Imbens Lecture Notes 4 ARE213 Fall 04 2 IN 7 ZZ Z 1Z Y is the vector of deviations of Y from group means Clustering can be an issue in much more general problems but here we look at a special case the linear model with clustering Suppose Yi 8i Let 8 be the N vector with 2th element equal to 8 Suppose that Eee 0217pINpZZ Since the diagonal elements of ZZ are equal to one this implies that the variance of e is equal to 02 An alternative way to think about this structure is to think of 8 239 77 W where 77 and nu are independent 77 has variance 1 7 p 02 and V varies only between clusters not within and has variance p 02 The standard ols variance for the least squares estimator for Bols XX71XY7 02 X9071 1 Under the variance structure implied by the model the true variance is 02 X X 1 IL p X ZZ XX X 1 i In 2 Imbens Lecture Notes 4 ARE213 Fall 04 3 where L is the number of regressors in X Kloek 1981 considers a simpli cation where all regressors X are xed within the groups and the group sizes are all equal to M In that case the true variance simpli es to 02 X XW 1 M 1 399 3 Moulton suggests that in cases where there are different group sizes using this correction with the average group size may still give a good approximation In particular even if some of the regressors vary within groups it may still give a good approximation to the standard errors of the regressors that are xed within groups To estimate 02 and rst calculate the residuals from the ols regression ignoring an 9 y clustering Y i XX X 1X Y This is not necessarily ef cient but B X X 1X Y is consistent for since the clus tering only affects the variance Then estimate the variance parameter 02 as e N i L The degree of freedoms subtracted here are the number of regressors in X Since the asymp totics is for N going to in nity and L staying xed this does not matter You could also just divide by N Next we need to estimate p To estimate p we rst subtract from each residual the mean residual within the group using the projection matrix 8quot IN 7 ZZ Z 1Z These residuals are uncorrelated within clusters as well as between clusters Next we estimate the variance of this residual 62 N i K Imbens Lecture Notes 4 ARE213 Fall 04 4 Now it is important to subtract the degrees of freedom K for the K means that were subtracted from the residuals Because K goes to in nity with the sample size this will make an important difference Let us consider the expectation of 55 ignoring the difference between 5 and e which will be small in large samples BEE E IN 7 ZZ Z 1Z IN 7 ZZ Z 1Z E IN 7 ZZ Z 1Z ae IN 7 ZZ Z 1Z IN 7 ZZ Z 1Z 02 1 7 p IN p ZZ IN 7 ZZ Z 1Z 02 1 i p IN 7 ZZ Z 1Z The trace of this matrix that is the sum of the diagonal elements of this matrix is equal to 02391 39NK Hence 62 estimates 02 1 7 p so that we can estimate p as Another approach that works well in the Kloek case with constant group sizes and only aggregate regressors is the following First estimate the group means of the outcome 1 M Yk M Yik Then regress 57 X 3 57 This will give the same estimates as ols on the big data set and the correct standard errors More generally diVide the regressors into two parts X lVi Vi where W are the Irnbens Lecture Notes 4 ARE213 Fall 04 5 regressors that vary within the clusters and V1 are those that vary only between clusters7 and let ZZ denote the K vector of cluster dummies Moulton suggests for running the regression K Wi w 6 2 m Then in the second stage run the regression 5k VMX 8167 with the cluster speci c variables Let us see how this works out in practice I took the census data from Angrist and Krueger to estimate a regression with both individual level education and the average of state education levels The idea is to see if education of those around a person affect their earnings as well logearningsi 00 01 educl 02 stateeducl 8139 I estimate the regression using least squares and calculate the standard errors in three ways First the conventional ols standard errors Second the correct standard errors given in 27 and nally the standard errors suggested by Kloek Table 1 ESTIMATES AND STANDARD ERRORS intercept own education average state education estimate 42584 00656 00665 ols 00542 00011 00045 Moulton 01345 00011 00110 Kloek 01105 00022 00091 Note that the Kloek standard errors are pretty good for the average education variable7 but not for the individual level education variable The Moulton correct standard errors Imbens Lecture Notes 4 ARE213 Fall 04 6 are very close to ols for the individual regressor but very different from ols for the aggregate regressor Next I estimate the coef cient on average education using the Moulton Kloek suggestion of rst estimating state dummies This leads to Tabb Z SECOND STAGE KLOEK kiOULTON REGRESON intercept average state education estimate 43510 se 01606 00129 Although this does not give results identical to those of the single regression approach the estimates and standard errors are fairly close and certainly the standard errors are no longer misleading the way they were if you use conventional ols standard errors REFERENCES KLOEK T 1981 OLS Estimation in a Model where a Microvariable is Explained by Aggregates and Contemparoneous Disturbances are Equicorrelated77 Econometricu Vol 49 No 1 205 207 MOULTON B 1990 An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units77 Review of Ecnamics and Statistics 334 338 MOULTON B AND W RANDOLPH 1989 Alternative Tests of the Error Component Model77 Eczmometrica Vol 57 No 3 685 693 Irnbens Lecture Notes 2 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics ORDINARY LEAST SQUARES H VARIANCE ESTIMATION AND THE BOOTSTRAP W 423 1282 In the rst lecture we considered the standard linear model Yi Xi 8i 1 We looked at estimating and functions of under the following assumption Assumption 1 el Xl N N002 Assuming also that the observations are drawn randomly from some population the following distributional result was stated for the least squares estimator MB A 6 Nlt0702E1XX 171gt In fact for this result it is suf cient that 8139 is independent of Xi one does not need normality of the 8139 We estimated the asymptotic variance as 1 N 1 A2 Uml39 N 7 11 where N 1 A 2 A2 UmI N 3X1 11 In this lecture I want to explore alternative ways of estimating the variance7 and relate them to alternative assumptions about the distribution and properties of the residuals First we consider the distribution of B under much weaker assumptions Instead of independence and normality of the e we make the following assumption Imbens Lecture Notes 2 ARE213 Fall 04 2 Assumption 2 Ela X 0 This essentially de nes the true value of to be the best linear predictor s EXX 1EXY Under this assumption and independent sampling we still have normality for the least squares estimator but now with a different variance MB 7 6 N o ElXX lfl EleZXX D ElXXlYl Let the asymptotic variance be denoted by v ElXX lfl El82XX l ElXX lfl This is known as the heteroskedasticity consistent variance or the robust variance To see where this variance comes from write the least squares estimator minus the truth as The variance of the second factor is 1 El 1 N 2 71 N 2 7 2 LltNXz 8igt J i g leiXZXi 7NE8 XX Imbens Lecture Notes 2 ARE213 Fall 04 3 We can estimate the heteroskedasticity consistent consistently as 1 N 1 1 N 1 N 1 A i i i 2 l i V N E N N E where e Y 7 is the estimated residual An alternative method for estimating the variance of least squares estimators and in fact of many other estimators as well is bootstrapping or more generally resampling Consider the following scenario We have a random sample of size N from some dis tribution with cumulative distribution function We are interested in estimating the expected value of X EX through the sample average i The variance ofthe sample average is VX a N EX 7 EX2N How do we estimate the variance There are of course many methods but here we consider a particular approach Suppose we actually knew the cdf In that case we could calculate the variance by replacing all expectations by integrals 11X EX xdFXx 2 0 ltz e uxgt2deltzgt lt3 and thus 1X 1 N Now obviously we do not know the cdf If we did we would not have to estimate the expected value of X in the rst place However we can replace it in these calculations with an estimate based on the empirical distribution function N Fm 21 xN Imbens Lecture Notes 2 ARE213 Fall 04 4 If we use the empirical distribution function instead of the actual distribution function in expressions 2 and 3 the expected value is N A 1 uX xdFXx NE at x i1 The variance is A A N 5 x 7 X2dFXz x 7 i2dFXx 2a i i2N SEAN 71N i1 Hence we would end up estimating the variance of the sample average as W9 S N 1N2 which is pretty close to the standard estimate of S N Now this calculation is more complicated than it need be In practice we do not need the exact bootstrap variance which is not equal to the exact variance of i anyway But if all we are interested in is an approximation we can make the calculation a lot simpler If we want the distribution of some statistic WX such as the sample average according to the empirical distribution function we can draw from the empirical distribution So consider the discrete distribution with support 1 2 xN and probabilities PTltX 1N for all 239 Draw a random sample from this distribution of size N and calculate the statistic Repeat this many times and calculate the average and sample variance of WX over these random samples That will give us by the law of large numbers the population mean and variance of WX according to the empirical distribution function Let us make this a little more speci c Suppose our sample is 1 0 x2 3 and x3 1 The sample average is 43 We are interested in the variance of this sample average The empirical distribution function is a discrete distribution with probability mass function f 13 Imbens Lecture Notes 2 ARE213 Fall 04 5 for x 0 13 and zero elsewhere One random sample from this distribution could be 0 10 The value of the statistic for this sample is wl 13 The next sample could be 033 with a sample average of LU2 2 After doing this many times say M times we can use the statistics w1w2 wM to approximate the expected value and variance of the empirical distribution functions as and We then use this variance estimator as an estimate of the variance of X We can do this is much more complex settings Suppose we are interested in some regression parameters de ned as s EXX 1EXY Given a sample of size N yi il We can resample the pairs y x to get a new sample xljylj v1 where for each I the random variable 739 is an integer between 1 and N with Prlj k 1N and 739 is independent of l for j 31 k Then calculate for each data set the regression estimate A N 71 N m 70 Mimi j1 7391 Imbens Lecture Notes 2 ARE213 Fall 04 6 Given M replications estimate the variance of the original estimate A N 7 N as 1 M A 7 A 7 VW M 2 i 3 31 i 3 11 where 1 M A 3 M I We can do this for many other estimators used in econometrics The key is that the estimator should be relatively easy to calculate because you will have to do this many times to get an estimate of the variance There are also conditions required For example7 if we are interested in the maximum of the support7 bootstrapping is unlikely to work See Efron reading packet7 Efron 19827 Efron and Tibshirani 19937 Davidson and Hinkley 19987 and Hall 1992 for details We now discuss three additional issues First is that of the parametric bootstrap Con sider the regression model Y X6e Instead of bootstrapping the pairs YZ7 Xi7 which is also known as the nonparametric bootstrap7 we can bootstrap the residuals First estimate by least squares Then calculate the resid uals 6i Yi XiB Imbens Lecture Notes 2 ARE213 Fall 04 7 For I 1M resample N residuals by drawing N numbers from the set of integers from 1 to N 7 6 12 N for j 1N Then construct the lth bootstrap sample 1Xj using 571 81 Then proceed as before If the disturbances are really independent of the Xls this works better than the nonparametric bootstrap and in fact can give exact results but if not the nonparametric bootstrap is to be preferred The second issue is an alternative to the bootstrap the jackknife Consider the original example of estimating the population mean The sample average is i and we are interested in its variance The jackknife estimate of the variance calculates for each 239 the estimate based on leaving out the 2th observation 7 1 7 96739 1 Given these N estimates of the mean which clearly average out to i the variance of i is estimated as N A 7 7 7 2 W95 Zmi 95 13971 To see why this works consider the difference 1 1 i 7 x39iixz l NN71 N 7 l The expectation of this difference is obviously zero The variance is Wm i mm H i M 7 M2 i 1 2 1 2 2 2 7 7 m N N2N712U N2U U ltNN712N2gt U Imbens Lecture Notes 2 ARE213 Fall 04 8 Averaging this over all observations gives approximately UZN which is the variance for i The nal concept is that of improved variance estimates Instead of calculating the variance this way7 we could bootstrap other statistics such as t statistics Suppose we wish to get a con dence interval for A simple way to do this is to calculate the sample average i the sample variance 82 and estimate the 95 con dence interval as i 7196 gtlt SxNj 196 gtlt SxN A bootstrapping version works as follows Draw for l 17 7M a boostrap sample of size N from the empirical distribution function7 11 339 17 7N7 l 17 M For each bootstrap sample calculate the sample mean7 variance and t statistic N 7 1 1 N E My j1 ti ii fanWW Calculate the 0025 and 0975 sample quantiles from the M t statistics and denote them by 0025 and btovg75 The 95 con dence interval is the set of all values of x such that btovozs lt i i lt btovg75 This can lead to con dence intervals with better coverage properties See Hall 1992 for details Imbens Lecture Notes 2 ARE213 Fall 04 9 Let us look at some variance estimates based on the various methods discussed here We focus on a regression of log weekly earnings on a constant and years of education The estimated regression function with conventional standard errors is is logearr1gsi 50455 00667educi 00349 00062 The estimate for 02 is 01744 The estimated covariance matrix for B is A 7 22 1 7 00072066 400005212 V U 39X X 00000337 With robust standard errors the estimated regression function is logearr1gsi 50455 00667educi 00353 00063 The two sets of standard errors are obviously very similar To see why7 let us compare the matrix eZXiXllN to 32 XiXllN The rst is l N 01740 23632 2 I 7 ggiX XiN 329614 7 and the second N A 01744 23491 UZ ZXiXlN 324789 i1 It is not always the case that using robust standard errors makes no difference Let us look at the same regression in levels rather than logs The conventional standard errors are in round brackets and the robust standard errors are in square brackets eaEingsi 195348 4 297745educl 332351 23019 332939 29504 Imbens Lecture Notes 2 ARE213 Fall 04 10 with now a slightly bigger difference about 5 difference in standard errors Let us go back to the model in logs and consider bootstrap standard errors We con sider two versions First7 we bootstrap the residuals7 keeping the covariates the same the parametric bootstrap Second7 we do the nonparametric bootstrap The results for the rst are in round brackets7 for the second in square brackets7 both based on 100000 bootstrap replications logearr1gsi 50455 00667educi 00850 00062 00861 00064 Again the standard errors are very similar to those based on the conventional calculations Let us know see how well the various con dence intervals work in practice I carried out the following experiment I took the census data used by Angrist and Krueger in their returns to schooling paper QJE7 1991 This has observations for 329509 individuals on among other things wages and education I ran a linear regression of log wages on a constant and years of education7 with the following result logearr1gsi 49952 00709educi 000045 00003 Next7 I take this sample of 329509 individuals as the population Repeatedly I draw 5000 random samples with replacement7 although this does not matter at all given the size of the population7 of size n for n 207 n 1007 and n 500 In each case I estimate same linear regression and calculate the standard errors in four different ways conventional ols standard errors7 2392 robust ols standard errors7 239239239 parametric bootstrap7 and 23971 nonparametric bootstrap Given a con dence interval for both 90 and 95 con dence interval for the coef cient on years of education I check whether the true value77 0709 is in there I calculate how often that happens over the 5000 replications The results are as Imbens Lecture Notes 2 ARE213 Fall 04 11 follows In the rst row of each part of the table I report the converage probabilities and in the second part of the table the t statistic for the null hypothesis that the actual coverage rate is equal to the nominal one 095 or 090 Table 1 ACTUAL VERSUS NOMINAL COVERAGE RATES 95 con dence interval 90 con dence interval convent robust par boot nonpar boot convent robust par boot nonpar boot n 20 09072 08819 08898 09353 08466 08173 08274 08847 172 273 241 59 156 241 212 45 n 100 09155 09378 09140 09437 08562 08808 08523 08903 123 43 128 23 113 49 123 25 n 500 09284 09502 09274 09510 08693 09051 08681 09060 69 01 72 03 71 12 74 14 With 500 observations the robust and nonparametric bootstrap based intervals are very accurate7 in contrast to the conventional and parametric bootstrap based intervals With smaller sample sizes all intervals deteriorate The conventional intervals end up being supe rior to the robust intervals for small sample sizes Imbens Lecture Notes 2 ARE213 Fall 04 12 ADDITIONAL REFERENCES DAVISON A C AND D V HINKLEY 1998 Boatstmp Methads and their Applica tz39ans Cambridge University Press7 Cambridge EFRON B 1982 The Jackwife the B00tstmp and Other Resamplmg Plans SIAM Philadelphia EFRON B AND TIBSHIRANI 1993 An Introductian t0 the B00tstmp Chapman and Hall7 New York HALL 13 19927 The Bootstrap and Edgeworth Expansions Springer Verlag7 New York Irnbens Lecture Notes 16 ARE213 Fall 04 1 ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics DISCRETE RESPONSE MODELS V RANDOM COEFFICIENT OR MIXED MULTINOMIAL LOGIT MODELS Let us rst recall some Of the properties Of the conditional logit We consider a case with 3 choices dinner at Chez Panisse Olivetti7s or McDonalds Y E 0 OM There is only one characteristic Of the chOice that matters price TO make the comparisons simpler let us suppose that the prices for the rst two are equal and much higher than for the other one PC PO gtgt PM The coef cient on this characteristic in the utility function is lt 0 We leave out the intercept in the utility function for simplicity These would capture taste preferences for the three restaurants SO the utilities for the three choices are UiC 39Pc EiCI UiO 39PO 6107 and UiM 3 39 PM EiM The probability Of dinner at Chez Panisse is my O MU maXUc Uio U e de eXpWPc eXpWPo exp PM39 It follows from the HA independence Of irrelevant alternatives property Of the conditional lOgit that eXp PO 7g PrUO gt lt maXUC Ui0Ui

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.