Adv Econometric Theory
Adv Econometric Theory ECON 5350
Popular in Course
verified elite notetaker
Velma Runte MD
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
Miss Noel Mertz
verified elite notetaker
Popular in Economcs
This 108 page Class Notes was uploaded by Hallie Kuphal on Wednesday October 28, 2015. The Class Notes belongs to ECON 5350 at University of Wyoming taught by David Aadland in Fall. Since its upload, it has received 20 views. For similar materials see /class/230366/econ-5350-university-of-wyoming in Economcs at University of Wyoming.
Reviews for Adv Econometric Theory
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/28/15
ECON 5340 Class Notes Chapter 12 Serial Correlation 1 Introduction In this Chapter7 we focus on the problem of serial correlation aka autocorrelation Within the multiple linear regression model Throughout7 we assume that all other Classical assumptions are satis ed Assume the model is lit mm 5t 1 Where 72 Pi pn72 pnil 2 Eee 03929 yo 2 pnil pn72 quot39 Pi 7 cadet 93 73 is called the autocovariance of the errors and pS 7370 is called the autocorrelation of the errors Serial correlation is a common occurrence in time series data Consider an example of a macroeconomic consumption function Ct 51 52t53Yt t Where i 19507 19857 C is consumption and Y is income A plot of the OLS residuals is attached 2 Time Series Properties of a FirstOrder Autoregression Assume that the error terms follow a rsteorder autoregressive AR1 process 6t P t71 IL where pt N z39z39d07 0i lpl lt 1 and t 17 T Rewrite using repeated substitutions 5t Petil M New mil m 926 m WM T71 pTEtiT ZFO ijij Letting T A 00 gives 00 7 st ZFO 9 m7 which is called an in nite movingeaverage process We can now use the MAltgtO representation to calculate several moments of the distribution for 6 a Mean Em 2 NEW 0 a Variance oo 70 varltetElteEZ vim aip20ip4aim j0 ai1p2p4 aia if Covariances co co Ys Covknyetis E 270p397Lt7j EPOPWPH 930i 1 7 92 9570 Since the mean does not depend on time and the covariance between error terms only depend on their distance between each other and not i we say that e is a weakly covariance stationary process This information can be substituted into 2 to give 1 p pT72 pT71 2 p pT73 pT72 02mm lt4 pT72 pita 1 p pT71 pT72 p 1 3 Ordinary Least Squares We now examine several results related to OLS when autocorrelation is present in the model It is useful to break these results into two parts 7 when the model has no lagged dependent variables and when it does 31 Properties of OLS Estimators 311 No Lagged Dependent Variables b X Xle Y is unbiased and consistent 2 a s is a biased estimator of 0392 52 is a consistent estimator of 0392 52X X 1 is a biased and inconsistent estimate of 121170 02X X 1X QXX X 1 asy b N N 0 Q 1QQ 1 where plimX X Q and plimX ZX Q 312 Lagged Dependent Variables Consider the following simple example 2h 5M4 6t 5 where 6t Petil M 6 and pt is white noise For this example7 the OLS estimator is b 7 22ytyt71 7 g 226tyt71 T i T 39 22 31371 22 31371 In the presence of a lagged dependent variable the OLS estimates of will be biased and inconsistent The intuition is that ywl and e are both directly in uenced by 61 More speci cally ywl is in uenced through equation 5 and e is in uenced through equation The more formal argument using Slutsky s theorem and the fact that y and e are covariance stationary gives plimb 3 plim 22 y37171p11m22 gym 3 1 1 03 M 1 7mm 752 1 7921 7pm 7 91 7 I32 5 lt1 7 pm 39 See Greene 2003 p266 for more details To summarize we know the following about OLS in the presence of lagged dependent variables a b X X 1X Y is biased and inconsistent 52 is a biased and inconsistent estimator of 0392 52X X 1 is a biased and inconsistent estimate of 121170 02X X 1X QXX X 1 32 Autocorrelation Corrected Standard Errors with OLS For autocorrelation Newey and West have developed a consistent estimator ofvarb 02X X 1X QXX X 1 in the same spirit as White s standard error correction with heteroscedasticity The Newey7West estimator is also robust to different forms of autocorrelation ie you do not need to directly specify the autoregressive process for 6 Under the Newey7West estimator we calculate 1 T 2 I 1 L T I I Squot T 21 e x T 211 Znj11 7 L eleti x t j xiiiM where L is chosen large enough that the residual correlations are insigni cant Newey and West show that S will converge in probability to UTZX QX where the e are the OLS residuals 4 Testing for Autocorrelation As in the case of heteroscedasticity the tests below are based on the OLS residuals This makes sense at least asymptotically because b 5 41 Graphical Test As a rst step it is often useful to graph et against t and see if the residuals appear to be random noise See the attached graph of consumption residuals 42 DurbinWatson Test This is the most widely used autocorrelation test It is used to test for rst7order autocorrelation ie e pe1 Mt The null hypothesis is H0 p 0 and the alternative hypothesis is HA p 7E 0 Let s begin by rearranging the test statistic 2325t 5t712 225 25t5t71 5371 21 5 231 5 2231 5 2232 5t5t71 5i 53quot 231 5 21 7 7 1 7 A where T 7 Ztj1 etet T 21 5 is an estimate of the jm7order autocorrelation coef cient and 7139 A7 e egw T e t1 t is a term that goes to zero as T 7gt 00 Therefore in the limit we know 1 21 7 7 1 If H0 is true we would expect 7 1 0 and d 2 If p 1 then we would expect 1 0 If p 71 then we would expect 1 4 A couple of notes a The exact distribution for the 1 statistic depends upon X and as a result a unique set of critical values does not exist Durbin and Watson have however developed lower and upper bounds for the true but unknown critical values If the 1 statistic falls in between the lower and upper bounds for the critical value no conclusion can be reached I Positive autocorrelation ie p gt 0 is much more common than negative autocorrelation so often the test is one7tailed with the critical values at the lower end of the distribution 421 Durbin s h test Not surprisingly7 the DW test does not work in the presence of laggededependent variables because the OLS estimates of are biased and inconsistent Durbin has developed an alternative The test statistic is h TlxTl 7 T5 where 5 is the estimated variance of the coef cient on y1 The statistic h has an asymptotic standard normal distribution and can be used to test the same H0 as in the DW test 43 Lagrange Multiplier Test The disadvantage of the DW test is that it has an inconclusive region and only works for AR1 processes The LM test helps resolve these issues The hypotheses are H0 no autocorrelation HA ARp or MAp The rst step is to run the following regression 5t HWY P15t71 925t72 39 39 39 Pp5tip Vt The test statistic is LM TR2 which has an asymptotic chiesquare distribution with 17 degrees of freedom There is a tradeoff associated with the choice of 17 Choosing too large of a p can cause the test to lose power ie7 lead to Type ll errors Choosing too small of a 17 may miss highereorder autocorrelation 44 BoxPierce Q Test The BoxePierce Q test is similar to the LM test but it does not control for X The test statistic is QT21T which is asymptotically chi7square with 17 degrees of freedom A slight variation of the Box7Pierce Q test was suggested by Ljung and Box 2 P 7 j Q TT 2Zj1 E 45 Gauss Example cont We now perform the three tests for autocorrelation using the US consumption function example See Gauss example 121 for the results 5 Generalized Least Squares 51 Q is Known The ef cient estimator for the model in equations 1 and 2 is 3 X Q lX 1X Z 1Y X P PX1X P PY We need to calculate the transformation matrix P such that the transformed errors are white noise Toward that end7 de ne the lag operator L to be LjX Xj The appropriate transformation is l 7 pLY1 7 pLX l 7pLe 3 Y X p where 311 i 9210 901 i No 61 7 p60 212 i 7111 902 i 7301 62 7 p61 Y X 6 7 ZIT PyT71 HET 7 Pl T71 6T 7 peT1 The problem is how to transform the rst observation since yo and 10 are not observed One solution is to treat 311 and 351 as starting values and multiply the entire rst row by 1 7 p2 so that 61 N N07 7 This implies that the transformation matrix Will be 1p2 0 0 0 7p 1 0 0 P 7 0 0 1 0 0 0 p 1 Technically7 this means that the transformed model has no constant since the transformed constant has a different rst value Also7 remember that the transformation matrix P above is only valid for AR1 autocorrelation processes Highereorder process involve more complicated P 511 Maximum Likelihood Estimation Start by Writing the transformed model as y 9214 3amp5 9115 M The likelihood joint probability function can then be Written as M9 fy1fy2ly1 39 39 39 fyleT71 Where we iteratively use the de nition of a conditional distribution 9017962 W ml The log likelihood function is then T lnL9 111ml 2t21nfylyt1 Assuming that f is the normal pdf7 we get ln L9 igunan 1na 7 2 p 1110 7 p2 s where 9 0i p and the last term in 8 is included to account for the rst observation Assuming that 91 is known7 the ML estimator is the familiar GLS estimator 3 X Q lX 1X Z 1Y and 69716 T 52 Q is Unknown Next we consider two types of estimators when 9 is unknown 7 two step and maximum likelihood estimators 521 TwoStep Estimators Both estimators are asymptotically ef cient with one iteration Further iterations are optional and may help smallesample performance 1 Prais and Winsten This is a type of feasible GLS The estimator works as follows I Step 1 Use 7 1 as an estimate of p Transform the data according to a Step 2 Run OLS on transformed data ie7 XX 1XLY 2 Cochraneeorcutt Same as above but ignore the rst observation 522 Maximum Likelihood Estimators 1 Brute force Maximize 8 using one of many nonlinear optimization algorithms7 such as Newtone Raphson to Concentrated ML Sometimes it is possible to concentrate certain parameters out of the likelihood function Consider a concentrated version of 8 that is only a function of p ie7 and 7 have been concentrated out map gum 1nltailtm 7 21 mm 1110 7 p2 where Z em 9 X 9 19X 1X 9 19Y 10 639 MP 7 XP5P 11 First solve for p by maximizing lnLp and then substitute ML into 9 and 10 This Will provide the complete maximum likelihood estimates of 9 ufmp 3 Hildreth and Lu Use a grid search to nd an estimate of p that maximize 8 for the implied estimates of and 7 53 Gauss Application cont We continue With the consumption function example and contrast three different estimators 7 Prais7Winsten7 Cochraneeorcutt and maximum likelihood See Gauss example 122 for further details GAUSS F 39 Nov T4 T3T243 2003 Consuwm kNTFU c O Readuab 47TT8 Tk Q T966T986 6O Readugk 460 T954 T958 T962 T966 T970 T974 T978 T982 T986 WWW Variable OLS CochraneOrcutt PraisWinsten Maximum Likelihood Intercept 1 207 05 26533 433 5893 4923 8822 6399 Trend 06289 136649 30467 45586 MPC 08862 06739 0842 08137 rho 0794 05237 06387 ECON 5340 Class Notes Chapter 7 Functional Form and Structural Change 1 Introduction Recall that although OLS is considered a linear estimator it does not mean that the relationship between Y and X need be linear In this Chapter we introduce several types of nonlinear models and methods of testing for nonlinearity associated with structural changes 2 Du1nmy Variables 21 Comparing Two Means Consider the regression model yia56diei for 239 ln where 1 is a dummy or binary variable equal to one if the condition is satis ed and zero otherwise The dummy variable can be used to contrast two means Elyilf ydi 0 a x Eyxd 1 a an 6 A simple t test can then be used to test whether the conditional mean of y is different when d 0 as opposed to d l The null hypothesis for this test would be H0 6 0 An example will be presented below 22 Several Categories Some qualitative variables are naturally grouped into discrete categories eg seasons of the year religious af liation race etc It also often useful to arti cially categorize quantitative variables eg income levels education age etc Consider the regression model 21239 04 9625 61D1i 52D2z 53D3i 6239 with a set of four binary variables Du Dgi Dgi and D42 Notice that one of the variables D42 is excluded from the model to avoid the dummyevariable trap Inclusion of D4i would violate the Classical assumption that the X matrix is of full rank This occurs because the four dummy variables will sum to a column of ones which is perfectly collinear with the constant By excluding D 7 the 6 coef cients are interpreted as the change in y when moving from the j category DJi l as compared to 4 category D4i 1 221 Example Testing for seasonality Consider two different models that test for seasonality in Y 21239 ai 51D1i52D2i 63D3i i 1 21239 25 91D1i 92D2i 93D3i 94D4i 6i 2 Each equation avoids the dummyevariable trap 7 the rst by excluding D4i and the second by excluding the constant term a Both equations generate the same goodness of t but the coef cients are interpreted differently For equation 1 the test for seasonality is H06162630 and for equation 2 the test is H091929394 Both of these tests can be performed using the general F test developed in chapter 6 provided the errors are normally distributed It is instructive to relate the coef cients to one another Category Relationship between coef cients U l l H so a616191794 U to l l H so to a6236292794 1331 93a63s6393794 D4l 9404 This highlights the fact that the 6 coef cients are measured as the effect on Y relative to the excluded category 2 3 Interactive Effects The dummy variables introduced in the previous two sections are intercept dummies because they cause parallel shifts in the regression line Often7 we want the slope of the regression line to change with some qualitative variable For example7 one might think that each extra inch of height for women is associated with a different change in weight than it is for men ie7 the slope of the regression of weight on height is different for men than women This can be incorporated using slope dummies or interaction terms Consider the following regression model 212390451i51Di52iDi i 3 The slope of the regression line when D 0 equals l The slope of the regression line when D 1 equals l Q Therefore7 to test whether the slope of the regression line is different when D 0 versus D 1 can be carried out by a simple t test of the null hypothesis H0 2 0 The intercept dummy term 61Di is included so the two regression lines are free to have different intercepts 2 4 Spline Regressions Sometimes it is useful to let the slope of a regression line vary with some threshold value of a continuous explanatory variable This soecalled spline regression is basically a regression line with a kinks Consider the regression yia5i i with the additional restriction that the regression line is continuous everywhere but has a kink at 0 The point 10 is referred to as a knot Note that the regression line is not differentiable at 10 We need to introduce a dummy variable 1 if an gt 10 Di 0 otherwise The new spline regression model is 21239a51i52iDi53Di i 4 with the additional restriction that the line is continuous at an 0 ie7 Oz 110 04 l x0 QxO 53 Substituting the restriction 3 i QxO back into equation 4 and rearranging gives 21239 04 51 52Dii 0 6i The slope of the regression line when xi lt 10 equals l The slope of the regression line when xi gt 10 equals l Q 25 Gauss Examples Earnings Equation Consider the following log earnings equation lnwagei l QAgei 3AgeZ2 4GTadei SMaTTiedi 6MaTTiedi Grade 6239 and a sample of 1000 men taken from the 1988 Current Population Survey The rst program shows how to estimate and graph a regression model with both an intercept and a slope dummy variable see example 71 The second program estimates and graphs a spline regression with knots at Gradei 12 and Gradei 16 The second program also tests to see if the slopes are equal across the different education levels see Gauss example 72 3 Nonlinear Functional Forms Below are some common nonlinear functional forms for regression models Note that although some re gression models may appear to violate Classical assumption 1 ie7 the model is linear in the coef cients and error terrn7 it is sometimes possible to take a linearizing transformation of the model eg7 see the doubleelog form below 31 DoubleLog Consider the following regression model C 5 21239 51 H xjji eXP i F2 which initially appears to be nonlinear in the parameters and the error term However by taking natural logs of both sides we get lnyi l Q ln12i 3 lnac3i k lnack i 6239 which is now obviously linear in and 6 This is called the doubleelog functional form The coefficients can be interpreted as elasticities 811121239 Blnxki k or the percentage change in 31 for a one percent change in arm all else equal 32 SemiLog The semielog functional form refers to either the dependent or some of the independent variables being logged One example is 11121239 51 522 532 z m 6239 321 Notes 1 If for instance we introduced a time trend t as an explanatory variable then 81nyi8t gives the conditional average growth rate of 31239 2 Q for example can be interpreted as the percentage change in 31 for a unit change in 352 33 Polynomial An example of a polynomial functional form is 21239 51 5230M 5335239 54952J953J 6239 The coefficients do not measure the relevant partial derivatives For example 321239 8352 52 2532i 543i which needs to be evaluated at some value for 2 and 3 Higheorder polynomial functional forms can provide excellent goodness of t within the sample however the outeofesample t is often poor Rarely does theory call for anything over secondeorder polynomials 34 Box Cox Transformations The BoxeCox transformation introduces a very exible and general functional form For example the BoxeCox transformation of yia5i i A1 A2 31 717 xi 71 V A1 7a lt A2 62 Unfortunately the parameters A1 and A2 need to be estimated along with a and This produces a nonlinear estimation problem we will discuss this further beginning in chapter 9 Here are some possible outcomes Parameter combination Model type A1 A2 1 linear model A1 0 A2 1 semielog model A1 A2 0 doubleelog model A1 1 A2 71 reciprocal model 4 Structural Change In this section we are concerned with testing for structural change in the regression model We start by assuming that we know the location of the possible break point Then later we present a more exible procedure for identifying possible unknown break points 41 Tests with Known Break Points We begin by partitioning the data into two parts Denote the sample sizes of each part as m and n2 where n m n2 The regression model is Y X 0 6 1 1 1 1 YX e Y2 0 X2 52 62 where l and 2 are k X 1 vectors We can estimate this model using least squares 71 XiX1 0 Xin 0 X2 X2 X519 1 Next7 we test for structural change using our standard F test7 where the hypotheses are H0 1 R5q 0r5152 HA R 7 q or z and R 1k l 7 1k and q 0 The standard F statistic is kawu 52 F 411 Notes This test is often referred to as a Chow test after Gregory Chow 1960 H 2 The test can be performed using the unrestricted and restricted residuals as well It is also possible to test for a break in only the slopes or only the intercepts 03 Sometimes the error variances may not be equal on each side of the break In this case7 the appropriate q test is of the Wald type W 31 32WV1 V271 1 32 which is asymptotically distributed chiesquare with k degrees of freedom V1 and V2 refer to the asymptotic varianceecovariance matrices of l and Q respectively 42 A Test with Unknown Break Points The Chow test above requires that the user knows the break point with certainty Often structural change is more subtle and gradual7 with an unknown break point The CUSUM test below is designed for this type of situation Begin by de ning the recursive residuals t 17 T as 5t M xibt71 Where 11 XgilX1 1X 71Y1 and X1 is the t 7 1 X k matrix of regressors including data from 17 t 7 1 The variance of the recursive residuals is vare Eee vare 7 xb1 7 0392 i02X 71Xt7171t 021 xX 1X1 1ac Now de ne 5t 1 lt1x2ltX1Xt1gt m Which under constant parameters is distributed w N iidN0a392 The CUSUM test is based on the cumulative sum of w t wr Wt Zrk1 Where A2 7 t w 7132 d 7 i UJT 0 7 ElmH1 m an w Zrk1 m The CUSUM statistic can then be graphed over time With the con dence bands connecting the points k i a T 7 and T i 3a T 7 For a 95 99 con dence interval7 a 0948 a 1143 421 Notes H It is also possible to graph the recursive residuals themselves or the sum of the squared recursive residuals CUSUMSQ test N Other tests are available for nding unknown structural break points7 such as the Hansen 1992 test and Andrews 1993 test ECON 5340 Class Notes Review of Statistical Inference 1 Samples and Sampling Distributions De nition We say X17 Xn is a random sample of size n if each Xi is drawn independently from the same Pdfy 95239 Notes 1 Xi1 is sometimes said to be an independent and identically distributed iid random sample 2 9 is a vector of parameters eg7 9 702 3 Three data types time series7 cross sectional7 and panel 11 Descriptive Statistics De nition A function of one or more random variables that does not depend on any unknown parameters is a statistic 1 Measures of Central Tendency Mean X Z1 Xi Median Let Y17 Yn be the reordering of X17 Xn from smallest to largest Yi is called the it order statistic of X17 Xn The median is de ned as ln12 Mode Most frequent Xi 2 Measures of Dispersion 52 21109 7 X amp2 Z1Xr X 3 Measures of Association I Covariance 5W Z1Xi 7 7 Y I Correlation my saw575 Where 71 3 my 3 l 12 Sampling Distribution De nition A statistic eg7 Y1 X and saw is a random variable With a distribution called a sampling distribution 2 then X is a random variable Example If X17 Xn are a random sample With mean u and variance 03975 With a sampling distribution that has mean u and variance TiTL Proof 1 E0 e22 Em any 7 2 VaTX 11Va7 ZL1 Xi na i i172 1 See Gauss example 81 for the sampling distributions of X Where Xi N N07 1 With n 37 107 100 2 Finite Sample Estimation De nition An estimator is a rule for using the sample data to form either a point ie7 single value or interval ie7 range of values estimate 21 Estimation Criterion l Unbiasedness An estimator is unbiased if 9 Examples 0 X is an unbiased estimator of p a The statistic Z X 1000 if coin is heads Z X 7 1000 if coin is tails is an unbiased estimator of p 2 Ef cient Unbiasedness An unbiased estimator 91 is ef cient if there is no such that 1211719 lt var 1 239 7E 1 Example continued vaTX TiTL wag 7 05EX 1000 7 p 05EX 7 1000 7 m2 3 MeanASquare Error The meanesquare error of is MSE EW 7 92 mm bias 2 Notes 1 Given some regularity conditions the 12009 will never be smaller than the CramereRao lower bound 2 A minimum variance unbiased estimator lVlVUE is an ef cient unbiased estimator among all linear and nonlinear estimators 3 A minimum variance linear unbiased estimator or sometimes called best linear unbiased estimator BLUE is an ef cient estimator among all linear estimators q Attaining the CramereRao lower bound gt ef ciency However ef ciency at attaining the Cramere Rao lower bound 01 A linear estimator is one that is a linear function of the data 22 52 versus 72 Which is a better estimator ls 52 unbiased E52 E 2109 7 XV ni1EZ1lXiLX 7 2 71 EZ1X2quot272EZ1X1 ILX7LEZ1X72 nil 2211 iii 727LEXLZ1Xi 7nEX 7 02 nil iZ1EX27L2 TLEX7FL2 1 2 a 7 2 n71na39 0 Yes 52 is an unbiased estimator of 0392 ls 6392 unbiased No7 6392 is not an unbiased estimator of 0392 However7 the bias clearly shrinks as n grows a What is the variance of 52 2 2 74 2 vars MSES nil What is the variance of 6392 7 1 7 1 2 var6 2 varn 52 var52 lt var52 n n Which estimator has a smaller MSE MSEamp2 var6 2 biasamp22 n 7 1 2 1 2 var52 7702 n n 2 4 7 n 7 1 2039 04 7 n n 7 l n2 271 7 la394 204 0394 2 7 n n n2 lt MSEs2 Therefore7 6392 has a smaller MSE than 52 3 LargeSample Distribution Theory Large7sample distribution theory is important because the small7sample distribution of random variables are often unknown 31 Convergence in Probability De nition Let Xn be a random variable whose distribution depends on n We say Xn converges in probability to c or plim Xn c if limnnoo Pr Xn 7 cl gt e 0 for every 6 gt 0 If Xn has mean pn and variance 0 with limits 0 and 07 then Xn convergence in mean square to c To calculate plim X77 we will use Markov s inequality PruX gt 6 lt for all 6 gt 0 and uX 2 0 If we let uX X 7 m2 and let 6 16202 we get 0392 1 W Mix 44 2 kc p PrX 7 m2 2 kW 3 which is Chebyshev s inequality Notes 1 9 is a consistent estimator of 9 iff plim 9 2 Convergence in mean square gt convergence in probability Convergence in probability at convergence in mean square 3 Slutsky s Theorem lfgX is a continuous function not in n plimgap gplimx For example7 but plimX72l plimXn2 2 4 Jensen s lnequality lf 9Xn is concave in X77 gEXn 2 5 Using Slutsky s theorem where plim Xn c and plim Xn 17 a plimXn Yn C d b plimXnYn Cd c plimXnYn cd7 1 7E 0 Example 1 Consider the pdf lilifac0 TL Find what7 if anything7 Xn converges to in probability and mean square Answer Begin by nding the mean and variance of Xn EXn 017n1p Warn 7 1217n712 ln7102 a Convergence in mean square lim p 1 and lim 0200 ngtoo ngtoo ms Therefore X7 gt 1 a Convergence in probability lim Pr Xn 7 11gt e 1 ngtoo for any reasonably small 6 Therefore X7 3 1 However lim P4X 0 lim 1 7 l 1 ngtoo ngtoo n gt lim Pr an 2 e 0 for every 6 gt 0 ngtoo Therefore X7 5 0 32 Convergence in Distribution De nition Xn is said to converge in distribution to if limnnoo at all continuity points ofFac Notes 1 Converge in distribution X7 71gt X 2 is the limiting distribution of Xn 3 The mean and variance of are called the limiting mean and limiting variance Rules When X7 3L X and Y7 L C r a XnYn EL cX X Y L X c and XnYn L Xc b If 9Xn is a continuous function 9Xn 71gt gX c If plimXn 7 Yn 0 then Y7 71gt X provided a limiting distribution for Y7 exists Example The pdf of the nth order statistic from the random sample X1 Xn Where fx1970ltS90lt9lt00 and zero elsewhere is nyn 97M 91 70ltyS9 and zero elsewhere Find the limiting distribution Answer First7 we need to nd y n2 z n y n 0 9n dz 305 0ltylt9 17 y 2 9 Now the limiting distribution is lim Gny07 0ltylt9 ngtoo 17 y 2 9 Therefore7 is a degenerate pdf with all the mass at Y 9 33 Central Limit Theorem Question What is the limiting distribution of Xn Answer A spike at p Consider a stabilizing transformation of Xn Y Xn 7 p De nition Let X17 Xn denote a random sample from any distribution with nite mean u and nite variance 0392 Then M i N071 039 Gauss example S2 shows the CLT in action 34 Asymptotic Distributions De nition An asymptotic distribution is used to approximate a true and possibly unknown niteesample distribution Notes 1 The mean and variance of an asymptotic distribution are called the asymptotic mean and asymptotic variance 2 9 is said to be asymptotically ef cient if asyvar is less than or equal to the asymptotic variance of any other consistent estimator 3 Occasionally you Will hear the term asymptotically unbiased limnnoo 9 Example 1 Consider the random variable EX l L N0 1 a39 We say that Xn aiy NLa392n Example 2 Find the asymptotic distribution of Zn n1 7 Yn7 Where Yn is the nth order statistic from the uniformm 1 random sample X17 Xn Answer Start by nding the limiting distribution of Yn GM 07 0 S yn lt 1 17 yn 1 Therefore7 Yn has a degenerate limiting pdf With all the mass at Yn 1 The pdf for Zn can be found by the change of variable technique hnzn 1 7 zznykl7 0 lt z lt n and zero elsewhere The cdf for Zn is 0zlt0 n1 7wn 1dw 1 7 1 7 27nn 7 0 S 2 lt n 0 122n and its limiting distribution is lim 0 zlt0 ngtoo lie Z0 zltoo Therefore Zn aiy exponential 1 4 Maximum Likelihood Estimation Example 1 Consider the random sample X1 05X2 20 X3 100X4 15X5 70 generated from an exponential distribution What is the maximum likelihood ML estimator of Answer Begin by forming the likelihood function L9 L f1727374755 5 5 1 1 5 1 121 HM 1 121 E eXPFJEi E eXPZi1 9025 where 9 l It is often more convenient to work with the monotonic transformation lnL9 11195 7 9m 12 13 x4 x5 51119 7 219 The ML estimator of 9 9 is the value of 9 that maximizes L9 or In L9 Now we calculate d1L9 5 A nd g 7210gt9521gt 42 Next we check the secondeorder condition to ensure that 9 521 is indeed a maximum d21nL9 192 759 2 lt 0 Therefore 3 42 is the maximum likelihood estimator of EX See Gauss example S3 for further details Notes 1 The information number is 9 7E w E hwy 2 The information matrix is 9 7E w E mug amalgam where 9 91779k is a k X 1 column vector The CramereRao lower bound7 I9 1 is the lowest value the variance of an unbiased estimator 9 can attain7 given certain regularity conditions are satis ed Example 2 Find the ML estimators for L and 0392 from a normal distribution Let X17Xn be a random sample from NLa392 LUWQ H lt2w02r05exp 7021 7 M Taking natural logs 1 n ln 101702 705nln2739ra392 7 202 E 211239 7 102 a First take partial derivatives with respect to L and 0392 81nL9 7 1 n V 39 81nL9 n 1 n V 2 31 g 2110 W aa2 w 24 W 821nL9 n 821nL9 1 n 821n L9 n 1 n 2 3 12 77 W 24 2119 va g 24 W Now set rst derivatives equal to zero and solve for the ML estimators 81nL9 1 n 1 7 a pzi1xr0X 81nL9 7 n 1 n 27 A27 1 n 7 2 802 T zili7 70 7211xi7X39 CramereRao Lower Bound 9 p 02 The information matrix is 1 045 7 74242217 0 finger hives Lorry 0 and the CRLB is a Question Are X 52 and 6392 ef cient estimators Answer Recall7 EX p E52 0392 and E6 2 77102 7 121270 02n gt X is a minimum variance linear unbiased estimator 7 var 52 204 n 7 1 gt 52 ma or ma not be unbiased efficient y y 7 6392 7L 0392 and asyvar6392 204n gt 6392 is asymptotically efficient Properties of ML Estimators under regularity 1 ML 1 9 2 ML a N9I 19 3 ML achieves the CRLB and is therefore asymptotically efficient lnvariance ie7 y 99 gt AyML 99ML q Notes The asymptotic covariance matrix of ML is often hard or impossible to estimate Three possible asymptotically equivalent estimators are 1 I 1 ML Which is often not feasible 1 A 71 2 7 BangaL Ql Which is sometimes quite complicated 3 BHHH estimator 89 at n Blnf 239 Blnf 1239 7 21 5 of Moments De nition Let X17 Xn be a random sample from fac917 quot790 Let Mk be the k sample moment and EXk be the km population uncentered moment The method of moments estimator for 9 91 9T is obtained by solving the 7 equations Mi 239 17 7 11 Notes 1 Method of moments may also use E X 7 Wk or other functions yk9 of the unknown parameters 2 Method of moments estimators are NOT typically ef cient 3 Method of moments estimators are typically consistent by virtue of Slutsky s theorem Example Suppose X17 Xn is a random sample from a gammaa distribution The likelihood function is 7 n LU ma a n 1902 1n 1 exp 7 21 is dif cult to evaluate without using numerical methods Consider instead the following two moments M1 X a M2 7 M12 1amp2 an Using these two sample moment equations to solve for 64 and g gives 64 X26392 and 3 amp2X 51 Variance of the Method of Moments Estimator Let the sample moments be 1 n gk Zi1gkXi for k TL and g 91 gk have asymptotic covariance matrix 1 l n 7 V Em Zi1l9jXi gj9kXi gkl l for jk 17K Now let G be the matrix Q91 Q9 91 891 89 8919 E E E 891 89 89k 2 2 2 891 89 89k Consider the rst7order Taylor approximation to gk yk9 around 9 a 2 79 GW 7 9 7 r9 7 9 2 flaw 7 79 Using the CLT7 we know ZWTG71VG71I Example continued Let 9 047W g1 X 7 7 92 7 X 7027 A A OK GBg 5 89 2 0 2 and 1 g 5072979 V 1 1 2 TL 507491792 Wg2 Then the estimated asyvar G 1VG 1 6 Interval Estimation De nition An interval estimate is found by algebraic manipulation of a pivotal quantity 7 a quantity based on the point estimate and the parameter 7 subject to a desired con dence coef cient Example 1 Find the 90 interval estimate for p from a random NLa392 sample n 25 with X 50 and 0392 100 and ii 52 100 Answer i We know that Z X 7 pa39 N N07 1 This implies that Pr7z S O 7 pa39 S 2 090 gt Pr7z S 0550 7 p S 2 090 gt Pr7z 7 25 S 705L S 2 7 25 090 gt Pr2z 50 2 p 2 50 7 22 From Table 1 in Greene7 we know that z 1645 Therefore7 the 90 con dence interval for p is 4671 5329 ii We know that Z V50 7 ps N tn 7 1 From Table 2 in Greene7 we know that z 1711 Therefore7 the 90 con dence interval for p is 46587 5342 Example 2 Find the con dence coef cient for the Pr1 S 0392 S 2 if 52 15 from a normally distributed random sample n 25 Answer To solve this problem7 we need to recognize that n 7 120392 N X2n 7 1 and use Table 3 in Greene Pr1 lt02 lt2 Pr 7 Hypothesis Testing There are two principal areas of statistical inference 1 parameter estimation already covered and 2 hypothesis testing General for Classical Neyman7Pearson Hypothesis Testing 1 State the null H0 9 90 and alternative hypotheses 2 Determine the size of the critical region 03 State the decision rule q Calculate the statistic o1 Make a decision ie7 reject or fail to reject the null 6 Consider possible errors 7 1 Concepts 1 Type 1 Error Reject true null hypothesis The probability of a type I error is called the size of the test 2 Type ll Error Fail to reject false hypothesis One minus the probability of a type ll error is called the power of the test 3 Power Function The power function yields the probability that the sample point falls in the critical region7 given that the true value of 9 is not 90 4 Certain Best Tests Assuming a simple alternative7 C is the best critical region of size a for testing H0 9 9 versus H1 9 9 if for every region A such that PruX17 Xn E A a PruX1Xn E ClHO a PruX17 Xn E ClHll 2 PruX17 Xn E AlHll 5 Uniformly Most Powerful Tests Assuming a composite alternative7 a test is uniformly most powerful if C is the best critical region of size a for testing each simple hypothesis in H1 In other words7 the power function is no less than for any other test of equal size 72 Tests Based on Con dence Intervals Consider the following test 1 Reject H0 9 90 if 90 falls outside 9L79U 2 Accept H1 9 7E 90 ie7 fail to reject H0 if 90 falls inside 9L7 9U Example 1 i continued from Section 6 Consider the following test H0 p 48 H1 p 7E 48 The 90 con dence interval is 46717 5329 The decision is to Reject H0 if 4671 2 no 2 5329 Fail to reject H0 if 4671 lt 0 lt 5329 Therefore7 we fail to reject the hypothesis H0 p 48 73 Likelihood Ratio Wald and Lagrange Multiplier Tests The likelihood ratio LR7 Wald and Lagrange multiplier LM tests are asymptotically equivalent tests that may produce different results in small samples When no other information exists7 you can choose the test that is the easiest to compute See the attached gure for a graphical representation of each test 731 Likelihood Ratio Test Let 9 and LR be the restricted unrestricted estimate and likelihood value7 respectively Let the null and alternative hypotheses be H0 09 q H1 09 7E q The likelihood ratio is de ned as A igZU Where 0 S A S 1 The LR statistic is then LR 72 111A 13 X20 Where 7 is the number of restrictions imposed 732 Wald Test In the LR test7 one needs to calculate LU and LR An advantage of the Wald test is that 93 does not need to be calculated The Wald statistic is W 7 C90 7 q vaTC9v 7 q 1c9v 7 q as X2T If 09 is normally distributed7 then W is a quadratic form in a normal vector and is distributed chiesquare for all sample sizes Notes H C 8c989 Because is often nonlinear7 varc 7 q can be approximated by varc 7 q 2 CvaT C Where 2 The power may be low because the alternative does not appear in computations 3 Wald test is not invariant to the form of the restriction eg7 H0 9192 0 versus H0 91 092 q Wald test does not rely on strong distributional assumptions like the LR or LM 733 Lagrange Multiplier Test This test is based on the restricted model Derivation Begin by forming the Lagrangian The rst7order conditions are At 93 If H0 09 q is correct7 g 0 lnL9 lnL9 A c9 7 q BlnLquot 7 81nL9 809 7 89 89 89 A 0 alnLquot 8A 09 7 q 0 WLMHE 89R This fact is used as motivation for LM 851499983 13 r 734 An Example Using the LR W and LM Tests Consider an arti cial random sample n 100 from an exponential 01 distribution likelihood function is The log 1nL9 119 9 21 9 Where 9 l The rsteorder condition and unrestricted ML estimator is 81nL97n n i A 771 89 7572i1xi70gt9U7X The secondeorder condition is 821nL9 n 7 lt 0 892 92 so 9U is indeed a maximum Now consider testing the following hypothesis H0 9 75 H1 9 7E 75 so that 93 75 l Likelihood Ratio Test The likelihood values are A A100 A n LU 9U eXP 9U 221 A A100 A n LR 9R exp79R 221 and the LR statistic is LR 72 magQ to Wald Test The Wald statistic is 7 2 7 2 W 9UA 75 9U var9U 7 75 var9U A 1 A Where var9U f 19U 7 8119 02 u 3 Lagrange Multiplier Test The LM statistic is 2 LM F I 9 R Where A n n A A2 93 a 7 221 an and 93 nQR Finally the critical region is de ned by the chiesquare critical value With 7 1 degrees of freedom and a 95 con dence level Using Table 3 in Greene the critical value is 384 Therefore I If LR W or LM is greater than 384 we reject the null H0 9 75 in favor of the alternative I If LR W or LM is less than or equal to 384 we fail to reject the null H0 9 75 See Gauss example S4 for further details 1n Luv 1 1n Lmldo cm I 11quot Luwd I In 7 7 quotWHHHH kaolihuud lane 4 In Lm IuLh u Lagrange nIuIlInIIur 1 luluum In UH Ix dc I lmlul I39M lhun Im lhrcc Ipprnudm In lcxlmg Ilh39 hypullms th39 luamr39 w vim ruspccl In IL A In LllIII and Ihc con xlcd in Likelihood nIliu IN II Ilu Imnmnn 11m 7 I u ILL nnpmuu i1 huuld um 1cm In I Lxru rnluullnu III 1hr upllkchlnuul mIIleuII n l l39L39hHLK c has Illa Im nu IIIc murmur In I In 1 whcw I l mu illur ml the hkchlumd func IinII vII lhr uln39ueruIIch uluc I All I h th talus If he likelihuud I39uncliun u llw Nncml I xxlllmlu ECON 5340 Class Notes Review of Probability and Distribution Theory 1 Random Variables De nition Let 0 represent an element of the sample space C of a random experiment7 c E C A random variable is a oneetoeone function X Xc An outcome of X is denoted ac Example Single Coin Toss CCTCH Xc0ifcT Xc 1 ifcH 11 Probability Distribution Function pdf Two types 1 Discrete pdf A function such that 2 07 Vac and 75 l 2 Continuous pdf A function such that 2 07 Vac and 117 facdac 1 Notes 1 PrX at in the discrete case7 and PrX at 0 in the continuous case 2 Pra g X g b ff facdac 12 Cumulative Distribution Function cdf Two types 1 Discrete cdf A function such that 2X95 2 Continuous cdf A function such that ff ftdt Notes 1 Fb 7 1712 1 ftdt 7 f ftdt Where b 2 a 2 0 g F x g 1 3 limindx F 0 4 limnooF l 5 lfacgtyFx2Fy 2 Mathematical Expectations Consider the continuous case only 21 Mean De nition The mean or expected value ofgX is given by Notes 1 EX p ff acfacdac is called the mean of X or the rst moment of the distribution 2 is a linear operator Let gX a bX EgX abxfxdxafxdxbxfxdac Ea EbX a bEX 3 Other measures of central tendency median7 mode 2 2 Variance De nition The variance ofgX is given by Var9Xl E9X E9Xl2l 9 Egl2fxd Notes 1 Let gX X We have W00 7 027 ltx7ugt2fltxdx as as 2facdac 7 2p acfacdac pZ facdac EX2 7 2pEX p2 EX2 7 2 2 VaTX is NOT a linear operator Let 91 a bX Var9Xl 7 gltx 7gltm2fmdx 7 m 7 www 7 Warm 7 12202 as as 3 a39 is called the standard deviation of X 23 Other Moments The measure EXT is called the Tm moment of the distribution While EX 7 MT is called the 7 central moment of the distribution 7 Central Moment Measure 1 EX 7 w 7 o 2 EX 7 m2 0392 variance dispersion 3 EX 7 m3 skewness asymmetry 4 EX 7 m4 kurtosis tail thickness Moment Generating Function MGF The MGF uniquely determines a pdf When it exists and is given by 00 MO EetX emfacdac 700 The Tm moment of a distribution is given by 24 Chebyshev s Inequality De nition Let X be a random variable With 0392 lt 00 For any k gt 07 l Prp7ka X pkaZl7 Chebyshev s inequality is used to calculate upper and lower bounds on a random variable without having to know the exact distribution Example Let X N where fac7 7 ltxlt N3 and zero elsewhere If we let k 327 we get Cheb Pr732 X 32217 590 5 1 7 322 7 32 1 1 Exact Mia2 g X g 32 13 mm mHam 7 fa2 2 0866 3 Speci c Probability Distributions 3 1 Normal pdf If X has a normal distribution7 then 96 0 12 exp where 700 lt at lt 0 ln shortehand notation7 X N NW 0392 Notes 1 The normal pdf is symmetric 2 Z X 7 pa39 N N07 l is called a standardized random variable and is called the standard normal distribution 3 Linear transformations of normal random variables are normal If Y a bX where X N NLa3927 then Y N Na by 11202 32 Chisquare pdf lf Z 239 17 n are independently distributed N07 1 random variables7 n Y 21 Z N XZW where EY n and VaTY 2n Exercise Find the MGF for Y Z2 and use it to derive the mean and variance Answer We begin by calculating the MGF for Z2 where t lt 05 M i EetZ1 enlwzwz 27770 56070 02le 27770 5670 5072le Now using the method of substitution7 let w 1 7 202 so that dw 1 7 2t12dz Now making the substitution produces 00 Mt 1 7 2042 27rr056 05w1dw 1 7 242 700 To calculate the mean7 we take the rst derivative of MO and evaluate at t 0 dMt IL dt lt012i732lt01 To calculate the variance7 we take the second derivative of MU7 evaluate at t 07 and subtract 2 PM i 72 T l o 2 31 gilltw 2 2 t 33 F pdf lf X1 and X2 are independently distributed X20 random variables7 X1n1 F N F XZn2 7117712 34 Student s t pdf lf Z N N07 1 and X N X2n are independent7 T iNtm Xn 35 Lognormal pdf If X N NW 0392 then Y eXpX has the distribution Lex W2 fy7 ay pl 05 a l for y 2 0 Sometimes this is Written as y N LNL0392 The mean and variance of Y are EY eXpL 022 and VaTY exp2p 0392expa392 7 1 Notes 1 If Y1 N LNL1U and Y2 N LNL2a39g are independent random variables7 then Y1Y2 LNUH aw 03 36 Gamma pdf The gamma distribution is given by me W1 epom for 0 S at lt 00 The mean and variance are EX a and VaTX 0amp2 Notes 1 T a 10 ya 1 exp7ydy is called the gamma function7 a gt 0 2 T a a 7 1 if a is a positive integer 3 Greene sets 1 and a R 4 When a 17 you get the exponential pdf 5 When a n2 and 2 you get the chiesquare pdf Example Gamma distributions are sometimes used to model waiting times Let W be the waiting time until death for a human Let W N Gammaa L 80 so that the expected waiting time until death is 80 years Note W N Exponential Find the PrW S 30 PrW l 30 1 1 30 30 7 80d 7 80d m80explt w w 80 0 W w w 7 8 10 780exp7w80 290 7 7 exp738 7 exp0 7 1 7 0687 7 0313 37 Beta pdf lf X1 and X2 are independently distributed Gamma random variables then Y1 X1 X2 and Y2 X1Y1 are independently distributed The marginal distribution f2y2 of fyl7 312 is called the beta pdf 7 I a gm 7 WW 21ch 1 7ltycgt 5 1lt1cgt where 0 S y S c The mean and variance are EY COLOz and VaTY CZOL Od 1 38 Logistic pdf The logistic distribution is at N1 1 AWN where 700 lt at lt 00 and A05 1 exp7ac 1 The mean and variance are EX 0 and VaTX 7723 A useful property of the logistic distribution is that the cdf has a closedeform solution 39 Cauchy pdf lf X1 and X2 are independently distributed N07 17 then YX1X2fym where 700 lt y lt 00 The mean and the variance of the Cauchy pdf do not exist because the tails are too thick 310 Binomial pdf The distribution for ac successes in n trials is bno 7 at 04751 7 047L775 as where at 07 17 n and 0 S a S 1 The mean and variance of the binomial distribution are EX na and VaTX na1 7 a The combinatorial formula for the number of ways to choose as objects from a 9 set n distinct objects is 311 Poisson pdf The Poisson pdf is often used to model the number of changes in a xed interval The Poisson pdf is exp7x ac at where at 07 17 and A gt 0 The mean and variance are EX VaTX A Example Let the probability of Spina Bi da in one child be 11000 Let X be the number of Spina Bi da cases in 3000 births With stochastic independence and X N POiSSOTl 30001000 37 nd the probability that there are ve cases in 3000 births m 5 0101 PrX 5 w 4 Distributions of Functions of Random Variables Let X17 X2Xn have joint pdf fac1acn What is the distribution of Y gX1X2Xn To answer this question7 we will use the changeeofevariable technique Change of Variable Technique Let X1 and X2 have joint pdf 117352 Let Y1 91X17X2 and Y2 92X17X2 be the transformed random variables If A is the set where f gt 07 then let B be the set de ned by the oneetoeone transformation of A to B Then 90117212 fh111171127h22117212 39 lJl Where 2117212 6 37 1 h1y1y27 2 122117212 and Example Let X1 and X2 be uniformly distributed on 0 S Xi S 1 The random sample X17 X2 is jointly distributed 9617902 f11f22 1 over 0 S 11 12 S 1 and zero elsewhere Find the joint distribution of Y1 X1 X2 and Y2 X1 7 X2 Answer We know that 1 h1y1y2 05211 212 and x2 h2y1y2 05y1 7 12 We also know that 05 05 J 705 05 705 Therefore7 90117212 f1h11117112f2h12117212 39 lJl 05 where 311312 6 B and zero elsewhere 5 Joint Distributions 51 Joint pdfs and cdfs A joint pdf for X1 and X2 gives PrX1 17X2 x2 faclac2 A proper joint pdf will have the property ff ff faclac2dac2dac1 1 and ash x2 2 0 for all 11 and x2 A joint cdf for X1 and X2 is PrX1 g achxg g 12 Fac1ac2 f f ft1t2dt2dt1 5 2 Marginal Distributions The marginal pdf of X1 is found by integrating over all X2 00 101001 f17 2d2 700 and likewise for X2 Example Let X1 and X2 have joint pdf faclac227 0lt 11 lt 12 lt 1 and zero elsewhere Is this a proper pdf 1 1 1 1 2dac2dac1 7 2121351251 dam 7 21 7 ac dacl 7 2111 7 as 3510 7 2 7 1 7 1 0 11 0 0 So yes7 this is a proper pdf The marginal distribution for X1 is 1 f1ac1 7 2de 7 2x2 11 7 21 7 x1 0 lt 951 lt 1 1 and zero elsewhere The marginal distribution for X2 is as f2ac2 del 2x1 i0 23027 0 lt 12 lt 1 0 and zero elsewhere Notes 1 Two random variables are stochastically independent if and only if f1aclf2ac2 facl x2 2 In our example7 X1 and X2 are not independent because f1aclf2ac2 4x2 7 411352 7E 2 facl7 x2 3 Moments eg7 means and variances in joint distributions are calculated using marginal densities eg7 EX1 faclf1ac1dacl 53 Covariance and Correlation De nition The covariance between X and Y is COUX7Y 7 E X 7 MY 7 1 7 EXY 7 mm De nition The correlation coef cient between X and Y removes the dependence on the unit of measurement X Y p COTTXY Cov 7 way where 71 S p S 1 Notes 1 If X and Y are independent7 then covXY 0 MM EXY 7 My xyfltxfltydydx 7 My 7 fxdxyfyydy 7 mm 7 mm 7 mm 7 0 2 However7 covXY 0 does not imply stochastic independence Consider the following joint distri7 bution table y frx 71 0 1 71 0 0 13 13 as 0 0 13 0 13 1 0 0 13 13 My 0 13 23 where pi 07 py 23 and 6012093 7 Z 06 7 MW 7 mm 11 711313 072313 11313 7 0 However7 X and Y are not independent because for as y 07 0 we have fz0fy0 19 ii f070 13 6 Conditional Distributions De nition The conditional pdf for X given Y is may fMy My Notes 1 If X and Y are independent7 and fy 2 The conditional mean is EX Y MW 3 The conditional variance is VIZTX Y 7 pf y2fx ydac 7 Multivariate Distributions Let X X17 Xn be a n X 1 column vector of random variables The mean and variance of X is IL EX 17 mm and 711 03912 011 721 722 727 2 varm EltX 7 mX 7 m ml 0712 01m Notes 1 Let W A BX Then EW A BEX N The variance of W is VarW 7 7 EBX 7 BEXBX 7 BEX 7 EBX 7 EXX 7 EX B 7 BEB 71 Multivariate Normal Distributions Let X X17 Xn N NW 2 The form of the multivariate normal pdf is me 7 Mn2 W1 exp705r 7 WE W 7 0 See Gauss example Pl for an example of a bivariate normal density function 72 Quadratic Form in a Normal Vector If X 7 p is a normal vector7 then the quadratic form Q X 7 p 2 1X 7 p N X2n Proof The moment generating function of Q is MU EWQ 39 39 39 270W W1 expw 7 M E W 7 u 7 05m 7 M E W 7 widen m 7 727rrn2 212exp705x 7 pl 7 20243 7 mm dag Next7 multiply and divide by l 7 20 f f2739r 2 21 7 2z12exp705x 7 p l 7 202719 7 mph713 Mm 1 7 22W 7 1 7 20 zlt 05 The numerator is the integral of a multivariate normal random distribution With variance El 7 2t and so it equals one MU then simpli es to the MGF for a X2n random variable 73 A Couple of Important Theorems 1 Let X N N0I and A2 A ie7 A is idempotent X AX N X20 Where the rank of A is 7 2 Let X N N0I X AX and X BX are stochastically independent iff A B 0 ECON 5340 Class Notes Chapter 11 Heteroscedasticity 1 Introduction In this chapter7 we focus on the problem of heteroscedasticity within the multiple linear regression model Throughout7 we assume that all other classical assumptions are satis ed Assume the model is Y X e 1 where a 0 0 0 0 0 0 0 Eee 029 2 0 0 01 0 0 0 0 72 Heteroscedasticity is a common occurrence in crossesectional data It can also occur in time series data eg7 AutoRegressive Conditional Heteroscedasticity7 ARCH 2 Ordinary Least Squares We now examine several results related to OLS when heteroscedasticity is present in the model 21 Summary of Findings 1 b X X 1X Y is unbiased and consistent 2 121170 02X X 1X QXX X 1 is the correct formula 3 121170 02X X 1 is the incorrect formula 4 b 13 N 0 Q 1QQ 1 where plim 1X X Q and plim 1X 9X Q 22 Estimating Varb Let s examine the conditions under which 52X X 1 will provide a reasonable estimate of varb 02X X 1X QXX X 1 1 If plimb then plim52 0392 2 lffl is unrelated to X then the difference between X X 1 and X X 1X ZXX X 1 will approach zero as TL H 0 Therefore if you have a sufficient amount of data and you can be sure the heteroscedasticity is unrelated to X then OLS although not most ef cient will provide a reliable estimate of 23 White s Estimator of Varb Most of the time the heteroscedasticity will be related to X and so if we continue to use OLS we need a good estimate of varb 02X X 1X QXX X 1 White 1980 suggests that if we don t know the form of 9 we can still nd a consistent estimate of X QX that is 1 n So E 221 522135 will converge in probability to 0 X QX where the 6 are the OLS residuals Therefore White s asymptotic estimate of varb is estasyvarb X X 1nSOX X 1 Davidson and McKinnon have shown that White s estimator can be unreliable in small samples and have suggested appropriate modi cations 2 4 Gauss Example In this application we are interested in measuring the degree of technical inef ciency of rice farmers in the Ivory Coast The data are both crossesectional N 154 farmers and time series T 3 years The model is ln1TE aX Z7e where TE represents technical ef ciency ie ratio of actual production to the ef cient level from a production frontier X is a set of managerial variables eg years of experience gender age education etc and Z is a set of exogenous variables ie erosion slope weed density pests region dummies year dummies etc The main point of the exercise is to see whether technical inefficiency is related to the managerial characteristics of the rice farmers once we have accounted for aspects of the production process outside their control See Gauss example 111 for further details 3 Testing for Heteroscedasticity All the tests below are based on the OLS residuals This makes sense at least asymptotically because big 3 1 Graphical Test As a rst step it may be useful to graph 52 or 6 against any variable suspected of being related to the heteroscedasticity If you are unsure which variable is responsible you can plot against Xib which is simply a weghted sum of all X 32 White s Test The advantage of White s test for heteroscedasticity and similarly White s estimator of 1211701 is that you do not need to know the speci c form of Q The null hypothesis is H0 039 0392 W and the alternative is that the null is false The motivation for the test is that if the null is true 52X X 1 and 52X X 1X ZXX X 1 are both consistent estimators of 121270 while if the null is false the two estimates will diverge The test procedure is Regress 52 on X X where the Kronecker operator indicates all crosses and squares of X The test statistic is W nR2 aiy X2P 7 1 where P is the number of regressors in X 29 X including the constant The disadvantage of the test is that since it is so general it can easily detect other sorts of misspeci cations other than heteroscedasticity Also the test in nonconstructive in the sense that once heteroscedasticity is found the test does not provide guidance in how to nd an optimal estimator 33 GoldfeldQuandt Test The GoldfeldeQuandt test addresses the disadvantage of White s test It is a more powerful test that assumes the sample can be divided into two groups 7 one with a low error variance and the other with a high error variance The trick is to nd the variable on which to sort the data The hypotheses are The test procedure is H Order the observations in ascending order according to the size of the error variances 2 Omit 7 central observations often 7 03 Run two separate regressions 7 rst n 7 7 2 observations and last n 7 7 2 observations 4 Form the statistic F e lelnl 7 ke 262n2 7 N Fn1 7 k7n2 7 k which requires that e N N07 03929 01 Reject or fail to reject the null hypothesis 34 BreuschPagan Test One drawback of the GoldfeldeQuandt test is that you need to choose only one variable related to the heteroscedasticity Often there are many candidates The BreuschePagan test allows you to choose a vector7 27239 of variables causing the heteroscedasticity The hypotheses are H0 03922 027Vi HA 03922 02fa0 a zi The test statistic is g ZZ Z 1Z g my 2 LM 2 X0371 where gi 62262 7 1 and Z 17 If Z X 29 X from White s test7 then the two tests are algebraically equivalent 35 Gauss Example cont We now perform the three tests for heteroscedasticity using the Ivory Coast riceefarming data The Goldfelde Quandt test will not work because after sorting7 the smaller X matrix is not of full rank White s test will not work either because X 29 X involves too many variables See Gauss example 112 for the results from the BreuschePagan test 4 Generalized Least Squares 41 Q is Known Assume that the varianceecovariance matrix of the errors is known apart from the scalar 0392 and is given by We learned that the ef cient estimator is g X Q lX 1X Z 1Y X P PXfWX P PY where PQP I and 1171 0 0 0 10 0 P 2 0 0 10 GLS can be interpreted as quotweighted least squaresquot because the transformation matrix P weights every observation by the inverse of its error standard deviation Therefore7 observations with the most inherent uncertainty get the smallest weight Example Let the model be Yz39 5Xz39 5239 where 039 0392sz2 The GLS estimator is therefore A ZiXini 1 Y X zexz 72239 Z or the average yex ratio 42 Q is Unknown There are too many 039 elements to estimate with a sample size equal to n Therefore7 we need to restrict 039 so that it is a function of a smaller number of parameters eg7 03922 0392sz2 or 039 fa zi 421 TwoStep Estimation Since 9 is unknown7 we need to estimate it Let s refer to FGLS XI ileinl AEl as the feasible GLS estimator Consider the following twoestep procedure for calculating FGLS 1 Estimate the regression model 522 fa zi vi Use 64 to obtain the estimates 639 f6zi 2 Calculate FGLS Provided 64 is a consistent estimate of a in step 17 then FGLS will be asymptotically ef cient at step 2 It may be possible to iterate steps 1 and 2 further7 but nothing is gained asymptotically Sometimes it may be necessary to transform the regression model in step 1 eg7 take natural logs of 039 eXpa 422 Maximum Likelihood Estimation Write the heteroscedasticity generally as 039 02fioz The normal log likelihood function is 2 7 if 2 7 n V i 21239 i 9025 lnL a a 7 2111277 lna39 05 221111 ma 02 fia The rsteorder conditions are BlnL 7 1 n 2mm 7 Qxiacg 7 n xiei 7 83 W 221 ma 0 221 ma 0 3 Bln L n 1 n 622 2 1 n 622 372 w Zia m 0 j quot Zia Ta 4 mm 7 1 n ga 1 n 629M 1 8a 7 72 221 ma W 221 Ha 7 0 5 where gia 8fioz8a Notice that equation gives the normal equation for GLS Solving equations 3 through 5 jointly for 9 03927 a will produce the maximum likelihood estimates of the model This can be accomplished in a couple of different ways 1 Brute force Use one of the nonlinear optimization algorithms eg NewtoneRaphson to maximize the likelihood function 2 Oberhofer and Kmenta twoestep estimator Start with a consistent estimator of Use that estimate to obtain estimates of 0392 and a lterate back and forth until convergence The efficient asymptotic ML variance is given by the negative inverse of the information matrix A 821nL 1 as 1200 9 7 7 11 ML l 8989 l and is given as equation 11721 in Greene If this matrix is not working well in the nonlinear optimization algorithm or is not invertible one could simply use the negative inverse Hessian without expectations or the outer product of the gradients OPG 43 Model Based 39I st for Heteroscedasticity As a nal note rather than use the OLS residuals to test for heteroscedasticity one could test the null hypothesis H0 a 0 using one of the classical asymptotic tests For example the likelihood ratio test would use LR 72lnLR 7 lnLU 13 X2J where L3 is the likelihood value with homoscedasticity imposed ie a 0 and LU is the likelihood value allowing for heteroscedasticity ie a 7E 0 44 Gauss Application cont Using the Ivory Coast riceefarming example we now calculate feasible GLS and ML estimates of and y The heteroscedasticity is assumed to follow 02 0392 expozi where 27239 1 Tegionli TegionQi See Gauss example 113 for further details ECON 5340 Class Notes Chapter 6 Inference and Prediction 1 Introduction Our primary goal in this chapter is develop a systematic method for testing restrictions which allow us to distinguish between nested models Nested models are such that one model can be written as a special case of the other For example if one wished to test yi l Ql i 6239 versus 31 l 6239 this would be considered a nested hypothesis test because the second model is nested Q 0 within the rst model If one wished to test the rst model against 31 l QZi 6239 this would be considered a nonenested test because one model cannot be written as a special case of the other Nonenested hypotheses test will be covered later Next we proceed with two alternative yet equivalent approaches to testing linear restrictions 2 Testing Linear Restrictions 21 Unrestricted Approach This is called the quotunrestricted approachquot because we will only estimate the unrestricted model ie without imposing the restriction in H0 We will represent a set of J linear testable restrictions on Y X e as R q where R is a J X k restriction matrix with full row rank and q is a J X 1 vector of constants Here are some examples H0 81 0 R 10 ohm 5 51 52 kllXC I H0 2 31 R 01100hxk 5 151 52 5k11gtltk q 1 aHO 2 3 k0 0 1 0 0 0 0 1 0 0 0 0 1 211 Motivating the Test Statistic Assume that e N N07 03921 What is the sampling distribution of Rb ERb 7 R vab ERb 7 ERbRb 7 ERb MM 7 ma 7 WR RvarbR 02RX X 1R Rb NR 02RX X 1R If H0 is true Rb 7 q 7 m N0 02RX X 1R From Greene7 Theorem B117 Rb Q02RX X 1R 1Rb 7 q m varm 1m X2J Replace 0392 with 52 F 7 imsy0101710 is the ratio of two independent chiesquared random variables Therefore7 we know that 1 7 qgt39ltRltX39Xgt1R gt1ltRb 7 a F 52 N FJn 7 k 1 212 Examples Continued 39 H01 510 R127 R X X 1R 1 R127 b 11 11 b2 F q 1 15 11 F1nik S2J 52 12117011 or taking square roots I b b 1 1 1 tn 7 k t Jammfsdbl sm H0 52531 7 l 7 b2b3 1 b2b3 1 NF ynik Rb Q RX X 1R 1Rb 1 52522 2523 533 F 52 or taking square roots 2 b3 7 1 t t 7 k 56b2 113 n 213 A Few Notes 1 It would be simple to test these last two restrictions jointly J 2 1 0 0 0 R andq 0 1 1 0 0 1 2 It is possible to calculate joint con dence regions analagous to con dence intervals 3 Recall7 if the errors are not normal but 6 N 07 03921 then the Wald7 Lagrange multiplier and likelihood ratio statistics are asymptotically distributed chiesquared and can be used to test linear restrictions 2 2 Restricted Approach 1n the quotrestricted approachquot7 we estimate the model With the restriction imposed and then compare the change in the goodnesseofe t of the model With and Without the restriction imposed Turn now to the problem of a restricted regression 221 Restricted Regression The problem is to m in Y 7 X Y 7 X subject to R q Next form the Lagrangian Lquot 7 Y 7 X Y 7 X 2A R 7 q The rst7order conditions are 8L 8b 7 72X Y 2X Xb 2R A 7 0 8L 2 Rb 7 0 8A q Written in matrix form gives 71 X X R b X Y b X X R X Y 7 7 7 2 R 0 A q A R 0 q Solving 2 for b and A is straightforward although tedious using Greene A774 This produces b 7 b 7 X X 1R RX X 1R 1Rb 7 q 3 So as expected the restricted and unrestricted estimates of are equal if the restriction is exactly true in the sample data Otherwise I and b will be different note that because we are dealing with a random sample this will generally be true even if the restriction holds in the population ie R q 222 Test Based on Loss of Fit Let 5 Y 7 X1 and 5 Y 7 X1 be the unrestricted and restricted residuals respectively We can relate them according to e Y 7 X1 7 Y7Xb7Xb7b 7 e7Xb 7b Now taking the inner product of 5 ie sum of squared restricted residuals gives 55 e e b 7 b X Xb 7 1 Substituting in and simplifying gives 56 7 e e 7 Rb 7 q RX X 1R 1Rb 7 q 4 Finally substituting 4 into 1 gives I I 7 J kmwp ww E EV 71 i k This shows that the statistic used to test R q can be interpreted as the relative loss in t caused by imposing the restriction If the restriction is true the loss in t should be small the F statistic will be small and you will fail to reject the null If the restriction is false the loss in t will be large the F statistic will be large and you will reject the null Alternatively if one divides through by the SST then the F statistic can be written in terms of the unrestricted and restricted R2s 1 y M y lieLhy MOyDJ R2 RZVJ F e39ey39MOwn 7 k lt1 7 WW 7 k N W k39 5 Note that equations 4 and 5 are used to produce alternative interpretations of the hypothesis test Generally researchers continue to use equation 1 which only requires calculation of the unrestricted esti7 mator 23 Gauss Example This example uses cross7sectional data from the 1998 Current Population Survey The data are for n 1000 males The regression equation is lnwagei l Qagei 3ozge2 4g7 adei 5marriedi gum39om 6239 We wish to test two hypotheses The rst is whether schooling has an impact on wages let s hope it does and the second is whether these ve variables jointly explain a signi cant amount of the variation in wages across the 1000 males 1 Schooling hypothesis H0 4 0 versus HA 4 7E 0 In our notation we set R 0 0 0 1 0 0 and q 0 At the 5 signi cance level the critical F value with 1 degree of freedom in the numerator and 994 degrees of freedom in the denominator is 384 N Overall signi cance H0 2 3 4 S e 0 versus the HA H0 is false We set R 0 l 5 and q 0 the latter being a 5 X 1 vector The critical F value with 5 degrees of freedom in the numerator and 994 degrees of freedom in the denominator is 221 See Gauss example 61 to calculate the statistics and complete the tests 3 Testing Nonlinear Restrictions Testing nonlinear restrictions requires a fairly simple modi cation of the linear testing procedure Consider testing the null hypothesis H0 C q where C is a possibly nonlinear function in The rst step is to linearize the restriction using a rsteorder Taylor series approximation around 03 2 C 0 03 7 where CMY 80 8 The variance of is then approximately varC3 2 var3 This implies that we can form the Waldelike statistic 4 Prediction In this section we will use our regression model to predict values of the dependent variable given observations on the regressors X0 These observations may either be inesample or outeofesample The predicted value is Y0 X012 The Gauss7Markov theorem implies that Y0 is the best linear unbiased estimator ofEY0lX0 Xo The 0 prediction error7 e Y0 7 Y0 has the following variance vareo vaTYO7l70vaTX0 60 7X0b vaTX0 7b60 XovarbX0 02 X002X X 1X 021 021 X X X 1X 6 Equation 6 highlights that there are two sources of uncertainty associated with predicting Y0 7 the rst is associated with the random error term 60 and the second is associated with estimating the population parameters Note also that X7 as opposed to X07 appears inside equation This occurs because all n observations are used in calculating the least squares estimator 117 not just the n0 observations in X0 It is often desirable to present a con dence interval around the predicted value In this case7 the 1 7 A 96 100 percent con dence interval for Y0 is Y0 i tA25660 where the standard error of 50 is given by square root of 67 with 0392 replaced by 52 After predictions are made7 we often wish to evaluate their accuracy Two common measures are l O A Root mean square error RMSE 0 E 1Yio Y2quot and n 2 RMSE 0 2 K0 Theil s U statistic where the latter measure removes the units of measurement and hence potential scaling problems 4 1 Gauss Example In this example7 we will predict the wage of a particular male and calculate the 95 con dence interval for the true wage The male is 34 years old7 has 22 years of schooling7 is married and is not a union member ie7 X0 173473427 227 17 See Gauss example 62 for more details ECON 5340 Class Notes Chapter 9 Nonlinear Regression Models and Nonlinear Optimization 1 Introduction In this Chapter7 we examine regression models that are nonlinear in the parameters and give a brief overview of methods to estimate such models 2 Nonlinear Regression Models The general form of the nonlinear regression model is 21239 hi7 7 i7 1 which is more commonly written in a form with an additive error term 21239 h m 6i 2 Below are two examples 1 haci 6239 ora expei This is an intrinsically linear model because by taking natural logae rithms7 we get a model that is linear in the parameters7 lnyi e 1 lnac1i 21n12i 62 This can be estimated with standard linear procedures such as OLS 2 Man ora Since the error term in 2 is additive7 there is no transformation that will produce a linear model This is an intrinsically nonlinear model ie7 the relevant rsteorder conditions are nonlinear in the parameters Below we consider two methods for estimating such a model 7 linearizing the underlying regression model and nonlinear optimization of the objective function 21 Linearized Regression Model and the GaussNewton Algorithm Consider a rsteorder Taylor series approximation of the regression model around g 21239 M90235 i 3 1002350 9i7505 50 i where 935i7 0 ah351l w07 78h8 kl o39 Collecting terms and rearranging gives Y0 Xo 60 where W E Y7hX o9X75050 X0 3 9X7 039 The matrix X0 is called the pseudoregressor matrix Note also that 60 will include highereorder approxima tion errors 211 GaussNewton Algorithm Given an initial value for e we can estimate with the following iterative LS procedure bt1 X0btX0btl71X0btY0btl 7 X bn X btl 1X bn X btbn 5 bt X bn X bn 1X bn e bt WtAtgt The iterations continue until the difference where W 2X0b X0b 1 A 1 and g 2X0b e This is called the GausseNewton algorithm Interpretations for between n1 and b is suf ciently small W A and 9 will be given below A consistent estimator of 0392 is 1 n 52 22121239 M90235 212 Properties of the NLS Estimator Only asymptotic results are available for this estimator Assuming that the pseudoregessors are wellebehaved ie7 plimX0 X0 Q07 a nite positive de nite matrix7 then we can apply the CLT to show that asy 0392 b N Nl Q 1l7 n where the estimate of 0 1Q0 1 is 52X0 X0 1 213 Notes 1 Depending on the initial value7 1107 the GausseNewton algorithm can lead to a local as opposed to global minimum or head off on a divergent path 2 The standard R2 formula may produce a goodnesseofe t value outside the interval 07 l 03 Extensions of the J test are available that allow one to test nonlinear versus linear models r Hypothesis testing is only valid asymptotically 2 2 Hypothesis Testing Consider testing the hypothesis H0 R q Below are four tests that are asymptotically equivalent 221 Asymptotic F test Begin by letting 51 Y7hX7 b Y7hX7 11 be the sum of square residuals evaluated at the unrestricted NLS estimate Also7 let 51 be the corresponding measure evaluated at the restricted estimate The standard F formula gives Under the null hypothesis7 JF aiy X2J 222 Wald Test The nonlinear counterpart to the Wald statistic introduced in chapter 5 is W Rb 7 q CVC 1Rb 7 q 13 X29 Where 1 62X0 X0 1 C 8Rb8b and 1amp2 Sbn 223 Likelihood Ratio Test Assume e N N07 03921 The likelihood ratio statistic is LR 72lnL 71nL 13 X2J Where lnL and lnL are the unrestricted and restricted log likelihood values respectively 224 Lagrange Multiplier Test The LM statistic is based solely on the restricted model Occasionally7 by imposing the restriction R q7 it may change an intrinsically nonlinear model into an intrinsically linear one The LM statistic is eLXngE XEl lXE m S 1 n em 7 5 LM ENE 2J SUMn nTSS n M Where 5 Y 7 hXb7 XE gXb7 IN is the estimated coef cient of 5 on X and E2 is the coef cient of determination of 5 on X2 3 Brief Overview of Nonlinear Optimization Techniques An alternative method for estimating the parameters of equation 2 is to apply nonlinear optimization techniques directly to the rsteorder conditions Consider the NLS problem of minimizing The rsteorder conditions produce 8hx 82 521 2 Elle 7 Mama 17 07 3 Which is generally nonlinear in the parameters and does not have a nice closedeform7 analytical solution The methods outlined below can be used to solve the set of equations 31 Introduction Consider the function f9 a 119 092 The rsteorder condition for minimization is d 9 b2c90gt9 ibQc This is considered a linear optimization problem even though the objective function is nonlinear in the parameters Alternatively consider the objective fuction f9 a 1192 cln9 The rsteorder condition for minimization is This is considered a nonlinear optimization problem Here is a general outline of how to solve the nonlinear optimization problem Let 9 be the parameter vector7 A the directional vector and A the step length Procedure 1 Specify 90 and A0 2 Determine A 3 Compute 9H1 9 AtAt 4 Convergence criterion satis ed a Yes 7 Exit No 7 Update t t 17 compute At and return to 2 There are two general types of nonlinear optimization algorithms 7 those that do not involve derivatives and those that do 32 DerivativeFree Methods Derivativeefree algorithms are used When the number of parameters are small7 analytical derivatives are difficult to calculate or seed values are needed for other algorithms 1 Grid search This is a trialeandeerror method that is typically not feasible for more than two parame ters It can be a useful means to nd starting values for other algorithms 2 Direct search methods Using the iterative algorithm 9H1 9 AA a search is performed in m directions A1 Am A is chosen to ensure that G91 gt G9 3 Other methods Simplex algorithm and simulated annealing are examples of other derivativeefree methods 33 Gradient Methods The goal is to choose a directional vector A to go uphill for a max and an appropriate step length A Too big a step may overshoot a max and too small a step may be inef cient See Figures 53 and 54 attached With this in mind consider choosing A such that the objective function increases ie G91 gt G9 The relevant derivative is dG9 AA A dA 9 where g dG91d91 If we let A Wg where W is a positive de nite matrix then we know that dG9 AA d Qthgt 2 0 As a result almost all algorithms take the general form 9t1 9t ttht where A is the step length W is a weighting matrix and g is the gradient The GausseNewton algorithm above could be written in this general form Here are examples of some other algorithms 1 Steepest Ascent WIsoAg An optimal line search produces A ig gg Hg Therefore the algorithm is 9H1 9 7 g gg Hgg This method has the drawbacks that a it can be slow to converge especially on long narrow ridges and b H can be dif cult to calculate 2 Newton s Method aka NewtoneRaphson Newton s method can be motivated by taking a Taylor series approximation around 90 of the gradient and setting equal to zero This gives g9 2 990 H909 7 90 Rearranging7 produces 9 90 7 H 190g90 Therefore7 W 7H 1 and A l a Very popular and works well in many settings Hessian can be dif cult to calculate or positive de nite if far from optimum Newton s method will reach the optimum in one step if G9 is quadratic 3 Quadratic Hill Climbing W 7H9 7 0404 where a gt 0 is chosen to ensure that W is positive de nite 4 Davidson7Fletcher7Powell DFP W1 W E7 where E is a positive de nite matrix Et MA Wt9t 9t7lgt 9t7lWti 7 MA MA 97971 97971 Wtgt7gt71 Notice that no second derivatives ie7 H9 are required Choose W0 I 5 Method of Scoring W 7E5271 6 BHHH or Outer Product of the Gradients W g9g9 1 is an estimate of 414m 471 W is always positive de nite Only requires rst derviatives Notes 1 Nonlinear optimization with constraints There are several options such as forming a Lagrangian function7 substituting the constraint directly into the objective and imposing arbitrary penalties into the objective function 2 Assessing convergence The ususal choice of convergence criterion is G or 9 Sometimes these methods can be sensitive to the scaling of the function Belsley suggests using g H lg as the criterion which removes the units of measurement 3 The biggest problem in nonlinear optimization is making sure the solution is a global as opposed to local optimum The above methods work well for globally concave convex functions 34 Examples of Newton s Method Here are two numerical examples of Newton s method 1 Example 1 A sample of data n 20 was generated from the intrinsically nonlinear regression model M 91 92062 9313 6m where 91 92 1 The objective is to minimize the function 91792 2 h91792y M91792 where h9192 91 9212 9313 See Figure B and Table B3 attached to see how Newton s method performs for three different initial values 2 Example 2 The objective is to minimize aw 93 7 392 5 The gradient and Hessian are given by 99 39276939972 119 69766971 4 Substituting these into Newton s algorithm gives 39 9 72 9t19ti Ma 72 7 29 7 1 39 Now consider two different starting values 90 15 and 90 05 Starting Value 90 15 Starting Value 90 05 9171579 27225 9170573075577025 92 7 225 7 W 7 2025 92 7 7025 7 W 7 70475 93 7 2025 7 W 7 20003 93 7 70475 7 W 7 70874 This examples highlights the fact that at least for objective functions that are not globally concave or convex the choice of starting values is an important aspect of nonlinear optimization Gauss Example In this example we are going to estimate the parameters of an intrinsically nonlinear Cobb7Douglas produc7 tion function Q 7 1Lf1Kfa t using GaussNewton and Newton s method see Gauss example 91 and test for constant returns to scale see Gauss example 92 For Gauss7Newton the relevant gradient vector is gX 0 7 sz K551nLt gLf1Kfa1nKt L 1Kf For Newton s method the relevant gradient and Hessian matrices are 90 7 7221elt8h b FIGURE SJ Mention rm NA r 1 1 A 05 1 1 5 2 A 0 FIGURE 54 Line Starch lN uvwl MW x y I z x Flgurr quot1 Luu r umulml uh ch Muchm mm um u ITERAIIONS a mi NFWIUN ranmm 97 mnwu mgv ngV 2 ECON 5340 Class Notes Math Checklist Below are topics that you need to understand in order to be successful in this class They include topics from matrix algebra7 probability and statistics Please take some time to review these topics during the rst few weeks of class 1 Matrix Algebra Matrix multiplication Transpose rules for matrices lnner and outer products of a vector ldempotent matrices Matrix rank Orthogonality Matrix inversion Determinants Algebra of partitioned matrices Kronecker products Characteristic equations7 roots and vectors a Trace of a matrix Quadratic forms Matrix de niteness Matrix differentiation 21 22 Probability Finite Samples Properties of probability density functions pdfs Properties of cumulative density functions cdfs Expectation operator Moment generating function and moments of a distribution Normal7 chiesquared7 F and t distributions Change of variable technique Joint and marginal distributions Covariances and correlations Independence Conditional distributions Distributions of quadratic forms Large Samples Convergence in probability and mean square Convergence in distribution Consistency Slutsky s theorem Limiting distributions Central Limit Theorem Asymptotic distributions 3 Statistics Random sample Sample moments ie7 rnean7 variance7 skewness7 kurtosis Covariances and correlations Sampling distributions Unbiasedness Ef ciency Meanesquared error CramereRao lower bound Classical hypothesis testing Type I and H errors Size and power of a test Con dence intervals ECON 5340 Class Notes Chapter 2 The Classical Multiple Linear Regression Model 1 Introduction The multiple linear regression model can be Written as yi 1xi1 2xi2m kik i i177 1 Where 3 is the dependent variable acij is the j explanatory variable and 6239 is the error term There are k explanatory variables There are n observations at is often set equal to one 239 l n so l is an intercept The s are coef cients or parameters to be estimated Using matrices model 1 can be Written more compactly as Y X e 2 Where Y is an n X 1 column vector of dependent variables X is an n X k matrix of explanatory variables is a k X 1 column vector of parameters and e is an n X 1 column vector of errors For example 211 11 12 xlk 51 212 21 22 39 39 39 2k 52 Y X g yn M1 M2 39 39 39 71k 5k 2 DataGenerating Assumptions There are six dataegenerating assumptions associated With model 1 or 2 1 Linearity The model must take the form of 1 so that it is linear in the parameters and the error term a The model need not be linear in the Xs or Ys as However it must be transformable into a form such I For example after taking natural logs y exp 1xi 6239 can be transformed into lnyi lxi 6239 Which is in the form of 1 With an appropriate rede ning of y 22 Full Rank The columns of X need to be linearly independent and there must be at least k observations In other words RankX k 23 MeanZero Errors Conditional on X the error terms are mean zero a In other words EHX 0 a This implies that EYlX X ie the regression of Y on X is the conditional mean X I Including a constant term Will guarantee this assumption holds Assume yi O 1 xi 6239 has the property p 7E 0 By rede ning the constant term 6 6i 7 p and intercept g o p the model can be Written With meanezero errors yi g lm 6 2 4 Spherical Disturbances The error terms should display homoscedasticity ie error variances are constant across observations and no autocorrelation ie errors are uncorrelated across observations Homoscedasticity Varei 0392 for all 239 1n No autocorrelation CoveieJ 0 for all 239 7E j Var61 CO U 2 61 Covkn 61 001 61 62 Var62 001 61 62 I Matrix representation Vare Eee I I 0217 001 61 en CO U 2 en Varen 25 Nonstochastic Regressors The explanatory variables are xed in repeated sampling a This is often true in scienti c experiments a This is generally not true in the social sciences We can relax the assumption7 so long as corrXe 0 2 6 Normality The error terms Will follow a normal distribution This is supported by the Central Limit Theorem and is necessary for inference It is not necessary7 however7 to show the optimal properties of least squares estimators ECON 5340 Class Notes Chapter 3 Least Squares 1 Introduction We are interested in estimating the population parameters from the regression equation Y X e The population values are 0392 and 6 Their sample counterparts are 117 6392 and e The sample counterpart to the error term 6 is called the residual e The two are related according to YX eXbe 2 Least Squares 2 1 The Problem We want to estimate the parameter by choosing a tting criterion that makes the sample regression line as close as possible to the data points Our criterion is min e e Y 7 Xb Y 7 Xb Y Y 7 b X Y 7 Y Xb b X Xb 1 The criterion is minimized by choosing 1 Taking the vector derivative with respect to b and setting equal to zero gives 86 s 72X Y 2X Xb 0 2 8b Provided X X is nonsingular guaranteed by Classical assumption two7 we solve to get b X X 1X Y 3 The second7order condition gives 826 6 2X X 8b8b which satis es the condition for a minimum since X X is a positive7de nite matrix if X is of full rank Greene A7114 22 Example Violent Crimes and the Prison Population The data are take from WWWojpusdojgovbjscvicthtm for the 50 states and the District of Columbia during the year 1990 Let X violent crimes100000 people and Y prisoners10000 people Assume the population regression equation is 21239 l 2m 6239 The objective is to choose 1 and 112 to minimize 51 51 221 522 Zi1yi b1 i 1129002 Which gives the two rsteorder conditions M 2iyi b1 b2i0 4 bii 22iyi b1 b2ii0 5 Equations 4 and 5 can be arranged to produce the normal equations Ziyi b1b22ii b12iib22ig Finally7 solving for 111 and 112 gives 21 2771225 El n 27W 9 b2 2m 7 is This is the same answer you get via matrix algebra b bl7 bg X X 1X Y for appropriately de ned X and Y See Gauss example 31 for more details 23 Algebra of Least Squares Consider the normal equations X Y7XbX e0 6 Three interesting results from equation 6 assuming a constant term 1 First column of X implies ei 0 Positive and negative residuals exactly cancel out 2 ei 0 implies that E T7 7 X1 07 Which implies 7 XI The regression hyperplane passes through the sample mean 3 Y e Xb e b X e 0 The tted values are orthogonal to the residuals 24 Partitioned and Partial Regressions Let a regression have two sets of explanatory variables7 X1 and X27 such that Y Xl l X2 2 e The normal equations can be written in partitioned form as XiXl XiXQ b1 XiY X Xl X Xg b2 X Y Solving for 112 gives 172 X50 X1XiX171XiX2l71lX 1 X1XiX171XiYl lX M1X2l71lX M1Yl7 where M1 I 7 X1X1 X1 1Xi can be interpreted as a residualmaker matrix7 ie7 premultiplying any conformable matrix by M1 will generate the residuals associated with a regression on X1 Note the following De ne 531 MlY De ne 21 M1X2 M1 is symmetric and idempotent ie7 M1 Ml Ml M1M1 This implies that we can write b2 X5M1X2 1X M1Y 52152ll 1l5 215Y1l This is the result that makes multiple regression analysis so powerful for applied economics We can interpret 112 as the impact of X2 on Y while quotpartialing or netting out the effect of X1 The results for 111 are analogous 25 Goodness of Fit and Analysis of Variance We will now assess how well the regression model ts the data Begin by writing the sample regression equation Y X12 e in deviation from its mean form using the following matrix 1 1 1 1 z z 1 11 1 n n n M0 i 1 7 722 7 1 1 1 1 n n where 239 is the unit column vector We can then write Y7Y7M YM XbeM Xbe 7 Premultiplying 7 by itself transposed and noting that M0 is a symmetric and idempotent matrix gives Y 7 Y Y 7 Y Y MOY b X MOXb e e or SST SSRSSE where the three terms stand for total regression and error sum of squares respectively A natural measure of goodness of t is 7 SSR 7 SSE SST SST39 R2 A few notes about R2 0 g R2 g 1 a By adding additional explanatory variables you can never make R2 smaller An alternative measure is R2 1 7 W the adjusted R2 This measure adds a penalty for additional explanatory variables Be cautious interpreting R2 when no constant is included a Value of R2 will depend on the type of data eg crosssectional data tends to produce low R2s and time series data often produces high R2s Comparing R2s requires comparable dependent variables ECON 5340 Class Notes Chapter 4 Finite Sample Properties of the LS Estimator 1 Introduction We now wish to compare the properties of the LS estimator to other potential estimators This analysis will be exact and will hold for any sample size 2 Gauss Markov Theorem The GaussMarkov Theorem states that provided the Classical assumptions hold the ordinary least squares OLS estimator b is the minimum variance estimator among all linear unbiased estimators Some times it is said that the OLS estimator is BLUE Best Linear Unbiased Estimator First we need to show that b is unbiased We know b X X 1X Y X X 1X X e 5 X X 1X e Taking expectations gives Eb 5 X X 1X Ee 3 because and X are not random variables recall X is assumed to be xed in repeated sampling Therefore I is an unbiased estimator of Second we need to show that b has the smallest variance among all linear unbiased estimators Begin by noting that b is a linear estimator because it is linear in Y or alternatively 6 Now consider all other possible linear unbiased estimators b0 CY where C is a xed k X n matrix For 110 to be unbiased it must be that CX I because Eb0 ECX Ce CX The variance of b is The variance of be is The question is now whether X X 1 121170 12117010 or 00 is bigger in a matrix sense Eb 7 W17 7 EX X 1X ee XX X 1 X X 1X 021XX X 1 02X X 1 Ebo who ll ECee C 0200 Toward that end de ne D E C 7 X X 1X so that DX 0 Using this we can write 12117010 021 X X 1X D X X 1X 02X X 1 02DD 121170 UZDD Finally we note that DD is a nonenegative de nite matrix Greene A7114 so that the variance of b is no larger than the variance 0 A few notes about the GausseMarkov Theorem Notice that no distributional assumptions were necessary a b is a random variable with a sampling distribution and associated sampling variance 121270 a If elX N N0a392 then b N N a392X X 1 and is the best among all linear and nonlinear estimae tors Keep in mind that some biased estimators may have smaller MSEs If I is BLUE for then any linear combination of bis is BLUE for the same linear combination of is a The Gauss7Markov Theorem holds for stochastic regressors as well7 provided the standard assumptions for X and e are made 3 Estimating the Variance of the LS Estimator We know 121170 7392X X 17 but 0392 is an unknown parameter Therefore in order to nd 17271 we need to nd a good estimator for 0392 Start by de ning M I 7 XX X 1X 7 which is symmetric and idempotent This matrix can be used to relate the residuals e to the errors 6 eMYMX e Me Using this relation7 we can then nd an unbiased estimator for 0392 Begin by nding the expectation of the inner product of e Ee e E M Et7 e Me The last equality uses the fact that the trace tr of a scalar is simply the scalar This can be further manipulated to give Et7 e Me Etree M using Greene A794 Taking expectations through the trace operator then gives Etree M 02tTM 0392t7 n 7 tTXX X 1X which after using Greene A794 again produces 0392trln 7 trXX X 1X 7 02m 7 trX X 1X X 7 02m 7 k Therefore7 if we de ne 52 e en 7 k we know it will be an unbiased estimator for 0392 and W01 52X X 1 The square root of the estimated variance of b is often called the standard error of b 4 Inference in Least Squares Regressions Now reintroduce Classical assumption 6 e N N0 03921 Because I is a linear function of e we then know that b N N a392X X 1 In this section we will take a rst pass at testing some simple hypotheses regarding the population regression equation 41 Single Coef cient Tests Consider two different cases 7 0392 is known to the econometrician and more realistically 0392 is not known to the econometrician We wish to test whether a single element of the vector is equal to zero or not a 0392 known Assume we are testing the null hypothesis Ho k 0 against the alternative hypothesis HA k 7E 0 We can form the statistic bk k Zlc 7 Sick where 5kC is the km diagonal element of X X 1 Under the null hypothesis 2 will have a standard normal N0 1 distribution a 0392 unknown Testing the same hypothesis we want to replace a39 with 5 Doing so means 2 no longer has a standard normal distribution Instead we form the statistic bk 5k Sbk tk where sbk s Skk which has a student s t distribution with n 7 k degrees of freedom This is motivated by the fact that the ratio of 2 standard normal to the square root of n 7 k52a392 chie squared distribution divided by its degrees of freedom has a student s t distribution This also relies on independence between the 2k and n 7 k52a392 42 Con dence Intervals Con dence intervals provide a convenient method of presenting the same information as in the hypotheses tests above Below are the con dence intervals for two population parameters k and 0392 I The 1 7 A 96 100 con dence interval for k can be found by noting that Pr7t2 lt bC 7 ksbk lt tAQ 1 7 A Rearranging the con dence interval for k is bk 7 tAstk bk tAstk I The 1 7 A 96 100 con dence interval for 0392 can be found by noting that PrX 7A2 lt n 7 k52a392 lt xi 1 7 A Rearranging the con dence interval for 0392 is 7 lagxi n 7 k52X 7V2 43 Overall Signi cance Now assume that we wish to assess the overall signi cance of the regression model That is we want to test whether or not the explanatory variables explain a signi cant amount of the variation in the dependent variable The null hypothesis in this case will be Ho 2 k 0 The alternative hypothesis is that the null is false Not surprisingly we can use the R2 to execute the test The test statistic is RZk71 F lt1 7 Jaelt77 k which under the null hypothesis has an F distribution with k 7 1 and n 7 k degrees of freedom in the numerator and denominator respectively A couple of notes a If the true model is 31 l 62 then no variation in 31 around its mean of l is explained and R2 0 a It is possible for all individual coefficients to be insigni cant while jointly they are signi cant 44 Example Prison Population Violent Crimes cont In this example we will use Gauss to perform three different hypothesis tests on the regression equation 21239 51 529 53903239 6239 where the new variable 13 is the unemployment rate for the 239 state 1 Since more violent crimes and higher unemployment should lead to a higher prison population we test H0 2 0versusHA 2gt0 H0 3 0versusHA 3gt0 The critical i value with df n 7 k 48 at the 5 signi cance level is 168 The estimated i values are 1032 and 7136 respectively As a result we reject the rst null in favor of a positive relationship between violent crimes and the prison population7 and fail to reject the second null that there is a nonepositive relationship between the unemployment rate and the prison population 2 The 95 con dence interval for 2 is 00380056 The interval is centered on the point estimate 2 00477 which implies that each violent crime leads to a 047 increase in the prison population7 holding constant the unemployment rate 3 The overall signi cance test is H0 1 52530 HA H0 false The 5 critical F value with k 7 1 2 degrees of freedom in the numerator and n 7 k 48 degrees of freedom in the denominator is 321 The estimated F value is 5346 Therefore7 we reject the null that all the slope coefficients are jointly zero The model does indeed explain a signi cant amount of the variation in the prison population across states See Gauss example 41 for more details 5 Data ProbleIns Data problems such as multicollinearity7 missing observations and outliers will be covered later ECON 5340 Class Notes Chapter 8 Speci cation Analysis Model Selection and Data Problems 1 Introduction Most of this chapter is concerned with choosing the correct regression model That is7 how do you choose between competing models and if you get it wrong7 what are the consequences The last part of my notes7 deals with several different practical problems that may occur in the data 2 Speci cation Analysis 21 Omission of Relevant Variables Suppose that the quottruequot regression model is YX1 1X252 1 where X1 is a n X k1 matrix and X2 is a n X k2 matrix Now assume that the researcher mistakenly estimates the following Y Xl l e 2 The least squares estimate of l is 121 XlX1 1XlY XiX1 1X X131X232e 1X X1 1X X2 2 X X1 1Xie Taking expectations then gives EUH 51 XiX171XiX2 2 This implies that 111 is a biased estimator of l unless I 2 07 which means that equation 2 was the quottruequot model and X2 was not really relevant or 2 X1 and X2 are orthogonal Neither of these are likely to be true so omitting relevant variables produces biased estimates of the coefficients Although 11 is biased its variance will not be larger and is likely to be smaller than the LS estimator for l when X2 is included call this estimator b1 2 These two variances are 17071 02XiX1 1 varb1 2 02X1 M2X1 1 02X5X1 7 X X2X X2 1X X1 1 where M2 is the quotresidual makerquot matrix for X2 Note however that the estimates of 12117011 and varb1 2 2 may not re ect this ordering because 5 is a biased estimator of 0392 when excluding X2 from the model 2 2 Pretest Estimators At least on a meanesquare error basis it is not clear which estimator is better 111 or 111 2 A third and quite popular choice is the soecalled pretest estimator call it bi This estimator is a mix of the previous two First you estimate model 1 and then perform a statistical test to see if X2 belongs in the model If you reject the null X2 does matter then you settle on 111 2 Otherwise you choose 11 Using an F test we can write Eb1 PrF lt Fa 1 PrF gt Fa 7E l Therefore bi is a biased estimator unless the F test is designed to always reject the null hypothesis size 2 1 The variance of bf is nonetrivial to calculate The gauss example below performs a Monte Carlo experiment to see which of these three estimators performs better on a meanesquare error basis 221 Gauss Example MSE Comparison of a Pretest Estimator For this experiment we let M 51 529 53903239 6239 and examine the mean square error of three estimators of Z 112 1123 and 113 For given values of the independent variables we then draw 2000 different samples each of size n 50 See Gauss example 81 for more details 23 Inclusion of Irrelevant Variables Now assume that the quottruequot regression model is Y Xl l e and the researcher mistakenly estimates Y Xl l Xg g e As shown earlier the estimator for l from the latter model is 1212 Xl MglelXngY XiM2X171XiM2X1516 l XiM2X1YlXiM2e which is clearly unbiased However as shown above there is a cost involved with including the unnecessary regressors X2 The variance of 111 2 is in ated relative to the correct estimator b1 3 Choosing Between Nonnested Models Sometimes we want to choose between two models that are not nested For example maybe we want to distinguish between model 1 Y X e and model 2 Y 27 6 Assuming X and Z each have a variable not included in the other neither model can be written as a special case of the other No simple t or F test can reject one model in favor of the other One solution to this problem is to arti cially nest the two models in the compound model Y1704X OLZ E where 0 043 1 31 J Test The J test of Davidson and MacKinnon 1981 is designed to test whether or not a 0 model 1 or a 1 model 2 Normally we would just estimate a and run a simple t test The problem is that a is not identi ed 7 it is nothing more than an arbitrary scaling of and y The J test solves this problem using the following procedure 1 Estimate 7 by a LS regression of Y on Z 2 Estimate and a by a LS regression of Y on X and Z Ay 3 Using 64 carry out an asymptotic t test of the null hypothesis HO a 0 4 Reverse the order in l 7 and test the null hypothesis HO a 1 Unfortunately in nite samples their are four possible outcomes 7 reject both nulls fail to reject both nulls and reject one of each of the nulls 4 Model Selection Criteria The J test is implicitly designed to distinguish between models based on goodnessofe t within the sample An example of a less sophisticated modeleselection criterion would be to see which model produced a greater R2 The problem with these approaches is that what works well inesample may not work so well outeofe sample In this case we need a penalty for overeparameterizing a model Here are some options 2 2 1 Choose explanatory variables to maximize R2 1 7 7 7 kR or alternatively minimize s 2 Choose explanatory variables to minimize Akaike lnformation Criterion AlC lne en Qkn 3 Choose explanatory variables to minimize Schwarz Criterion lne en klnnn These three criteria will tend to produce increasingly parsimonious models as the penalty for additional explanatory variables increases 5 Data ProbleIns This section is an eclectic collection of practical data problems 5 1 Multicollinearity There are two types of multicollinearity MC perfect and imperfect Perfect MC violates the Classical assumption that the X matrix is of full rank in which case OLS cannot be calculated This section deals with imperfect MC between the explanatory variables in which case OLS can be calculated 511 Properties of the OLS Estimator Given that imperfect MC does not violate any of the Classical assumptions we know that the GausseMarkov theorem still holds and b is the best linear unbiased estimator This is a surprising result to some but it simply means that given the multicollinear regressors there is no better way than OLS to estimate the population parameters Of course all else equal having less multicollinear regressors would produce more reliable estimates smaller standard errors but that is not an option 512 Detection The rst two procedures to detect MC involve using simple correlations and variance in ation factors VlFs 1 Simple Correlation Coe icients The easiest method to detect MO is to print out a matrix of simple pairwise correlation coe icients between the explanatory variables and look for values close to one in absolute value say greater than 08 in magnitude to Variance ln ation Factors The problem with pairwise correlation coe icients is that it can miss more sophisticated forms of multicollinearity that involve multiple explanatory variables VlFs are calculated according to VIF3J 1 71ml where R is the coe icient of determination for a regression of the jth explanatory variable on all other explanatory variables It is interpreted as the amount var j is in ated relative to the case of no MC Another approach to detecting MO is the diagnostic approach of looking for MC in the OLS results Under severe MC OLS properties include 1 Small changes in the data eg eliminating a single observation or variable can cause large changes in the 3s N High R2s and low is 03 Unexpected signs on the 3s of course this could also be caused by an inappropriate theory so be cautious 513 Solutions There are many ways to handle MC and none of the potential solutions are uniformly the best Here are some options 1 Do nothing Recall that OLS is still BLUE 2 Transform the data Taking ratios linear combinations or rstedifferences of the explanatory variables can often reduce MC 3 Drop variables This is probably the most common solution Many researchers use economic theory common sense and initial regression results to choose variables to drop You need to be very careful however to not drop a relevant variable because it will bias all the remaining estimates 4 Mechanical approaches Routines such as ridge regressions and principal components are options but are not widely accepted by the discipline 5 2 Measurement Error Many economic variables are measured with error For example the consumer price index is calculated from a m of prices across many metropolitan areas and tends to miss new goods often fails to account for improvements in existing goods and doesn t fully recognize consumers ability to substitute toward cheaper goods Survey data are also often measured with error as respondents misstate their true behavior or characteristics Let s consider two types of measurement error 521 Measurement Error in the Dependent Variable Assume the true model is 51 52i i7 where represents the true and unobserved value of the dependent variable The researcher unfortunately is endowed with 31 p a noisy measure of Rewriting gives 21239 51 52 6239 M Therefore as long as p is iid and uncorrelated with an the OLS estimates of the s will be BLUE 522 Measurement Error in the Independent Variables Now assume the true model is 21239 51 5290 6239 4 where represents the true and unobserved value of the dependent variable The researcher unfortunately is endowed with an p a noisy measure of Rewriting 4 gives 21239 51 529 6239 52M 152i6 It is clear that the corracie 7E 0 which violates a Classical assumption and will result in biased and inconsistent estimates of Q In fact 2 0011062363 001496 We 7 zm 52 and the inconsistency in 112 measuring the variables in their deviationefrometheemean form is given by l I Li 2x 6239 phmb2 phm i232 02 Using Slutsky s theorem and Qquot plim 2211 we can show that 2Q 139 b p1m2 QU so if 7 gt 0 112 is quot 39 39 t t in 39 This matches the fact that corracie i QUi lt 0 when 2 gt 0 and corracie i QUi gt 0 when 2 lt 0 which causes 2 to be biased toward zero Signing the bias is much more complicated in a multivariate setting Finally the typical solution is instrumental variables estimation that is nd a proxy variable for an that is not correlated with the measurement error p 5 3 Missing Observations A third practical problem with economic data is missing observations ie quotholesquot in your dataset This is a common occurrence in survey data as people refuse to answer questions If observations for certain questions are missing there are several options 1 to 03 q Eliminate the entire row entire observation from the dataset There are two problems with this approach First missing observations are often not random so eliminating them will produce a sample that is not representative of the population eg maybe old people are reluctant to state their age Second this often leaves you with too few remaining observations Replace the missing value with the sample mean If the entire row of the X matrix is missing this is no different than entirely eliminating the observation Furthermore if missing values are systematically related to X the sample mean may not be an representative estimate of the true value of X Dummy variable approach Create a new dummy variable for each variable that has missing obsere vations provided they are missing in different rows and add the dummies to the X matrix In this fashion the researcher is using all the available observations on an explanatory variable in calculate ing the corresponding coefficient One downside is that like 1 and 2 above it assumes that the observations are missing at random which is not always the case Sophisticated interpolation There are several available routines that allow one to use inesample and outeofesample information to make a more sophisticated than the unconditional mean guess at the missing value Little is known about the property of these estimators and what it known typically comes from simulation exercises in special contexts ECON 5340 Class Notes Chapter 5 Large Sample Properties of the LS Estimator 1 Introduction Unlike the simple linear regression model in many cases we cannot calculate the exact distribution of our estimators This is generally true when we relax Classical assumption 6 which we do here Fortunately however we can often calculate approximate distributions that hold when the sample size is large This is the focus of chapter 5 2 Consistency of b Recall a consistent estimator has the following property lim Prlb 7 l lt 6 1 ngtoo for any positive 6 It is said that the probability limit of b is that is plim z Next we are going to establish the consistency of 1 Continue to assume that X is nonstochastic and lim 1X X Q ngtoon is a positiveede nite nite matrix This condition is fairly restrictive less restrictive assumptions can be used and guarantees that the explanatory data are quotwellebehavedquot in the sense that their variance does not get too large Here is a countereexample Counterexample Consider the timeeseries model yt 1 2t t where t 1 n In this case n nn1 n t n 4 2 1 1 00 X X 2 4 2 lim X X 21 2122 4 1 A X M w W 00 00 To show consistency rewrite b as b 5 X X 1X e Taking the probability limit gives 1 1 plimbi plim X X 1plim X e TL TL 1 1 139 X X 1139 X pzmn pzmltn 6 Q 1gtlt00 where plimX X 1 plimX X 1 Via Slutsky s Theorem Greene Theorem D12 and plimX e 0 because iX e converges in mean square to zero Greene Theorem D11 As a result plimb or b is a consistent estimator of 3 Asymptotic Distribution of b Continue to assume that X is nonstochastic lim X X Q and e N 0021 Because I is a consise ngtoo tent estimator of the limiting distribution of b is degenerate ie a spike at However using the Central Limit Theorem we can take a stabilizing transformation of b to produce a nonedegenerate limiting distribution m2 73 i M0702 This result suggests that in large samples we can approximate the distribution of b as NW U iQ l We call this the asymptotic distribution of b or b aiy N UTIQ l A few notes If fb is continuous fb ail Nf F 71Q 1F where r 8gltgt If we add that e N N0 03921 then b has the exact distribution N a392X X 1 Since 0392 and Q are unknown we can replace them with 52 and X X respectively This produces our standard estimate 0131111701 52X X 17 which is also a consistent estimate of the unknown asymptotic variance 4 Asymptotic Behavior of Test Statistics Earlier we assumed that the errors were normal ie7 e N N0a39217 and showed that a t him5b Ntnik FOTRNFkiLnik If instead we did not impose normality ie7 e N 00217 then we can show z b 7 Sb W N01 k 7 1F a X2k 7 1 5 Instrumental Variables and 2SLS Covered later 6 Measurement Error Covered later 7 Normally Distributed Disturbances Now we will reinstate the assumption that e N N0a392 71 Maximum Likelihood Estimation Maximum likelihood estimation MLE is an alternative estimation criterion to least squares The principle of MLE is to select the parameters of the model so as to maximize the likelihood or probability that the data were generated by the model Given the model Y X e and the assumption on the errors above we can write the joint probability or likelihood function as L 27702 2exp7e 620392 mow2 exp7Y 7 XWY 7 X 202 We could maximize this likelihood directly by choosing the unknown parameters and 0392 however it is often easier to work with the log likelihood function lnL 7 ln2739r 79118 7 ri y 7 X Y 7 X Taking derivatives with respect to and 0392 and setting equal to zero gives 81nL 7 1 r L a ley X l O 81 L 1 I 52 ltY7X ltYX3gt0 Solving these equations jointly for and 0392 produces the ML estimates ML XIX71XIYZ 7k 6 e en52n From our earlier LS analysis we know that ML is the best linear unbiased estimator of and 6 is a biased estimate of 0392 recall 52 is unbiased All ML estimates subject to some weak regularity conditions have the following properties Consistency Asymptotic ef ciency ie among all consistent estimates they have the smallest asymptotic variance Asymptotic normality lnvariance ie if ML is the ML estimate of 9 then 99ML is the ML estimate ofg9 for continuous 9 One potential drawback of MLE is that it requires the user to know the distribution of the errors which is often assumed normal Fortunately there are tests based on the skewness and kurtosis of the residuals eg BeraeJarque test that allow one to test this assumption 72 Wald Lagrange Multiplier and Likelihood Ratio Tests The standard t and F tests require the errors to be normally distributed If the errors are not normally distributed however we can rely on the CLT and the fact that b is asymptotically normal Assume we wish to test the possibly nonlinear null hypothesis H0 9 0 Below are three tests that will generally give different answers in small samples but are asymptotically equivalent Wald Test w 912 Gb52X X 1lGb 1 902 xzm where G01 89b8b and J is the number of restrictions in the null Likelihood Ratio Test LR 72lnL71nL nlnee 7 lne e aiy X2J where an asterisk indicates the restricted null hypothesis imposed likelihood and residuals Lagrange Multiplier Test LM W1Wn nRg my X2J where R3 is the coef cient of determination from a regression of 5 on X
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'