ECONOMETRIC THY I
ECONOMETRIC THY I ECON 583
Popular in Course
Popular in Economcs
This 83 page Class Notes was uploaded by Miss Adeline Weimann on Wednesday September 9, 2015. The Class Notes belongs to ECON 583 at University of Washington taught by Eric Zivot in Fall. Since its upload, it has received 26 views. For similar materials see /class/192470/econ-583-university-of-washington in Economcs at University of Washington.
Reviews for ECONOMETRIC THY I
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/09/15
Example stylized consumption function Campbell and Mankiw 1990 Act I 6061Ayt62m ti1T SIZt 875 L 3 where ct the log of real per capita consumption excluding durables yt the log of real disposable income and rt 2 the ex post real interest rate T bill rate infla tion rate Note See Zivot and Wang 2005 Chapter 21 for S PLUS code to replicate this example Assumptions Act7 Ayt rt are stationary and ergodic tIt is a stationary and ergodic martingale differ ence sequence MDS where It 2 Acs Aysrs1 denotes the observed information set at time t Endogeneity and Instruments The variables Ayt and rt are likely to be contempo raneously correlated with at Because tIt is a stationary and ergodic MDS E tIt1 O which implies that any variable in It1 is a potential instrument For any variable act1 C It1 xt1 t is an uncor related sequence Data Annual data over the period 1960 to 1995 taken from Wooldridge 2002 Example Testing the Permanent Income Hypothesis The pure permanent income hypothesis PIH due to Hall 1978 states that ct is a martingale so that Act 2 at is a MDS Hence the PlH implies the linear restrictions H0 61 62 0 which are of the form R6 r with 0 1 0 0 R o o Jaro rankR 2 lfthere are temporary income consumers then 61 gt O Act 51 52Ayt 63735 8t Xt 17ACt 17Ayt 177 t 1y7 EXt t 0 EXtXL S 1 Estimator 61 62 63 J 2step 007 627 010 1578 004 150 098 209 Iterated 008 591 032 1855 004 144 095 173 CU 008 574 054 1747 003 139 095 186 1 step 003 801 024 W 2 I4 005 223 116 2SLS 008 586 027 2018 003 133 175 155 Table 1 GMM estimates of the consumption function parameters GMM Wald Statistic This is based on any of the unrestricted GMM es timates efficient or inefficient Using the iterated GMM estimate the Wald statistics is A A A A 1 Wald nR6s 1 r Ravar6s 1R Iter Iter gtltR8Se1r r 1699 Since rankR 2 Wald N X22 The p value is 00002 so we reject the PIH at any reasonable level of significance G M M LR Statistic This statistic can only be computed using an efficient GMM estimator 2 step iterated CU 2SLS It is based on the difference between a restricted and u n restricted J statistic The unrestricted model is Act 6061Ayt62m t t 1T The Jstatistic from the iterated efficient GMM esti mation is J3s 1 S l Iter 7 iter 1855 The restricted model imposed H0 61 62 O Act60 t t 1T To ensure a positive GMM LR statistic the restricted model should be estimated using the unrestricted effi 1 CIent weight matrIX SIter Some software eg Eviews cannot do this The J statistic from the restricted efficient GMM es timation is JSS 1 S l 2188505 Iter 7 iter The GMM LR statistic is then J3s 1 Sgelr J3S1 s l Iter 7 Iter 7 iter 188505 1855 1699 LRGMM Note LRGMM Wald since H0 61 62 O is a linear hypothesis Example Testing Endogeneity of rt in consumption function Act I 6061Ayt62m t i1T Xt 17 Act 17 Ayt lart ly Consider testing H0 Ert t O rt is exogenous H1 Ert t 75 O rt is endogenous Under H0 the full set of instruments is Xt 17 ACt 17 Ayt la Tit 177 and under H1 the valid instruments are X175 17ACt 17Ayt 17Tt 1y Therefore 90275 rt 2 suspect instrument K 5 K124K K1K21 Unrestricted GMM based on X75 2 1 Act1 Ayt1 rt1 my gives 1 1 J6SFu SH 2 19528 Restricted GMM based on X175 2 1 Act1 Ayt1 rt1 and 81 117Fu gives quot 1 1 J6SllFull7 SllFull 19346 The C statistic is therefore 1 1 1 1 J6SFuII7 SFuII J6SllFull7 SllFull 00182 0 The p value based on the X21 distribution is 0892 so we do not reject the null that rt is exogenous in the consum ption function Example Testing Exogeneity of Act1 in consump tion function Act Xt 6061Ayt62m t t 1T laAct laAyt lart ly Here H0 EAct1 t O Act1 is exogenous H1 EAct1 t 75 O Act1 is endogenous Under H0 the valid instruments are Xt 17ACt 17Ayt 17Tt 1 and under H1 the valid instruments are X175 17Ayt 17Tt 1Y7K1 3 Remark Under H1 60 is just identified K1 2 L so Restricted GMM based on X175 2 1 Ayt1rt1 A1 and 8117Fu gives JSs 1 S 1 o 11Fu7 11Ful Therefore the C stat is 1 1 which is identical to the J statistic for the unre stricted model Remark We will get exactly the same result if we test H0 1 ElAyt 18tl 0 H1 1 ElAyt 18tl 75 0 or if we test H0 Ert1 t0 H1 Ert1 t7 0 Example Testing instrument relevance in the con su mption function Act Xt 6061Ayt62m t t 1T laAct laAyt lart ly There are 2 endogenous variables so we have 2 re duced form equations Ayt W10 W11ACt 1 W12Ayt 1 W13Wt 1 Ult Wt W20 7T21ACt 1 W22Ayt 1 W23Wt 1 v2t Instruments are completely irrelevant if W11 W12 W13 0 W21 W22 W23 0 Least squares esthnathan of the fhst stage for Zlyt gNes Coefficients Value Std Error t value Prgtt Intercept 00067 00055 12323 02271 tslagGC 12345 03955 31214 00039 tslagGY 05226 02781 18787 00697 tslagR3 00847 01395 06069 05483 Least squares estimation of the first stage for rt gives Coefficients Value Std Error t value Prgtt Intercept 00083 00044 18987 00669 tslagGC 01645 03167 05192 06073 tslagGY 04290 02228 19259 00633 tslagR3 08496 01117 76049 00000 Example Simultaneous Equations Model Klein s Model 1 Reference Berndt 1991 Chapter 10 The Practice of Econometrics Ad dison Wesley Three equation macro model with five accounting identies annual data 1 consumption function 2 investment function 3 labor demand function Consum ption function 0N7 040 041Wi1 W72 042137 043Pi 1 871 Wil W72 2 private public wage income Pi profit income Net Investment function Ii 50 31131 32131 1 5319 1 822 K7 2 capital stock Labor Demand wage bill Wz Y0 Y1E7 72E71 1 V3 i 1931 823 E7 private product Accounting Identies Total product Y7 TXi 0N7 I7 G7 Income Y7 2 Pi Wi Capital K7 2 I7 Ki1 Wages Wi Wil W72 Private product E7 2 Y7 TXi W72 where TXi indirect business taxes G net government demand Remarks 1 Wil and Pi are endogenous variables in consumption function 2 Pi is endogenous in investment function 3 E7 is endogenous in labor demand function 4 5115527 and 537 may be correlated and heteroskedastic Total endogenous variables CNmIiaWn umKiaWiaEi Total predetermined variables potential instruments 1 Gm W2717TX717 73 19317Ki 17Pi 17Ei 1 3SLS Specification in Hayashi Notation Consumption function 9711 0N7 Zn 17Wi7Pi7Pi 17 51 04070417007043 L1 4 Investment function M2 Ii Z72 17 W7 Pi 3 1 52 30731732733Y L2 4 La bor demand function 2413 Wu Z73 17 E717 Ei 17i 1931 63 ti0771772773 L3 4 Instruments X7 2 mil 2 xiM 2 same instruments for each equation Kgtlt1 1 Gi ngTX77L 1931 K71 P71 E71 K 8 total parametersL 2 L1 L2 L3 12 total momentsMK 3 X 8 24 MK L 12 overidentifying restrictions Example Exchange Rate Regressions 87L1m 04m Bmfz m 7L1m 73 7L n monthsm 1 M currencies where si m log of spot exchange rate between home currency and currency m in month 73 fi m log of price of forward contract for delivery of currency m in month z39 1 Note cov 7m 57h 75 0 due to common shocks affecting all currencies Forward Rate Unbiased Hypothesis Uncovered Interest Rate Parity Under rational expectations and risk neutrality E8i1m fifrn m 17 39 39 39 7M gt O m 0 3m 2 1 and E i1 mIi 0 Notes 1 Note imIi is a MDS as a result of rational expectations 2 We would like to test all restrictions together for a more powerful test Because si m and fi m are nonstationary behave like random walks UIP is typically tested using the so called differences regression ASH Lm 7m 6mfim 8im 7L1m ASH Lm 87L1m 8im Now UIP implies 7m 07 6m 0 and 0 m 1M SUR Model Specification in Hayashi Notation yim A8im Zim 17 fi 1m 8im 1 6m 7m 6m7 Lm 2 X7 Z7 2 union of zil ziM instruments for each equation Kgtlt1 17 fi 11 3i 117 7fi 1M 3i 1M7 K M 1 total parametersL 2M tota momentsMK MM 1 MM 1 2M 2 MM 1 overidentifying restrictions Sargan39s Specification Test SUR J statistic J3SURS 1JR7 A A A A A n 855R N X2MM Testing UIP using GMM Wald and LR statistics H0 6mltgggtlt2gtm1nM H0 R 6 r 2Mgtlt2M2Mgtlt1 2Mgtlt1 where R212M7r207170717quot397071 Wald Statistic WaldGMM nR3 r R avar8R1RS r N X22M A A 1 5 5SSUR 558 2 ZSglljR X InZ 1 LR Difference in J Statistic LRGMM 6035153 J8SURS JR Sg R WM 60 07170717quot39 7071 Note Since the null hypothesis is linear in 6 the Wald and LR statistics are numerically equivalent Example Capital Asset Pricing Model CAPM Rim 7 am mR7jjlj7 l im m 1Massetsi1nmonths cov im ih 75 0 aim is conditionally heteroskedastic where Rim 2 return on asset m in month 73 Riv 2 return on market index in month 73 f T7 2 return on risk free asset in month 73 CAP M pricing relationship EiRlmi a BmER M 1 72 04iOform1M One test of the CAPM has null hypothesis H0 a1aMO M zero restrictions SUR Model Specification in Hayashi Notation f yim Rim T7 Zim 1 Rim 2 same for all equations 6m am78m7 Lm 2 X7 Z7 2 union of zil ziM instruments for each equation Kgtlt1 1R 4 K 2 total parametersL 2M tota momentsMK 2M 2M 2M 2 O overidentifying restrictions Sargan39s Specification Test SUR J statistic J3SURS 1JR7 A A l A A quot n smy smas 511 R S 511 R 5333 sz Sg RD 0 because the system is exactly identified Testing the CAPM H0 amzo m1M H0 MgtIlt2M2fsx1er1 where 1 0 0 0 0 0 R 99 0 90 0 0 0 1 0 Asymptotic Properties of Nonlinear GMM Under standard regularity conditions to be discussed later it can be shown that 9W 3 00 9W 00 i N0 avar9v V where avar W G WG1G WSWGG WG1 G 2 E 3gWt90 30 For efficient GMM set W 8 1 and so avar9 1 G s 1G 1 Remark Notice that with nonlinear GMM the expression for avar W is of the same form as in linear GMM except that Em Ethg is replaced by G E 83g 90 A consistent estimate of avar9W may be computed using avaraw ewe 1ewsweewe l where S is a consistent estimate for S avar and A G Gn v v n1 agoMW 80 751 For the efficient GMM estimator W S1 and avar S 1 G S l 1 If gtwt 00 is an ergodic stationary MDS then a consistent estimator of S takes the form I L SHC n1thWt79gtWt70l 751 9300 If gtwt00 is a mean zero serially correlated ergodic stationary process then 00 S LRVI 02 Zrj1 3 j1 Elgtwt7 00gtWt j7 00 391 H and a consistent estimator has the form n l SHAC f0 kLfj fg j1 qn A A 1 n A A I j0 Z gtWt70gt jWt j0l tj1 9300 Proving Consistency and Asymptotic Normality for Nonlinear GMM Hayashi s Chapter 7 see also Hall Chapter 3 discuss the techniques for proving consistency and asymptotic normality for a general class of nonlinear extremum estimators These estimators have the form A 0 arg max Qn0 OEGCRP Qn0 objective function 9 parameter space For GMM define W J0W gn0 wgnlt0 gnw gimme 751 If 9 minimizes J0W then it maximizes 62710 GMM is one type of extremum estimator Another type is the so called M estimator where 1 I L 62719 Zmwt79 nt l mwt0 2 real valued function Two common M estimators are maximum likelihood ML estimators and non linear least squares NLS estimators o The ML estimator is an M estimator with mWt ln fytxt 1 I L Qn0 Z In fytlxti 0 ntzl o The NLS estimator is an M estimator with mom 0 yt mt 02 an i gt mt 62 751 Remarks 1 The first order conditions for an M estimator are M4 Wu A 0 ng1 nt wt707 We Therefore M estimators can be thought of as method of moment estimators based on the population moment E Wt 0 0 2 GMM can be thought of as an M estimator The first order condtions for GMM are 0 awe Gn9Wgn9 2 Gnayv vl Z govt 9 80 n 751 Z Wta 751 Mme Gnlt WgWt0 The population moment is E wt 0 E G Wgwt 0 0 Consistency of Extremum Estimators Basic Idea Let 9 arg maxechRp If Qn0 g Q00 uniformly in 0 and if 00 uniquely maximizes Q00 then 9 g 0 Technical requirements 1 Continuity of Qn0 and Q00 2 Uniform convergence of Qn0 to Q00 sup Ian c2009 3 o as n a oo 069 3 Compact parameter space 9 4 00 uniquely maximizes Q00 Note See Ha Chapter 3 or Newey and McFaddan 1994 for a formal proof based on a and 6 arguments Application to GMM Recall 1 A 1 I A we J0W gn0wgn0 9 argegaa nwgt 963ng WW Now 1 Qn0 is continuous provided gn0 is continuous 2 If Wt is ergodic stationary and Egwt 0 exists then 1 n p gnw g 2 gm 0 a Elgltwt01 751 Qow EgltWt 01 WEgWt 0 3 If Egwt 00 0 and Egwt 0 75 0 for any 0 75 00 then 0090 Eigltwt eo1 WEigWt 00 o 009 lt 0for07500 Therefore Q00 is uniquely maximized at 00 4 Uniform convergence of Qn0 to Q00 requires uniform convergence of gn0 to Egwt Hayashi states that a sufficient condition for sup gn0 Egwt 0 0 as n gt 00 069 Elsup llgn0 lt 00 066 Asymptotic Normality of GMM Recall we gnaw gn0 v vgnlt0 0 ar max 0 g OeecRp Assume 1 00 C 9 compact 2 gwt 0 is K x 1 and continuously differentiable in 0 for any Wt 3 Egwt 00 0 and Egwt 00 75 0 for 0 75 00 4 G has full column rank P 5 gnwo N0 S 6 Esupee llgn9 lt 00 Basic trick Apply exact TSE ie Mean Value Theorem to nonlinear sample moment gn0 to get a linear function of 0 Since 9 arg maxechRp Qn0 the first order condtions FOC are Bond A A A p91 TGn0wgn0 G 0 3gn0 8gn0 8gn0 3 80 801 39p Now apply exact TSE to about 00 53 gnwo Gn 9 90 07 Aigil l A0071 72 1 1 A7 E 01 Substitute TSE into FOC 0 Gn v v gnww Gu x 00 pgtlt1 Gn9Wgn00 Gn9IWGn99 00 Solve for 00 lt3 60 Gnlt WGn 1 Gn v vgn00 Multiply both sides by W M9 60 Gnlt WGn 1 Gn v w gn00 Asymptotics Now agnw 1 ageshe Z T 1 t G 0 IQEP 80 n By the ergodic theorem ano 3 E 8ggg00 G Also we know that 9 3 00 gt 9 2 Ma a Ai0i0 3 07370 8gwt7 Gn0 gtE 80 uniformly in 0 then 31719 one 3 G A sufficient condition for uniform convergence of 31710 to E is E sup 0 80 8g 0i lt 00 Finally by assumption i n W g gn00 tg 75700 Therefore g 00 i G WG1G WgtltN0 S E NOV V G WG 1G WSWGG WG 1 Remark For efficient GMM W 8 1 and V avar 9 00 G s 1G 1 Remark The preceeding results imply the following simplifications provided 9 is close to 90 gnu gnlt00Gnlt A 00 gnwo anoxe 00 0291 and 2 60 Gnlt WGn l 1 Gn Wgn00 Gn00 WGn90l1 Gn00Wgn00 0290 where 0191 represents terms that converge in probability to zero These simplifications will be useful in the following derivations Hypothesis Testing in Nonlinear GMM Models The main types of hypothesis tests 0 Overidentification restrictions 0 Coefficient restrictions linear and nonlinear o Subsets of orthogonality restrictions work the same way in nonlinear GMM as they do with linear GMM Remarks 1 One should always first test the overidentifying restrictions before conducting the other tests If the model specification is rejected it does not make sense to do the remaining tests 2 Testing instrument relevance is not straightforward in nonlinear models Asymptotic Distribution of Sample Moments and Jstatistic By assumption the normalized sample moment evaluated at 00 is asymptoti cally normally distributed vague n i gm 00 i No s t l KgK avarw gnwo As a result the J statistic evaluated at 00 is asymptotically chi square distrib uted with K degrees of freedom J we S l ngn00 S 1gneo i x200 provided S38 As with linear GMM the normalized sample moment evaluated at the efficient GMM estimate 0 0S1 is asymptotically normally distributed S lWgnda i No IK PF 8gwt7 PF 2 FF F1F F 8 12G G E 80 rank IK PFK P To derive this result combine the TSE of about 00 with the the FOC for the efficient GMM optimization gn gn90 anox 00 0291 A A 1 A 0 60 Emma s lemma Gn00S1gn90 0291 to give gnu gn90 Gn00 Gn00 S 1Gneo 1 Gn00 S 1gneo 0291 Let S1 S12 S12 Then write the normalized sample moment as lWgnda 1K P 1 S lWgnwo 0291 where F S 12ano Pig 2 FF F 1F Then by the asymptotic normality of gn00 and Slutsky s theorem S lWgnda i It PF gtlt Na 1K E N0 Ik PF since Ik PF is idempotent From the previous result it follows that the asymptotic distribution of the J statistic evaluated at the efficient GMM estimate 0S1 is chi square with K P degrees of freedom ngn 1 S 1gnlt9 1 81Wgn 1 gtlt lng ln i X2K P Jlt9ltS 1S 1 Therefore we reject the overidentifying restrictions at the 5 level if J S 1S 1 gt xgsK P Testing Restrictions on Parameters The Trilogy of Tests For ease of exposition consider testing the simple hypotheses H 00 0 Pgtlt1 0 H1 07500 We consider three types of tests 0 Wald test 0 GMM LR test difference in J statistics o GMM LM test GMM score test A A A l Let 0 0S denote the ef uent GMM estimator The three test statistics have the form WaldGMM n9 00 Gn S 1Gn 9 00 LRGMM J00S 1 J S 1 LMGMM ngn00 S 1Gn00 Gn90S1Gn90l XGn90S1gn90 where the same value of S1 is used for all statistics Result Under H0 0 00 d WaldGMMa LRGMM LMGMM gt X2P Derivation of LMGMM score statistic The score statistic is based on the score of the GMM objective function 8Qn9 A A1 A M 80 nlt gnlt Intuition if H0 0 00 is true then 0 N BQHWO pxl 80 whereas if H0 0 00 is not true then Gn90S1gn90 36271090 quot 1 p91 7s 7 Gn90 S gnwo Assuming H0 0 00 is true 80 Gn00S 1 gn00 i G s 1 gtlt N0 S N0 G s 1G Notice that avar G s 1G Then the GMM score statistic is defined as LMGMM GQ90S1Gn90i 1 BQHWO xM ae gt ngn00S1Gn00 Gn00 S 1Gn00 gtlt Gn90S1gn90 Then by the asymptotic normality of gd o and Slutsky s theorem 1 LMGMM i N0G s 1G G s 1G gtltN0G S1G E X203 Maximum Likelihood Estimation Eric Zivot May 14 2001 This version November 20 2005 1 Maximum Likelihood Estimation 11 The Likelihood Function Let X1Xn be an iid sample With probability density function pdf fc Where 0 is a k X 1 vector of parameters that characterize For example if XfNp02 then fx0 27TUZ 12exp722x 7 02 then 0 MU2 The joint density of the sample is by independence equal to the product of the marginal densities f17 7 0 f1 0 fwn0 HfWW The joint density is an n dimensional function of the data 81 ccn given the para meter vector 0 The joint density1 satis es flt17quot397quoti0 gt 0 fx1xn0dx1dxn l The likelihood function is de ned as the joint density treated as a functions of the parameters 0 TL Llt6lxlxn f501xn0 Hmw 11 Notice that the likelihood function is a k dimensional function of 0 given the data 81 CL n It is important to keep in mind that the likelihood function being a function of 0 and not the data is not a proper pdf It is always positive but L lx1xquotd01d 3amp1 1If X1 Xn are discrete random variables then fac1 xn PrX1 x1 Xn son for a xed value of 9 To simplify notation let the vector x x1 ccn denote the observed sample Then the joint pdf and likelihood function may be expressed as fx 0 and L lx Example 1 Bernoulli Sampling Let Xf Bernoulli That is X 1 With probability 0 and X 0 With proba bility l 7 0 Where 0 S 0 S l The pdf for X is fz390 00011 143 z39 071 Let X1 Xn be an iid sample With Xf Bernoulli The joint densitylikelihood function is given by fx 0 L0lx Ham 7 are 02100107 Willa 11 For a given value of 0 and observed sample L f cc 0 gives the probability of observing the sample For example suppose n 5 and SC 0 0 Now some values of 0 are more likely to have generated this sample than others In particular it is more likely that 0 is close to zero than one To see this note that the likelihood function for this sample is L0l0770 1 W5 This function is illustrated in gure xxx The likelihood function has a clear maximum at 0 0 That is 0 0 is the value of 0 that makes the observed sample 5L 0 0 most likely highest probability Similarly suppose 5L l 1 Then the likelihood function is L01105 Which is illustrated in gure xxx Now the likelihood function has a maximum at 0 1 Example 2 Normal Sampling Let X1 Xn be an iid sample With XfNQL 0392 The pdf for X is l fx0 27TUZ 12 exp 7 m2 7 00 lt p lt 00 02 gt 0 7 00 lt x lt 00 039 so that 0 M 0392 The likelihood function is given by W rearm ay i1 1 TL QWZVW exp lt 202 2 7 M2 i1 2 Figure xxx illustrates the normal likelihood for a representative sample of size n 25 Notice that the likelihood has the same bell shape of a bivariate normal density Suppose 0392 1 Then Lope 7 Loewe 7 2704 exp 2cm 7 pr Now ew 7 Zltei7eee7pr7zilt 7e22ltep7 lte7pgtltee7p2 7 Zlt 7e2nlt7p2 sothat Loewe 7 270W exp 2m 7 e We 7 WD i1 Since both 7 if and CE 7 02 are positive it is clear that LMlcc is maximized at M i This is illustrated in gure xxx Example 3 Linear Regression Model with Normal Errors Consider the linear regression pp 7 seem i71n all N iid N0U2 The pdf of Sill is 202 l minim 27r0212 exp 7L5 The Jacobian of the transformation for 8139 to yi is one so the pdf of gill is normal With mean and variance 0392 mm 0 7 WW2 exp 79139 7 pier Where 0 5 0392 Given an iid sample of n observations y and X7 the joint density of the sample is fle 0 7 WW2 exp 2 Di 7 WV 7 mowZap7lty7xp lty7xp 3 The log likelihood function is then lumen20 7 731nm 7 3111072 7 g 7 mm 7 X6 Example 4 ARZ model with Normal Errors To be completed 12 The Maximum Likelihood Estimator Suppose we have a random sample from the pdf fc and we are interested in estimating 0 The previous example motives an estimator as the value of 0 that makes the observed sample most likely Formally the maximum likelihood estimator denoted mle is the value of 0 that maximizes Lw x That is mle solves mgax L ix It is often quite dif cult to directly maximize L X It usually much easier to maximize the log likelihood function lan x Since ln is a monotonic function the value of the that maximizes lnL x will also maximize Lw x Therefore we may also de ne 07715 as the value of 0 that solves mgax ln L0 5L With random sampling the log likelihood has the particularly simple form 1nLlt6x71nlt fltx6gt 7 Zlnfltx0 Since the MLE is de ned as a maximization problem we would like know the conditions under which we may determine the MLE using the techniques of calculus A regular pdf 0 provides a suf cient set of such conditions We say the 0 is regular if 1 The support of the random variables X SX cc fc gt 0 does not depend on 0 2 0 is at least three times differentiable with respect to 0 3 The true value of 0 lies in a compact set 9 1f 0 is regular then we may nd the MLE by differentiating lnL0lx and solving the rst order conditions BlnL9mlelx 80 7 Since 0 is k X l the rst order conditions de ne k potentially nonlinear equations in k unknown values 6111L 9mla x2 alnLmlelx 5 80 6hL9mzelx 69 The vector of derivatives of the log likelihood function is called the score vector and is denoted 8 lnL lx S 0 lt x a By de nition the MLE satis es 8 mlelx 0 Under random sampling the score for the sample becomes the sum of the scores for each observation 5L 80lx W swim Where 80lcci W is the score associated With 58 Example 5 Bernoulli emmple continued The log likelihood function is 1nL0X ln 021110010 7 will ln0 n 7 Zn ln1 7 0 i1 i1 The score function for the Bernoulli log likelihood is i 81nL0lx 71 quot 1 quot 80lx7 80 iggxli lie 717ng The MR satis es Swmlelx 0 Which after a little algebra produces the ME A 1 quot 0m e 139 Hence the sample average is the MLE for 0 in the Bernoulli model 5 Example 6 Normal emample continued Since the normal pdf is regular we may determine the MLE for 0 M702 by maximizing the log likelihood n n 2 1 n 2 lnL0x i 7 ln27r 7 E lnU 7 W 7 p The sample score is a 2 X 1 vector given by 613L9x SWX lt 61119x1 gt 602 Where alan x 7 1 quot 8M 7 02 7 M Blan x 7 n 271 1 272 2 802 7 307 507 Y 501711 R H Note that the score vector for an observation is 6111f 9m12 02 1 xv i SWW lt 61119 12 gt 7102 JF1Eal2i2i mi 2 502 2 2 l M so that Swix 2 1 Swim Solving 8 mleix 0 gives the normal equations BlnLmlex 1 quot A A Mm e 0 8 0727715 l BlnLmlex n 12 1 1 12 2 quot A 2 T Umze t 507mg 1 Mmze 0 i1 Solving the rst equation for mle gives n A 1 M 33v a mle n V 1 11 Hence the sample average is the ME for M Using mle 5E and solving the second equation for 6727115 gives 1 TL 6727118 g 7 if 11 Notice that 6392 We is not equal to the sample variance 6 Example 7 Linear regression emample continued The MLE of 0 satis es 8 m15 y7X 0 Where 80 y7X 6 lnL yX is the score vector Now 7 1 a I 7 I I I T 7 WWW 2yX XX l 702 17X yX X6 8mL0X n 1 75072 WW 2y7X6in Solving WM 0 for 6 gives mle X907le BOLS Next solving W 0 for 0392 gives 5y 7 X mmy e X mle 720m Hi My XBOLSy XBOLS A2 Umle 13 Properties of the Score Function The matrix of second derivatives of the log likelihood is called the Hessian 62111L9 x 621nL9Ix 2 Hm 821nL0x 3 591 l 2 2 8080 62 inLga x2 621nL 9 x aakaal 69 The information matrix is de ned as minus the expectation of the Hessian 50 EH0l If we have random sampling then R 82 11 0 139 n Hlt0Ixgt Emacs Oix nI xi The last result says that the sample information matrix is equal to n times the information matrix for an observation The following proposition relates some properties of the score function to the information matrix Proposition 8 Let fxi0 be a regular pdf Then 1 Elsww f swlxnm mom o 2 if 0 is a scalar then UaTltSlt0lCEO ES0lxi2 S0lxi2fxi0dxi 58 if 0 is a vector then UaTltSlt0lCEO ES0lxi80lx S0lxi80lx fxi 0dxlv le Proof For part 1 we have Eww xm S0lxifxi0dxi 8111 0 I Wfltxu WW 0dci 8 8 a 1 80 39 0 The key part to the proof is the ability to interchange the order of differentiation and integration For part 2 consider the scalar case for simplicity Now proceeding as above we get Elsww slt0xi2fltxi0dxi Wf xmm 1 a 2 1 a 2 mfi9gt flti 0di fi 0gt d Next recall that lei and wam 7 Wmmdwi 7E Now by the chain rule 82 8 l 8 Wlnfm m e v 072 a 0 2 0482 0 7 enact fltx Wm Then imam 7 fn02fxu0gt HM1 xml xmdx fii071fii0gt2di7fxii0di Eww 7 f 0dc Eww 14 Concentrating the Likelihood Function In many situations our interest may be only on a few elements of 0 Let 0 0102 and suppose 01 is the parameter of interest and 02 is a nuisance parameter parameter not of interest In this situation it is often convenient to concentrate out the nuisance parameter 02 from the log likelihood function leaving a concentrated loyalikelthood function that is only a function of the parameter of interest 01 To illustrate consider the example of iid sampling from a normal distribution Suppose the parameter of interest is M and the nuisance parameter is 0392 We wish to concentrate the log likelihood with respect to 0392 leaving a concentrated log likelihood function for M We do this as follows From the score function for 0392 we have the rst order condition alanix quot 271 1 272 27 T7707 507 Ewiim 0 Solving for 0392 as a function of M gives Notice that any value of 02M de ned this way satis es the rst order condition w 0 If we substitute 02M for 0392 in the log likelihood function for 0 we get the following concentrated log likelihood function for M haw 731nm 7 311107200 7 Zen 7 mg 7 11127r 7 gm 7 m2 n n l n 2 75 ln27r 1 7 5 ln 7 M Now we may determine the MLE for M by maximizing the concentrated log likelihood function ln L2Ml The rst order conditions are 8111 Lcmmzell 21 mle 8 5 21 mle2 which is satis ed by ame 5 provided not all of the 8 values are identical For some models it may not be possible to analytically concentrate the log likelihood with respect to a subset of parameters Nonetheless it is still possible in principle to numerically concentrate the log likelihood 15 The Precision of the Maximum Likelihood Estimator The likelihood log likelihood and score functions for a typical model are illustrated in gure xxx The likelihood function is always positive since it is the joint density of the sample but the log likelihood function is typically negative being the log of a number less than 1 Here the log likelihood is globally concave and has a unique maximum at mle Consequently the score function is positive to the left of the maximum crosses zero at the maximum and becomes negative to the right of the maximum lntuitively the precision of mle depends on the curvature of the log likelihood function near 97715 lf the log likelihood is very curved or steep around mle then 0 will be precisely estimated ln this case we say that we have a lot of information about 0 On the other hand if the log likelihood is not curved or at near 97715 then 0 will not be precisely estimated Accordingly we say that we do not have much information about 0 The extreme case of a completely at likelihood in 0 is illustrated in gure xxx Here the sample contains no information about the true value of 0 because every value of 0 produces the same value of the likelihood function When this happens we 10 say that 0 is not identi ed Formally 0 is identi ed if for all 01 02 there exists a sample x for which L 1lx L 2lx The curvature of the log likelihood is measured by its second derivative Hessian H0lx W Since the Hessian is negative semi de nite the information in the sample about 0 may be measured by lex If 0 is a scalar then H lx is a positive number The expected amount of information in the sample about the parameter 0 is the information matrix dlx 7EH0lx As we shall see the information matrix is directly related to the precision of the MLE 151 The Cramer Rao Lower Bound If we restrict ourselves to the class of unbiased estimators linear and nonlinear then we de ne the best estimator as the one with the smallest variance With linear estimators the Gauss Markov theorem tells us that the ordinary least squares OLS estimator is best BLUE When we expand the class of estimators to include linear and nonlinear estimators it turns out that we can establish an absolute lower bound on the variance of any unbiased estimator d of 0 under certain conditions Then if an unbiased estimator d has a variance that is equal to the lower bound then we have found the best unbiased estimator BUE Theorem 9 CrameriRao Inequality Let X1 A Xn be an iid sample with pdf 0 Let 9 be an unbiased estimator of 0 ie Em 0 If 0 is regular then uard 2 dlcc 1 where dlx 7EH lx denotes the sample information matrix Hence the CrameriRao Lower Bound CRLB is the inverse of the information matrix If 0 is a vector then uar 2 deL fl means that uar 7 dlx is positive semi de nite Example 10 Bernoulli model continued To determine the CRLB the information matrix must be evaluated The infor mation matrix may be computed as was ElelxN or dlx uarS0lL Further due to random sampling dlx n dlxi n uarS0l5L v Now using the chain rule it can be shown that E 00 0 7 1 swirl 7 2mm W The information for an observation is then mm 7mm 1EW j ljfEWW 017 0 since iEhJiei 070 7 ES0lxil imi m0 The information for an observation may also be computed as 0 var50lm W7 017 0 WWW M 17 m2 0217 02 0 i 1 01 7 0 The information for the sample is then 50 7 n new 7 L 7 l 7 017 0 and the CRLB is 0 1 0 CRLB marl 7 l n This the lower bound on the yariance of any unbiased estimator of 0 Consider the MLE for 0 07mg 5E Now Elgmlel A 0042 varwmle UaT SE Tl Notice that the MLE is unbiased and its variance is equal to the CRLB Therefore 0mg is e ioient Remarks 0 If 0 0 or 0 1 then le 00 and vod mle 0 Why 0 le is smallest When 0 0 As H a 00 le a 00 so that vod mle a 0 which suggests that 97115 is consistent for 0 Example 11 Normal model continued The Hessian for an observation is H 0 7 32111 aw 7 880l1 i 7 4 162hg f WEEquot 7 W 7 T 7 621 51 62111me 02 H 3022 Now 82 ln x50 aflfg 02 1 82 ln x50 82 ln x50 8 l i 7072sz M 82 ln x50 1 7 E02 2 7 02 Ber27702 so that 0 7 02W EKSW Ml0272 gt EKCW Ml0272 07572 0273El M2l Using the results2 El n Ml 0 E W gg l 1 was 02871 5032 The information matrix for the sample is then 10lx nlex quot0371 2 2 0 we then have and the CRLB is 2 CRLB I0x1 lt 7 0 gt 204 0 T Notice that the information matrix and the CRLB are diagonal matrices The CRLB for an unbiased estimator of M is 2 and the CRLB for an unbiased estimator of 0392 204 is TL 2aci 7 L2a392 is a chiesquare random variable With one degree of freedom The expected value of a chiesquare random variable is equal to its degrees of freedom 13 The MLEs for M and 0392 are mle a 7727115 g Mmle2 Now El mlel M n 7 1 E A 2 2 Umlel n 0 so that Lmle is unbiased Whereas oils is biased This illustrates the fact that mles are not necessarily unbiased Furthermore 02 narwmle Z CRLB and so mle is e icient The MLE for 0392 is biased and so the CRLB result does not apply Consider the unbiased estimator of 039 TL 1 2 E 2 i1 Is the variance of 82 equal to the CRLB To be continued Remarks 0 The diagonal elements of le gt 00 as n gt oo o le only depends on 0392 Example 12 Linear regression model continued The score vector is given by 702 1X yX X5l 80lyyX 02 2yX5 y X5gt ngfa igii fiae Where 5 y 7 X6 Now EM 0 and E n02 since e enoszgm so that lt gt lt 3 gt 14 l ml A q m l H NIH
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'