A small frog hops onto the Laxey wheel, a waterwheel in the Isle of Man, the largest in the world. Clinging to the wheel, the frog rides from its base, at a height of = 0 ft, to its top at a height of 72.5 ft, then back down. It rides the wheel once more before hopping off. Find a formula for = (), the frogs height above ground as a function of , the angle (in degrees), 90 630, and sketch its graph.

ECONOMETRICS Bruce E. Hansen °2000, 2017 University of Wisconsin Department of Economics This Revision: January 5, 2017 Comments Welcome 1This manuscript may be printed and reproduced for individual or instructional use, but may not be printed for commercial purposes. Contents Pfe ....................................i......... x 1 Introduction 1 1. Watomt............................1...... 1. hetlphtEmt ....................1.. 1. omtrsdNit .......................2..... 1. balt..............................3...... 1. drDtts ...........................4..... 1. cfEmiDt ...........................6.... 1. omtfr ............................7...... 1. aitent ............................7..... 1. omonmsl ..............................9...... 2 Conditional Expectation and Projection 10 2. oit...............................0......... 1 2. hetfWgs..........................0...... 1 2. oixit...........................2...... 1 2.4 Log Diﬀens ..............................4....... 1 2. oixitit ......................5..... 1 2. ousl............................6....... 1 2. woIdxs .........................7..... 1 2. EFEr.................................9........ 1 2. tlMdl ...........................0....... 2 0. sne ............................1....... 2 1.tsi ...............................1....... 2 2.oiae.............................2...... 2 3. mindois...................4..... 2 4. sait ...........................5....... 2 5.arEF ................................5........ 2 2.16 Linear CEF with Nonlinear Eﬀe t.......................7.... 2 7.arEFtimmyVl......................7..... 2 8.tnroi ...........................0....... 3 9.ariore........................5..... 3 2.20 Regression Coeﬃe t............................6...... 3 2.21 Regression Sub-Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37 . . . 2.22 Coeﬃe itmis..........................7...... 3 3. dilsi ..........................8........ 3 4.tnrpxit .........................9...... 3 5. stten ............................0..... 4 6. rgs ............................1........ 4 2.27 Limitations of the Best Linear Projection . . . . . . . . . . . . . . . . . . . . . 42. . 2.28 Random Coeﬃe itl..........................3...... 4 i CONTENTS ii 2.29 Causal Eﬀects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 . . . 2.30 Expectation: Mathematical Details* . . . . . . . . . . . . . . . . . . . . . . . . . 49. 2.31 Moment Generating and Characteristic Functions* . . . . . . . . . . . . . . . . . . .51 2.32 Existence and Uniqueness of the Conditional Expectation* . . . . . . . . . . . . .52 2.33 Identiﬁn*...............................2........ 5 4.hiof..............................3....... 5 Ei ....................................7........ 5 3 The Algebra of Least Squares 59 3. uit...............................9......... 5 3. mil .................................9........ 5 3. Mmtmtr ............................0....... 6 3. sqsmtr...........................1...... 6 3. ifasestieer..................2... 6 3. ifasestilor................3.. 6 3.ilI ...............................5......... 6 3. sqsRl ...........................6...... 6 3. MeiMtNit .........................7....... 6 0.ejMt ..............................8...... 6 3.11 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69 . . 2.sifrre .........................0..... 7 3. lfe .............................1...... 7 4. sCmnt ...........................2...... 7 5.dlgs .............................3...... 7 6.dir...............................5...... 7 3.17 Inﬂulis ............................6..... 7 8. SDtet...............................8........ 7 9.rmmig ..............................0......... 8 0.hiof..............................4....... 8 Ei ....................................5........ 8 4 Least Squares Regression 88 4. uit...............................8......... 8 4. meln ................................8....... 8 4. arsMol...........................9...... 8 4. Mnfaassr........................0.... 9 4. iefsussr ......................1.... 9 4. Gsovhm ............................3..... 9 4. Glasrs..........................4...... 9 4. il .................................5........ 9 4. sifrre .........................7..... 9 0. Sarsor.........................8...... 9 1. eMtmninromis .............9.. 9 2. eMtmninreis............0...1 3.nrEr.............................03........1 4. mit ..............................4.........1 5. ssti.............................05........1 6.mlxml............................06.......1 7.eltl ...........................8..........1 Ei ...................................2.........1 CONTENTS iii 5 Normal Regression and Maximum Likelihood 114 5. uit..............................4..........1 5. heorbt..........................4.......1 5. -rbt ..........................6.......1 5. ttti ...........................7.......1 5. Fsi ..............................18.......1 5. iNtndirs.....................8....1 5. orlsMol .........................9.......1 5.8 Distribution of OLS Coeﬃcient Vector . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.9 Distribution of OLS Residual Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 0.nifemst......................2......1 1.tit...............................3..........1 5.12 Conﬁdence Intervals for Regression Coeﬃe t ...............24....1 5.13 Conﬁd esfrre....................5.....1 4. s ..................................6.........1 5.idais ...........................7.......1 6.idpr...........................9........1 7.fitndfrlRs ..................0.....1 8.hiof.............................1........1 6 An Introduction to Large Sample Asymptotics 140 6. uit..............................0..........1 6. smiiti ..........................41.........1 6. oneitli .......................2........1 6. WkLwfaeumr ........................3......1 6. lsurnnedttga*.................4...1 6. tdM mt ..........................5.......1 6. oneit ........................6.......1 6. nlitm ............................7.....1 6. Mlnmitm .....................50....1 0.hrMmt .............................1........1 1.usfMmt ...........................2.......1 2.tlhd...............................4........1 3.hsrrml .........................5.......1 6.14 Uniform Stochastic Bounds* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.15 Semiparametric Eﬃe ny .........................8.......1 6.hiof.............................1........1 Ei ...................................7.........1 7 Asymptotic Theory for Least Squares 168 7. uit..............................8..........1 7. oyfaessr ...................9.....1 7. smiotl...........................0.......1 7. isi .............................5.......1 7. oyfrremsr ..................7.....1 7. omasiextsit ...............8......1 7. eascextsit ...............8.....1 7. mmrofiexttit..................0......1 7. liaeMtsr ..................1....1 7.10 Functions of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 1. minrEr........................4.......1 CONTENTS iv 2.ait...............................86.........1 7.13 Conﬁd eel ...........................7........1 4. sl ............................9.......1 5.cIl.............................0........1 6. lt...............................2........1 7. mcsaSit.......................2.......1 7.18 Conﬁd eRis............................3........1 7.19 Semiparametric Eﬃe yitejMdl ..............4....1 7.20 Semiparametric Eﬃe yitemdsesMl........5.1 1.oflnsal ......................7......1 2. miv* ...........................8.......1 Ei ...................................0.........2 8 Restricted Estimation 203 8. uit..............................03.........2 8. odLsus.........................04......2 8. olis...........................5........2 8. MimDie.............................6.......2 8. ymiit .........................07.......2 8.6 Eﬃe itimDiesr....................8.....2 8. olisi .......................9......2 8. nedSarrmait..................1......2 8.9 Misspeciﬁoit.............................11........2 0. lri ...........................3.......2 1.uls...........................4.......2 2.hiof.............................5........2 Ei ...................................7.........2 9 Hypothesis Testing 219 9. Hts ...............................9.........2 9. AnedRj .........................0.......2 9. yeIr ...............................1........2 9. ts .................................1..........2 9. yeIrndwr..........................3.......2 9.6 Statistical Signine ..........................3........2 9. -s.................................4.........2 9. tdteesns ......................6......2 9. Wles ...............................7.........2 0. mcsals.........................28......2 1.BdTs ...........................9.......2 2. imDies..........................9.......2 3. imDiesUrHmds ...............0...2 4. Fs .................................1.........2 5.rlsttfNnrHts ................33...2 6. tClil ..........................6.......2 9.17 Conﬁd eelysr ....................8.....2 8.orndsCy.........................9......2 9. miolwr..........................1.......2 0. miolw,rCs ....................3.....2 1.hiof.............................4........2 Ei ...................................6.........2 CONTENTS v 10 Endogeneity 248 1.xml ...............................48.........2 2.mall...........................9.......2 3. dFr ..............................0........2 10.4 Identiﬁit ..............................1.........2 5.mallsr......................2......2 6.wtesqs ..........................2......2 7.oloiph .........................4......2 8.oyfLS ...........................6........2 9. mit ..........................7.......2 1.eMimnit......................8........2 1.yImtAmit ......................0.......2 10.12Identiﬁiui ..........................61........2 Ei ..................................63.........2 11 Generalized Method of Moments 265 1. mtMdl .............................5.........2 2.aromtMl..........................6........2 3. MMEsr .............................6........2 4.oittLSdVI .......................7.......2 5.nifMMEmtr ......................7......2 11.6 Eﬃe iMM ...............................8.......2 11.7 Eﬃe iMMv rLS ........................9.......2 11.8 Estimation of the Eﬃeititt ..................69....2 9.oyldGMM .........................0.....2 1.eMimnit......................0........2 1.MM:eGels .........................71......2 11.12Over-Identiﬁois ..........................2.......2 1.hi:heiSit ................3......2 1.ilmnts......................74.....2 Ei ..................................76.........2 12 Regression Extensions 279 1. nrsqs ..........................9......2 2. leses.........................2.......2 3.sfHds.......................5.......2 4.sfOdinair....................5........2 5.asetis.........................6.......2 6. igs ...........................88.......2 Ei ..................................91.........2 13 The Bootstrap 293 13.1 Deﬁoiftep .........................3......2 2.hemltFit ...................93......2 3. actp ..........................5......2 4.opmitfidae ..................5....2 5.lal............................6........2 6.lqlal......................8......2 7.mmtel ......................8......2 8. mixis .........................99.......2 9. dTs .............................1........3 1mmtT-ds.........................1......3 CONTENTS vi 13.11Percentile Conﬁdnel .......................3......3 1.pMtsfRsMsl..................3.....3 1.pGMMnfe.........................04......3 Ei ...................................6.........3 14 NonParametric Regression 307 1.dit...............................7.........3 2.ndmtr.............................7........3 3. lgs.............................9........3 4.olritr..........................0........3 5. amtundRsti ................11....3 6. lBddtl ....................3.....3 7. mist .........................16.......3 8. ilemit ......................8......3 9.nrEr..............................9........3 1.ler............................0........3 15 Series Estimation 322 1. xniyir.........................22.......3 2.sl.................................2..........3 3.lirMl...........................4.......3 4. lalMel .......................24......3 5.ofipis..........................4.......3 6. e’omn............................6.......3 7. xagns.........................26......3 8.dldRsFti ......................29.......3 15.9 Cross-Validation Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 1.reiMSr ........................30.....3 1.fince...........................1........3 1.mittl...........................2.......3 1.mittltiroig ................33....3 1.smit ..........................4........3 1.lreis........................5.....3 1eirf .............................5........3 16 Empirical Likelihood 341 1. -mLd .........................41.....3 2. mistfLmsr ..................3.....3 3. ins.........................4......3 4.s..................................45........3 5. mlmpit .........................6........3 17 Univariate Time Series 348 1.tidEii ........................8.......3 2. ss..............................0........3 3.tifR)ss........................1......3 4.agpr ..............................51........3 5.tifR) ...........................2.......3 6.sit ...............................52........3 7. mist .........................53.......3 8. pfAss........................4......3 9.ndtir ...........................4.........3 CONTENTS vii 7sfOtiil ...................55.....3 17.11Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 7.sUtot.........................6.......3 18 Multivariate Time Series 358 8.trts(Rs ......................8.......3 8.sit ..............................59.........3 18.3 Restricted VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 8.lqif aAR ........................9......3 8.sfOdiil ...................60.....3 8.lfgLtinVAR......................0.....3 8. rCl ............................1........3 8.oit .............................1..........3 8.odVRs............................2........3 19 Limited Dependent Variables 364 9.irhi..............................4.........3 9.otDt ..............................5..........3 9.edDt .............................66........3 9.mel .............................7........3 20 Panel Data 369 20.1 Individual-Eﬀctol.........................69.......3 20.2 Fixed Eﬀet ..............................9.........3 0. mials.........................71......3 21 Nonparametric Density Estimation 372 1. ltmait.......................72........3 1.smitEfKlss....................3.....3 A Matrix Algebra 376 A. oit...............................6...........3 A. oelMt............................7........3 A. Mtoiti ...........................8..........3 A. Mttll .........................8.........3 A. e..................................9..........3 A. akdIr.............................9........3 A. mit..............................81.........3 A. vs ..............................82.........3 A.9 Positive Deﬁnns ..........................82........3 A.als..............................4........3 A.Mtulls...........................4...........3 A.KrPutdtecpr..................85....3 A.trrs..............................86........3 A.Mtors.............................89.........3 A.Mtul...........................1.........3 CONTENTS viii B Probability 393 B.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 B. nmVbl ...........................5.........3 B.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 B. mmaFit ...........................6..........3 B. mmnDs..........................7........3 B. alnmVsl .....................0........4 B.7 Conditional Distributions and Expectation . . . . . . . . . . . . . . . . . . . . . . . . 402 B. sis ............................3.........4 B.Iil..............................4..........4 C Numerical Optimization 408 C. iar .............................8...........4 C. eitds ...........................8.........4 C. tees..........................0.......4 Preface This book is intended to serve as the textbook a ﬁrst-year graduate course in econometrics. Students are assumed to have an understanding of multivariate calculus, probability theory, linear algebra, and mathematical statistics. A prior course in undergraduate econometrics would be helpful, but not required. Two excellent undergraduate textbooks are Wooldridge (2013) and Stock and Watson (2015). For reference, some of the basic tools of matrix algebra, probability, and statistics are reviewed in the Appendix. For students wishing to deepen their knowledge of matrix algebra in relation to their study of econometrics, I recommend Matrix Algebra by Abadir and Magnus (2005). An excellent introduction to probability and statistics is Statistical Inference by Casella and Berger (2002). For those wanting a deeper foundation in probability, I recommend Ash (1972) or Billingsley (1995). For more advanced statistical theory, I recommend Lehmann and Casella (1998), van der Vaart (1998), Shao (2003), and Lehmann and Romano (2005). For further study in econometrics beyond this text, I recommend Davidson (1994) for asymp- totic theory, Hamilton (1994) for time-series methods, Wooldridge (2010) for panel data and discrete response models, and Li and Racine (2007) for nonparametrics and semiparametric econometrics. Beyond these texts, the Handbook of Econometrics series provides advanced summaries of contem- porary econometric methods and theory. The end-of-chapter exercises are important parts of the text and are meant to help teach students of econometrics. Answers are not provided, and this is intentional. I would like to thank Ying-Ying Lee for providing research assistance in preparing some of the empirical examples presented in the text. As this is a manuscript in progress, some parts are quite incomplete, and there are many topics which I plan to add. In general, the earlier chapters are the most complete while the later chapters need signiﬁcant work and revision. ix Chapter 1 Introduction 1.1 What is Econometrics The term “econometrics” is believed to have been crafted by Ragnar Frisch (1895-1973) of Norway, one of the three principal founders of the Econometric Society, ﬁrst editor of the journal Econometrica, and co-winner of the ﬁrst Nobel Memorial Prize in Economic Sciences in 1969. It is therefore ﬁtting that we turn to Frisch’s own words in the introduction to the ﬁrst issue of Econometrica to describe the discipline. A word of explanation regarding the term econometrics may be in order. Its deﬁni- tion is implied in the statement of the scope of the [Econometric] Society, in Section I of the Constitution, which reads: “The Econometric Society is an international society for the advancement of economic theory in its relation to statistics and mathematics.... Its main object shall be to promote studies that aim at a uniﬁcation of the theoretical- quantitative and the empirical-quantitative approach to economic problems....” But there are several aspects of the quantitative approach to economics, and no single one of these aspects, taken by itself, should be confounded with econometrics. Thus, econometrics is by no means the same as economic statistics. Nor is it identical with what we call general economic theory, although a considerable portion of this theory has ade ﬁninitely quantitative character. Nor should econometrics be taken as synonomous with the application of mathematics to economics. Experience has shown that each of these three view-points, that of statistics, economic theory, and mathematics, is a necessary, but not by itself a suﬃcient, condition for a real understanding of the quantitative relations in modern economic life. It is the uniﬁcation of all three that is powerful. And it is this uniﬁcation that constitutes econometrics. Ragnar Frisch, Econometrica, (1933), 1, pp. 1-2. This deﬁnition remains valid today, although some terms have evolved somewhat in their usage. Today, we would say that econometrics is the uniﬁed study of economic models, mathematical statistics, and economic data. Within the ﬁeld of econometrics there are sub-divisions and specializations. Econometric the- ory concerns the development of tools and methods, and the study of the properties of econometric methods. Applied econometrics is a term describing the development of quantitative economic models and the application of econometric methods to these models using economic data. 1.2 The Probability Approach to Econometrics The unifying methodology of modern econometrics was articulated by Trygve Haavelmo (1911- 1999) of Norway, winner of the 1989 Nobel Memorial Prize in Economic Sciences, in his seminal 1 CHAPTER 1. INTRODUCTION 2 paper “The probability approach in econometrics”, Econometrica (1944). Haavelmo argued that quantitative economic models must necessarily be probability models (by which today we would mean stochastic). Deterministic models are blatently inconsistent with observed economic quan- tities, and it is incoherent to apply deterministic models to non-deterministic data. Economic models should be explicitly designed to incorporate randomness; stochastic errors should not be simply added to deterministic models to make them random. Once we acknowledge that an eco- nomic model is a probability model, it follows naturally that an appropriate tool way to quantify, estimate, and conduct inferences about the economy is through the powerful theory of mathe- matical statistics. The appropriate method for a quantitative economic analysis follows from the probabilistic construction of the economic model. Haavelmo’s probability approach was quickly embraced by the economics profession. Today no quantitative work in economics shuns its fundamental vision. While all economists embrace the probability approach, there has been some evolution in its implementation. The structural approach is the closest to Haavelmo’s original idea. A probabilistic economic model is speciﬁed, and the quantitative analysis performed under the assumption that the economic model is correctly speciﬁed. Researchers often describe this as “taking their model seriously.” The structural approach typically leads to likelihood-based analysis, including maximum likelihood and Bayesian estimation. A criticism of the structural approach is that it is misleading to treat an economic model as correctly speciﬁed. Rather, it is more accurate to view a model as a useful abstraction or approximation. In this case, how should we interpret structural econometric analysis The quasi- structural approach to inference views a structural economic model as an approximation rather than the truth. This theory has led to the concepts of the pseudo-true value (the parameter value deﬁned by the estimation problem), the quasi-likelihood function, quasi-MLE, and quasi-likelihood inference. Closely related is the semiparametric approach. A probabilistic economic model is partially speciﬁed but some features are left unspeciﬁed. This approach typically leads to estimation methods such as least-squares and the Generalized Method of Moments. The semiparametric approach dominates contemporary econometrics, and is the main focus of this textbook. Another branch of quantitative structural economics is the calibration approach. Similar to the quasi-structural approach, the calibration approach interprets structural models as approx- imations and hence inherently false. The diﬀerence is that the calibrationist literature rejects mathematical statistics (deeming classical theory as inappropriate for approximate models) and instead selects parameters by matching model and data moments using non-statistical ad hoc 1 methods. 1.3 Econometric Terms and Notation In a typical application, an econometrician has a set of repeated measurements on a set of vari- ables. For example, in a labor application the variables could include weekly earnings, educational attainment, age, and other descriptive characteristics. We call this information the data, dataset, or sample. We use the term observations to refer to the distinct repeated measurements on the variables. An individual observation often corresponds to a speciﬁc economic unit, such as a person, household, corporation, ﬁrm, organization, country, state, city or other geographical region. An individual observation could also be a measurement at a point in time, such as quarterly GDP or a daily interest rate. Ad hoc means“forthispurpose”—amethoddesignedforaspeci ﬁc problem — and not based on a generalizable principle. CHAPTER 1. INTRODUCTION 3 Economists typically denote variables by the italicized roman characters , and/or The convention in econometrics is to use the character to denote the variable to be explained, while the characters and are used to denote the conditioning (explaining) variables. Following mathematical convention, real numbers (elements of the real line R,slla scalars) are written using lower case italics such as , and vectors (elements of Rylr case bold italics such as x e.g. ⎛ ⎞ 1 ⎜ 2 ⎟ x = ⎜ . ⎟ ⎝ . ⎠ Upper case bold italics such as X are used for matrices. We denote the number of observations by the natural number and subscript the variables by the index to denote the individual observation, e.g x an z . In some contexts we use indices other than , such as in time-series applications where the index is common and is used to denote the number of observations. In panel studies we typically use the double index to refer to individual at a time period . The observation is the set ( ) The sample is the set {(xz): =1 } It is proper mathematical practice to use upper case for random variables and lower case for realizations or speciﬁc values. Since we use upper case to denote matrices, the distinction between random variables and their realizations is not rigorously followed in econometric notation. Thus the notation will in some places refer to a random variable, and in other places a speciﬁc realization. This is an undesirable but there is little to be done about it without terriﬁcally complicating the notation. Hopefully there will be no confusion as the use should be evident from the context. 2 We typically use Greek letters such as and o denote unknown parameters of an econo- metric model, and will use boldface, e.g. β or θ, when these are vector-valued. Estimates are typically denoted by putting a hat “^”, tilde “~” or bar “-” over the corresponding letter, e.g. and are estimates of The covariance matrix of an econometric estimator will typically be written using the capital ³ ´ boldface V often with a subscript to denote the estimator, e.. Var β as the covariance b b matrix for β Hopefully without causing³confus´on, we will use the notai=avar( β) to denote √ b eampivcemifo β − β (the variance of the asymptotic distribution). Estimates will be denoted by appending hats or tildes, e.g. is an estimate of V . 1.4 Observational Data A common econometric question is to quantify the impact of one set of variables on another variable. For example, a concern in labor economics is the returns to schooling — the change in earnings induced by increasing a worker’s education, holding other variables constant. Another issue of interest is the earnings gap between men and women. Ideally, we would useexperimental data to answer these questions. To measure the returns to schooling, an experiment might randomly divide children into groups, mandate diﬀerent levels of education to the diﬀerent groups, and then follow the children’s wage path after they mature and enter the labor force. The diﬀerences between the groups would be direct measurements of the eﬀects of diﬀerent levels of education. However, experiments such as this would be widely CHAPTER 1. INTRODUCTION 4 condemned as immoral! Consequently, in economics non-laboratory experimental data sets are ypicalwinscope. Instead, most economic data is observational. To continue the above example, through data collection we can record the level of a person’s education and their wage. With such data we can measure the joint distribution of these variables, and assess the joint dependence. But from observational data it is diﬃcult to infer causality, as we are not able to manipulate one variable to see the direct eﬀect on the other. For example, a person’s level of education is (at least partially) determined by that person’s choices. These factors are likely to be aﬀected by their personal abilities and attitudes towards work. The fact that a person is highly educated suggests a high level of ability, which suggests a high relative wage. This is an alternative explanation for an observed positive correlation between educational levels and wages. High ability individuals do better in school, and therefore choose to attain higher levels of education, and their high ability is the fundamental reason for their high wages. The point is that multiple explanations are consistent with a positive correlation between schooling levels and education. Knowledge of the joint distibution alone may not be able to distinguish between these explanations. Most economic data sets are observational, not experimental. This means that all variables must be treated as random and possibly jointly deter- mined. This discussion means that it is diﬃcult to infer causality from observational data alone. Causal inference requires identiﬁcation, and this is based on strong assumptions. We will discuss these issues on occasion throughout the text. 1.5 Standard Data Structures There are ﬁve major types of economic data sets: cross-sectional, time-series, panel, clustered, and spatial. They are distinguished by the dependence structure across observations. Cross-sectional data sets have one observation per individual. Surveys and administrative records are a typical source for cross-sectional data. In typical applications, the individuals surveyed are persons, households, ﬁrms or other economic agents. In many contemporary econometric cross- section studies the sample size is quite large. It is conventional to assume that cross-sectional observations are mutually independent. Most of this text is devoted to the study of cross-section data. Time-series data are indexed by time. Typical examples include macroeconomic aggregates, prices and interest rates. This type of data is characterized by serial dependence. Most aggregate economic data is only available at a low frequency (annual, quarterly or perhaps monthly) so the sample size is typically much smaller than in cross-section studies. An exception is ﬁnancial data where data are available at a high frequency (weekly, daily, hourly, or by transaction) so sample sizes can be quite large. Panel data combines elements of cross-section and time-series. These data sets consist of a set of individuals (typically persons, households, or corporations) measured repeatedly over time. The common modeling assumption is that the individuals are mutually independent of one another, but a given individual’s observations are mutually dependent. In some panel data contexts, the number of time series observations per individual is small while the number of individuals is large. In other panel data contexts (for example when countries or states are taken as the unit of measurement) the number of individuals can be small while the number of time series observations can be moderately large. An important issues in econometric panel data is the treatment of error components. CHAPTER 1. INTRODUCTION 5 Clustered samples are increasing popular in applied economics, and is related to panel data. In clustered sampling, the observations are grouped into “clusters” which are treated as mutually independent, yet allowed to be dependent within the cluster. The major diﬀerence with panel data is that clustered sampling typically does not explicitly model error component structures, nor the dependence within clusters, but rather is concerned with inference which is robust to arbitrary forms of within-cluster correlation. Spatial dependence is another model of interdependence. The observations are treated as mutu- ally dependent according to a spatial measure (for example, geographic proximity). Unlike cluster- ing, spatial models allow all observations to be mutually dependent, and typically rely on explicit modeling of the dependence relationships. Spatialdependence can also be viewed as a generalization of time series dependence. Data Structures • Cross-section • Time-series • Panel • Clustered • Spatial As we mentioned above, most of this text will be devoted to cross-sectional data under the assumption of mutually independent observations. By mutual independence we mean that the observation ( ) s independent of the observation ( x ) or 6= . (Sometimes the label “independent” is misconstrued. It is a statement about the relationship between observations and , not a statement about the relationship between and x ad/or z .) n this case we say that the data are independently distributed. Furthermore, if the data is randomly gathered, it is reasonable to model each observation as a draw from the same probability distribution. In this case we say that the data are identically distributed. If the observations are mutually independent and identically distributed, we say that the observations are independent and identically distributed, iid,ora random sample.F or most of this text we will assume that our observations come from a random sample. Deﬁnition 1.5.1 The observations ( x z ) are a sample from the dis- tribution if they are identically distributed across =1 with joint distribution . Deﬁnition 1.5.2 The observations ( x z ) are a random sampleif they are mutually independent and identically distributed (iid)a css = 1 CHAPTER 1. INTRODUCTION 6 In the random sampling framework, we think of an individual observation ( x z as a re- alization from a joint probability distribution ( xz) which we can call the population .si “population” is inﬁnitely large. This abstraction can be a source of confusion as it does not cor- respond to a physical population in the real world. It is an abstraction since the distribution is unknown, and the goal of statistical inference is to learn about features of from the sample. The assumption of random sampling provides the mathematical foundation for treating economic statistics with the tools of mathematical statistics. The random sampling framework was a major intellectural breakthrough of the late 19th cen- tury, allowing the application of mathematical statistics to the social sciences. Before this concep- tual development, methods from mathematical statistics had not been applied to economic data as the latter was viewed as non-random. The random sampling framework enabled economic samples to be treated as random, a necessary precondition for the application of statistical methods. 1.6 Sources for Economic Data Fortunately for economists, the internet provides a convenient forum for dissemination of eco- nomic data. Many large-scale economic datasets are available without charge from governmental agencies. An excellent starting point is the Resources for Economists Data Links, available at rfe.org.orsscy ﬁnd almost every publically available economic data set. Some speciﬁc data sources of interest include • Bureau of Labor Statistics • US Census • Current Population Survey • Survey of Income and Program Participation • Panel Study of Income Dynamics • Federal Reserve System (Board of Governors and regional banks) • National Bureau of Economic Research • U.S. Bureau of Economic Analysis • CompuStat • International Financial Statistics Another good source of data is from authors of published empirical studies. Most journals in economics require authors of published papers to make their datasets generally available. For example, in its instructions for submission, Econometrica states: Econometrica has the policy that all empirical, experimental and simulation results must be replicable. Therefore, authors of accepted papers must submit data sets, programs, and information on empirical analysis, experiments and simulations that are needed for replication and some limited sensitivity analysis. The American Economic Review states: All data used in analysis must be made available to any researcher for purposes of replication. The Journal of Political Economy states: CHAPTER 1. INTRODUCTION 7 It is the policy of the Journal of Political Economy to publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication. If you are interested in using the data from a published paper, ﬁrhkthejournebsite, as many journals archive data and replication programs online. Second, check the website(s) of the paper’s author(s). Most academic economists maintain webpages, and some make available replication ﬁles complete with data and programs. If these investigations fail, email the author(s), politely requesting the data. You may need to be persistent. As a matter of professional etiquette, all authors absolutely have the obligation to make their data and programs available. Unfortunately, many fail to do so, and typically for poor reasons. The irony of the situation is that it is typically in the best interests of a scholar to make as much of their work (including all data and programs) freely available, as this only increases the likelihood of their work being cited and having an impact. Keep this in mind as you start your own empirical project. Remember that as part of your end product, you will need (and want) to provide all data and programs to the community of scholars. The greatest form of ﬂattery is to learn that another scholar has read your paper, wants to extend your work, or wants to use your empirical methods. In addition, public openness provides a healthy incentive for transparency and integrity in empirical analysis. 1.7 Econometric Software Economists use a variety of econometric, statistical, and programming software. Stata (www.stata.com) is a powerful statistical program with a broad set of pre-programmed econometric and statistical tools. It is quite popular among economists, and is continuously being updated with new methods. It is an excellent package for most econometric analysis, but is limited when you want to use new or less-common econometric methods which have not yet been programed. R (www.r-project.org), GAUSS (www.aptech.com), MATLAB (www.mathworks.com), and Ox- Metrics (www.oxmetrics.net) are high-level matrix programming languages with a wide variety of built-in statistical functions. Many econometric methods have been programed in these languages and are available on the web. The advantage of these packages is that you are in complete control of your analysis, and it is easier to program new methods than in Stata. Some disadvantages are that you have to do much of the programming yourself, programming complicated procedures takes signiﬁcant time, and programming errors are hard to prevent and diﬃcult to detect and eliminate. Of these languages, GAUSS used to be quite popular among econometricians, but currently MAT- LAB is more popular. A smaller but growing group of econometricians are enthusiastic fans of R, which of these languages is uniquely open-source, user-contributed, and best of all, completely free! For highly-intensive computational tasks, some economists write their programs in a standard programming language such as Fortran or C. This can lead to major gains in computational speed, at the cost of increased time in programming and debugging. As these diﬀerent packages have distinct advantages, many empirical economists end up using more than one package. As a student of econometrics, you will learn at least one of these packages, andprablymorethanone. 1.8 Reading the Manuscript I have endeavored to use a uniﬁed notation and nomenclature. The development of the material is cumulative, with later chapters building on the earlier ones. Never-the-less, every attempt has been made to make each chapter self-contained, so readers can pick and choose topics according to their interests. CHAPTER 1. INTRODUCTION 8 To fully understand econometric methods, it isnecessary to have a mathematical understanding of its mechanics, and this includes the mathematical proofs of the main results. Consequently, this text is self-contained, with nearly all results proved with full mathematical rigor. The mathematical development and proofs aim at brevity and conciseness (sometimes described as mathematical elegance), but also at pedagogy. To understand a mathematical proof, it is not suﬃcient to simply read the proof, you need to follow it, and re-create it for yourself. Never-the-less, many readers will not be interested in each mathematical detail, explanation, or proof. This is okay. To use a method it may not be necessary to understand the mathematical details. Accordingly I have placed the more technical mathematical proofs and details in chapter appendices. These appendices and other technical sections are marked with an asterisk (*). These sections can be skipped without any loss in exposition. CHAPTER 1. INTRODUCTION 9 1.9 Common Symbols scalar x vector X matrix R real line R Euclidean space E() mathematical expectation var () variance cov () covariance var (x) covariance matrix corr() correlation Pr probability −→ limit −→ convergence in probability −→ convergence in distribution plim→∞ probability limit N(01) standard normal distribution N(2) normal distribution with mean and variance 2 chi-square distribution with degrees of freedom I × identity matrix trA trace 0 A matrix transpose A −1 matrix inverse A 0 positive deﬁnite A ≥ 0 positive semi-de ﬁnite kak Euclidean norm kAk matrix (Frobinius or spectral) norm ≈ approximate equality = deﬁnitional equality ∼ is distributed as log natural logarithm Chapter 2 Conditional Expectation and Projection 2.1 Introduction The most commonly applied econometric tool isleast-squares estimation, also known as regres- sion. As we will see, least-squares is a tool to estimate an approximate conditional mean of one variable (thedependent variable) given another set of variables (the regressors, conditioning variable,or covariates). In this chapter we abstract from estimation, and focus on the probabilistic foundation of the conditional expectation model and its projection approximation. 2.2 The Distribution of Wages Suppose that we are interested in wage rates in the United States. Since wage rates vary across workers, we cannot describe wage rates by a single number. Instead, we can describe wages using a probability distribution. Formally, we view the wage of an individual worker as a random variable with the probability distribution ()=Pr( ≤ ) When we say that a person’s wage is random we mean that we do not know their wage before it is measured, and we treat observed wage rates as realizations from the distribution Treating un- observed wages as random variables and observed wages as realizations is a powerful mathematical abstraction which allows us to use the tools of mathematical probability. A useful thought experiment is to imagine dialing a telephone number selected at random, and then asking the person who responds to tell us their wage rate. (Assume for simplicity that all workers have equal access to telephones, and that the person who answers your call will respond honestly.) In this thought experiment, the wage of the person you have called is a single draw from the distribution of wages in the population. By making many such phone calls we can learn the distribution of the entire population. When a distribution function is diﬀerentiable we deﬁne the probability density function ()= () The density contains the same information as the distribution function, but the density is typically easier to visually interpret. 10 CHAPTER 2. CONDITIONAL EXPECTATION AND PROJECTION 11 Wage Density Wage Distribution 0.00.1 0.2 0.3 0.4 0.50.6 0.7 0.8 0.9 1.0 000007006054321 0 000000100203045607891 Dollars per Hour Dollars per Hour Figure 2.1: Wage Distribution and Density. All full-time U.S. workers In Figure 2.1 we display estimates 1 of the probability distribution function (on the left) and density function (on the right) of U.S. wage rates in 2009. We see that the density is peaked around $15, and most of the probability mass appears to lie between $10 and $40. These are ranges for typical wage rates in the U.S. population. Important measures of central tendency are the median and the mean. The median of a continuous 2 distribution is the unique solution to 1 ()= 2 The median U.S. wage ($19.23) is indicated in the left panel of Figure 2.1 by the arrow. The median is a robust measure of central tendency, but it is tricky to use for many calculations as it is not a linear operator. The expectation or mean of a random variable with density is Z ∞ = E()= () −∞ Here we have used the common and convenient convention of using the single character to denote a random variable, rather than the more cumbersome label .Ag nale ﬁnition of the mean is presented in Section 2.30. The mean U.S. wage ($23.90) is indicated in the right panel of Figure 2.1 by the arrow. We sometimes use the notation the notation E instead of E() when the variable whose expectation is being taken is clear from the context. There is no distinction in meaning. The mean is a convenient measure of central tendency because it is a linear operator and 4 arises naturally in many economic models. A disadvantage of the mean is that it is not robust especially in the presence of substantial skewness or thick tails, which are both features of the wage 1 The distribution and density are estimated nonparametr ically from the sample of 50,742 full-time non-military wage-earners reported in the March 2009 Current Populati on Survey. The wage rate is constructed as annual indi- vidual wage and salary earnings divided by hours worked. 2If is not continuous the deﬁnition is =inf { : () ≥ 1 } 3 2 The median is not sensitive to pertubations in the tails of the distribution. 4The mean is sensitive to pertubations in the tails of the distribution. CHAPTER 2. CONDITIONAL EXPECTATION AND PROJECTION 12 distribution as can be seen easily in the right panel of Figure 2.1. Another way of viewing this is that 64% of workers earn less that the mean wage of $23.90, suggesting that it is incorrect to describe the mean as a “typical” wage rate. Log Wage Density 1 2 3 4 5 6 Log Dollars per Hour Figure 2.2: Log Wage Density 5 In this context it is useful to transform the data by taking the natural logarithm . Figure 2.2 shows the density of log hourly wages log() for the same population, with its mean 2.95 drawn in with the arrow. The density of log wages is much less skewed and fat-tailed than the density of the level of wages, so its mean E(log()) = 295 6 is a much better (more robust) measure of central tendency of the distribution. For this reason, wage regressions typically use log wages as a dependent variable rather than the level of wages. Another useful way to summarize the probability distribution() is in terms of its quantiles. For any ∈ (01) the quantile of the continuous distribution is the real number which satisﬁes ( )= Theunlunin viewed as a function of is the inverse of the distribution function The most commonly used quantile is the median, that is, 05 = We sometimes refer to quantiles by the percentile representation of and in this case they are often called percentiles, e.g. the median is the 50 percentile. 2.3 Conditional Expectation We saw in Figure 2.2 the density of log wages. Is this distribution the same for all workers, or does the wage distribution vary across subpopulations To answer this question, we can compare wage distributions for diﬀerent groups — for example, men and women. The plot on the left in Figure 2.3 displays the densities of log wages for U.S. men and women with their means (3.05 and 2.81) indicated by the arrows. We can see that the two wage densities take similar shapes but the density for men is somewhat shifted to the right with a higher mean. 5Throughout the text, we will use log() or log to denote the natural logarithm of 6More precisely, the geometric mean exp(E(log)) = $1911 is a robust measure of central tendency. 7If is not continuous the deﬁnition is =inf { : () ≥ } CHAPTER 2. CONDITIONAL EXPECTATION AND PROJECTION 13 white men white wome black men black women Women Men Log Wage Density Log Wage Density 0 1 2 3 4 5 6 1 2 3 4 5 Log Dollars per Hour Log Dollars per Hour (a) Women and Men (b) By Sex and Race Figure 2.3: Log Wage Density by Sex and Race The values 3.05 and 2.81 are the mean log wages in the subpopulations of men and women workers. They are called the conditional means (or conditional expectations) of log wages given sex. We can write their speciﬁcv aluesas E(log() | = )=3 05 (2.1) E(log() | = )=2 81 (2.2) We call these means conditional as they are conditioning on a ﬁxed value of the variable sex. While you might not think of a person’s sex as a random variable, it is random from the viewpoint of econometric analysis. If you randomly select an individual, the sex of the individual is unknown and thus random. (In the population of U.S. workers, the probability that a worker is a woman happens to be 43%.) In observational data, it is most appropriate to view all measurements as random variables, and the means of subpopulations are then conditional means. As the two densities in Figure 2.3 appear similar, a hasty inference might be that there is not a meaningful diﬀerence between the wage distributions of men and women. Before jumping to this conclusion let us examine the diﬀerences in the distributions of Figure 2.3 more carefully. As we mentioned above, the primary diﬀerence between the two densities appears to be their means. This diﬀerence equals E(log() | = ) − E(log() | = )=3 05 − 281 =0 24 (2.3) Ad i ﬀerence in expected log wages of 0.24 implies an average 24% diﬀerence between the wages of men and women, which is quite substantial. (For an explanation of logarithmic and percentage diﬀerences see Section 2.4.) Consider further splitting the men and women subpopulations by race, dividing the population into whites, blacks, and other races. We display the log wage density functions of four of these groups on the right in Figure 2.3. Again we see that the primary diﬀerence between the four density functions is their central tendency. CHAPTER 2. CONDITIONAL EXPECTATION AND PROJECTION 14 men women white 3.07 2.82 black 2.86 2.73 other 3.03 2.86 Table 2.1: Mean Log Wages by Sex and Race Focusing on the means of these distributions, Table 2.1 reports the mean log wage for each of the six sub-populations. The entries in Table 2.1 are the conditional means oflog() given sex and race. For example E(log() | = = )=3 07 and E(log() | = = )=2 73 One beneﬁt of focusing on conditional means is that they reduce complicated distributions to a single summary measure, and thereby facilitate comparisons across groups. Because of this simplifying property, conditional means are the primary interest of regression analysis and are a major focus in econometrics. Table 2.1 allows us to easily calculate average wage diﬀerences between groups. For example, we can see that the wage gap between men and women continues after disaggregation by race, as the average gap between white men and white women is 25%, and that between black men and black women is 13%. We also can see that there is a race gap, as the average wages of blacks are substantially less than the other race categories. In particular, the average wage gap between white men and black men is 21%, and that between white women and black women is 9%. 2.4 Log Diﬀerences* A useful approximation for the natural logarithm for small is log(1 + ) ≈ (2.4) This can be derived from the inﬁnite series expansion of log(1 + ): 2 3 4 log(1 + )= − + − + ··· 2 3 4 = + ( ) The symbol ( ) means that the remainder is bounded by as → 0 for some ∞ Aplot of log(1 + ) and the linear approximation isonni ie.. ecnseht log(1 + ) and the linear approximation are very close for || ≤ 01, and reasonably close for || ≤ 02, but the diﬀerence increases with ||. Now, if ∗ is % greater than then =(1+ 100) Taking natural logarithms, ∗ log =log +log(1+ 100) or log − log =log(1+ 100) ≈ 100 where the approximation is (2.4). This shows that 100 multiplied by the diﬀerence in logari