Economic Statistics and Econonmetrics
Economic Statistics and Econonmetrics ECON 140
Popular in Course
Popular in Economcs
This 116 page Class Notes was uploaded by Dr. Janiya Bernier on Thursday October 22, 2015. The Class Notes belongs to ECON 140 at University of California - Berkeley taught by Staff in Fall. Since its upload, it has received 12 views. For similar materials see /class/226709/econ-140-university-of-california-berkeley in Economcs at University of California - Berkeley.
Reviews for Economic Statistics and Econonmetrics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/22/15
Introduction to Basic Probability Concepts Charlie Gibbons Economics 140 September 3 2009 Outline Probability I Basics I Joint Probabilities I Conditional Probabilities I Independence g Expectations I De nition and properties I Conditional Expectations Dispersion I Variance I Covariance and Correlation Preliminary definitions We begin With a random variable X If X takes a countable number of values7 it is discrete otherwise it is continuous EX The outcome of a die roll is a discrete random variable7 While an individual s income is a continuous random variable Preliminary definitions We begin with a random variable X If X takes a countable number of values it is discrete otherwise it is continuous EX The outcome of a die roll is a discrete random variable While an individual s income is a continuous random variable The simplest and most intuitive way to calculate a probability is the probability mass function The PMF is PrX m and is calculated for discrete random variables For rolling a die we have Prroll a 1 2 or 3 g Preliminary definitions De ne the cumulative distribution function C DF7 FXx7 as PrX EX The GDP in the die rolling example calculates the probability of rolling a number less than m FX3 PrX 3 PrX 1PrX 2PrX 3 Preliminary definitions The GDP has three important properties lirn 0 you can t get anything less than foo 4700 lirn 1 everything is less than 00 1 dFXz 2 0 the GDP is non decreasing Preliminary definitions We saw that the PMF ofia discrete random variableris PrX m thus theiCDF is 1mm PrXy Preliminary definitions We saW that the PMF of a discrete random variable is PrX a thus the GDP is I Fxm Z PrX y y7oo A continuous random variable has a sample space with an uncountable number of outcomes Here7 the CDF is de ned as Fxz fXydy Preliminary definitions We can de ne the probability density function PDF for a continuous variable as mm ne mm by the Fundamental Theorem of Calculus By our assumptions on the CDF7 the PDF is always non negative Joint Probabilities Previously we considered the distribution of a lone random variable Now weiwill consider the joint distribution of two random Variables Joint Probabilities The joint cumulatiue distribution function joint CDF7 FX3z7 y7 of the random variables X and Y is de ned by FXymy PrX z and Y y m y Z Z nyltS7 tdsdt 57oot7oo As with any CDF7 FXymy must equal 1 as x and y go to in nity Joint Probabilities Consider the roll of two dice and let X and Y be the outcomes on each die Then the 36 equally likely possibilities are my1 2 3 4 5 6 1 11 12 13 14 15 16 2 21 22 23 24 25 26 3 31 32 33 34 35 36 4 5 6 41 42 43 44 45 46 51 52 53 54 55 56 61 62 63 64 65 66 Joint Probabilities What is FXTYQ 3 7 Joint Probabilities What is FXy2 3 M1 2 3 4 5 6 12 22 32 42 52 62 Fxy2 3 13 23 33 43 53 63 14 24 34 44 54 64 36 15 25 35 45 55 65 16 26 36 46 56 66 Joint Probabilities Therjointsprobability mass function joint PIMF7 ny is fXyzy PrX z and Y y Joint Probabilities What is fxyas 5 Joint Probabiliti What is fxy6 es 75 2 3 4 5 6 12 22 32 42 52 62 7 13 23 33 43 53 63 7 14 24 34 44 54 64 1 fXY675 15 2 5 35 45 5 5 6 5 16 26 36 46 56 66 7 Joint Probabilities For continuous random variables7 joint probability density function joint PDF7 ny is de ned by Pkg1724 m I x ys7 t 18 th Joint Probabilities Joint CDF of independent normals nmm mqu M 11 Mo N am quotIIInc I I III Density I I w I hm no I 1 u p v quot quot414139it feki Joint Probabilities Joint PDF of independent normals Density Joint Probabilities Let s take the joint GDP and let y go to in nityiz39e7 take any possible value of Y We get FX FXY7 24 Where is the marginal cumulative distribution function marginal C DF Of X Joint Probabilities What is Joint Probabilities What is FX2 2 3 4 5 6 12 13 14 15 25 35 45 55 65 16 26 36 46 56 66 Joint Probabilities The marginal PMF of X is 15694 fX3Ylt7y 21790quot Joint Probabilities The marginal PMF of39X is 15694 ax724 The marginal PDF39Of X is fXm fXY397ydy Joint Probabilities The marginal PMF of X is fX Z fXY7y 21700 The marginal PDF of X is 00 mm mam dy 00 Note that7 while a marginal PDF PMF can be found from a joint PDF PMF7 the converse is not true there are an in nite number of joint PDFs PMFs that could be described by a given marginal PDF Conditional Probabilities Suppose that the value of X in a joint distribution is knowniwhat can we say about the distribution of Y given this knowledge This is called the conditional distribution on given X z The conditional PDF PMF of Y given X z fy Xin z is de ned by fXY 967 24 As for any PDF PMF7 over the support of Y7 the conditional PDF PMF must integrate sum to 1 It must also be non negative for all real values We divide by fXm because we ve changed the sample space from all values of X to just X m Conditional Probabilities What is fyX y 3X g 2 Conditional Probabilities What is fyXy 3X g 2 1 2 3 4 5 1 171 12 13 14 15 2 21 22 23 27439 275 276 2 1 finy3ng 2 E g Independence X and Y are independent ifand only if nym y FAMEy fXY739 y fX90fYZ Independence We also see that X and Y are independent if and only if fYXy X 17 fyy V1 6 X Independence We also see that X and Y are independent if and only if waQJlX 95 fYy V9 E X This implies that knowing X gives you no additional ability to predict Y7 an intuitive notion underlying independence Independence We showed in the two dice example that FX327 3 Which is equal to 2 3 1 Fx2 X Fy3 i 6 X 6 7 This is because the rolls of the two dice are intuitively independentithe result on one die has no bearing on that of the other Independence Imagine instead that X is the outcome on the rst die and Z is the sum of the outcomes on two dice Then we have 2 1 2 3 4 5 6 12 13 14 15 16 17 23 24 25 26 27 28 34 35 36 37 38 39 45 46 47 48 49 410 56 57 58 59 510 511 67 68 69 610 611 612 manipoawHa Independence AS we Would imagine7 the result of X in uences the value of Z 7 so they shouldr t be independent Let s prove it Independence What is FXZi2 r5 Independence What is FXZ2 5 2211 2 3 4 5 6 1 12 13 14 15 16 17 2 23 24 25 26 27 28 3 34 35 36 37 38 39 4 45 46 47 48 49 410 5 56 57 58 59 510 511 6 67 68 69 610 611 612 if 5 10 2 FxZ25 6X Fx2 gtltFz5 Expectations of Random Variables The expectation E u of a random Variable X is simply the average of the possible realizations of X 7 weighted by their probability Expectations of Random Variables The expectation7 EX E u of a random variable X is simply the average of the possible realizations of X 7 weighted by their probability For discrete random variables7 this can be written as EX Z PrX gm 2 mm 16X 16X Expectations of Random Variables The expectation EX E u of a random variable X is simply the average of the possible realizations of X weighted by their probability For discrete random variables this can be written as 1EX Z PrX gm 2 mm 16X 16X In our die example 1 1 1 1 1 1 1EX1x62xg3xg4xg5x86x 21 735 6 Expectations of Random Variables For continuous random variables Expectations of Functions of Random Variables The de nition of expectation can be generalized for functions of random variables7 Expectations of Functions of Random Variables The de nition of expectation can be generalized for functions of random variables7 We have the equations El9lt gtl Z f909 16X Elglt gtl gltzgtfltzgtdz 700 for discrete and continuous random variables respectively Properties of Expectations Expectations are linear operators rie Ea gX 1 6 a39EgX 1 0 Properties of Expectations Expectations are linear operators be Ea gX I hX c a b a Note that7 in general 7E gIEX Properties of Expectations In our die example7 1 s 1 1 1 1 1 2727 27 27 27 27 27 IEX 1 x62 x63 x64 x65 x66 x6 91 F1517 75 352 1225 em 75 MXW Conditional Expectations The expectation of a random variable Y conditional on or given X is de ned analogously to the preceding formulations7 but uses the conditional distribution fy Xin m rather than the unconditional fyy Conditional distributions are about changing the population that you are considering Conditional Expectations Conditional distributions are about Changing the population that you are considering For discrete random variables EMX 95 Z PIY le My Z fYXin 96 yEY yEY Conditional Expectations Conditional distributions are about Changing the population that you are considering For discrete random variables EMX 95 Z PIY le My Z fYXin 96 yEY yEY For continuous random variables EltYXzgt1 yfyixltyXzgtdy1 yfininXdz Conditional Expectations Recall that7 for independent random variables X and Y7 fYX32 X 90 fix24 and anQE Y y fXw Conditional Expectations Recall that7 for independent random variables X and Y7 fwde 90 M24 and fXymY y fx Hence7 L EmX EY l and 1EXY EX Variance The varianceiof a random Variable is a measure of its dispersion around its mean It isde ned as themsecond central moment of X varltXgt E M w Variance The variance of a random variable is a measure of its dispersion around its mean It is de ned as the second central moment of X VarltXgt E M e m Multiplying this out yields E X2 7 2MX p2 IE X2 7 Mam p2 1W2 7 HM Variance The variance of a random variable is a measure of its dispersion around its mean It is de ned as the second central moment of X VarltXgt E M e m Multiplying this out yields E X2 7 2aX a IE X2 7 Mam p2 1W2 7 HM The standard deviation7 a of a random variable is the square root of its variance 226 a VarX Variance The variance of a random variable is a measure of its dispersion around its mean It is de ned as the second central moment of X VarX E X 7 W2 Multiplying this out yields E X2 7 2aX a E X2 7 2plEz p2 1W2 7 HM The standard deviation7 a of a random variable is the square root of its variance 226 a VarX See that VaraX b aZVarX Covariance and Correlation The covariance ofkran dom variables X and Y is de ned as covX Y 2 my E X 7 EXX Y EyY 1300 7 MXW Covariance and Correlation The covariance of random variables X and Y is de ned as CovXYEUXY EKX EXOQHY EYOm EltXYILLXILLY We have VaraX bY aZVarX bZVarY 2aboovX Y Covariance and Correlation The covariance of random variables X and Y is de ned as CovXYEUXY EKX EXOQHY EYOlll EXY7axlay We have VaraX bY aZVarX bZVarY 2abCovX Y Note that covariance only measures the linear relationship between two random variables The covariance between two independent random variables is O Cova anceand Coweb on The Correlation of random variables X and Y is de ned as UXY UXUY39 PXY Covariance and Correlation The correlation of random variables X and Y is de ned as O39XY UXUY 39 PXY Correlation is a normalized version of covarianceihow big is the correlation relative to the variation in X and Y Both will have the same sign Regression Inference Charlie Gibbons Economics 140 September 237 2009 Outline Unbiased ness Confidence intervals Hypothesis testing Relating these concepts Unbiased ness Suppose that we are interested in a univariate regression yi 31 2i Ei Our estimates of the coef cients are A i COVYX mi VarX Yi AgX Unbiased ness These estimates are unbiasediie7 their expectation is equal to their true values Let s show it IE 92 E W mcmxn IECIOVX7 31 IECOVX7 32X IECOVX7 AH Var X H 7 Eamon 32m Meow Xm VarX zVa X 32 E lt3 ohm Eo 7mm Y 52X 51 52X 32X 51 Unbiased ness Does unbiasedness mean that Ur asedness Does unbiasedness mean that m N0 Unbiasedness only holds in expectation W Unbiased ness What if you are trying to estimate 31 and 32 with only three observations each with different x values Are your estimates still unbiased Unbiased ness What if you are trying to estimate 31 and 32 with only three observations each with different x values Are your estimates still unbiased Yes The distribution of the estimates that you get has a mean of the true value7 but the variance of the estimate around the true value is very wide and decreases as you add more observations Note that the variance of 32 is Var Y 7 N 7 2VarX7 which goes down as N goes up Confidence intervals We d like to nd a range of values 5 to bg such that the probability of g lying in that range is equal to 1 7 a This is known as a 1001 7 a con dence interval In math7 we want Prb lt gltbg1ia Confidence intervals Let s subtract our value of 32 from all sides Prltb i zlt gi gltbgi ggt 1704 Now7 divide all sides by the standard deviation of 32 5232 lt Pr 732 lt 732 1704 Confidence intervals Lastly7 multiply everything by 717this changes the direction of the inequalities 927er 3252 327122 Pr 1704 Confidence intervals Our estimate 32 is a random variable whose mean is 62 because it s unbiased The expression 32 i 32 Var takes a random variable subtracts its mean and divides by its standard deviation This is a standardized random variable If we knew the standard deviation of our estimate then this standardized variable would be distributed N0 1 Since we estimate the standard deviation this estimate is distributed tN2 where N is the number of observations and N 7 2 is called the degrees of freedom Confidence intervals Now we have 32 712 gt t 32 7 big 7 7 N2 gt 7 1 oz 4 Var 4 Var This is the probability that our estimate is between two values This probability is equal to 1 minus the probability of being above the high value and the probability of being below the low value Pr A ibL A i bU liPr 32 2 lt tN2 Pr tN2 lt 32 2 ltBzgt lt8 1704 Confidence intervals Let the probability of being in each tail be the same Then we have A ibU 172xPr tN2ltBZ 2 1704 Varlt 2gt A ibU Pr tN2lt 2 2 g ltBzgt Confidence intervals 7 U Typically 04 005 To gure out What A has to be for V3I 2 this to hold we give this command to Stata di invttaildf 0025 or di invnormal0025 Where df is the degrees of freedom here N 7 2 and 0 025 a 2 2 We are looking for the lower tail boundary but Stata gives the upper tail boundary for the t This is ne because the tand Normal distributions are symmetric and we can just take the negative of this value Confidence intervals S0 for us7 with N 1007 04 P t 7198 0025 7 r N72 lt 2 A 7 bU 32 2 Var 32gt 521142 1984Var 32gt 7198 Confidence intervals Similarly7 Pr tum gt 198 0025 g A 7 L 5 z 2 198 Var lt32 12532 7 1984 Var 32gt Confidence intervals So our 95 con dence interval is 82 7 198 92 198 Confidence intervals 198 is called the critical value for a tg N72 distribution with 2 a 005 and N 100 The critical value is de ned as Pr 5 gt Kelvin Pr 5 lt ame These probabilities are equal because the t and Normal distributions are symmetric Here we care about our estimate being too high77 the rst probabilityithe upper tail and being too low77 the second probabilityithe lower tail Confidence intervals We can write the general expression for con dence intervals as 32 tam V 32 32 ism Var all This means that7 if we repeated our estimation on many samples7 then the true parameter would lie in this region 95 of the time This does not mean that there is a 95 chance that 32 is really in this range in this particular case This is because g is not random so it is either in this interval or notithe probability is either exactly 0 or exactly 1 Hypothesis testing Suppose that you have a univariate regression and estimate the coef cients and their variances Now suppose that you have a guess of what the parameters are equal toia hypothesis For example H0 2 g 1 H1 3 B2 7 5 H0 is known as the null hypothesis and H1 is the alternative hypothesis Hypothesis testing Four things could happen H0 is true H0 is false Correct Type II error Type I error Correct Do not reject H0 Reject H0 We want to be wrong as infrequently as possible We pick 04 PrType 1 error to be the probability that we reject our null hypothesis even though it is true Hypothesis testing To do hypothesis testing7 we Assume that our H0 is true See hOW unlikely it is that we would get our estimate BA assuming that H0 is true Hypothesis testing We have to be Careful with our notation here Since our estimate g is based upon random varialoles7 it is a random variable Our estimator is unloiased7 so its mean is g Under the null hypothesis7 g I Remember that g and thus I are not random Hypothesis testing Let 32 be the possible values that our estimator could take on Hence7 192 N N I Var B2 is a random variable7 unlike g 32 is a realization from our Bg distribution Hypothesis testing Let s calculate the probability of getting an estimate further away from 1 than our realized estimate 32 This is the probability of being in the tails of the distribution of the estimator B2 PrltBrbgt gibD Prlt1 27blt7 my Hypothesis testing Let s divide by the standard deviation of B2 which we estimate using the standard error of g Pr Bzib gt 327b 4 Var 4 Var Bria 324quot Pr Tr lt 7 7 Hypothesis testing 527 is a standardized random variable It is a random ar 2 variable that is distributed tN2 So now we have 62 e bl 32 e bl Pr tN2 gt Pr tN2 lt Var Var This a distribution that we can calculate tN2 and a value for this distribution M This value is called a test V3Il2 statistic Hypothesis testing 32 i 5 i Var be our test statistic This gives us Pr 25 gt 25 PrtN2 lt 725 2 gtlt Pr mpg lt 4 2197 the probability of getting a value of Jjjr igs or more extreme The simpli cation happens due to the symmetry of the t or Normal distribution Hypothesis testing To calculate this probability7 we can ask Stata using the following command di 2 1 ttaildf 77 or di 2 normal7t We need the 1777 in the t distribution because Stata only gives upper tail probabilities Hypothesis testing We call the probability of being further from the null hypothesis than the estimate in either direction the p ualue In the preceding slide7 the p Value is p Hypothesis testing We call the probability of being further from the null hypothesis than the estimate in either direction the p ualue In the preceding slide7 the p Value is p When do we reject the null hypothesis Under the pvalue approach7 we reject if p lt 04 Under the test statistic approach7 we reject if 75 gt ta N72 2 Relating these concepts Con dence intervals and hypothesis tests are very similar A con dence interval asks7 given a tail probability Oz and the assumption that 32 32 What critical values produce this tail probability A hypothesis test asks7 given critical values and the assumption that g b7 what is the probability of begin in the tails Relating these concepts A con dence interval contains all the values for null hypotheses that cannot be rejected at the 04 level A hypothesis that is rejected at the 04 level is outside of the 1001 7 a con dence interval and a hypothesis that cannot be rejected at that level is contained in that con dence interval Thus7 it is said that a hypothesis test is an inverted con dence interval and vice versa Introduction to Univariate Regression Charlie Gibbons Economics 140 September 237 2009 Outline Correlation Regression I Basic examples I Least squares Correlation Covariance is a measure of linear association between two variables CovX Y E O39Xy E X 7 EXX Y 7 EyY EXY 7 XY Correlation Correlation normalizes the covariance by dividing by the standard deviations of X and Y O39XY PXY O39XO39Y This makes correlation unit free and bounds it between 1 and 1 Higher absolute values imply a stronger linear relationship Correlation The following pictures show scatterplots of X Y1 Y2 and Y3 against X Each illustrates a different degree of correlation All variables have been standardizedithey all have mean 0 and standard deviation of 1 we ll come back to this topic later in the course Correlation The following pictures show scatterplots of X Y1 Y2 and Y3 against X Each illustrates a different degree of correlation All variables have been standardizedithey all have mean 0 and standard deviation of 1 we ll come back to this topic later in the course Check yourself Given the correlations on the next slide what are the covariances Correlation Correlation J Carrelaticn z 03915 m D 7 0 D am if f mm v 7 v 7 DDDmaausza Tiw W T 3 2 J 2 3 N7 8 N7 ne H o a DDUBD D 7 Be a x a D ad X a 32ch an 7 7 3 2 2 u 1 2 3 2 u 2 3 x Basic examples Regression assumes the true relationship Yz39 l l ZXi l 6i and actually estimates Yz39 31 32Xi6i The Greek letter variables are not observed and the two 3 parameters are to be estimated 31 and 31 are the intercept parameters and g and 3 2 are the slope or marginal effects parameters Basic examples Another name for regression is ordinary least squares OLS It has this name because regression is solved by minimizing the sum of the squares of the residuals The residual is the difference between the observed value of Y and that predicted using the estimates 31 and 62 5iYiYiYiBi Ain Basic examples Hence7 OLS solves min Yi 7 b1 bZXmZ 1717172 1 The sum of squared residuals or errors SSESSR is simply the sum of squared residuals at the estimated parameter values 2 7 31 3290 Basic examples For the univariate case7 we get 1 Y7 2X 3 1 me YgtltXi 72 CovltX7Ygt Ly 2 1 XFX2 VarX pXYaX39 Looking at the value of 31 note that the regression line goes through the point X7 Y Basic examples Check yourself Returning to our correlation pictures7 What are the slopes of the regression lines on those data Basic examples Slope 1 5mm we 43 AZ 41 a Least squares Regression is a mathematical procedure that minimizes the squared differences between the actual and predicted values of Y 226 the squared vertical distances from the line to the points Three proposed regression lines are given on the next slideiwhich is the true regression Least squares Lme 1 Lme 2 Lme 3 Least squares Assuming a regression line of the form Yz39 31 in 513 Line 1 is a regression of Y on X that minimizes the sum of the squared vertical distances min YF b1b2Xl2 171172 i Least squares Assuming a regression line of the form Yz39 31 in 513 Line 5 is a regression of X on Y that minimizes the sum of the squared horizontal distances ebl K 2 X7 7 7 13112 i lt1 52 VJ Least squares Assuming a regression line of the form Yz39 31 in 513 Line 2 is a principal components line that minimizes the sum of the sums of the squared vertical and horizontal distances 2 21 n 2 K b1b2Xz ltXl b2 b2gtgt Least squares So which is right Least squares So which is right It depends Usually we are trying to predict the value of Y for some given values of X For example What is a person s expected income if he is an Hispanic high school graduate Then a regression of Y income on X sex race and education gives us the closest prediction of his income