INTRO TO ECONOMETRIC
INTRO TO ECONOMETRIC ECO 4421
Popular in Course
Popular in Economcs
This 21 page Class Notes was uploaded by Elmore Funk on Thursday September 17, 2015. The Class Notes belongs to ECO 4421 at Florida State University taught by Patrick Mason in Fall. Since its upload, it has received 87 views. For similar materials see /class/205439/eco-4421-florida-state-university in Economcs at Florida State University.
Reviews for INTRO TO ECONOMETRIC
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/17/15
1 create problem set on pValues 2 create problem set on confidence intervals 3 create problem set on nding slopes 3 nd regressions that illustrate problems that bedeVil regression analysis that is issues discussed in chapter 9 Professor Patrick L Mason Lecture Notes Introduction to Econometrics Florida State University Spring 20 l 2 January 5 2012 NB These notes are reVised a couple of times per week There are typos we ll catch them in class Mostly the notes exist to remind me of things to cover during lectures I ECONOMETRICS HOW TO DO IT AND WHAT S IT ALL ABOUT Important terms and concgts 0 De nition of data statistics econometrics 0 Relationship between economic theory and econometric analysis 0 Types of data crosssectional time series panel longitudinal Experimental control for random assignment observational historical survey records Data is information related to people place things or any object of interest A statistic is a mathematical measurement or formula based on the characteristics of data Literally speaking is 39 Speci cally econometrics is concerned with the quantitative measurement and analysis of social economic and business data Econometric analysis may be used for quantitative description hypothesis testing of economic and social theories and forecasting future economic and social outcomes Theoretical analyses provide casual explanations of the relationships between particular economic variables Econometric analysis uncovers the quantitative significance of correlations suggested by economic theories A Start with a theory Consider the relationship between education and income We use economic theory to tell us three things 1 whether there is a relationship between education and income 2 whether the relationship is positive or negative and 3 whether changes in education cause changes in income or whether chances in income cause chances in education Theory Higher individual education produces higher individual market skill Increases in individual market skill produce increases in worker productivity More productive workers will receive higher pay 1 Individual Skill gIndividual Education general relationship skilli a beducationi linear relationship assumption where there are i l n individuals and b gt 0 This says that increases in an individual s level of education produces higher level of marketable individual skill 2 Individual Productivity hindividual skill general relationship productivityi c d skilli linear relationship assumption where there are i l n individuals and d gt 0 This says that an increase in individual marketable skill will lead to greater value of individual output during a given timeperiod 3 Individual Earnings fIndividual Productivity general relationship earningsi e P productivityi linear relationship assumption where there are i l n individuals and fgt 0 This says that increases in individual productivity leads to higher individual earnings Taken together 1 7 3 imply that Individual Earnings fIndividnal Education earningsi e t kproductivityi e fc dskilli e f c d kskilli e P c f kd a beducationi e P c P d ka f kdbeducationi eamingsi 30 31educati0ni where g e fc t kd ka 51 f kd kb gt 0 and is the expected increase in individual earnings associated with an additional year of individual education Statistical Implication There is a positive 39 quot 39 391 between 39 J39 39J 39 d t39 and individual earnings we expect that individuals with higher levels of education will have higher individual earnings earningsi B0 31educationi unobserved factors i Econometric analysis can reveal whether this theory is consistent with the data that is whether 51 gt 0 or B1lt 0 or B1 0 ii Econometric analysis can reveal precisely the strength of the relationship between education and individual eamings that is whether 31 is large or small iii The theory also predicts an identical relationship between education and income regardless of race gender sexual preference or other noneconomic factors that is 31 is identical regardless of race gender etc iv There are unobserved factors that also effect individual earnings B Do a review of the relevant literature There s nothing new under the sun Go to Web of Science Econlit wwwnberorg or other electronic database and do a literature search by topic author key word etc C Establish statistical model and hypotheses eamingsi 50 51educationi si where si represents unobserved in uences on individuals earnings that is everything that may affect individuals earnings other than individual years of education Hypothesis 1 BO 2 0 nobody receives negative labor earnings Hypothesis 2 Bl gt 0 there is a positive relationship between individual education and individual earnings D Select an appropriate data set Decennial Census Crosssectional data Current Population Survey Monthly Crosssection l962present National Longitudinal Survey Panel data Panel Study on Income Dynamics Panel data 1968 7 presnt General Social Survey Crosssectional 19722006 amp Longitudinal 2006present National Longitudinal Study of Adolescent Health Add Health Longitudinal l994present Panel Study of American Religion amp Ethnicity PSARE Longitudinal 2006present University of Michigan Consumer Sentiment Time Series l978present Civilian Unemployment rate Time Series l948present Many of cross sectional and longitudinal datasets are available at University of Michigan s Inter University Consortium for Political and Social Research httpwwwicpsrumicheduicpsrwebICPSRindexjsp The time series data are available at Federal Reserve Bank of St Louis web site httpresearchstlouisfedorg Observational data Time Series Cross Sectional Panel or Longitudinal Experimental data Laboratory experiment Policy Experiment Negative Income Tax Tennessee Star Milwaukee Voucher Natural Experiment E Select Variables and examine descriptive statistics F Perform Regression Analysis eamingsi 240 24educationi How do we interpret this equation The equation says that typically persons with no education can be expected to 240 per week Typically also persons with 8 years of education can expect to 432 per week Persons with 12 14 and 16 years of education can expect to earn 528 576 and 624 per week respectively Object of econometric analysis estimation How do we determine that 50 240 Bl 24 Object of econometric analysis hypothesis testing How do we really know that 240 and 24 are not just random uctuation In other words how do we know that they are really different from 0 G Revise theory as needed Is it possible to that Education fIncome Yes That is a prediction from radical economics In particular it may be the case that Young Adult Education fParental Income What type of data would you need to test this theoretical prediction II ELEMENTARY STATISTICS AREVIEW Statistics descriptive V inferential Statistics may be descriptive or inferential Descriptive statistics mean just what it says they are Statistics that describe or give a numerical picture of a body of data Interential statistics allow us to draw inferences that is make extremely well educated guesses about what is true for an entire population based on the analysis of a sample We ll eventually get to inferential Statistics but we ll Start with descriptive Statistics Three or four or ve most important descriptive Statistics Average Variance and Standard Deviation and Covariance and Correlation A A VERA GE An average describes atypical observation it gives a description of a randomly selected member of a population By randomly selected we mean that each member of the population has an equal chance of being selected The mean mode and median are different averages Usually the mean is called the average but really all three are averages In fact there are other ways of calculating the average but these three are most common a The M is Simply the observation that occurs most often b The median is the observation that is exactly in the middle when we have ordered the data from lowest to highest c The mean is the equal to the sum of all observations divided by the total number of observations We use the symbol uto represent the population mean Why Habit u is pronounced mu It s the Greek letter for m as in mean The mean is also referred to as the expected value u mean or average of the population The mean gives a quick summary or picture of the data The mean tells us what is true for the typical observation For example consider the Kamara family of ve Let s assign some ages to the Kamara population Kwabena 55 Alice 50 Ngina 20 Kamau 10 Naima 7 So our population is 55 50 20 10 7 H 55500107 284 Median 20 The Mode is unde ned since all the outcomes have the same value In this case the mean mode and median provides different answers for age of the typical person in the Kamara family B Variance and Standard Deviation The variance or standard deviation tells you whether the data is spread out that is whether it s disperse Or to speak colloquially the variance or standard deviation provides a measure of diversity Often we use the standard deviation as a measure of inequality So the variance and standard deviation provide additional details on the picture of the data We use the symbol 62 to represent the population variance and cto represent population S standard deviation Why Habit 6 is pronounced sigma It s the Greek letter for as in gtandard deviation 62 population or true variance E the mean value of the squared deviations from the mean 6 population or true standard deviation E square root of the variance Notice that nobody is this population is 284 years of age even though it s the mean A mean of 284 says that if we randomly select a person from the Kamara household then over a large number of times we expect that the person is likely to be about 28 years of age That s useful because it implies that some people in the Kamara home are over 28 and some are under 28 But the mean has a limited usefulness since Note that nobody is the Kamara home is really close to 28 years of age Statistically speaking there is a large amount of dispersion about the mean that is few persons in the Kamara household are actually close to the age of 28 That s why we also need the variance The variance helps focus the picture that we ve observed with the mean Deviations from the mean ie how much does each person differ from the average Age of individual 7 mean age of family u 55 7 284 266 Kwabena is 286 years above the mean 50 7 284 216 Alice is 216 years above the mean 20 7 284 84 Ngina is 84 years below the mean 10 7 284 l84 Kamau is 184 years below the mean 7 7 284 214 Naima is 214 years below the mean Nobody is close to the mean There is a lot of dispersion The mean does a badjob of telling us what s true for the age of randomly selected individuals from this sample But what is the total amount of dispersion Try adding up the deviations from the mean for the five family members No good the deviations add up to 0 So we can t tell how much dispersion there is how out of focus our picture is just by adding up the deviations from the mean They will always add up to 0 Another solution Square the deviations add them up and divide by the number of family members 62 55 2842 50 2842 20 2842 10 2842 7 2842 5 2662 2162 842 1842 2142 5 707564665670563385645796 204124 5 40824 years of age The total dispersion is 40824 years of age The standard deviation is a measure of the average deviation from the mean It tells us how clustered together the observations are around the mean 6 Stande deviation is the square root of the variance 40824 2020 years of age Note if 6 0 or 62 0 it means that everybody in the Kamara household is exactly the same age There s no deviation from the mean So the higher the variance or standard deviation the greater the dispersion or the average squared deviation from the mean Coef cient of Variation Both the variance and standard deviation depends on the unit of analysis Suppose we were calculating the standard deviation of annual income instead of age The standard deviation of annual income will be higher than the standard deviation of age because annual income is usually measured in 1000s while age is measured in years How do we know if the standard deviation of annual income is more disperse than the standard deviation We don t because the units of analysis are different So we need a basis of comparison S tan dardDeviatz39on 039 Coef clent of var1atlon E 7 Mean u The coefficient of variation is one way of determining whether there is a large or small amount of dispersion especially when we are comparing things that are measured in different units Alternative de nition of coefficient of variation g 100 u For the Karnara family the coefficient of variation 202284 07113 hence the standard deviation is about 71 the size of the mean Suppose the Kamara s also have income Kwabena earns 250000 per year Alice earns 350000 Ngina earns 75000 while Kamau and Naima earn 0 each The mean annual income is 92000 and the standard deviation of income is 10491613 hence the coefficient of variation for income is 114 C Covariance and correlation The mean and standard deviation are descriptions of individual categories of data Sometimes we are interested in describing the relationship between two or more categories of data Consider for example age and income If income increases as age increases or vice versa we say there is a positive covariance or correlation If income decreases as age increases or viceversa we say there is a negative covariance or correlation between age and increase If income does not change as age increases or decreases then we say there is no relationship between age and income that is age and income are independent Any two categories of data X age and Y income are statistically independent if knowledge of the value Y provides no information on the value of X Or if knowledge of the value of X provides no information on the value of Y cxy 5 population covariance of X and Y cm gt 0 high values ofX are associated with high values of Y low values ofX are associated with low values of Y X and Y move up and down together am lt 0 high values of X are associated with low values of Y low values of X are associated with high values of Y When X goes up down Y moves down up cXY 0 No linear relationship between X and Y If X and Y are independent there is no linear or nonlinear relationship between X and Y Hence statistical independence implies GXY 0 But the opposite is not true that is GXY 0 does not imply statistical independence it only implies that there is no linear relationship Covageincome Gageyincome mean value of the product of the deviations 5528425000092000 5028413500092000 202847500092000 10284092000 72840920005 1787200 Cov a 6 income 1787200 Correlation coef cient pageyinmme g 07541 aweWW 2259 10491663 1 S pXY S 1 Deviation Deviation from from Product of Deviations Person Age mean Income mean ageEaxincomeEi Kwabena 55 266 250000 158000 4202800 Alice 50 216 135000 43000 928800 Ngina 20 84 75000 17000 142800 Kamau 10 184 0 92000 1692800 Naima 7 214 0 92000 1968800 284 92000 1787200 standard deviation correlation coefficient 2259 10491663 07541 coefficient of variation 080 114 III PROBABILITY THEORY A Introduction A random variable is a data category that takes on alternative values each with a probability S 1 A constant is a data category with a fixed value The mutually exclusive alternative values results are called outcomes An event is a set of one or more outcomes The probability of an outcome refers to the fraction of times that an outcome occurs in the long run that is the fraction of times the outcome occurs when we have observed the random variable many times The sample space lists all possible outcomes An event is a collection of outcomes An experiment is the activity that generates the outcomes for a random variable Random variables can be described by probability distributions which lists all possible outcomes and the probability that each outcome will occur Example the roll of the dice is a random variable Example Let X student height The height of each person in the class is a random variable The outcomes or alternative values include the speci c height of each person in the class Suppose there are 40 students in the class and we observe that 30 students are less than 6 feet tall and 10 students are at least 6 feet tall If so we can say that there is a probability 075 3040 or 75 percent chance that a student selected at random will be less than 6 feet tall Probability of an outcome The probability of an outcome is the proportion of times it occurs when an experiment is repeated a large number of times Some random variables are discrete and some are continuous A continuous random variable may take on any value on the real number line A discrete random variable may take on only a speci c number of real values Example of a continuous random variable The temperature tomorrow may be 851 85 853 91329 92 etc Another example of a continuous random variable The annual income of American households be 1000032 stock market losses 083 200006 1500078 16640 10023792 10094208851 Bill Gates family Example of a discrete random variable Coin ip HEADS l and TAILS 0 There are only two possible outcomes Another Example of a discrete random variable A pair of dice has only 11 possible outcomes 2 3 4 5 6 7 8 9 10 ll 12 Suppose we roll a pair of dice 360 times We record the outcome from each roll that is we record the results of each experiment We then notice that the number 9 occurs 41 times From this experiment the probability of observing a 9 that is ProbabiliZJRoll ofdz39ce 9 PRoll ofdz39ce 9 is 41360 01139 or 1139 Generically speaking ProbX x8 px8 where in our example X Roll of dice x8 9 and px8 01139 where x8 is the eighth possible outcome from rolling a pair of dice X Roll of Dice The Experiment I Possible outcomes I x1 I x I X3 I X4 I X5 I x5 I X7 I x8 I X9 I x10 I x11 I IValueofoutcome 23456789I101112 Suppose X1 xz xn are mutually exclusive and collectively exhaustive outcomes Mutually exclusive for two possible outcomes i and j i and j cannot occur at the same time Collectively exhaustive x1 xz xn is the list of all possible outcomes i ProbX x1 ProbX xz ProbX xn pX1 PX2 pXn 1 The sum of probabilities for all events equals 1 ii 0 S pxi S 1 Each probability must take on a value between 0 and l C p x l l 7pxi The probability that X xi does not occur p x l equals 1 7 pxi B Probability distribution of a discrete random variable Probability density function each outcome and the associated probability of observing that outcome Cumulative probability distribution the probability of observing an outcome less than or equal to a speci c value ProbX S x Example Bemulli distribution Bernoulli Distribution is a discrete distribution Experiment has two outcomes P probability of success 1 7 P probability of failure trials are independent and P is equal for all trials Probability distribution Outcome Probability Heads 1 050 Tails 0 050 For a binary Bernoulli distribution Population Mean E u p Population Variance E o2 pl 7 p Application for coin ip 7050 62050050025 6025 14 Examgles of PDF and CDF Suppose a gambler has a pair of dice 7 one blue di and one red di Each di has six numbers of it ranging from 1 to 6 Each roll of the dice provides an outcome The sample space list of all possible outcomes is 2 3 4 5 6 7 8 9 10 ll 12 There are 36 possible outcomes of a roll of a fair pair of dice Outcome Red Di Blue Di Outcome Red Di Blue Di 2 1 1 7 4 3 3 1 2 7 5 2 3 2 1 7 6 1 4 1 3 8 2 6 4 2 2 8 3 5 4 3 1 8 4 4 5 1 4 8 5 3 5 2 3 8 6 2 5 3 2 9 3 6 5 4 1 9 4 5 6 1 5 9 5 4 6 2 4 9 6 3 6 3 3 10 4 6 6 4 2 10 5 5 6 5 1 10 6 4 7 1 6 11 5 6 7 2 5 11 6 5 7 3 4 12 6 6 We can use this information to establish probability distribution function PDF and a cumulative distribution function CDF Experiment Roll of fair pair of dice X Outcome value PDF CDF X1 2 136 00278 136 00278 X2 3 236 00556 336 00833 X3 4 336 00833 636 01667 X4 5 436 01111 1036 02778 X5 6 536 01389 1536 04167 X6 7 636 01667 2136 05833 X7 8 536 01389 2636 07222 X3 9 436 01111 3036 08333 X9 10 336 00833 3336 09167 X10 11 236 00556 3536 09722 X11 12 136 00278 1 The nal number in the CDF is 1 there is a 100 percent chance of observing some number between 2 and 12 when dice are rolled When dice are rolled we must observe some number between 2 and 12 because they are the only possible number The cumulative frequency in each row adds up the cumulative frequency from the previous row and the individual frequency from the current row For example at the row for X5 we see that the number 7 has a frequency of0 1667 that is there is a 1667 percent change of observing 7 The individual frequency is also the probability So rounding off to two decimal places there is a probability 017 of observing the number 7 when the dice are rolled ProbX X5 ProbRolling 7 ProbX 7 636 017 Ifthe dice are rolled 1000 times then 167 times we will observe the number 7 But the CDF shows column shows that there is a probability of 2136 05833 chance we will observe a number less than or equal to 7 This means that 58 percent of the time we will observe the number 7 or some lower value Or 42 percent of the time we will observe a number strictly higher than 7 specifically 8 9 10 11 or 12 ProbX S 7 2136 05833 ProbX gt 7 1536 04167 What s the mode By definition it s the number with the highest frequency So the mode is 7 Column 3 shows that the highest individual frequency is 636 What s the median By definition it s the event that is exactly in the middle half of the observations are higher and half of the observations are lower So X5 7 is the median outcome There is a 1536 chance of observing outcomes below 7 and there is a 1536 chance of observing outcomes above 7 The median is a positional average So the median is the midpoints of a distribution It s the value of X such that the PrX lt X S 12 M1 PrX gt X Z 12 PrX lt X S 12 is the probability that the experiment has an event that is less than or equal to the midpoint of the distribution PrX gt x Z 12 is the probability that the experiment has an event that is greater the midpoint of the distribution What s the mean 23456789101112 H 11 2i 3i4i 5i6i7i8i9i10i11i12i l 11 11 11 11 ll 11 11 11 11 ll 77 i 7 right answer but wrong procedure p2i3l4i 5i6i7 8i9i1oi11l12i 3 36 36 36 36 36 36 36 36 36 36 7 right answer and right procedure Why did the wrong procedure yield the right answer Questions Try these at home It s safe a The gambler losses if she rolls 2 3 or 12 on the initial roll What s the probability of losing b The gambler wins if she rolls 7 or 11 on the initial roll What s the probability of winning c The gambler assumes a point if she neither wins nor losses on the rst roll So only 4 5 6 8 9 and 10 are points that may be assumed by the gambler If the gambler makes the point that is rolls the same number again without rst rolling 7 he wins the bet What the probability the gambler assumes a point Which two points should the gambler bet the most money on Which two points should the gambler bet the least money on Consider the Seminoles Football Team A reasonable person might ask How many games can we expect the Noles to win this year Probability Probability Using the cumulative probability distribution PrVictories S 9 ProbabilityVictories gt 9 Pr6 lt Victories S 9 PrVictories S 9 7 ProbVictories S 6 095 7 022 073 Okay Got it Moving on C Probability distribution of a continuous random variable Normal Distribution is a continuous distribution 1 1Cill 2 777 fX JLZ e Z a General Normal Probability Distribution Function 27139039 fX J e77 Standard Normal Probability Distribution Function p 0 o 1 7r fX is a bell shaped curve that is symmetric around u e is sometimes called Euler s number or Napier s constant It is the base ofnatural circumferenceofcircle logarithms It Both e anal Irare irrational numbers e diameterofcircle 2 718281828459045 It 314159265358979323846Rounding offto 4 decimal places e 2 7182 7 31416 Graph of Cumulative Standard Normal Distribution bell shaped curve IZ FX Io Lei7 l probabilities must sum to l foo 27139 r The symbol is a S It represents a summation sign for continuous random variables The standard normal probability distribution function takes on every conceivable value from negative in nity 00 to positive 00 in nity 2 72575 L ProbX S 2575 I e 2 00050 and ProbX gt 2575 09950 foo 27 x2 2 7196 ProbX s 196 I e 00250 and ProbX gt 496 09750 foo 7139 r 71645 i ProbX g 4645 I ie 2 00500 and ProbX gt 4645 09500 6 m 0 1 2 ProbX s 0 I 57 050 and ProbX gt 0 00000 foo 27 Z Z 2575 L 2575 L ProbX S 2575 I L6 2 09950 and ProbX gt 2575 I L8 2 00050 566 627 no 7139 z 196 7 ProbX s 196 I Le 2 09750 and ProbX gt 196 00250 5 m 1645 1 ii ProbXS1645 I Te 2 inc 7139 r These probabilities are already calculated See Table 1 Note that the symmetry of the 09500 and ProbX gt 1645 00500 probabilities D Expected values and descriptive statistics Let Y represent an experiment It is a random variable that is a variable that takes on multiple values In this case the experiment is rolling the dice Why do we use Y to represent an
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'