### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# ECON 4340 ECON 4340

GC&SU

GPA 3.67

### View Full Document

## 66

## 0

## Popular in Course

## Popular in Economcs

This 149 page Class Notes was uploaded by Destinee Auer on Monday October 12, 2015. The Class Notes belongs to ECON 4340 at Georgia College & State University taught by William Farr in Fall. Since its upload, it has received 66 views. For similar materials see /class/221943/econ-4340-georgia-college-state-university in Economcs at Georgia College & State University.

## Similar to ECON 4340 at GC&SU

## Popular in Economcs

## Reviews for ECON 4340

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/12/15

Chapter 6 Using Econometrics A Practical Guide Specmca on Choo ngthe Independent vanaMes AH Studenmund AddisonWesley sau murmur PEARSON Copyright 2011 Pearson Addison Wesley Slides by NielS39HUQO Ellquot19h A All rights reserved Washington and Lee UnIverSIty Specifying an Econometric Equation and Specification Error 2 Before any equation can be estimated it must be completely specified Specifying an econometric equation consists of three parts namely choosing the correct independent variables functional form form ofthe stochastic errorterm Again this is part of the first classical assumption from Chapter 4 A specification error results when one of these choices is made Incorrectly This chapter will deal with the first of these choices the two other choices will be discussed in subsequent chapters 2011 Pearson AddisonWesley All rights reserved 6391 6 Omitted Variables 2 Two reasons why an important explanatory variable might have been left out we forgot it is not available in the dataset we are examining Either way this may lead to omitted variable bias or more generally specification bias The reason for this is that when a variable is not included it cannot be held constant Omitting a relevant variable usually is evidence that the entire equation is a suspect because of the likely bias of the coefficients 62 2011 Pearson AddisonWesley All rights reserved The Consequences of an Omitted Variable Suppose the true regression model is Yi Bu leli Bzxzi i 61 Where ei is a classical error term IfX2 is omitted the equation becomes instead Y 2 Bo leli 5 62 Where Ei39 61 32X2i 63 Hence the explanatory variables in the estimated regression 62 are not independent of the error term unless the omitted variable is uncorrelated with all the included variables something which is very unlikely But this violates Classical Assumption Ill 2011 Pearson AddisonWesley All rights reserved The Consequences of an Omitted g Variable cont What happens if we estimate Equation 62 when Equation 61 is the truth We get bias What this means is that Et l e B 64 The amount of bias is a function of the impact of the omitted variable on the dependent variable times a function of the correlation between the included and the omitted variable Or more formally Bias B m rml m So the bias exists unless 1 the true coefficient equals zero or 67 2 the included and omitted variables are uncorrelated 2011 Pearson AddisonWesley All rights reserved Correcting for an Omitted Variable In theory the solution to a problem ofspecification bias seems easy add the omitted variable to the equation Unfortunately that s easier said than done for a couple of reasons 1 Omitted variable bias is hard to detect the amount of bias introduced can be small and not immediately detectable 2 Even if it has been decided that a given equation is suffering from omitted variable bias how to decide exactly which variable to include 2011 Pearson AddisonWesley All rights reserved 6395 Correcting for an Omitted Variable cont What if You have an unexpected result which leads you to believe that you have an omitted variable You have two or more theoretically sound explanatory variables as potential candidates for inclusion as the omitted variable to the equation is to use How do you choose between these variables One possibility is expected bias analysis Expected bias the likely bias that omitting a particular variable would have caused in the estimated coefficient of one of the included variables 2011 Pearson AddisonWesley All rights reserved Correcting for an Omitted Variable cont 39lt Expected bias can be estimated with Equation 67 67 Expeaed bias Bom 39 f11110111 When do we have a viable candidate When the sign of the expected bias is the same as the sign of the unexpected result Similarly when these signs differ the variable is extremely unlikely to have caused the unexpected result 67 2011 Pearson AddisonWesley All rights reserved Irrelevant Variables This refers to the case of including a variable in an equation when it does not belong there This is the opposite ofthe omitted variables case and so the impact can be illustrated using the same model Assume that the true regression speci cation is Yi 50 BIXH Ei 63910 But the researcher for some reason includes an extra variable Yi Bo leli BZXZi ei 611 The misspeci ed equation s errorterm then becomes 5i Ei Bzxzi 612 2011 Pearson AddisonWesley All rights reserved 6398 2 Irrelevant Variables cont So the inclusion of an irrelevant variable will not cause bias since the true coefficient of the irrelevant variable is zero and so the second term will drop out of Equation 612 However the inclusion of an irrelevant variable will Increase the variance of the estimated coefficients and this increased variance will tend to decrease the absolute magnitude of their tscores Decrease the I32 but not the R2 2011 Pearson AddisonWesley All rights reserved 6399 Four Important Specification Criteria decide whether a given variable belongs in the equation 1 Theory Is the variable s place in the equation unambiguous and theoretically sound 2 3 when the variable is added to the equation Bias Do other variables coef cients change signi cantly when the variable is added to the equation lfall these conditions hold the variable belongs in the equation If none of them hold it does not belong The tricky part is the intermediate cases use sound judgment 4 We can summarize the previous discussion into four criteria to help tTest Is the variable s estimated coef cient signi cant in the expected direction R2 Does the overall t of the equation adjusted for degrees of freedom improve 2011 Pearson AddisonWesley All rights reserved Specification Searches Almost any result can be obtained from a given dataset by simply specifying different regressions until estimates with the desired properties are obtained Hence the integrity of all empirical work is open to question To counter this the following three points of Best Practices in Specification Searches are suggested 1 Rely on theory rather than statistical fit as much as possible when choosing variables functional forms and the like 2 Minimize the number of equations estimated except for sensitivity analysis 3 Reveal in a footnote or appendix all alternative specifications estimated 2011 Pearson AddisonWesley All rights reserved Bias Caused by Relying on the tTest to Choose Variables 2 Dropping variables solely based on low tstatistics may lead to two different types of errors 1 An irrelevant explanatory variable may sometimes be included in the equation ie when it does not belong there 2 A relevant explanatory variables may sometimes be dropped from the equation ie when it does belong In the first case there is no bias but in the second case there is bias Hence the estimated coefficients will be biased every time an excluded variable belongs in the equation and that excluded variable will be left out every time its estimated coefficient is not statistically significantly different from zero So we will have systematic bias in our equation 2011 Pearson AddisonWesley All rights reserved 6391 2 Sensitivity Analysis Contrary to the advice of estimating as few equations as possible and based on theory ratherthan fitl sometimes we seejournal article authors listing results from five or more speci cations What s going on here In almost every case these authors have employed a technique called sensitivity analysis This essentially consists of purposely running a number of alternative specifications to determine whether particular results are robust not statistical ukes to a change in specification Why is this useful Because true specification isn t known 2011 Pearson AddisonWesley All rights reserved 6391 3 Data Mining DANGER Data mining involves exploring a data set to try to uncover empirical regularities that can inform economic theory That is the role of data mining is opposite that of traditional econometrics which instead tests the economic theory on a data set Be careful however a hypothesis developed using data mining techniques must be tested on a different data set or in a different context than the one used to develop the hypothesis Not doing so would be highly unethical After all the researcher already knows ahead of time what the results will be 2011 Pearson AddisonWesley All rights reserved 6391 4 Key Terms from Chapter 6 2 Omitted variable Irrelevant variable Specification bias Sequential specification search Specification error The four specification criteria Expected bias Sensitivity analysis 2011 Pearson AddisonWesley All rights reserved 615 Chapter 2 M Using Econometrics A Practical Guide Ordinary Least Squares AH Studenmund AddisonWesley isau murmur PEARSON Copyright 2011 Pearson Addison Wesley Slides by NielsHugo Blun9h A All rights reserved Washington and Lee UniverSIty 6 Estimating SingleIndependent Variable Models with OLS 2 Recall that the objective of regression analysis is to start from Yi Bo B1Xi 1 21 And through the use of data to get to SAG Z 30 1Xi 2392 Recall that equation 21 is purely theoretical while equation 22 is it empirical counterpart How to move from 21 to 22 2011 P eeee on AddisonWesley All rights reserved 2391 6 Estimating SingleIndependent Variable Models with OLS cont One of the most widely used methods is Ordinary Least Squares OLS N OLS minimizes 2 6 i 1 2 N 23 i 1 Or the sum of squared deviations of the vertical distance between the residuals ie the estimated error terms and the estimated regression line We also denote this term the Residual Sum of Squares RSS 2011 Pearson AddisonWesley All rights reserved 2392 6 Estimating SingleIndependent Variable Models with OLS cont Similarly OLS minimizes inn r Why use OLS Relatively easy to use The goal of minimizing RSS is intuitively theoretically appealing This basically says we want the estimated regression equation to be as close as possible to the observed data OLS estimates have a number of useful characteristics 2011 Pearson AddisonWesley All rights reserved 2393 6 Estimating SingleIndependent Variable Models with OLS cont OLS estimates have at least two useful characteristics The sum of the residuals is exactly zero OLS can be shown to be the best estimator when certain specific conditions hold we ll get back to this in Chapter 4 Ordinary Least Squares OLS is an estimator A given 3 produced by OLS is an estimate 2011 Pearson AddisonWesley All rights reserved 2394 B Estimating SingleIndependent Variable Models with OLS cont How does OLS work First recall from 23 that OLS minimizes the sum of the squared residuals YBUBIXa YBUBXe YQUBXH eY Y 2 y ir Yi3o BXZ Estimating SingleIndependent Variable Models with OLS cont Next it can be shown see Exercise 12 that the coefficients that ensure that for the case ofjust one independent variable are N 21mg i Yi 81 N 210 if i1 0 Y 81X 2011 P eeee on AddisonWesley All rights reserved 2396 Estimating Multivariate Regression Models with OLS In the real world one explanatory variable is not enough The general multivariate regression model with K independent variables is Yi 50 B1X1i 32X2i BKXKi 5i i1l2waN Biggest difference with singleexplanatory variable regression model is in the interpretation of the slope coefficients Now a slope coef cient indicates the change in the dependent variable associated with a oneunit increase in the explanatory variable holding the other explanatory variables constant 2011 Pearson AddisonWesley All rights reserved 27 6 Estimating Multivariate Regression Models with OLS cont 2 Omitted and relevant variables are therefore not held constant The intercept term 30 is the value on when all the Xs and the error term equal zero Nevertheless the underlying principle of minimizing the summed squared residuals remains the same 28 2011 Pearson AddisonWesley All rights reserved B Example financial aid awards at a liberal arts college 2 Dependent variable FINAIDi financial aid measured in dollars of grant awarded to the ith applicant 2011 Pearson AddisonWesley All rights reserved 2399 Example financial aid awards at a liberal arts college Theoretical Model FINAID1 PARENTI HSRANKi 29 FINAIDi so BIPARENTi BZHSRANKi ei 210 where PARENTi The amount in dollars that the parents ofthe ith student are judged able to contribute to college expenses HSRANKi The ith student s GPA rank in high school measured as a percentage ie between 0 and 100 2011 Pearson AddisonWesley All rights reserved 2391 0 Example financial aid awards at a liberal arts college cont 39lt Estimate model using the data in Table 22 to get FINAIDi 8927 036PARENTi 874HSRANKi 211 Interpretation of the slope coefficients Graphical interpretation in Figures 21 and 22 2011 Pearson AddisonWesley All rights reserved 2391 1 Figure 21 Financial Aid as a H Function of Parents Ability to Pay FINA Slope O36 3 holding HSRANK cunslnnll 0 PARENT Figure 21 Financcal Aid as a Function of Parents Ability to Pay in Equation 21 an increase ofone dollar in the parents39 ability 0 pay decreases the nancial aid award by 036 holding constant high school ran lt 2011 Pearson AddisonWesley All rights reserved 2391 2 Figure 22 Financial Aid as a Function of High School Rank 2 FlNAIDI Slope 8740 E holding PARENTI constant HSRANK Figure 22 Finanmal Aid as a Functlon of High School Flank in Equation 2 1 an increase ul one percentage point in high school rank intreases the nancial aid award by 8740 holding constant parenls ability Lo pay 2011 Pearson AddisonWesley All rights reserved 213 C Total Explained and Residual Sums of Squares 39se N TSS E Yi S K2 212 i1 2Yi Y22 zi Y22e 213 TSS E88 RSS This is usually called the decomposition of va ance 2011 Pearson AddisonWesley All rights reserved 2391 4 Figure 23 Decomposition of the Variance in Y Y X Y Y r Y Xv 7 Y Y Ea RX 0 x x Figure 23 Decomposmon of the Variance ll l Y The vpriation on around its mean Y 7 Y can be dccomposed into two parts 1 Yi Vquot the difference between thc estimated value onOquot and the mean value of Y Y and 2 Vi Y the dith rence between the actual value OH and the estimated value on 2011 Pearson AddisonWesley All rights reserved 2391 5 B Total Explained and Residual 5 Sums of Squares LSSLSSLSS TSS T88 T88 ESS RSS 1 T88 T88 2ESS 1RssR T88 T88 Describing the Overall Fit of the 5 Estimated Model The simplest commonly used measure of overall fit is the coefficient of determination R2 2 Ess R Eei R2 T88 T88 2m W2 214 Since OLS selects the coefficient estimates that minimizes RSS OLS provides the largest possible R2 within the class of linear models 2011 Pearson AddisonWesley All rights reserved 2391 7 Figure 24 Illustration of Case I Where R2 0 Y 39 a a a 39 Regression Linc Y 3 R2 0 I I O O I 0 Figure 24 X and Y are not related in such a case R2 would be 0 2011 Pearson AddisonWesley All rights reserved Figure 25 Illustration of Case Where R2 95 C C u z I Q n o 39 39 0 o R 95 o X Figure 25 A set ofdala for X and Y that can be quotexplainedquot quite well with a regression line R2 95 2011 Pearson AddisonWesley All rights reserved Figure 26 Illustration of Case 3 Where R2 1 Figure 26 A perfect t all the data poian are on the regression line and the resulting R2 is 1 2011 Pearson AddisonWesley All rights reserved 6 The Simple Correlation 5 Coefficient r o This is a measure related to R2 o r measures the strength and direction of the linear relationship between two variables r 1 the two variables are perfectly positively correlated r 1 the two variables are perfectly negatively correlated r O the two variables are totally uncorrelated 2011 Pearson AddisonWesley All rights reserved 221 The adjusted coefficient of 5 determination A major problem with R2 is that it can never decrease if another independent variable is added An alternative toR2 that addresses this issue is the adjusted R2 or R2 2e N K 1 R2 1 f 215 2Y1YN 1 Where N K 1 degrees of freedom 2011 Pearson AddisonWesley All rights reserved 6 The adjusted coefficient of determination cont 2 So E measures the share of the variation of Y around its mean that is explained by the regression equation adjusted for degrees of freedom EZ can be used to compare the fits of regressions with the same dependent variable and different numbers of independent variables As a result most researchers automatically use instead of R2 when evaluating the fit of their estimated regressions equations 2011 Pearson AddisonWesley All rights reserved Evaluating the Quality of a Regression Equation Checkpoints here include the following Is the equation supported by sound theory How well does the estimated regression fit the data Is the data set reasonably large and accurate Is OLS the best estimator to be used for this equation 91599N How well do the estimated coefficients correspond to the expectations developed by the researcher before the data were collected 0 Are all the obviously important variables included in the equation l Has the most theoretically logical functional form been used 00 Does the regression appear to be free of major econometric problems These numbers roughly correspond to the relevant chapters in the book 2011 Pearson AddisonWesley All rights reserved Table 21a The Calculation of Estimated Regression Coefficients for the WeightHeight Example Table 21 The Calculation of Estimated Regression Coefficients for the WeightHeight Example Flaw Data Required Intermediate Calculations i V X Y s 7 xi 7 Y X a 302 X XXV a V Y en Yr Y 1 2 3 4 5 6 7 8 9 1 140 5 52940 535 2862 15729 1353 47 2 157 9 1240 135 182 1674 1608 38 3 205 13 3560 265 702 9434 1863 187 4 198 12 2860 165 272 4719 1799 181 5 162 10 740 035 012 259 1672 52 6 174 11 460 065 042 299 1736 04 7 150 8 1940 235 552 4559 1544 44 8 165 9 7440 4135 182 594 1608 42 2011 Pearson AddisonWesley All rights reserved 225 Table 21 b The Calculation of Estimated Regression Coefficients for the WeightHeight Example 9 170 10 060 035 012 021 1672 23 10 180 12 1060 165 272 1749 1799 01 11 170 11 060 065 042 039 1736 36 12 162 9 740 135 132 999 1603 12 13 165 10 440 035 012 154 1672 22 14 180 12 1060 165 272 1749 1799 01 15 160 8 940 235 552 2209 1544 56 16 155 9 1440 135 182 1944 1608 58 17 165 10 440 035 012 154 1672 22 18 190 15 2060 465 2162 9579 1991 791 19 185 13 1560 265 702 4134 1863 13 20 E 11 1440 065 L42 gt936 1736 186 Sum 3388 207 00 00 9250 59020 33883 03 Mean 1694 1035 00 00 1694 00 2011 Pearson AddisonWesley All rights reserved Table 22a Data for the Financial Aid Example Table 22 Data f0r the Financial Aid Example i FINAID PARENT HSRANK MALE 1 1 9640 O 92 0 2 8325 9147 44 1 3 1 2950 7063 89 0 4 700 33344 97 1 5 7000 20497 95 1 6 11325 10487 96 O 7 1 91 65 51 9 98 1 8 7000 31 758 70 0 9 7925 1 6358 49 0 10 11475 10495 80 0 1 1 1 8790 O 90 0 12 8890 18304 75 1 227 2011 Pearson AddisonWesley All rights reserved Table 22b Data for the Financial Aid Example 13 17590 14 17765 15 14100 16 18965 17 4500 18 7950 19 7000 20 7275 21 8000 22 4290 23 8175 24 11350 25 15325 26 22148 27 17420 2059 0 15602 0 22259 5014 34266 11569 30260 19617 12934 8349 5392 0 3207 OOAOAAAOAAAOOOA 2011 Pearson AddisonWesley All rights reserved 228 Table 22c Data for the Financial Aid Example 28 18990 0 90 0 29 11175 10894 97 0 30 14100 5010 59 0 31 7000 24718 97 1 32 7850 9715 84 1 33 0 64305 84 O 34 7000 31947 98 1 35 16100 8683 95 1 36 8000 24817 99 0 37 8500 8720 20 1 38 7575 12750 89 1 39 13750 2417 41 1 40 7000 26846 92 1 41 11200 7013 86 1 42 14450 6300 87 0 continued 2011 Pearson AddisonWesley All rights reserved Table 22d Data for the Financial Aid Example Table 22 continued i FINAID PARENT HSHANK MALE 43 15265 3909 84 0 44 20470 2027 99 1 45 9550 12592 89 O 46 1 5970 0 57 0 47 12190 6249 84 0 48 11800 6237 81 O 49 21 640 O 99 0 50 9200 10535 68 0 Dalafile FINAIDZ 2011 Pearson AddisonWesley All rights reserved 2 I3 Key Terms from Chapter 2 Ordinary Least Squares OLS Interpretation of a multivariate regression coefficient Total sums of squares Explained sums of squares Residual sums of squares Coefficient of determination R2 Simple correlation coefficient r Degrees of freedom Adjusted coefficient of determination EZ 2011 Pearson AddisonWesley All rights reserved Chapter 8 M Using Econometrics A Practical Guide Multicollinearity AH Studenmund AddisonWesley sau mrprrmur PEARSON Copyright 2011 Pearson Addison Wesley Slides by NielsHugo Blun9h A All d Washington and Lee Uanel Slty rights reserve Introduction and Overview The next three chapters deal with violations of the Classical Assumptions and remedies for those violations This chapter addresses multicollinearity the next two chapters are on serial correlation and heteroskedasticity For each of these three problems we will attempt to answer the following questions 1 What is the nature of the problem 2 What are the consequences of the problem 3 How is the problem diagnosed 4 What remedies for the problem are available 2011 Pearson AddisonWesley All rights reserved 8391 Perfect Multicollinearity Perfect multicollinearity violates Classical Assumption VI which specifies that no explanatory variable is a perfect linear function of any other explanatory variables The word perfect in this context implies that the variation in one explanatory variable can be completely explained by movements in another explanatory variable A special case is that of a dominant variable an explanatory variable is definitionally related to the dependent variable An example would be Notice no error term X1i 0 Q1X2i 81 where the as are constants and the Xs are independent variables in Yi 50 B1X1i BZXZi 8i 82 Figure 81 illustrates this case 2011 Pearson AddisonWesley All rights reserved 8392 Figure 81 12 Perfect Multicollinearity XI Figure 81 Perfect Multicollinearity With perfect nrultimllinearityl an independent variable an be completely explained by Llre movements of one or more other independent variables Perfect multicollinearity can usually be avoided by careful screening of the independent variables before a re gression is run 2011 Pearson AddisonWesley All rights reserved 83 Perfect Multicollinearity it 7 cont What happens to the estimation of an econometric equation where there is perfect multicollinearity OLS is incapable of generating estimates ofthe regression coef cients most OLS computer programs will print out an error message in such a situation What is going on Essentially perfect multicollinearity ruins our ability to estimate the coefficients because the perfectly collinear variables cannot be distinguished from each other You cannot hold all the other independent variables in the equation constant if every time one variable changes another changes in an identical manner Solution one of the collinear variables must be dropped they are essentially identical anyway 2011 Pearson AddisonWesley All rights reserved 8394 l3 39lt Imperfect Multicollinearity Imperfect multicollinearity occurs when two or more explanatory variables are imperfectly linearly related as in X11 0 0 Cl1X21 1 87 Compare Equation 87 to Equation 81 Notice that Equation 87 includes ui a stochastic error term This case is illustrated in Figure 82 2011 Pearson AddisonWesley All rights reserved 85 Figure 82 Imperfect Multicollinearity X2 Figure 82 Imperfect Multicollinearity With imperfect multimllinean39ty an independent variable is a strong but not perfect linear fuucLion ofone 0139 more oLher independent variables ImperfecL multicolhnea ly varies in degree from sample 10 sample 2011 Pearson AddisonWesley All rights reserved 86 The Consequences of Multicollinearity 2 There are five major consequences of multicollinearity 1 Estimates will remain unbiased 2 The variances and standard errors of the estimates will increase Harder to distinguish the effect of one variable from the effect of another so much more likely to make large errors in estimating the 35 than without multicollinearity As a result the estimated coefficients although still unbiased now come from distributions with much larger variances and therefore larger standard errors this point is illustrated in Figure 83 a 2011 Pearson AddisonWesley All rights reserved 87 Figure 83 Severe Multicollinearity 5 2 Increases the Variances of the 3s Without Severe I Mulxicoliincan39ty With Severe Multicollinearity Figure 83 Severe Multicollinearity Increases the Variances of the 1 s Severe muhimliinearity produces a distribution of the 35 that is entered around the true E but that has a murh Wider variance Thus the distribution of 35 with multi olIineariTy is much widerthan otherwise 2011 Pearson AddisonWesley All rights reserved 88 The Consequences of 32 Multicollinearity cont 3 The computed tscores will fall a Recalling Equation 52 this is a direct consequence of2 above 4 Estimates will become very sensitive to changes in specification a The addition or deletion of an explanatory variable or of a few observations will often cause major changes in the values of the 33 when significant multicollinearity exists b For example ifyou drop a variable even one that appears to be statistically insignificant the coefficients of the remaining variables in the equation sometimes will change dramatically c This is again because with multicollinearity it is much harderto distinguish the effect of one variable from the effect of another 5 The overall fit of the equation and the estimation of the coefficients of nonmulticollinear variables will be largely unaffected 2011 Pearson AddisonWesley All rights reserved 8399 The Detection of Multicollinearity 39lt l3 39 First realize that that some multicollinearity exists in every equation all variables are correlated to some degree even if completely at random 80 it s really a question of how much multicollinearity exists in an equation rather than whether any multicollinearity exists There are basically two characteristics that help detect the degree of multicollinearity for a given application 1 High simple correlation coefficients 2 High Variance Inflation Factors VlFs We will now go through each of these in turn 2011 Pearson AddisonWesley All rights reserved 810 High Simple Correlation Coefficients Ifa simple correlation coefficient r between any two explanatory variables is high in absolute value these two particular Xs are highly correlated and multicollinearity is a potential problem How high is high Some researchers pick an arbitrary number such as 080 A better answer might be that r is high if it causes unacceptably large variances in the coefficient estimates in which we re interested Caution in case of more than two explanatory variables Groups of independent variables acting together may cause multicollinearity without any single simple correlation coefficient being high enough to indicate that multicollinearity is present As a result simple correlation coefficients must be considered to be sufficient but not necessary tests for multicollinearity 2011 Pearson AddisonWesley All rights reserved High Variance Inflation g Factors VlFs The variance inflation factor VIF is calculated from two steps 1 Run an OLS regression that has Xi as a function of all the other explanatory variables in the equation Fori 1 this equation would be X1 G102X2G3X3 GKXKV 815 where v is a classical stochastic error term 2 Calculate the variance inflation factorfor Bi A 1 VIP 3 816 1 1 Rh where R12 is the unadjusted R2 from step one 812 2011 Pearson AddisonWesley All rights reserved High Variance Inflation 7 a Factors VlFs cont From Equation 816 the higher the VIF the more severe the effects of muitcoinearity How high is high While there is no table of formal critical VIF values a common rule of thumb is that if a given VIF is greater than 5 the multicollinearity is severe As the number of independent variables increases it makes sense to increase this number slightly Note that the authors replace the VIF with its reciprocal 1 11 called tolerance or TOL Problems with VIF No hard and fast VIF decision rule There can still be severe multicollinearity even with small VIFs VIF is a sufficient not necessary test for multicollinearity 2011 Pearson AddisonWesley All rights reserved 8391 3 Remedies for 17 Multicollinearity Essentially three remedies for multicollinearity 1 Do nothing a Multicollinearity will not necessarily reduce the t scores enough to make them statistically insignificant andor change the estimated coefficients to make them differ from expectations b the deletion of a multicollinear variable that belongs in an equation will cause specification bias 2 Drop a redundant variable a Viable strategy when two variables measure essentially the same thing b Always use theory as the basis for this decision 2011 Pearson AddisonWesley All rights reserved 8391 4 6 Remedies for 1 r Multicollinearity cont 3 Increase the sample size a This is frequently impossible but a useful alternative to be considered if feasible b The idea is that the larger sample normally will reduce the variance of the estimated coefficients diminishing the impact of the multicollinearity 2011 Pearson AddisonWesley All rights reserved 8391 5 Table 81a Table 31 Data for the FishPope Example Year F PF PB N Yd 1946 128 560 501 24402 1606 1947 123 643 713 25268 1513 1948 131 741 810 26076 1567 1949 129 745 762 26718 1547 1950 138 731 803 27766 1646 1951 132 834 910 28635 1657 1952 133 813 902 29408 1678 1953 136 782 842 30425 1726 1954 135 787 837 31648 1714 1955 129 771 771 32575 1795 1956 129 770 745 33574 1839 1957 129 780 828 34564 1844 1958 133 834 922 36024 1831 2011 Pearson AddisonWesley All rights reserved Table 81a 1959 137 849 888 39505 1881 1960 132 850 872 40871 1883 1961 137 869 883 42105 1909 1962 136 905 901 42882 1969 1963 137 903 887 43847 2015 1964 135 882 873 44874 2126 1965 139 908 939 45640 2239 1966 139 967 1026 46246 2335 1967 136 1000 1000 46864 2403 1968 140 1016 1023 47468 2486 1969 142 1072 1114 47873 2534 1970 148 1180 1176 47872 2610 Source Historical Statistics of the 18 Colonial Times to 1970 Washington 00 US Bureau of the Census 1975 Datafile FISHB 2011 Pearson AddisonWesley All rights reserved Table 82a Table 82 Data for the SAT Interactive Learning Exercise SAT GPA APMATH APENG AP ESL GEND PREP RACE 1060 374 O 1 1 0 O O 0 740 271 0 0 0 0 0 1 0 1070 392 0 1 1 0 0 1 0 1070 343 0 1 1 O 0 1 O 1330 435 1 1 1 0 0 1 0 1220 302 0 1 1 0 1 1 O 1130 398 1 1 1 1 0 1 O 770 294 0 0 0 O 0 1 O 1050 349 0 1 1 0 0 1 0 1250 387 1 1 1 0 1 1 O 1000 349 0 0 0 O 0 1 0 1010 324 0 1 1 0 0 1 0 1320 422 1 1 1 1 1 0 1 1230 361 1 1 1 1 1 1 1 840 248 1 0 1 1 1 O 1 2011 Pearson AddisonWesley All rights reserved 910 1240 1020 850 1300 1350 1070 1000 1280 590 1060 1050 1220 Table 82b 226 dddo OOO O OOO O h AO OOOOO O OOO OO O OOO 39O OOO O AO L kAOOOOO OOOO X k AO LOOO L L L L LOO L l LO AOO L L A AO L L k LAO L ko dO d OOOOOOOCOO A 2011 Pearson AddisonWesley All rights reserved 819 Table 82c Table 82 continued SAT 1040 1070 900 1430 1290 1070 1100 1030 1070 1170 1300 GPA 273 310 270 373 164 4403 324 342 429 333 361 358 352 294 398 389 APMATH A LOO h LO LOO LO LO Ao APENG 0 44044004044040 AP A O l x kO LO L kO kO LO ESL O OOO AOOOO AOA O GEND 4440004444004404 PREP RACE 1 D O 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 O 0 1 1 1 0 1 0 1 0 1 O O O 2011 Pearson AddisonWesley All rights reserved 820 Table 82d 1410 434 1 1 1 1 0 1 1 1160 343 1 1 1 0 1 1 O 1 170 356 1 1 1 0 O 0 0 1280 411 1 1 1 O 0 1 0 1060 358 1 1 1 1 0 1 0 1250 347 1 1 1 0 1 1 0 1020 292 1 O 1 1 1 1 1 1000 405 O 1 1 1 0 0 1 1090 324 1 1 1 1 1 1 1 1430 438 1 1 1 1 O 0 1 860 262 1 O 1 1 0 0 1 1050 237 O 0 0 O 1 0 0 920 277 0 0 0 0 O 1 O 1100 254 0 0 0 0 1 1 0 1 160 355 1 O 1 1 1 1 1 1360 298 O 1 1 1 O 1 O 970 364 1 1 1 0 O 1 0 Datafile SAT8 2011 Pearson AddisonWesley All rights reserved 8392 1 Table 83a Table 83 Means Standard Deviations and Simple Correlation Coefficients for the SAT Interactive Regression Learning Exercise Means Standard Deviations and Correlations Sample Range 1 65 Variable SAT GPA APMATH APENG AP ESL GEND PREP RACE Mean 1 075538 3362308 0523077 0553846 0676923 0400000 0492308 0738462 0323077 Standard Deviation 1913605 0612739 0503354 0500961 0471291 0493710 0503831 0442893 0471291 2011 Pearson AddisonWesley All rights reserved 822 Table 83b Correlation Coeff Correlation Coeff AF39MATHGPA 0497 GPASAT 0678 APENGSAT 0608 APMATHSAT 0512 APENGAPMATH 0444 APENGGPA 0709 APSAT 0579 APGPA 0585 APAPMATH 0723 APAPENG 0769 ESLGPA 0071 ESLSAT 0024 ESLAPENG 0037 ESLAPMATH 0402 GENDGPA 0008 ESLAP 0295 GENDAPENG 0044 GENDSAT 0293 GENDESL 0050 GENDAPMATH 0077 PRERSAT 0100 GENDAP 0109 PRERAPMATH 0147 PRERGPA 0001 PRERAP 0111 PRERAPENG 0029 PRERGEND 0044 PRERESL 0085 FIACESAT 0085 RACEGPA 0025 RACEAPMATH 0330 RACEAPENG 0107 RACEAP 0195 HACEESL 0846 FIACEGEND 0022 RACEPREP 0187 823 2011 Pearson AddisonWesley All rights reserved B Key Terms from Chapter 8 39lt Perfectmulticollinearity Severe imperfect multicollinearity Dominantvariable Auxiliary or secondary equation Variance inflation factor Redundant variable 2011 Pearson AddisonWesley All rights reserved 824 Chapter 7 Using Econometrics A Practical Guide Specification Choosing a Functional Form AH Studenmund AddisonWesley isau imminqu PEARSON Copyright 2011 Pearson Addison Wesley Slides by NielsHugo Blun9h A All rights reserved Washington and Lee UniverSIty 6 Choosing a Functional Form 2 After the independent variables are chosen the next step is to choose the functional form of the relationship between the dependent variable and each of the independent variables KISS Principle Let theory be your guide Not the data 2011 Pearson AddisonWesley All rights reserved 7391 The Use and Interpretation of the Constant Term 2 An estimate of 30 has at least three components 1 the true 50 2 the constant impact of any specification errors an omitted variable for example 3 the mean ofs for the correctly specified equation if not equal to zero Unfortunately these components can t be distinguished from one another because we can observe only 30 the sum of the three components As a result of this we usually don t interpret the constant term On the other hand we should not suppress the constant term either as illustrated by Figure 71 72 2011 Pearson AddisonWesley All rights reserved Figure 71 The Harmful Effect of 52 Suppressing the Constant Term Estimated Relationship Suppressing he Imercept True Relation Obsewmiens Figure 71 The Harmful Effem of Suppressing the Constam Term ll39Ihe L onslnnl or inlerrepI Ierm is suppressed the estimmed regression will go ihrmlgh the origin Such an effecl potentially biases he 35 and infldies lheir bswres In this particular example the true slope is lose l0 zero in the range ol39lhe sample but arcing the regression L11 rough the origin makes he slope appear 0 be signi cmnly posiLive 2011 Pearson AddisonWesley All rights reserved 73 Alternative Functional Forms An equation is linear in the variables if plotting the function in terms ofX and Y generates a straight line For example Equation 71 YBOB1X8 71 is linear in the variables but Equation 72 Y 50 51x2 a 72 is not linear in the variables Similarly an equation is linear in the coefficients only if the coefficients appear in their simplest form they are not raised to any powers other than one are not multiplied or divided by other coef cients do not themselves include some sort of function like logs or exponents 2011 Pearson AddisonWesley All rights reserved 7394 Alternative Functional Forms cont For example Equations 71 and 72 are linear in the coefficients while Equation 73 Y Bo KB 73 is not linear in the coefficients In fact of all possible equations for a single explanatory variable only functions of the general form W E 30 Biflxl 74 are linear in the coefficients BO and B1 2011 Pearson AddisonWesley All rights reserved 7395 Linear Form 39se This is based on the assumption that the slope of the relationship between the independent variable and the dependent variable is constant M Axk Bk For the linear case the elasticity of Y with respect to X the percentage change in the dependent variable caused by a 1percent increase in the independent variable holding the other variables in the equation constant is k12l AVY AY Xk XL Elasuutymxk V k 2011 Pearson AddisonWesley All rights reserved 76 What Is a Logarithm Ife a constant equal to 271828 to the bth power produces x then b is the log ofx b is the log ofx to the base e if eb x Thus a log or logarithm is the exponent to which a given base must be taken in order to produce a specific number While logs come in more than one variety we ll use only natural logs logs to the base e in this text The symbol for a natural log is In so nx b means that 271828 b x or more simply lnx b means that eb x For example since e2 271828 2 7389 we can state that n7389 2 2011 Pearson AddisonWesley All rights reserved 7397 DoubleLog Form Assume the following BOXEIXgieSi Yields 1 Z In Bo Bl 1n X11 BZXZi 8i Here the natural log on is the dependent variable and the natural log ofX is the independent variable In a doublelog equation an individual regression coefficient can be interpreted as an elasticity because AUIIY AYY k A nxk AXkXk Elasticity XI Note that the elasticities ofthe model are constant and the slopes are not This is in contrast to the linear model in which the slopes are 2011 Pearson AddlsonWesley All rights reserved 7398 nncfanf Htth nlachmhnc are not Figure 72 g DoubleLog Functions X3 Y 21 gt I U lt B lt I lnY 30 611le 32109 B lt 0 0 Xi 0 Holding X2 ctmsmm XI Figure 72 DoubleLog Functions Depending an the values of the regression coef cients the doublelug functional form can take 011 a number ufshapes The left panel shows he use ofa doublelug function to depict a shape useful in describing the economic concept of an isoquant or an indif ference curve The right panel shows various shapes that can be achieved with a double log function ifx2 is held constant DI is not included in the equation 2011 Pearson AddisonWesley All rights reserved 7399 I3 Semilog Form 2 The semilog functional form is a variant of the double log equation in which some but not all of the variables dependent and independent are expressed in terms of their natural logs It can be on the righthand side as in Yi 30 BtlnX1i BZXZi 3i 77 Or it can be on the lefthand side as in MY 30 31x1 32X2 3 79 Figure 73 illustrates these two different cases 2011 Pearson AddisonWesley All rights reserved 7391 0 Figure 73 Semilog Functions YLBaB2X2J B lnX lnY Pu thi t 52Xz 0 X 0 xi Hulding Xl cunslunl Hultling x2 cunslanl Figure 73 Semilog Functions The semilog functional form on the right lnX can be used to depict in situation in which the impact 7le on Y is expected to increase at a decreasing rate as X1 gets bi at as long as 31 is greater than zero holding X2 constant The semilog functional fonn on the left lnY can be used to depict a situation in VVl le an increase in X1 causes Y to increase at an increasing rate 2011 Pearson AddisonWesley All rights reserved 711 Polynomial Form Polynomial functional forms express Y as a function of independent variables some of which are raised to powers other than 1 For example in a seconddegree polynomial also called a quadratic equation at least one independent variable is squared Yi Bo B1X1i B20992 B3X2i 5i 710 The slope onwith respect to X1 in Equation 710 is AY B 282X1 711 Note that the slope depends on the level of X1 2011 Pearson AddisonWesley All rights reserved 712 Figure 74 Polynomial Functions Y Y F lt 0 Yl30l5ilel51X14 i52Xi Blgt0 B gt t B lt I 0 Holding X1 consmnt X 0 Holding X3 constant X Figure 74 Polynomial Functions Quadratic mnionnl forms polynomials with squared terms take on U or inverted ing on the Values oft e coef cients holding X1 constant The left U shapes depend panel shows the shape of a quadratic function that could be used to show a typical cost curve the right panel allows the description ofan impact that rises and then falls like he impart of age on earnings 713 2011 Pearson AddisonWesley All rights reserved Inverse Form The inverse functional form expresses Y as a function of the reciprocal or inverse of one or more of the independent variables in this case X1 Yi 30 311X1i BZXZi 8i 713 80 X1 cannot equal zero This functional form is relevant when the impact of a particular independent variable is expected to approach zero as that independent variable approaches infinity The slope with respect to X1 is 2 31 AX xi The slopes for X1 fall into two categories depending on the sign of 31 illustrated in Figure 75 714 2011 Pearson AddisonWesley All rights reserved 7391 4 Figure 75 Inverse Functions Y B gt0 Y 50 Bzxz Bl XI Bil 52X 3 lt 0 0 XI Holding X constant Figure 75 Inverse Functions inverse or reciprocal functional forms allow the impact ofan X1 on Y to approach zero as X I increases in size The inverse function approarhes the same value the asympr tote from the top or bottom depending on the sign of B1 2011 Pearson AddisonWesley All rights reserved 7391 5 Table 71 Summary of Alternative Functional Forms Table 71 Summary of Alternative Functional Forms Functional Form Equation one X only Linear Y1 30 B1X e Doublelog InYi 30 B InXi ei Semilog lnX Y 30 3 InXi ei Semilog lnY InYi BO 31Xi ei Polynomial Y BO 81X BZXZi ei Inverse Yi 30 34 e The Meaning of BI The slope on with respect to X The elasticity of Y with respect to X The change in Y in units related to a 1 percent increase in X The percent change in Y related to a oneunit increase in X Roughly the slope on with respect to X for small X Roughly the inverse of the slope 01 Y with respect to X for small X 2011 Pearson AddisonWesley All rights reserved 716 f5al Laggedlndependent I a Variables Virtually all the regressions we ve studied so far have been instantaneous in nature In other words they have included independent and dependent variables from the same time period as in Yt Bo lelt BZXZt at 715 Many econometric equations include one or more lagged independent variables like X1t1 where t 1 indicates that the observation of X1 is from the time period previous to time period t as in the following equation Yt Bo leltl BZXZt at 716 2011 Pearson AddisonWesley All rights reserved 7391 7 Using Dummy Variables A dummy variable is a variable that takes on the values of O or 1 depending on whether a condition for a qualitative attribute such as gender is met These conditions take the general form Y1 Bo 1 Bixi B2131 6i 1 ifthe ith observation meets a particular condition 718 where D l 0 Otherwise This is an example of an intercept dummy as opposed to a slope dummy which is discussed in Section 75 Figure 76 illustrates the consequences of including an intercept dummy in a linear regression model 2011 Pearson AddisonWesley All rights reserved 7391 8 Figure 76 T An Intercept Dummy Y1 Bo ixi 32D Both Slopes 3L Figure 76 An Intercept Dummy if an intercept dummy BZDi is added to an equation a graph ofthe equation will have different intercepts for the two qualitative conditions speci ed by the dummy vari able The difference between the two intercepts is 32 The slopes are constant with re spect to the qualitative condition 2011 Pearson AddisonWesley All rights reserved 719 6 Slope Dummy Variables Contrary to the intercept dummy which changed only the intercept and not the slope the slope dummy changes both the intercept and the slope The general form of a slope dummy equation is Yi 50 B1Xi BZDi B3XiDi 5i 720 The slope depends on the value of D When D 0 AYAX 51 When D 1 AYAX 31 33 Graphical illustration of how this works in Figure 77 2011 Pearson AddisonWesley All rights reserved Figure 77 Slope and g Intercept Dummies Y 30 Bixi 32D BthDi A SIODe BI 33 5 90 T Slope is Figure 77 Slope and Intercept Dummies lislope dummy esxioi and intercept dummy BZDi terms are added to an equation a graph of the equation will have different intercepts and different slopes depending on the value of the qualitative condition speci ed by the dummy variable The difference between the two intercepts is 32 whereas the difference between the two slopes is 33 2011 Pearson AddisonWesley All rights reserved Problems with Incorrect 127 1 Functional Forms f functional forms are similar and if theory does not specify exactly which form to use there are at least two reasons why we should avoid using goodness of fit over the sample to determine which equation to use 1 Fits are dif cult to compare if the dependent variable is transformed 2 An incorrect function form may provide a reasonable t within the sample but have the potential to make large forecast errors when used outside the range of the sample The first of these is essentially due to the fact that when the dependent variable is transformed the total sum of squares TSS changes as well The second is essentially die to the fact that using an incorrect functional amounts to a specification error similar to the omitted variables bias discussed in Section 61 This second case is illustrated in Figure 78 2011 Pearson AddisonWesley All rights reserved I3 Figure 78a Incorrect Functional Forms Outside the Sample Range Out of Sample Out of Sample 4 D 4 0 Sample X 0 Sample X 1 Double Log B lt 0 h Polynomial 2011 P eeee on AddisonWesley All rights reserved Figure 78b Incorrect Functional g Forms Outside the Sample Range Out of Sample quot Out of Sample 0 Sample X 0 Sample 0 Sentilug Right Ll Linear Figure 78 Incorrect Functional Forms Outside the Sample Range lfan incorreu form is applied to data outside the range of the sample on WlllEl l it was estimated the probability oflarge mistakes increases In particular note how the poly nomial functional form can change slope rapidly outside the sample range panel 1 and that even a linear form can cause mistakes if the true functional form is nonlinear panel d 2011 Pearson AddisonWesley All rights reserved I3 Key Terms from Chapter 7 2 Elasticity Slope dummy Doublelog Natural log funCtlonal form Omitted condition Semilog functional form Interaction term polynomial Linearin the variables functional form Linearin I I Inverse the coeffICIents functional form 2011 P eeee on AddisonWesley All rights reserved Chapter 5 M Using Econometrics A Practical Guide Hypothesis Testing AH Studenmund AddisonWesley isaii imminqu PEARSON Copyright 2011 Pearson Addison Wesley Slides by NielsHugo Blun9h A All rights reserved Washington and Lee UniverSIty What Is Hypothesis Testing Hypothesis testing is used in a variety of settings The Food and Drug Administration FDA for example tests new products before allowing their sale If the sample of people exposed to the new product shows some side effect signi cantly more frequently than would be expected to occur by chance the FDA is likely to withhold approval of marketing that product Similarly economists have been statistically testing various relationships for example that between consumption and income Note here that while we cannot prove a given hypothesis for example the existence of a given relationship we often can reject a given hypothesis again for example rejecting the existence of a given relationship 2011 Pearson AddisonWesley All rights reserved Classical Null and Alternative Hypotheses The researcher rst states the hypotheses to be tested Here we distinguish between the null and the alternative hypothesis Null hypothesis Ho the outcome that the researcher does not expect almost always includes an equality sign Alternative hypothesis HA the outcome the researcher does expect Example H0 B S 0 the values you do not expect HA B gt 0 the values you do expect 2011 Pearson AddisonWesley All rights reserved 52 Type and Type II Errors Two types of errors possible in hypothesis testing Type I Rejecting a true null hypothesis Type II Not rejecting a false null hypothesis Example Suppose we have the following null and alternative hypotheses H0 B s 0 H A B gt 0 Even ifthe true B really is not positive in any one sample we might still observe an estimate of B that is suf ciently positive to lead to the rejection of the null hypothesis This can be illustrated by Figure 51 53 2011 Pearson AddisonWesley All rights reserved Figure 51 Rejecting a True Null Hypothesis Is a Type Error Distribution of Es Centered Around 0 A S E Quite Positive Figure 51 Reiecting a True Null Hypothesis Is a Type Error If B E but you observe a that is very positive you might reiect a true null hypothesis H0 B S 0 and conclude incorrectly that the alternative hypothesis HA B gt O is true 2011 Pearson AddisonWesley All rights reserved 5394 6 Type and Type II Errors cont 39lt Alternatively it s possible to obtain an estimate of B that is close enough to zero or negative to be considered not significantly positive Such a result may lead the researcher to accept the null hypothesis that B s 0 when in truth B gt O This is a Type II Error we have failed to reject a false null hypothesis This can be illustrated by Figure 52 2011 Pearson AddisonWesley All rights reserved 5395 Figure 52 Failure to Reject a False 7g Null Hypothesis Is a Type II Error Distribution of 25 Centered Around 1 l U 3 Negative But Close to 0 Figure 52 Failure to Reject a False Null Hypothesis Is a Type II Error If B 1 but you observe a 3 that is negative but close to zero you might fail to reject a false null hypothesis HO B S l and incorrectly ignore the fact that the alternative hypothesis 1 15 3 gt O is true 2011 Pearson AddisonWesley All rights reserved Decision Rules of Hypothesis Testing To test a hypothesis we calculate a sample statistic that determines when the null hypothesis can be rejected depending on the magnitude ofthat sample statistic relative to a preselected critical value which is found in a statistical table This procedure is referred to as a decision rule The decision rule is formulated before regression estimates are obtained The range of possible values ofthe estimates is divided into two regions an acceptance really nonrejection region and a rejection region The critical value effectively separates the acceptance nonrejection region from the rejection region when testing a null hypothesis Graphs of these acceptance and rejection regions are given in Figures 53 and 54 2011 Pearson AddisonWesley All rights reserved 5397 e Figure 53 Acceptance and Rejection 7g Regions for a OneSided Test of B Distriburron of 35 Probability of Type I Error i i i i i i i i i i i i i i l i 0 18 5 quotAcceptancequot Region chccvjon Region Figure 53 Acceptance and Helectlon Regions for a OneSided Test of 1 For a onesided test of HO B S 0 vs HA 5 gt 0 the critical value divides the distribu tion of 5 Lentered around zero on the assumption that H is true into quotacceptancequot and rejection regions 2011 Pearson AddisonWesley All rights reserved 5398 Figure 54 Acceptance and Rejection Regions for a TwoSided Test of B 1 Distribution of Probability of f Type I EITm39 Rejection Acceptance Region Rejection Region Region Figure 54 Acceptance and Rejection Regions for aTwo Sided Test of 5 For a twosided test ofHO B 0 vs HA B n 0 we divided the distribution of E into an quotacceptancequot region and two rejection regions 2011 Pearson AddisonWesley All rights reserved 5399 6 The tTest 39lt The ttest is the test that econometricians usually use to test hypotheses about individual regression slope coefficients Tests of more than one coefficient at a time joint hypotheses are typically done with the Ftest presented in Section 56 o The appropriate test to use when the stochastic error term is normally distributed and when the variance of that distribution must be estimated Since these usually are the case the use of the ttest for hypothesis testing has become standard practice in econometrics 2011 Pearson AddisonWesley All rights reserved 5391 0 The tStatistic For a typical multiple regression equation Yi 30 Ban l82X2i Ei 51 we can calculate tvalues for each of the estimated coefficients Usually these are only calculated forthe slope coefficients though see Section 71 Specifically the t statistic for the kth coefficient is Bk Bu l121lt sumo 39 l 2011 Pearson AddisonWesley All rights reserved 5391 1 The Critical tValue and the 5 tTest Decision Rule To decide whetherto reject or not to reject a null hypothesis based on a calculated t value we use a critical tvalue A critical tvalue is the value that distinguishes the acceptance region from the rejection region The critical t value t is selected from a ttable see Statistical Table 81 in the bacflt of the book depending on whetherthe test is onesided or twosided the level of Type Error specified and the degrees of freedom de ned as the number ofobservations minus the number of coefficients estimated including the constant or N K 1 2011 Pearson AddisonWesley All rights reserved 5391 2 I3 The Critical tValue and the tTest Decision Rule cont The rule to apply when testing a single regression coefficient ends up being that you should Reject HO if tk gt tC and if tk also has the sign implied by HA Do not reject H0 otherwise 2011 Pearson AddisonWesley All rights reserved 5391 3 2 6 The Critical tValue and the tTest Decision Rule cont Note that this decision rule works both for calculated t values and critical t values for onesided hypotheses around zero or another hypothesized value H0 Bks 0 HA Bk gt 0 H0 ska 0 HA3 Bk lt 0 S H0 Bks 8 HA Bk gt 8 H0 ska 8 HA3 Bklt S 2011 Pearson AddisonWesley All rights reserve d 514 I3 The Critical tValue and the tTest 5 Decision Rule cont As well as for twosided hypotheses around zero or another hypothesized value S H03 Bk 0 H03 Bk 8 HA3k O HABk S From Statistical Table 81 the critical t value for a onetailed test at a given level of significance is exactly equal to the critical t value for a twotailed test at twice the level of significance of the onetailed test as also illustrated by Figure 55 2011 Pearson AddisonWesley All rights reserved 5391 5 Figure 55 OneSided and 2 TwoSided tTests 4 5 r OneSided Level of Significanc 7 1699 o I 99 lt m TwoSided Level of Signi cance Figure 55 OneSided and TwoSided lTests The 1 fora onesided test at a given level ofslgnilicance is equal exactly lo the 1C or a lworsidzd test with twice the level ofsigni cance ofthe onersided lest For example tr 1699 for a 10perceni Wmsided mul fora 5percenl onesided E s for 29 degrees of cedom 2011 Pearson AddisonWesley All rights reserved 516 I3 Choosing a Level of 127 Significance The level of significance must be chosen before a critical value can be found using Statistical Table B The level of significance indicates the probability of observing an estimated tvalue greater than the critical tvalue if the null hypothesis were correct It also measures the amount of Type Error implied by a particular critical tvalue Which level of significance is chosen 5 percent is recommended unless you know something unusual about the relative costs of making Type and Type II Errors 2011 Pearson AddisonWesley All rights reserved 5391 7 Confidence Intervals l5 A confidence interval is a range that contains the true value of an item a specified percentage of the time It is calculated using the estimated regression coefficient the twosided critical tvalue and the standard error of the estimated coefficient as follows Con dence interval tC 8158 What s the relationship between confidence intervals and two sided hypothesis testing cannot reject the null hypothesis 55 If a hypothesized value fall within the confidence interval then we 2011 Pearson AddisonWesley All rights reserved 518 pValues This is an alternative to the ttest A pvalue or marginal signi cance level is the probability of observing a tscore that size or larger in absolute value ifthe null hypothesis were true Graphically it s two times the area under the curve of the tdistribution between the absolute value of the actual tscore and in nity In theory we could find this by combing through pages and pages of statistical tables But we don t have to since we have EViews and Stata these and other statistical software packages automatically give the pvalues as part of the standard output In light ofall this the pvalue decision rule therefore is Reject HO if pvalueK lt the level ofsignificance and if BKhas the sign implied by H A 2011 Pearson AddisonWesley All rights reserved 5391 9 Examples of tTests 5 OneSided The most common use ofthe onesided ttest is to determine whether a regression coefficient is signi cantly different from zero in the direction predicted by theory This involves four steps 1 Set up the null and alternative hypothesis 2 Choose a level of significance and therefore a critical tvalue 3 Run the regression and obtain an estimated tvalue ortscore 4 Apply the decision rule by comparing calculated tvalue with the critical tvalue in orderto reject or not reject the null hypothesis Let s look at each step in more detail for a specific example 2011 Pearson AddisonWesley All rights reserved Examples of tTests OneSided cont 39lt Consider the following simple model of the aggregate retail sales of new cars Y nil i2 is e 56 Where Y sales of new cars X1 real disposable income X2 average retail price of a new car adjusted by the consumer price index X3 number of sports utility vehicles sold The four steps for this example then are as follows 2011 Pearson AddisonWesley All rights reserved B Step 1 Set up the null and 5 alternative hypotheses From equation 56 the onesided hypotheses are set up as 1 H0p1so HA3B1gt0 2 H03220 HABZlt0 3 H0p320 HA353lt0 Remember that a ttest typically is not run on the estimate of the constant term 30 2011 P eeee on AddisonWesley All rights reserved 6 Step 2 Choose a level of significance and therefore a critical tvalue Assume that you have considered the various costs involved in making Type and Type II Errors and have chosen 5 percent as the level of significance There are 10 observations in the data set and so there are 10 3 1 6 degrees of freedom At a 5percent level of significance the critical tvalue to can be found in Statistical Table 81 to be 1943 2011 Pearson AddisonWesley All rights reserved 6 Step 3 Run the regression and 5 obtain an estimated tvalue Use the data annual from 2000 to 2009 to run the regression on your OLS computer package Again most statistical software packages automatically report the tvalues Assume that in this case the tvalues were 21 56 and 01 for 31 32 and 33 respectively 2011 Pearson AddisonWesley All rights reserved 6 Step 4 Apply the t test 5 decision rule As stated in Section 52 the decision rule for the ttest is to Reject HO if tk gt tC and if tk also has the sign implied by HA In this example this amounts to the following three conditions For B1 Reject HO if 21 gt 1943 and if 21 is positive For 32 Reject HO if 56 gt 1943 and if 56 is positive For B3 Reject HO if O1 gt 1943 and if O1 is positive Figure 56 illustrates all three of these outcomes 2011 Pearson AddisonWesley All rights reserved Figure 56a OneSided tTests of the Coefficients of the New Car Sales Model Acceptance Region H0 5 S 0 HA BJ gt 0 Rejection Region 3 i 1943 ll 2011 Pearson AddisonWesley All rights reserved Figure 56b OneSided tTests of the Coefficients of the New Car Sales Model t Acceptance Region Rejection Region Ho 1 B 3 0 HA B lt O HD2B320 llAB3ltl 1943 01 56 Figure 56 OneSided r Tesls of the Coef cients of the New Car Sales Model Given the estimates in Equation 57 and the critical Walue 0f1945 for a Spercenl level of signi cance onesided 6 elegrees of freedom meal we can reject the null hy pothesis for B1 but not for 32 or 33 2011 Pearson AddisonWesley All rights reserved I3 Examples of tTests a N TwoSided The twosided test is used when the hypotheses should be rejected if estimated coefficients are significantly different from zero or a specific nonzero value in either direction So there are two cases 1 Twosided tests of whether an estimated coefficient is significantly different from zero and 2 Twosided tests of whether an estimated coefficient is significantly different from a specific nonzero value Let s take an example to illustrate the first of these the second case is merely a generalized case of this see the textbook for details using the Woody s restaurant example in Chapter 3 2011 Pearson AddisonWesley All rights reserved 6 Examples of tTests 5 TwoSided cont Again in the Woody s restaurant equation of Section 32 the impace of the average income of an area on the expected number of Woody s customer s in that area is ambiguous A highincome neighborhood might have more total customers going out to dinner positive sign but those customers might decide to eat at a more formal restaurant that Woody s negative sign The appropriate twosided ttest therefore is 2011 Pearson AddisonWesley All rights reserved Figure 57 TwoSided tTest of the Coefficient of Income in the Woody s Model Hn a U quotAcceptuncequot 11MB 0 Region Rejection Region 0 u m 045 Critical Critical 2 37 Value value Evtimated zValuc Figure 57 TwoSided tTest of the Coefficient of Income In the Woody39s Model Given the estimates of Equation 54 and the Critical Values of 12045 for a 5percent level of signi cance twosided 29 degrees of freedom ttest we can reject the nu hypothesis that B 7 0 2011 Pearson AddisonWesley All rights reserved 530 Examples of tTests TwoSided cont The four steps are the same as in the onesided case 1 Set up the null and alternative hypothesis H0 5k 0 HA 5k 0 2 Choose a level of significance and therefore a critical tvalue Keep the level at significance at 5 percent but this now must be distributed between two rejection regions for 29 degrees of freedom hence the correct critical tvalue is 2045 found in Statistical Table 81 for 29 degrees of freedom and a 5percent twosided test 3 Run the regression and obtain an estimated tvalue The tvalue remains at 237 from Equation 54 4 Apply the decision rule For the twosided case this simplifies to Reject HO if 237 gt 2045 so reject H0 2011 Pearson AddisonWesley All rights reserved I3 Limitations of the tTest 2 Vl th the tvalues being automatically printed out by computer regression packages there is reason to caution against potential improper use of the ttest 1 The tTest Does Not Test Theoretical Validity If you regress the consumer price index on rainfall in a timeseries regression and find strong statistical significance does that also mean that the underlying theory is valid Of course not 2011 Pearson AddisonWesley All rights reserved I3 Limitations of the tTest 2 The tTest Does Not Test Importance The fact that one coefficient is more statistically significant than another does not mean that it is also more important in explaining the dependent variable but merely that we have more evidence of the sign of the coefficient in question 3 The tTest Is Not Intended for Tests of the Entire Population From the definition of the tscore given by Equation 52 it is seen that as the sample size approaches the population whereby the standard error will approach zero since the standard error decreases as N increases the tscore will approach infinity 2011 Pearson AddisonWesley All rights reserved B Key Terms from Chapter 5 2 Null hypothesis Decision rule Alternative hypothesis Critical value TypelError tstatistic Level ofsignificance Confidence interval Twosided test pvalue 2011 P eeee on AddisonWesley All rights reserved Chapter 3 Using Econometrics A Practical Guide Learning to Use Regression Analysis AH Studenmund AddisonWesley isaii imprinqu PEARSON Copyright 2011 Pearson Addison Wesley Slides by NielsHugo Blun9h A All rights reserved Washington and Lee Unlversrty Steps in Applied Regression Analysis The first step is choosing the dependent variable this step is determined by the purpose of the research see Chapter 11 for details After choosing the dependent variable it s logical to follow the following sequence Review the literature and develop the theoretical model Specify the model Select the independent variables and the functional form Nx Hypothesize the expected signs of the coef cients Collect the data Inspect and clean the data Estimate and evaluate the equation Document the results mores 2011 Pearson AddisonWesley All rights reserved 31 Step 1 Review the Literature and Develop the Theoretical Model Perhaps counter intuitively a strong theoretical foundation is the best start for any empirical project Reason main econometric decisions are determined by the underlying theoretical model Useful starting points Journal of Economic Literature or a business oriented publication of abstracts Internet search including Google Scholar EconLit an electronic bibliography of economics literature for more details go to wwwEconLitorg 32 2011 Pearson AddisonWesley All rights reserved Step 2 Specify the Model Independent Variables and Functional Form 2 After selecting the dependent variable the specification of a model involves choosing the following components 1 the independent variables and how they should be measured 2 the functional mathematical form of the variables and 3 the properties of the stochastic error term 2011 Pearson AddisonWesley All rights reserved 3393 Step 2 Specify the Model Independent Variables and Ji 7 Functional Form cont A mistake in any of the three elements results in a specification error For example only theoretically relevant explanatory variables should be included Even so researchers frequently have to make choices also denoted imposing their priors Example when estimating a demand equation theory informs us that prices of complements and substitutes ofthe good in question are important explanatory variables But which complements and which substitutes 2011 Pearson AddisonWesley All rights reserved 3394 l3 39lt Signs of the Coefficients Step 3 Hypothesize the Expected Once the variables are selected it s important to hypothesize the expected signs of the regression coefficients Example demand equation for a final consumption good First state the demand equation as a general function Qd fP Y PC PS e The signs above the variables indicate the hypothesized sign of the respective regression coefficient in a linear model 32 2011 Pearson AddisonWesley All rights reserved 35 Step 4 Collect the Data amp Inspect and Clean the Data A general rule regarding sample size is the more observations the better as long as the observations are from the same general population The reason for this goes back to notion of degrees of freedom mentioned first in Section 24 When there are more degrees of freedom Every positive error is likely to be balanced by a negative error see Figure 32 The estimated regression coefficients are estimated with a greater deal of precision 2011 Pearson AddisonWesley All rights reserved 3396 Figure 31 Mathematical Fit ofa Line to Two Points Figure 31 Mathematxcal PM o a ane to Two Palms ll xhere arc only WU pnims in a data set as in Figure 31 1 slmighl line 1an lled 10 those points malhenmlically without erlor because two poinls completely determine a slmiglu lino 2011 Pearson AddisonWesley All rights reserved 37 Figure 32 Statistical Fit of a Line to Three Points Figure 32 Stansncai Fit ol a Llne to Three Points If there an lhrec or more pninls in a data sol as in Figure 32 Ihcn he iin musl almost always be lled 10 he points statistically using the estimation procedures of Section 11 2011 Pearson AddisonWesley All rights reserved 3398 l3 Step 4 Collect the Data amp Inspect and Clean the Data cont 39lt Estimate model using the data in Table 22 to get Inspecting the data obtain a printout or pot graph of the data Reason to look for outliers An outlier is an observation that lies outside the range ofthe rest of the observations Examples Does a student have a 70 GPA on a 40 scale ls consumption negative 2011 39 Pearson AddisonWesley All rights reserved Step 5 Estimate and Evaluate the Equation Once steps 1 4 have been completed the estimation part is quick using Eviews or Stata to estimate an OLS regression takes less than a second The evaluation part is more tricky however involving answering the following questions How well did the equation fit the data Were the signs and magnitudes of the estimated coef cients as expected Afterwards may add sensitivity analysis see Section 64 for details 2011 Pearson AddisonWesley All rights reserved 3391 0 39lt Step 6 Document the Results A standard format usually is used to present estimated regression results in 10340 638Xi 088 33 t 722 N 20 W 73 The number in parentheses under the estimated coefficient is the estimated standard error of the estimated coefficient and the tvalue is the one used to test the hypothesis that the true value of the coefficient is different from zero more on this later 2011 Pearson AddisonWesley All rights reserved 3391 1 3 Case Study Using Regression Analysis to Pick Restaurant Locations 39lt Background You have been hired to determine the best location for the next Woody s restaurant a moderately priced 24hour family restaurant chain Objective How to decide location using the six basic steps of applied regression analysis discussed earlier 2011 Pearson AddisonWesley All rights reserved 312 6 Step 1 Review the Literature and Develop the Theoretical Model Background reading about the restaurant industry Talking to various experts within the firm All the chain s restaurants are identical and located in suburban retail or residential environments So lack of variation in potential explanatory variables to help determine location Number of customers most important for locational decision 9 Dependent variable number of customers measured by the number of checks or bills 2011 Pearson AddisonWesley All rights reserved 3391 3

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I made $350 in just two days after posting my first study guide."

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.