Applied Quantitative Analysis for Finance
Applied Quantitative Analysis for Finance FIN 203
Popular in Course
Popular in Finance
This 24 page Class Notes was uploaded by Rose Harvey I on Wednesday October 28, 2015. The Class Notes belongs to FIN 203 at Wake Forest University taught by Umit Akinc in Fall. Since its upload, it has received 26 views. For similar materials see /class/230707/fin-203-wake-forest-university in Finance at Wake Forest University.
Reviews for Applied Quantitative Analysis for Finance
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/28/15
DurbinWatson Critical Values 95 d Page 1 of 4 gt Critical Values for the Durbin Watson Statistic Level of Signi cance a 05 k2 k3 dL dU dL dU http wwwek0n0mij acgyudokumentaekonometrij aDurbinWatson205tatistikahtm 20060427 Durbin Watson Critical Values 95 d Page 2 of 4 31 136 150 130 157 123 165 116 174 109 183 111 182 113 181 115 181 116 180 118 180 119 180 121 179 122 179 123 179 129 178 134 177 138 177 141 177 144 177 146 177 149 177 151 177 152 177 154 178 156 178 157 178 166 180 172 182 Critical Values for the Durbin Watson Statistic Level of Signi cance 0c 05 k7 k8 k9 dL dU dL dU dL dU http wwwek0n0mij acgyudokumentaekonometrij aDurbinWats0n20statistikahtm 20060427 Durbin Watson Critical Values 95 d Page 3 of 4 11 020 301 http wwwek0n0mij acgyudokumentaekonometrij aDurbinWatson205tatistikahtm 20060427 Durbin Watson Critical Values 95 d Page 4 of 4 45 124 184 119 190 114 196 109 202 104 209 http wwwek0n0mij acgyudokumentaekonometrij aDurbinWatson205tatistikahtm 20060427 Seennd Order Regressinn Mmlels Twmm l values p lquot Ml y t p we are golng to lntxoduce some extehsrohs to the slmple lmearmodel that allow curvature vanables Therefore here ls a eautloh before we proeeed model the smaller the deylauohs between the aetual andthe predletedresporlse yanables Iquot r t t t r The reason th5 oeeurs ls relatedto the degrees offreedom of the regresslon The more It used the resultrs 0 degrees offreedom maklng the model eompletely useless for whether to melude afaetor as an mdepehdehtyarrable beeause there ls a cost quot Recall thatthe adjusted R thes to aeeouht forthrs eost The prmerple ofparslmanyrequlres shouldbe used but those thathaye marglnal or no explanatory power shouldhot I Seennd nn ler Mmlels Cansummmn meome mdepehdeht vanable ls not best desehbed by a strarght lme As economlc theory or dl t tapers off as your income increases you tend to consume more but the increase in your consumption begins to slow The linear model Y A BX 8 would yield a poor t because it essentially would be forcing a round peg in a square hole Obviously a better tting model would allow a curvature in the response surface In this case the addition of a square term as Y A BlX BzX 8 may capture the apparent nonlinearity We can estimate this model by adding the X2 values as a second independent variable and run it as a multiple regression The parameter E is called the rate of curvature and its signi cance can be tested in the usual way That is test the null hypothesis H0 32 0 versus 32 lt 0 or B gt 0 using the t statistic HE is signi cant and lt 0 then as shown in the above example the curvature is concavethe impact of the independent variable on the dependent variable decreases as X increases If on the other hand 32 is signi cant and gt 0 we have a convex relationship where the rate of change of the dependent variable strengthens as X increases The same conclusion can be reached mathematically by examining the derivative of Y with respect to X which is Bl ZBZX A negative 32 will reduce the derivative as X increases and vice versa Notice further that in a model in which Bl is zero implies a U shape relationship if B gt 0 inverted U if B lt 0 Finally even higher order models can be constructed by including a cubic term fourth power term etc Example assume X is the household weekly income and Y the weekly consumption Y X 500 25271 300 27181 350 400 33373 450 g 300 23808 235 a 36116 1020 g 200 38360 880 gtf 100 35920 567 20923 230 0 32449 470 0 500 1000 1500 34493 905 29771 468 X 39quotwme 36760 750 The rst order model Y A BX g is estimated as Rgression Statistics Multiple R 0856799 R Square 0734104 Adjusted R Square 0707515 Standard Error 3091203 Observations 12 ANOVA df SS MS F 39a 39 39 F Regression 1 2638161 2638161 2760872 0000371 Residual 10 9555536 9555536 Total 11 3593715 Standard Upper 95 16681007 25958019 f Error tStat Pvaue Lower 95 Intercept 2131951 2081785 1024098 128E06 X 0179006 0034068 52544 0000371 01030984 02549145 The coef cient of determination of 734 indicated a good t with 734 of the observed differences in consumption being attributed to variations in income The independent variable income is highly signi cant with a pvalue of 00037 in other words we would be able to reject the null hypothesis that B 0 at any level of signi cance gt 000037 The standard error is about 31 However from the graph of the points above it is apparent that the t may be improved by a secondorder model which includes X2 as another independent variable Y X 25271 300 27181 350 33373 450 23808 235 36116 1020 38360 880 35920 567 20923 230 32449 470 34493 905 29771 468 36760 750 Y A BlX BZXZ g is estimated below Rgression Statistics y 90000 122500 202500 55225 1040400 774400 321489 52900 220900 819025 219024 562500 Multiple R 0968032 R Square 0937087 AdjustedR Square 0923106 Standard Error 1584973 Observations 12 ANOVA df SS MS F F Regression 2 3367622 1683811 6702694 393E06 Residual 9 2260927 2512141 Total 11 3593715 Standard 39 39 Error tStat Pvaue Lower 95 Upper 95 Intercept 7598443 2760975 2752087 00224 13526787 13844207 X 0735177 0104679 7023129 617E05 04983753 09719781 X2 000045 844E05 538864 0000439 00006458 00002639 functlon of lncome Coemerent of deterrnrnatron ls now about 93 7 both Br and 82 hlghly srgnr eant Negatrye 82 rndeates that the relatronshrp between lncome and beeorne Remember however that predetrons of the dependent yanable forvalues ofthe rndependentyarrable outsrde ofthe range m the sarnple here from 230 to 1020 wlll glve mlsleadmg results 11 Interactinn Mndels wth two rndependent yanables X 37x1 s Say Yls eornpensatron 000 edueanon anan duea xed yalue ofexpenence X2 For rnstanee forX the equatron beeornes Y 3 for t 2465Xl rn pay ends to lncrease by 500 for every addnonal year ofeducatlon X2 X0 Thls relanonshrp between YandX forvanous leyels ofX ean be graphed as follows In thls t suspeet possrbrlrty as graphed below For a person with little experience X2 l the rate of increase in pay as education increases is stronger the line is steeper than for a more experienced person X2 3 In a model that allows this type of relationship X1 and X2 are said to interact We can model the interaction by including a term X 1X 2 in the model With this term included the model becomes Y A 31X1 32X2 33 X1X2 g This model can be estimated and the signi cance of the interaction term X1 X2 can be questioned by testing H0 33 0 versus 33 lt 0 or 33 gt 0 using student s I If 33 lt 0 and signi cant then the interaction is negative This means that as one variable increases in magnitude the effect of the other variable on the independent variable moderates This is the case in the above example If 33 gt 0 and signi cant the interaction is positive and the two variables reinforce one another The same conclusion is reached by looking at the derivatives of Y with respect to X1 and X z dYdX1 31 33X2 and dYdX2 32 33X1 If33 gt 0 and signi cant either derivative will be larger as the other variable increases and vice versa Example Y is pay 000 X1 education yrs and X2 experience yrs Y x1 x2 gtlt1gtlt2 53 1 3 3 642 2 2 4 428 1 2 2 664 4 1 4 815 5 4 20 638 2 3 6 662 1 5 5 572 3 2 6 778 6 3 18 975 8 6 48 843 4 8 32 681 3 2 6 The rst order model is Y A 31X1 32X2 g and estimated as Regression Statistics Multiple R 0944482 R Square 0892047 Adjusted R Square 0868057 Standard Error 5393216 Observations 12 ANOVA Signi cance df SS MS F Regression 2 2163166 1081583 3718469 446E05 Residual 9 261781 2908678 Total 11 2424947 f 39 39 Standard tStat Pvaue Lower 95 Upper Error 95 Intercept 4204253 3542825 1186695 847E07 340281 5005696 X1 4859923 0795379 6110198 0000177 3060651 6659194 X2 3021774 0861268 3508518 0006634 1073451 4970097 The regression is highly signi cant pvalue 446E05 coef cient of determination is better than 89 both X1 and X z are signi cant Estimate of the standard deviation of pay is about 5393 Suspecting signi cant interaction between education and experience and estimating the interactive model YA 13le BzX2B3X1X2 gyields Regression Statistics Multiple R 0951435 R Square 0905228 AdjustedR Square 0869689 Standard Error 535976 Observations 12 ANOVA Signi cance df SS MS F Regression 3 219513 7317101 2547114 0000191 Residual 8 2298162 2872703 Total 11 2424947 Standard Upper f quot Error tStat Pvaue Lower 95 95 Intercept 3424439 8188269 4182128 0003071 1536221 5312657 X1 7177251 2334712 3074149 0015252 1793396 1256111 X2 5100153 214819 2374162 0044953 0146417 1005389 X1X2 054759 0519117 105485 0322307 174468 0649496 Is this a better t The answer is found by testing the null hypothesis Ho B3 0 versus B3 lt 0 We can not reject the null hypothesis there is no signi cant interaction at even a modest level of signi cance of or 10 pvalue is 322 Despite the fact that the coef cient of determination improved to 905 and standard deviation is reduced by a small amount there is no compelling evidence that the inclusion of the interaction improves the model An interesting application of interaction among variables is when one of the variables suspected to interact with another happens to be a qualitative variable such as gender Suppose in the above example we differentiate between male and female observations by coding a new dummy variable X3 1 for males and 0 for females We can add an interaction term B4 X 1X 3 to the model to investigate whether or not the length of education affects pay for males differently than it does for females In this extended model the derivative of Y with respect to education is Bl 34X 3 If B4 is signi cant then we can conclude that the impact of education on pay for males X 3 l is Bl B4 while for females X3 0 is simply Bl Further if B4 gt 0 then education impacts male pay more strongly than it does female pay and vice versa 111 General second order model Suppose we have two independent variables to use for predicting the value of a dependent variable A complete second order model can be formed by including both the squared variables as well as the interaction term as follows Y A 3le BZXZ B3X1X2 34312 BSXZZ 8 One way to test the appropriateness of this complex model compared to the simpler alternative rstorder model Y A Ble BZXZ g is to use the ordinary t test the signi cance ofB3 B4 or B5 one at a time However this will not always give a reliable diagnosis To see why not suppose for a moment that none 0fB3 B4 and 35 is signi cant If we test each of these null hypotheses individually that B 0 at a 05 there will be 95 chance we ll make the correct decision for 33 that it is zero 95 chance with respect to B4 and 95 chance with respect to B 5 Thus the probability of correctly nding all the Bs insigni cant ie zero will be 953 857 leading to a type I error probability of rejecting the null when it is true of about 143 Obviously the more additional terms we test the more this error will be To avoid this we need to test the contribution of these second order terms collectively as H02 B3 34 35 0 H1 at least one is not zero Notice how similar this is to the Ftest used for the general signi cance of the entire multiple regression model As you may guess the appropriate test statistic for this test is the F distribution and the test is called the partial Ftest for we are testing a subset of the parameters and not all of them Let us refer to the simpler model Y A Ble BzXz g as the reduced model as opposed to the complete model For a general case let g to denote the number of B parameters in the reduced model in our case g 2 and kto denote the number of B parameters in the complete model 5 here Let SSER and SSEc be the sum of squared errors for the reduced and the complete models respectively Then the test statistic for the partial F test is given by F SSER SSECk g SSEC n k 1 n k 1 degrees of freedom for the denominator where n is the sample size as before If the computed test statistic exceeds the critical F for the appropriate or with k g and n 7 k 1 degrees of freedom the null is rejected and the signi cant contribution of the square terms and the interaction term to the predictive power of the model is acknowledged To conduct this test in order to choose between the simpler parsimonious and the more complex model we need to estimate both models rst and then do the partial Ftest to choose between them with k g degrees of freedom for the numerator and Example In the previous example of Y pay X1 education and X z experience we can construct the complete model as YA 13le BzXz B3X1Xz B4X12 35X22 g and de ne the model Y A Ble BZXZ g as the reduced model The data to estimate both models is Y x1 x2 gtlt1gtlt2 X12 X22 53 1 3 3 1 9 642 428 664 815 638 662 572 778 975 843 681 COLOJCDQJ MU IL M MOJCDQMU ICOL MM N 1300100be cob MOD 0 x DOA bk We had already estimate the first order model Y A Ble BzXz 8 above The estimate of YA 3le BZXZ 33Xle 34312 BSXZZ 8 Regression Statistics Multiple R 0954852 R Square 0911742 AdjustedR Square 0838194 Standard Error 5972436 Observations 12 ANOVA Signi cance df SS MS F Regression 5 2210927 4421853 1239656 0004072 Residual 6 21402 3567 Total 11 2424947 Standard Upper f quot Error tStat Pvaue Lower 95 95 Intercept 2903369 1249285 2324025 0059122 153521 5960259 X1 8691706 3630475 2394096 0053725 019175 1757516 X2 7108731 4789042 1484374 0188244 460963 188271 X1X2 018994 0823781 023057 0825313 220565 1825784 X12 037711 062401 060433 0567755 190401 1149787 X22 035318 0581485 060737 0565865 177602 1069665 The coefficient of determination is marginally better for the complete model 911 versus 892 ie the complete model accounts for 911 of the variation in the pay the parsimonious model accounts for 892 However the complete model is not a better model none of the BS appears to be signi cant This happens as a result of two factors First since the sample size is relatively small the complete model has very few degrees of freedom six as opposed to nine for the reduced model and second since the derived variables X1X2 X12 and X 22 are mathematically related to the original variables X1 and X 2 second order models tend to be susceptible to multico linearity Although with these observations we can easily see that the parsimonious model is superior let s do the formal partial Ftest to verify that the second order terms are not contributing signi cantly to the power of the model in predicting pay based on education and experience SSEc 21402 SSER 26178 k5 g 2 n 12 H0133 B4 35 0 H1 at least one is not zero a05 FZW 26178 214025 2 O4463 SSEcn k1 21402126 The critical F value with 3 and 6 degrees of freedom for a 05 is 4757055 therefore as we suspected we can not reject the null hypothesis that the interaction and square terms are all insigni cant The use of the partial Ftest is not con ned to test the signi cance of the interaction andor squared terms it can be used to choose between any two alternative models in which one model contains all the B parameters of the other model and then some LINEAR REGRESSION INTRODUCTION Decisions in business and other areas are often based on predictions ofwhat might happen in the future As one39s ability to predict future improves decisions can be made whose outcomes are more favorable The most formidable approach in predicting the future is to establish quantitative relationships between what is known and what is to be predicted Regression and correlation are interrelated statistical techniques that allow decision makers not only to establish a quantitative relationship among such variables but also measure the strength of the relationship In regression analysis a mathematical equation is estimated which relates some known quantities to an unknown variable ofinterest Examples may include the relationship between advertising expenditure and level of demand volume ofproduction and material cost smoking habits and incidents of heart diseases In all of the examples the relationship to be established is of statistical stochastic nature This means that we do not pretend to imply that the level of demand is exclusively and deterministically depends on advertising budget rather we hypothesize that among other zttam advertising budget has some nonitrivial effect on the level of demand Thus knowing the advertising budget does not allow us to predict sales without any error but simply affords a more accurate prediction than would be possible without that knowledge This is an important difference from the scientific laws where the knowledge of certain variables allow the scientists to make very accurate predictions as in the case of the speed ofa train determining the time to traverse a 100 mile track without error Regression and correlation analyses are used to establish a relationship or an association between two or more variables The known variable s isare called the independent explanatory variable s while the variable to be predicted is the dependent or response variable In the example of advertising versus sales volume the advertising budget is the known independent variable while the sales volume is the dependent response variable to be predicted In regression analysis we can have only one dependent variable but can use more than one independent variable for instance price in addition to advertising to predict this dependent variable Ifwe have only one independent variable the regression model is called a simple regression model whereas if we have more than one independent variable we have a multiple regression model In what follows we will first develop the simple regression model then extend it to the multiple regression case In the context of simple regression model the nature of the relationship can take many formsquot it may be linear or nonilinear concave convex or some arbitrary polynomial In the majority of the applications of regression method a linear relationship is assumed Especially in business and other social sciences nonilinear regression models are used with somewhat less frequency In the simple linear regression the stochastic relationship between the dependent and the independent variable can be represented by a straight line in the form of Y A BX E This is called the true model and is assumed to have been obmined from the entire population I Iere Y is the dependent variable X is the independent variable the parameters A and B are respectively are the intercept and the slope of the regression while 6 is the error term that represents the influence of all the other unknown factors on the dependent variable If the value ofB is positive we speak ofa direct relationship between the variables as they both go in the same direction 6 3 as the independent variable increases so does the dependent variable and vice versa On the other hand if the parameter B has a negative value the relationship is inverseiwhen the independent variable increases the dependent variable decreases and vice versa HE is actually zero then there is no relationship between Xand Y From this point on lets assume Yrepresents the monthly sales volume and Xthe advertising budget Since we assume there are other factors besides the advertising budget affecting the sales volume even for xed value of X a range of Y values are possible ie the relationship is not deterministic For any xed value of X the distribution of all Y values is referred to as the conditional distribution of Y denoted by Y lXiread as Y given X For example Y 1X 500 refers to the distribution of sales volumes in all months in which the advertising budget has been 500 In regression analysis we make certain assumptions about the conditional distributions of the dependent variableivariable which we try to predict o Normality All conditional distributions are normally distributed 6 g the distribution of sale volumes in all months in which advertising has been or will ever be some xed level is normal Homoscedasticity All conditional normal distributions have the same variance 02 Linearity The means of the conditional distributions are linearly related to the value of the independent variable Deterministic X Values of the independent variables isare known without any uncertainty Independent errors The magnitude of error in one observation has no in uence of the magnitude of the error in other observations The third assumption is implicit in the model Y A BX 6 Mean of YX Y X A BX since the mean of e is zero There are very large number of other factors affecting Y some positive some negative thus their combined effect is zero Utopia 1n the example ifwe knew we probably never will the true model Y 64958 11422X and 02 356 we would be able to make somewhat accurate predictions of Y for any given value of X For example if we wondered about sales volume when advertising is 500 we would calculate Y 64958 11422500 1220 We would then say the mean expected value of sales when X 500 is 1220 period However if we wanted to make a prediction of sales volume next month knowing that advertising budget is set at 500 this is a more dif cult question We now have to deal with the effect of all other factors and can only say that we expect sales to be more or less 1220 dependent on how the other factors 6 materialize The magnitude of more or less obviously depends on the value of Reality We do not know the true model AB 0392 but have a random sample of observations of Y andX values from which we can calculate estimates ofAB 0392 don t worry for the time being how the sample data used to get these estimates Let us call these estimates a b and se respectively Now in predicting Y values we have a more dif cult task in addition to the effect of other factors 6 we have to consider that our estimate of Ausing statistic a Busing statistic b and 02 using statistic se may have errors After all they came from a random sample Leznrsqllares methnd hr estimating the true mndel Yvalues these are plottedrh the following graph Usmg thrs method we want to e m w r 0 mi ofthe squared vemcal ddfferences between Y and the eshrhated Yx2 19 a 17X rs rhrhrrhrzed Thrs rs ruustrated at one ofthe sample porhts on the graph The quahhty to be rhrhrrhrzed by the ehoree ofa and 5 rs z Y r 1022 a r 17X2 Let39s ean thrs the sum of squared errors SSE Ifwe choose 2093007 2 X z anXT 225 under the raddcand rs premsely SSE These three formulas give then the leastsquares estimate of the true model true relationship between Y and X as l a bX Sampling Distribution of and b Since these estimates statistics are obtained from a random sample their values are not fixed like A and B but are variable ie they are random variables If had taken many different random samples and calculated a and b from them as above we wouldn t always find the same values Thus a and b like any other statistic calculated from a random sample can be associated with a probability distribution which is called sampling distribution Also the standard deviation of the possible a s and b s of the sampling distributions are referred to as the standard error of these estimators This discussion concludes our introduction and the estimation of the linear model from a sample we will continue our discussion with l measuring the degree of fit betweenX and Y and later 2 using the estimated model to make predictions or inferences about the true relationship betweenX and Y and about Y for any given fixed value of X MULTIPLE REGRESSION In simple regression the smaller the value of the standard error of estimate Se the better the predictive power of the model All con dence intervals obtained from the estimated model will be narrower if Se is smaller Remember that Se in fact is an estimate for the common standard deviation of all conditional distributions namely 6 Remember also that 0 measures the scatter around the regression line or the effect of the factors not considered in the model on the dependent variable Thus one way to improve the model s predictive power is to reduce 0 by explicitly considering additional factors as independent variables In the model Y A BX 8 where the dependent variable Y is the sales and the independent variable X is the advertising expenditures perhaps another determinant of the sales Ymight be size ofthe sales force ofnumber of salespeople used Ifwe plotted the estimated line and the actual observations and found that the points above the line are generally associated with high sales force levels and vice versa this approach using number of sales people as a second independent variable in addition to the advertising budget becomes a viable way of improving the model fit ie Y A BIXI BZXZ 8 where now X1 is the advertising budget and X2 is the size ofthe sales force Think this way with only one independent variable X1 the vertical distances of the points from the line are interpreted as unexplained variations or as errors however adding the size of the sales force X 2 as an additional independent variable will now attribute part of these unexplained differences to a known factor leaving smaller unexplained differences errors And this is the main idea of multiple regression Sample 1 y 13681x 68965 30000 25000 20000 15000 Sales 10000 00 5000 10000 15000 Advertising In general the true not observable multiple regression model with k independent explanatory variables has the form Y A BIXI Bng Bka g which is estimated from a sample set of observations as Y a 11XV1 12XV2 I3XV3 Notice that geometrically this represents a hyper plane rather than a linear line which can easily be drawn on a two dimensional space such as your notebook As before we refer to the distribution of sales revenue for any fixed X1 advertising expenditure and fixed X 2 size of the sales force the conditional distribution The assumptions that we made in the case of simple regression apply in the multiple egression as well 1 each conditional distribution is normal eg the sales revenue in all cases when the advertising has been 500 and 3 sales people have been used is normally distributed 2 the variance of the independent variable does not depend on the values of the independent variables eg the variance of the conditional distribution say with 500 advertising expenditure and 3 sales people is the same as the conditional distribution of sales volume with advertising expenditures of 600 and 2 sales people These are the normality and the 39 39 Jquot 1 I quot 39J The estimation ofthe regression coefficients A B 32 Bk by the least square method is based on the same principle of choosing the estimators a b1 b2 bkin a way that the sum of the squares of the vertical differences between the observed Y s in the sample and the estimated Ys ie 201 ff will be minimum We will rely on computer packages such as Excel SAS SPSS MINITAB to do the estimation and emphasize the use of the output from such packages Example Suppose it is believed that the 10year Treasury bond rates can be predicted by the prevailing overnight federal funds rate and 3month Treasury bill rate Thus we have Y 10year Treasury bond rate response variable and two independent variables X1 Federal funds rate X 2 3month Treasury bill rate The true model considered to be estimated is Y A BIXI Bng son the basis ofa sample of 16 observations obtained between 1980 and 1995 inclusive MULTIPLE REG REESION EXAMPLE Year Y X1 X2 1980 1143 1335 1139 1981 1392 1639 1404 1982 1301 1224 106 1983 111 909 862 1984 1246 1023 954 1985 1062 81 747 1986 767 68 597 1987 839 666 578 1988 885 757 667 1989 849 921 811 1990 855 81 75 1991 786 569 538 1992 701 352 343 1993 587 302 3 1994 769 421 425 1995 657 583 549 Here w Runnin results e have n 16 sample size and k 2 number of independent variables g a multiple regression using Excel regression tool we obtain the following SUM MARY OUTPUT Regression Statistics Multiple R 093918668 R Square 088207162 Adjusted R Square 08639288 Standard Error 08965961 Observations 16 ANOVA df SS MS F Signi cance F Regression 2 7816684427 390834221 486182013 92367507 Residual 13 1045049948 080388458 Total 15 8861734375 Standard Error tStat Pvaue Lower 95 Upper 95 Intercept 289591344 0818926109 353623289 000365145 112673115 466509573 X1 1339491821 0775045719 17407774 010532142 302356656 032520239 X2 237600263 0937405473 253465837 002490471 035086123 440114403 The est imated model is if 28959 13492X1 23760Xz The estimated standard deviation of all the conditional distributions is 8966 A brief explanation of the regression output follows A Regression Statistics R Square r2 is as before the coefficient of determination and measures the proportion of variation in 10year Treasury rates Y that is explained by federal funds rate and 3year treasury bill rate In other words it is l SSESST l 201 if 201 I702 In this case the regression accounts for about 94 of the variation in the 10year treasury Bond rate and about 6 is unaccounted for due to possibly other factors perhaps the state of the economy exchange rates etc Multiple R as before is the positive square root of Rsquare and measures the strength of correlation between 10year Treasury bond rate and the federal Funds rate and 3month Treasury Bill rate combination Adjusted R2 Adjusts R2 based on the number of independent variables Since the more independent variables you have the higher the R will be an adjustment is made to the R2 based on the number of explanatory variables used The adjustment is given by 1 7 n 1n k 11 R2 Notice that the bigger the k the smaller the adjusted R2 Y W Standard Error se ZX kl is the estimate of the common standard 4 deviation 039 B ANOVA 0 Regression 0 df degrees of freedom k number of independent variables 0 SSR Sum of squares regression ZOE l72 0 MSR mean Squares 7 regression SSdf 0 Residual o dfdegrees offreedom n 7 l k 0 SSE Sum of squares 7 error 201 T02 0 MSE Mean Squares 7 error SSdf 0 Total 0 dfdegrees offreedom n 7 l o SST Sum of squares 7 total 201 I792 0 MST Mean Squares 7 total SSdf Note that SST SSR SSE and df total df regression df error E o Fstatistic as in ANOVA analysis of variance portion of the output 7 the ratio of MSregression to MSerror 0 Signi cance pvalue of the F statistic explained more fully later C Estimated Model The last section of the output includes the details of the estimated regression model The first line is for the estimated intercept and is followed by one line per independent variable So there will always be k1 lines in this part of the output In each line the first figure is the estimated coe icient in this case a b1 and b 2 respectively The next column is the standard errors of the estimated parameters i e sa sb 1 and sh As before the t stat is the ratio of the estimated coefficient to its standard error e g b 1sb1 p value is the probability of observing a value as low or high extreme as the one we have e g b1 1349 if indeed 31 0 ie we cannot state at 05 level of significance thatX1 federal funds rate is a significant factor on determining the 10year Treasury Bond Rate Finally the last two columns give the lower and upper limits of the confidence interval for estimated parameter ie there is a 95 chance that the true B 2 is between a low of 07159 and a high of44011 Inferences with the Estimated Model After a model is estimated from a sample it can be used to make certain general and specific inferences about the population from which the sample observations came from Obviously these inferences will carry a measure of risk statistician s curse in the form of sampling errors A Inference about the estimated model in full The question here is simply whether the estimated regression equation has any predictive power ie is there any underlying relationship between the dependent variable and one some or all of the independent variables allowing better predictions about the response variable M the knowledge of the independent variables More formally this question is answered by the following test of hypothesis H0231 BzBk0 H1 at least one B i 0 Notice that the null hypothesis says none of the independent variables has any bearing on the dependent variable while the alternative asserts there is at least one that has some impact on the dependent variable The appropriate test statistic is the F stat in the print out with k and n k J degrees of freedom In the example the computed F stat is 486182 the critical F value at 05 signi cance level from the F table with k 2 and n k J 13 degrees offreedom is 381 Since the computed F statistic of 486182 and far exceeds the critical F value we reject the null hypothesis The same conclusion can be reached by simply looking at the p value of the F stat which is virtually 0 92367 397 much smaller than the default level of signi cance of 05 This simply says that if the null hypothesis had been true no underlying relationship an F value as large as 486182 would be very unlikely only 92367 397 probability allowing us to conclude that either the Federal Funds Rate or 3 month Treasury Bill rate or both have statistically signi cant impacts on the 10year Treasury bond rate B Inferences About the Individual Regression Parameters Bi Here we make inferences about the impact of each of the independent variables individually captured by 3 on the dependent variable H 0 B 0 H1 B 9 0 Twotailiwe do not care if the relationship is direct or inverse The appropriate test statistic is t bk insbk with n k J degrees of freedom Notice that since the hypothesized value Bk 0 the observed t bksbk If the computed t value is more extreme than the critical value we reject the null and conclude that there is some real relationship between the 139 independent variable and the dependent variable and that the nonzero value calculated for b is not likely to be due to random chance In the example if we wanted to judge whether Federal Funds Rate was a reliable predictor of 10year Treasury Bond Rate one way or the other ie 31 i 0 we would test the hypothesis H 0 B 1 0 H1 31 i 0 Twotail We want to conclude any impact The tstat is given in the printout second row third value as 17408 the critical t value from the t table for 05 025 on either side for a twotail test with n k J 13 degrees of freedom is 2160 Since the computed t value is not extreme enough we cannot reject the null hypothesis Thus there is insufficient evidence in the sample to allow us to conclude a real impact of the Federal Funds rate on the 10year Treasury Bond Rates The same conclusion can be reached without looking at the critical t value based on the pvalue In the printout the pvalue is 1053 which does not allow us to reject the null a level of signi cance of 05 If indeed there was no relationship between the 10year Rate and the federal Funds rate the chances of t values as small as 17408 is more than 10 not so improbable If we knew a priori that a high Federal Funds Rate cannot reasonably be a sign of high 10year Treasury Bond Rate or B gt 0 is not reasonable we would then do a one tail test The test now would be H 0 31 0 H12 B 1lt 0 Notice that the negative sign for the computed b2 does indicate an inverse relationship between 10yeare treasury bond rate and Federal Funds Rate but we want to see if the evidence is strong enough to beat the standard of proof of 05 level of significance The critical t value with 13 degrees of freedom is now under 10 which is 1771 and the p value is half of 1053 or 052 This obviously is an easier standard of proof due to the a priori assumption made about B 1 yet the sample still does not have enough power although close to meet this standard in this case either The t value is still less extreme than the critical tof 1771 174078 versus 1 1771 and the pvalue is still above 05 Based on either of these comparisons we cannot rule out the null hypothesis C Prediction ofa Speci c Ygiven X1 X2 Xk Known values of the independent variables allow one to make a point estimate of the dependent variable by simply plugging in the known values of the independent variables into the estimated equation to obtainY However since the sampling errors in the quot quot of the 39 J are present a more useful estimate might be in the form of a confidence interval An approximate confidence interval for Y for a specific instance year in this case can be obtained as If t1s2 SY S 18th where or is 1 the confidence interval and tgg is the corresponding tvalue with n k J degrees of freedom In the example say one is trying to predict the 10year Treasury Bond Rates with the knowledge that the federal Funds Rate is 9 and the 3month Treasury Bill rate is 667 What is the best estimate of the 10year Treasury Bill Rate The point estimate is l 28959 13492 9 2376 667 66012 An approximate 95 confidence interval can be stated using t value for or 05 with 13 degrees of freedom which is 2160 Thus the confidence interval extends from a low of 66012 2160 08965 46647 to a high of66012 2160 08965 85376 This is an approximate interval because it ignores the sampling errors in the estimation of A 31 and B 2 respectively by a b 1 and b 2 More advanced computer packages have the capability to calculate the exact confidence intervals for various combinations of independent variable levels D The Problem of Multi colinearity A problem that may af ict a multiple regression analysis is the socalled multicolinearity This condition exists when some of the independent variables are highly correlated among themselves In the example if the Federal Funds rate was very highly correlated with the 3month Treasury bill rate it would not contribute any independent additional information to predict 10year Treasury bond rate it would rather be duplicate and super cial information Multicolinearity does not make the regression totally useless but makes the interpretation of the results less straightforward When multicolinearity exists 0 The regression will still predict okay if the F stat in the ANOVA part is still significant 0 The tstats for the highly correlated variables may turn out to be insignificant even though the regression as a whole is significant 0 The estimated b coefficients may turn out to have the wrong unexpected sign 0 The model can be improved by simply determining the highly correlated variables and dropping one from the regression
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'