Econometric Theory & Methods
Econometric Theory & Methods ECN 215
Popular in Course
Popular in Economcs
This 78 page Class Notes was uploaded by Sydni D'Amore on Wednesday October 28, 2015. The Class Notes belongs to ECN 215 at Wake Forest University taught by Allin Cottrell in Fall. Since its upload, it has received 20 views. For similar materials see /class/230745/ecn-215-wake-forest-university in Economcs at Wake Forest University.
Reviews for Econometric Theory & Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/28/15
Regression Analysis Basic Concepts Allin Cottrell The simple linear model Represents the dependent variable 31 as a linear function of one independent variable Xi subject to a random disturbance or error ui yi 30 B1Xi rut The error term 14 is assumed to have a mean value of zero a constant variance and to be uncorrelated with its own past values ie it is white noise The task of estimation is to determine regression coef cients 1 and 131 estimates of the unknown parameters Bo and 131 respectively The estimated equation will have the form yi Bol 1X OLS The basic technique for determining the coef cients 1 and 1 is Ordinary Least Squares OLS Values for the coef cients are chosen to minimize the sum of the squared estimated errors or residual sum of squares SSR The estimated error associated with each pair of datavalues Xi 31 is de ned as A A iyi 5 i yi BO BIXi We use a different symbol for this estimated error 12139 as opposed to the true disturbance or error term ui These two coincide only if 1 and 1 1 happen to be exact estimates of the regression parameters x and B The estimated errors are also known as residuals The SSR may be written as SSR 2122 y 7 512 2011 90 7 ve Picturing the residuals The residual 12 is the vertical distance between the aActualAvalue of the dependent variable 31 and the tted value 5 Bo B1Xi Normal Equations Minimization of SSR is a calculus exercise find the partial derivatives of SSR with respect to both Bo and 131 and set them equal to zero This generates two equations the normal equations of least squares in the two unknowns Bo and 131 These equations are solved jointly to yield the estimated coef cients aSSRa o 7 722m 7 3390 7 3m 7 0 lt1 aSSRa i 7 722mm 7 3390 7 m 7 0 2 Equation 1 implies that Equation 2 implies that ZXiyi71 OZXi71 12Xi2 0 Now substitute for 130 in equation 4 using 3 This yields ZXiyi 37 lXZXi 13129612 ZXiyi 7 512x 7131fo 7 222m A 7ZXiyi75ZX1 TB 2x57222xi Equations 3 and 4 can now be used to generate the regression coef cients First use 5 to find 131 then use 3 to find Bo Goodness of t The OLS technique ensures that we find the values of 130 and 131 which fit the sample data bestquot in the sense of minimizing the sum of squared residuals There s no guarantee that 130 and 131 correspond exactly with the unknown parameters Bo and 131 No guarantee that the best fitting line fits the data well at all maybe the data do not even approximately lie along a straight line relationship How do we assess the adequacy of the fitted equation v First step find the residuals For each Xvalue in the sample compute the fitted value or predicted value of y using 37139 Bo B1961 n Then subtract each fitted value from the corresponding actual observed value of yi Squaring and summing these differences gives the SSR Example of nding residuals 130 7 5235091 1 01388 data xi data 31 tted 37139 1065 1999 2001 1254 2280 2263 1300 2350 2327 2850 2712 2744 2952 3021 3118 3208 3226 3651 4131 4409 4686 Standard error The magnitude of SSR depends in part on the number of data points To allow for this we can divide though by the degrees of freedom which is the number of data points minus the number of parameters to be estimated 2 in the case of a simple regression with intercept Let n denote the number of data points sample size then the degrees of freedom df n 7 2 The square root of SSRdf is the standard error of the regression 639 SS 0 1172 The standard error gives a first handle on how well the fitted equation fits the sample data But what is a big 639 and what is a small one depends on the context The regression standard error is sensitive to the units of measurement of the dependent variable Rsquared A more standardized statistic which also gives a measure of the goodness of fit of the estimated equation is R2 17 SSR li 2yi 5 27 SST R2 w SSR can be thought of as the unexplained variation in the dependent variable7the variation left over once the predictions of the regression equation are taken into account a 2yi 7 52 total sum of squares or SST represents the total variation of the dependent variable around its mean value R2 1 7 SSRSST is 1 minus the proportion of the variation in yi that is unexplained It shows the proportion of the variation in yi that is accounted for by the estimated equation As such it must be bounded by 0 and 1 0 s R2 s 1 R2 1 is a perfect score obtained only if the data points happen to lie exactly along a straight line R2 0 is perfectly lousy score indicating that Xi is absolutely useless as a predictor for yi Adjusted Rsquared Adding a variable to a regression equation cannot raise the SSR it s likely to lower SSR somewhat even if the new variable is not very relevant The adjusted Rsquared R2 attaches a small penalty to adding more variables If adding a variable raises the R2 for a regression that s a better indication that is has improved the model that if it merely raises the unadjusted R2 SSRn7k71 n71 R2717 717717R2 SSTn71 n7k71 where k 1 represents the number of parameters being estimated 2 in a simple regression To summarize SO far Alongside the estimated regression coef cients 1 and 1 we might also examine a the sum of squared residuals SSR a the regression standard error 639 a the R2 value adjusted or unadjusted to judge whether the best tting line does in fact t the data to an adequate degree Con dence intervals for coef cients As we saw a con dence interval provides a means of quantifying the uncertainty produced by sampling error Instead of simply stating I found a sample mean income of 39000 and that is my best guess at the population mean although I know it is probably wrongquot we can make a statement like I found a sample mean of 39000 and there is a 95 percent probability that my estimate is off the true parameter value by no more than 1200quot Con dence intervals for regression coef cients are constructed in a similar manner Suppose we re interested in the slope coef cient 31 of an estimated equation Say we came up with ll 90 using the OLS technique and we want to quantify our uncertainty over the true slope parameter Bl by drawing up a 95 percent con dence interval for Bl Provided our sample size is reasonably large the rule of thumb is the same as before the 95 percent con dence interval for B is given by ll i 2 standard errors Our single best guess at 31 point estimate is simply 31 since the OLS technique yields unbiased estimates of the parameters actually this is not always true but we ll postpone consideration of tricky cases where OLS estimates are biased On the same grounds as before there is a 95 per chance that our estimate 31 will lie within 2 standard errors of its mean value Bl The standard error of ll written as se 1 and not to be confused with the standard error of the regression 639 is given by the formula A quot2 o The larger is se 1 the wider will be our con dence interval a The larger is 639 the larger will be se 1 and so the wider the con dence interval for the true slope Makes sense in the case of a poor t we have high uncertainty over the true slope parameter a A high degree of variation of Xi makes for a smaller se 1 tighter con dence interval The more Xi has varied in our sample the better the chance we have of accurately picking up any relationship that exists between X and y Con dence interval example Is there really a positive linear relationship between Xi and yi We ve obtained 31 90 and seBl 12 The approximate 95 percent con dence interval for 31 is then 90 212 90 r 24 66 to 114 Thus we can state with at least 95 percent con dence that 31 gt 0 and there is a positive relationship If we had obtained se 1 61 our interval would have been 90 261 90 122 732 to 212 In this case the interval straddles zero and we cannot be con dent at the 95 percent level that there exists a positive relationship D Notes on Probability Allin Cottrell 1 Probability the classical approach The classical approach to probability which has roots in the study of games of chance is based on the assumption that we can identify a class of equiprabable individual outcomes of a random experiment For example the outcomes heads and tails are equiprobable when a fair coin is tossed the outcomes 1 2 3 4 5 and 6 are all equiprobable when a fair die is rolled Under these conditions the probability of any event A is the number of outcomes that correspond to A which we ll write as nA divided by the total possible outcomes n In other words it s the proportion of the total outcomes for which A occurs nA 0PA7l 1 For example let A be the event of getting an even number when rolling a fair die There are three outcomes gorresponding to this event namely 2 4 and 6 out of a total of six possible outcomes so the probability P A E i 2 2 Complementary probabilities This is simple but important If the probability of some event A is P A then the probability that event A does not occur written P A must be P A l 7 PA For example if the chance of rain for tomorrow is 80 percent the chance that it doesn t rain tomorrow must be 20 percent When trying to compute a given probability it is sometimes much easier to compute the complementary probability rst then subtract from 1 to get the desired answer This principle can be justi ed on the classical approach as follows Let n A denote the number of outcomes that do not correspond to event A Since every outcome either corresponds to event A or does not we have n nA n A or n A n 7 nA But then from rst principles n 7 71 n 77n7A17PA n n PbA 7A n n 3 Addition rule The addition rule provides a means of calculating the probability of A U B read A or B that is the probability that either of two events occurs With equiprobable individual outcomes we have PA quot7A and PB quot75 As a rst approximation to nd PA U B we need to add together the number of outcomes corresponding to event A and the number corresponding to B then divide by n But if the two events intersect nA n3 will overstate the number of outcomes corresponding to A U B speci cally the outcomes contained in the intersection of the events will Last revised January 2002 be doublecounted see Figure 1 Therefore we must subtract once the number of outcomes contained in the intersection which we ll write as n A 3 Thus nA l nB nAB 71A 713 quotAB quotAB PAUB T7777PAPB77 2 B U ADE A Figure l Illustrating the addition rule Consider the last term on the right in equation 2 above quotf This represents the fraction of the total outcomes that correspond to the intersection of A and B In other words it s the probability that A and B both occur P A B Thus the equation above can be put into its nal form as PAUBPAPB7PA B 3 Example Find the probability of drawing a spade A or a king B from a deck of cards There are 52 cards 13 spades 4 kings and one king of spades A B so 13 4 1 16 PAUB7 77ii 5252 52 52 4 Multiplication rule It is obvious that nAB 71 HA n n n A the righthand side is obtained by multiplying the lefthand side by n A n A 1 Let s see what identity 4 is saying On the lefthand side we have n A B n P A H B or the probability that A and B both occur On the righthand side we have rst n A n PA which represents the marginal or unconditional probability of A The second term is n AB n A this represents the number of outcomes corre sponding to A B over the number of outcomes corresponding to A Think about this expression It is nothing other than the conditional probability P BlA This can be read as the probability of B given A We are taking A as given by putting only the outcomes in A into the denominator In the numerator are the outcomes in A that are also in B The ratio is then the proportion of the outcomes in A that are also in B or assuming equiprobable individual outcomes P BlA Thus equation 4 can be rewritten as 4 PA BPAxPBlA 5 which is the general form of the multiplication rule for joint probabilities In the special case where the events A and B are independent the conditional probability PBlA equals the marginal probability PB and the rule simpli es to PA B PA x PB 5 Conditional and marginal probabilities further note Consider the following table of probabilities relating to drawing a single card from a regular deck conditional on face card P g n face card P g P king 0 The probability of drawing a face card jack queen king or ace is g and given that a face card has been drawn the probability of the card being a king is The probability of drawing a nonface card is 2 and in that case the probability of the card being a king is zero This is all from classical rst principles given equiprobable individual outcom es We can easily see that drawing a king and drawing a face card are not independent events We can also easily see in this little example that the unconditional probability of drawing a king is g from rst principles with no calculation required But let s try the exercise of computing the unconditional probability of drawing a king from the given table I ll use K to denote drawing a king and F to denote drawing a face card The event of drawing a king can be decomposed thus K F KU F K That is drawing a king can occur in principle in either of two ways in conjunction with drawing a face card or in conjunction with drawing a nonface card except that the latter conjunction has probability zero So let s apply our rules appropriately 9 Via the multiplication rule 4 l 16 4 PF K PF PKF7 7i x l 52x4 52 9 Using the multiplication rule again 36 PbF K PbF x PKl F 5 x 0 0 9 Now we can nd P K using the addition rule 4 4 PKPF KU F K 070 This admittedly somewhat arti cial exercise sheds some light on why the unconditional probability P K is called the marginal probability it s the number you get on the edge or margin of the sort of table shown above if you multiply out the conditional probabilities times the probabilities of the events on which we re conditioning then add up across the events on which we re conditioning That is N PM ZPAlEi X PEi i1 where E1 EN represent N mutually exclusive and jointly exhaustive events one and only one of them will occur Here s a slightly more real world example Suppose that whether or not it snows makes a difference to class attendance And suppose we want the marginal probability that all members of the class will be present tomorrow We could nd this by a multiplying the probability of snow tomorrow times the conditional probability that everyone is present in the case of snow and b multiplying the probability of no snow tomorrow times the conditional probability that everyone is present in the absence of snow then c adding up across the cases of snow and no snow The following point concerning conditional probabilities is particularly important In general PAlBPBlA That is the probability of A given B is generally not the same as the probability of B given A In the example above note that the probability of drawing a king is given that a face card is drawn The probability of drawing a face card given that a king is drawn on the other hand is 1 Here s a further example The police department of a city studies the safety of cyclists at night They nd that 60 percent of cyclists involved in accidents at night are wearing lightcolored clothing Should we conclude that wearing lightcolored clothing is dangerous Why not Express in terms of conditional probabilities the information you would need in order to judge Whether or not lightcolored clothing is helpful in avoiding accidents 6 Generalizing from the classical model We have introduced probability in terms of the classical approach which involves counting and manipulating the number of equiprobable outcomes corresponding to events of interest Although it is easiest to justify the addition and multiplication rules in classical terms these rules generalize they apply to any probabilities however derived e g by observation of relative frequencies over time or by expert judgm ent rather than by counting outcomes They are in the nature of consistency conditions Thus if I believe on whatever grounds that P A 60 and P B 30 and that events A and B are independent then in consistencyI am bound to believe that P A B 60 x 30 18 If I am inconsistent do not follow the above rules there is a particular economic symptom Provided I m willing to make bets based on my probability judgments it will be possible to set up a series of bets that I m sure to lose on average over time This is known as the Dutch book argument it was developed by Frank Ramsey 7 Introducing probability distributions discrete random variables Up till now we ve spoken only of the probability of events of one kind or another In econometrics we re generally more concerned with the probability distributions of variables of interest To approach this topic we ll rst make a distinction between discrete and continuous random variables A discrete random variable is one that can take on a nite set of distinct values depending on the outcome of some random experiment A simple example would be the number that appears uppermost when a die is rolled Such a variable is generally the outcome of a counting operation A continuous random variable can take on any real value within a speci ed range perhaps for example the weight of a randomly selected individual Here the variable is generally the outcome of some sort of measurement rather than counting We start with the simpler case of discrete random variables Here the probability distribution for some random variable X is a mapping from the possible values of X to the probability that X takes on each of those values The mapping may be represented by a mathematical function or a table Consider the example shown in Table l The rst column shows the possible values of X the number appearing uppermost when a die is rolled and the second shows the probability for each value If the die is fair the probability is the same for all xi This is known as the uniform or rectangular distribution It is graphed in Figure 2 One key feature of the table is that the sum of the entries in the probabilities column equals 10 this is necessarily true of any discrete probability distribution N Error xi 10 i1 The third column in Table 1 shows the product value of variable times probability that the value occurs The summation of this column gives the expected value or mean of the random variable You should be familiar I hope xi PXxi xiPXxi 1 2 3 4 5 6 g 2 21 35EX Table 1 Probability distribution for rolling a fair die xi Figure 2 Graph of probability distribution for one die with three measures of central tendency mean median and mode The median is the middle value when the data are sorted by size the mode is the value that occurs with greatest frequency or probability and the mean is as just described N EltXgtEMXZxIPltXxigt 6 i1 Note that in this example the expected value is not the value that occurs with greatest probability the probability of getting 35 on any single dieroll is zero Rather the expected value is the expected average if the random experiment is repeated many times Besides the expected value another key feature of any probability distribution is the degree afdispersian of the distribution around its mean This is usually measured by the variance of the distribution or the square root of the variance which is called the standard deviation The mean can be described as the probabilityweighted sum of the possible values of the random variable39 in the same terms the variance is the probabilityweighted sum of the squared deviations of the possible values of the random variable from its mean or N VarX E a 2 x1 7 EX2 PX xi 7 i1 Table 2 shows the calculation of the variance for the dierolling example VarX 2917 so the standard deviation is x 2917 1708 xi PXxi xi EOO xiJEYXH2 xi EX2PXxi 1 25 625 10417 2 15 225 03750 3 05 025 00833 4 05 025 00833 5 15 225 03750 6 25 625 10417 2 1 0 2917 VarX Table 2 Variance calculation rolling a fair die Another way of expressing the variance is to say it s the expected value of the squared deviation from the meanisince expected value just means probabilityweighted sum for a discrete random variable W00 7 E1X 7 E002 8 Equation 8 can be manipulated as follows VarX E X2 2XEX 13002 7 EltX2gt 7 2E1XEltXgt1 E 2012 7 EM 7 21EltXgt12 E0012 So that nally VarX EX2 13002 9 Or in words again the variance of X is the expectation of the square of X minus the square of the expectation of X This underlines the fact that in general EltX2gt 7 E 2012 these two terms are equal only if and only if the distribution has a variance of zero or in other words is degenerate In the dierolling example we know that E X2 352 1225 EX2 on the other hand x 12 717 x 22 717 717 x 62 15167 The variance then equals 15167 7 1225 2917 as calculated in Table 2 Two dice Let s try a slightly more interesting example than the single die roll LetX represent the average of the two numbers appearing uppermost when two fair dice are rolled The exercise is to determine the probability distribution of X and to use this to nd the mean and variance of X To start we can set out the sample space or in other words the full set of possible outcomes for the pair of dice 11 12 13 14 15 16 2 1 2 2 2 3 2 4 2 5 2 6 3 1 3 2 3 3 3 4 3 5 3 6 4 1 4 2 4 3 4 4 4 5 4 6 5 1 5 2 5 3 5 4 5 5 5 6 6 1 6 2 6 3 6 4 6 5 6 6 We can then set out the averages X values corresponding to these outcomes 10 15 20 25 30 35 15 20 25 30 35 40 20 25 30 35 40 45 25 30 35 40 45 50 30 35 40 45 50 55 35 40 45 50 55 60 Now over to you Constzuct a table similar to the combination of Tables 1 and 2 above Which enables you to shoW the distzibution and calculate the expected value and valiance Using a spreadsheet program is probably a good idea Note that the distribution in this case is not uniform for instance X 10 occurs with probability while X 15 occurs with probability 8 Measures of association covariance and correlation The concept of variance for a single variable generalizes to provide the concept of covariance for two variables The covariance of X and Y is the expectedvalue of the crossproduct deviation of X from its mean times deviation of Y from its mean COVXgtYUXYEiX EXY EYl 10 Given N observations on X and Y the covariance may be calculated as 1 N Cover Y E eri 7 EltXgty 7 EM i1 Covariance can take on any value positive negative or zero It provides a measure of the linear association between X and Y The logic can be seen if Y is graphed against X with the axes set to intersect at the point E X E Y as in Figure 3 The points display an upwardsloping linear association This will be expressed in EXEY 111 IV Figure 3 Positive covariance of X and Y a positive covariance most of the points lie in quadrants l and 111 where the deviations of X and Y from their respective means are of the same sign and hence the cross products are positive 1f the points were scattered evenly in all four quadrants then positive and negative crossproducts would tend to cancel and the covariance would be close to zero The correlation coef cient for two variables X and Y written mg is a scaled version of covariance we divide through by the product of the standard deviations of the two variables 7 Cov X Y m T 4 VarXVarY The resulting measure lies between 71 indicating a perfect downwardsloping linear association and 1 perfect upwardsloping linear relationship 11 9 Distribution concepts for continuous random variables For a simple example of a continuous random variable consider the clock face plus spinner shown below Let the random variable X the number towards which the spinner points when it com es to rest 12 6 What is the probability that X 3000 Applying the classical rule this is one out of an in nite number of equiprobable outcomes so the probability is zero And this goes for any outcome that is speci ed in full precision Unlike the cases examined earlier we can t draw up a useful mapping from speci c values of the random variable to the probability that those values occur The counting of outcomes approach is not going to work Try another angle The spinner must end up pointing somewhere in the range 0 to 12 So we can map from the full circumference of the clock face to a probability measure of 10 While we can t usefully count individual outcomes we can think in terms of fractions of that total measure For instance if the spinner is fair there ought to be a probability of i that it ends up pointing into the range 0 to 3 a probability of that it points into the range 7 to 9 and so on We can work this up into the idea of a cumulative densily mclian or cdf which can be written as FxPXltx 12 For the spinner example Fx 0 for all x lt 0 1t equals 025 for x 3 050 for x 6 and so on yielding the graph shown in Figure 4 A cumulative probability of 1 is reached where x 12 Figure 4 cdf for spinner Another essential concept when dealing with continuous random variables is the probability densilyfunczian or pdf This is de ned as the derivative of the cdf with respect to x and is usually written as f x d f x di x Fx 13 For the spinner example the cdf is a straight line as we saw above so its derivative slope is a constant By inspection of Figure 4 we can see the constant value is Thus the pdf is as shown in Figure 5 fx L 12 3 6 9 12 x Figure 5 pdf for spinner The pdf has a height of at x 6 This does natmean that the probability ofX 6 is as we know its zero Rather we use the pdf in this way we can determine the probability of X falling into any given range by taking the integral of the pdf over that interval ie nding the area under the curve between the speci ed values Px1ltX ltx2 XZ fxdx 14 This is illustrated in Figure 5 the probability that X falls into the range 6 to 9 is the area under the pdf between 6 and 9 namely 10 Central limit theorem and Gaussian distribution The graph of the spinner s pdf shows that the probability distribution is uniform or rectangular a continuous counterpart to the single dieroll example discussed above Such distributions are rarely if ever found in nature The die and spinner are both examples of human contrivances devices which constrain random processes to operate in a particularly simple and orderly manner Two aspects of this are noteworthy First each device allows only a single random process to in uence the outcome the toppling of a nearperfect cube or the rotary motion of the spinner on its bearing Second the outcomes are constrained to a speci c range via the integer dot patterns on the die or the arbitrary 0 to 12 range of the clock face Consider by contrast for example the heights or weights of a population of animals or people There will be numerous random in uences on the height of any individual stemming from both genetic endowment and environment and there is no xed range We can be pretty sure we ll never see an adult human less than 1 foot tall or greater than 12 feet tall but there s no xed limit no guarantee that somebody won t come along tomorrow and beat the current tallest person entry in the Guinness Book of World Records A proofieven a precise mathematical statementiof the Central Limit Theorem is beyond the scope of this class but the general idea is roughly as follows If a random variable X represents the summation of numerous independent random factors then regardless of the speci c distribution of the individual factors X will tend to follow the normal or Gaussian distribution the familiar symmetrical bell curve of statistics Figure 6 The general formula for the normal pdf is f gt 1 W92 15 x 7 e 20 7 00 lt x lt 00 a 27r where M denotes the mean of the distribution and 0 its standard deviation The standard normal distribution is obtained by setting u 0 and a 139 its pdf is then fx 1 e xz2 fooltxltoo 16 m Figure 6 Normal or Gaussian pdf The probability of x falling into any given range can be found by integrating the above pdf from the lower to the upper limit of the range A couple of results to commit to memory are P01 7 20 lt x lt u 20 w 095 and P01 7 30 lt x lt u 30 0997 Other values can be looked up in a normal distribution table if they are needed The Gaussian pdf does not rule out extreme values but it assigns them a very low probability as you can see from the diagram there is little chance of a normal random variable being found more than three standard deviations from the mean of the distribution A compact notation for saying that x is distributed normally with mean u and variance 02 is x N N01 02 Economics 215 Allin Cottrell The Error Correction Model 1 Setting up the EC model We start from a simple proportional longrun equilibrium relationship between two variables Y KXL We might think of Y as inventory and X as sales or Y as consumption and X as income or whatever Of course a fully speci ed equilibrium model may well include more variables and the equilibrium relationship need not be one of direct proportionality but let s keep it simple The relationship above can be written in log form as yzkXL 1 where we follow the convention of letting a lowercase letter designate the natural log of the variable represented by the corresponding upper case letter Taking logs reduces the multiplicative relation ship to an additive one which is a helpful mathematical simpli cation Now let s write down a general dynamic relationship between 3 and x y 50 31X 3296171 013 171 14 2 By including lagged values of both 6 and 3 this speci cation allows for a wide variety of dynamic patterns in the data We now ask Under what conditions is the generic dynamic equation 2 consistent with the long run equilibrium relationship 1 To assess this we zero outquot the factors that could cause divergence from equilibrium namely changes in x and stochastic uctuations m That is we set 3 3quot an x 26 for all t and set M 0 Thus we get 31 7 Bo 3126 Bzxquot myquot 1 01 50 B1 BZX 50 B1 32 gtlt 7 7 7 gtlt y 7 l X1 l X1 x If the above corresponds with equation 1 we have B0 7 l X1 7 k 51 B2 7 l X1 Suppose this is the case The second relationship above means that B1 32 1 7 x1 Let y denote the common value of these two terms Then 32 can be written as y 7 B1 and x1 can be written as 1 7 y Therefore equation 2 becomes y Bo 31X y 3096171 1 yyH 11 y Bo 31X 3196171 YXL71 3314 3471 L y 31171 Bo 3106 X171 M96171 3171 L So nally AM BOB1AX YXL71 7y171m 3 where Ax E x 7 2614 This is the characteristic error correctionquot speci cation where the change in one variable is related to the change in another variable as well as the gap between the variables in the previous period 2 Illustration consumption function To illustrate let s take the data on US per capita disposable income Yt and consumption expenditure Ct annual 1959 1994 from the Ramanathan data les data3 6 We begin by generating the logs of Ct and Yt and the changes in the logs of these variables The gretl script commands are as follows logs Ct Yt ldiff Ct Yt These commands generate the new variables lCt 1Yt logs and ldCt ldYt the logdifferences or As We then create a variable representing the gap between log income and log consumption genr gap 1Yt 1Ct genr gap1 gap 1 The 1 calls for the rst lag to be used We now specify a regression to be estimated via OLS corresponding to equation 3 above 01 s ldCt const ldYt gap1 The results are shown below OLS estimates using the 35 observations 1960 1994 Dependent variable ldCt Variable Coef cient Std Error tstatistic pvalue const 700172635 00129678 713313 01925 ldYt 0819320 00934824 87644 00000 gap1 0214349 0124736 17184 00954 Sum of squared residuals 000328885 Standard error of residuals 639 00101379 Adjusted R2 0698098 F2 32 403096 Durbin Watson statistic 184671 Note that the model does not seem to suffer from autocorrelation DW 185 The coef cient on the lagged gap which corresponds to y in equation 3 does not appear highly signi cant but on a onetailed test it is signi cant at the 5 percent level And a onetailed test is appropriate here on theoretical grounds we expect a positive coef cient and we can run H0 y s 0 versus H1 y gt 0 Why do we expect a positive value for y if the errorcorrection model is appropriate Let s go back to equation 3 The idea is that ceteris paribus the dependent variable should converge towards its equilibrium level Now ceteris paribusquot can be taken to mean barring changes in x and other disturbances m If we set Ax and u to zero equation 3 then becomes AM Bo M9614 3171 H X171 3 14 4 But notice that 30 is by de nition 30 1 7 x1 which in turn corresponds to k in equation 1isee the guring on page 1 So 4 is equivalent to AM Mk X171 Jail and k xpl is nothing other than the equilibrium value of y in period t 7 1 according to equation 1 Thus suppose we had k 2614 731121 gt 0 this would mean that last period the equilibrium value of 3 namely k 26121 exceeded its actual value or actual y fell short of equilibrium If error correctionquot is going on what should happen Well 31 should increase ie Ayt gt 0 heading towards equilibrium In other words the coef cient y should be positive1 It s like this change in y positive coeff gtlt degree to which 31 fell short of equilibrium last period 1If we had speci ed the gap the other way round as consumption minus income we would then have expected a negative value for y Factor of proportionality Given the above regression results how do we retrieve an estimate of the factor of proportionality K in the original equilibrium relationship We ve just noted that k corresponds to Boy In terms of the consumption estimates reported above this means the constant divided by the coef cient on the lagged gap or 700170214 That will give the estimated value of k but that s the natural log of what we want so we have to exponentiate take the antilog The best way to do the calculation is to let gretl take care of it based on its internal representation of the coef cient values genr prop expcoeffconstcoeffgap1 pr39int prop prop 0922619 That is according to these estimates the implied longrun equilibrium has people spending 92 of their disposable income And if consumer spending diverges from this equilibrium relationship with income then due to the positive errorcorrection coefficient there will be a tendency for spending to adjust towards the target value General comments The results then are sensible on the face of it but the model is not without problems Note that both the constant and the disequilibrium adjustment coefficient y are not estimated with much precision the standard errors are rather large in relation to the coefficient estimates The point estimate for y seems low it implies that only about 21 percent of last year s disequilibrium is made up in the current year a sluggish adjustment A 95 percent con dence interval for this parameter would be about 021 i 2012 or 7003 to 044 which is uncomfortably wide It may well be that the model is misspeci ed perhaps because it omits other relevant variables or because the relationship between income and consumption has not in fact remained constant over the 36year span of the data Further interpretation Nonetheless continuing to take the estimates at face value consider what we can read from the coef cient on the change in the log of income ldYt which corresponds to the B1 of equation 3 The point estimate is 0819 What is this saying Notice that it is not our estimate of the equilibrium ratio of consumption to income Let s go back to equation 3 once more but this time run the thought experiment of setting the previous period s disequilibrium and m to zero We re now asking how would consumption behave if there were no adjustment to be made on account of a previous disequilibrium and no random disturbance No previous disequilibriumquot means that 3111 k X171 Then equation 3 becomes A3 50 31AM M9614 3 14 A3 Bof letfyixtil kX171l Ayn Bo 51AX1 yk A31 BoB1Axry AM 31AM Equation 1 implies that if equilibrium is to be maintained Ayt should equal Am ie the percentage change in consumption should equal the percentage change in income But we are estimating that the proportional change in consumption is only 82 percent of the proportional change in income absent any previous disequilibrium to correct This is not actually a contradiction but it s saying that when income rises consumption tends to lag behind creating a disequilibrium which will in turn call for quot 39 in 1 periods In other words it s saying that changes in income are themselves a source of disequilibrium in this model and not just the random disturbances represented by ut Error correction versus general dynamic model One further question We said above see page 1 that the error correction model represents a re striction on the general dynamic model given in equation 2 The restriction takes the form of the requirement that B1 32 1 7 x1 This is a linear restriction and it is testable so let s test it We have already estimated the restricted model ie the error correction model and its SSR is reported above We need to estimate the unrestricted model7corresponding to equation 27and nd its SSR then form an Fstatistic in the usual way SSRr 7 SSRu1 m39df SSRudfu The numerator df 1 because there is only one restriction The denominator degrees of freedom dfu 31 since there are 35 observations after accounting for the lagged terms and the unrestricted model involves estimating 4 parameters It turns out that SSRu 000319503 and F1 31 0910265 with a pvalue of 034 74 We fail to reject H0 B1 32 1 7 x1 and conclude that the error correction model is an acceptable restriction on the more general equation 2 Here s the complete gretl program to generate the output discussed above open data3 6 logs Ct Yt ldiff Ct Yt genr gap lYt lCt genr gap1 gap 1 estimate error correction model ols ldCt const ldYt gap1 retrieve factor of proportionality genr prop expcoeffconstcoeffgap1 print prop genr SSRr ess now for the unrestricted model lags lCt lYt ols lCt const lYt lYt1 lCt1 genr SSRu ess genr dfu df genr Ftest SSRr SSRudfuSSRu pvalue F 1 dfu Ftest 3 Error correction in the stock market In view of goingson in the stock market over recent years it might be interesting to inquire whether the behavior of the market as represented say by the Dow Jones Industrial Average can be modeled as a case of longrun equilibrium plus error correctionquot What might be a plausible model for longrun equilibrium in this case Well corporate stocks ultimately derive their value from the fact that they are claims on the pro ts made by rms Thus it seems reasonable to suppose that stock prices should re ect the present discounted value of the future expected stream of corporate pro t A simple perhaps too simple proxy for this would be the current level of aftertax corporate pro ts divided by a longterm interest rate such as the rate on 10year Treasury bonds This would be the correct gure if people expected the current level of corporate pro ts to persist into the inde nite future the present value of an asset which promises to pay a given sum F each year forever is Fr where r denotes the appropriate discount rate My hypothesis then is that in the long run the Dow as an aggregate measure of the value of the stock market should be proportional to aftertax corporate pro ts divided by the longterm interest rate We want to allow the possibility that the Dow is not equal to this longrun equilibrium value at all times but that if it diverges from this value the error will tend to be corrected over time The simplest model that captures this is equation 3 reproduced below AM Bo 31AM M9614 y171 ML Here 31 represents the log of the Dow and x represents the log of the present value of corporate pro ts as described above The analysis was conducted on a quarterly basis with data obtained from the Federal Reserve Bank of St Louis corporate pro ts the Federal Reserve Board of Governors 10year Treasury bond rate and economag39ic com the closing value of the Dow The results are shown below where lddj denotes the logdifference of the Dow Jones average ldcprof denotes the logdifference of discounted cor porate pro ts and d jgap1 denotes the lagged value of the difference between the log of discounted pro t and the log of the Dow OLS estimates using the 220 observations 19533 20082 Dependent variable lddj Coef cient Std Error tratio pvalue const 7000549884 000689003 707981 04257 ldcprof 0190529 00482375 39498 00001 djgap1 00538393 00152960 35198 00005 Sum of squared residuals 0746783 Standard error of the regression 639 00586634 Unadjusted R2 0111245 Adjusted R2 0103054 F2217 135809 Durbin Watson statistic 16815 7 LM test for autocorrelation up to order 4 Null hypothesis no autocorrelation Test statistic LMF 143485 with pvalue PF4 213 gt 143485 0223534 The errorcorrection parameter ie the coef cient on d jgap1 has the expected positive sign The DW statistic does not suggest that autocorrelation is a problem neither does the LMF test statistic for autocorrelation up to the fourth order s in the discussion of the consumption function above we can retrieve an estimate of the factor of proportionality in the hypothesized longrun equilibrium relationship by dividing the constant by the djgap1 coef cient and exponentiating This gives a value of 09029 Thus we estimate that the equilibrium value of the Dow index is about 90 percent of the present value of the corporate pro t stream The residuals from the regression above are shown in Figure 1 The stockmarket crash of 1987 stands out as a large negative residual we infer that so far as our model is concerned this crash was not a case of error correctionquot otherwise the model would have predicted it but rather an error Figure 2 shows the actual value of the Dow alongside the equilibrium value implied by the model above that is 09029 times the present value of the corporate pro t stream Interestingly the run up in the Dow over the 1990s appears not as a movement above equilibrium but rather as a catching upquot with an equilibrium value that exceeded the actual value for much of the period I m not sure I d want to place too much stress on this result without further testing of the adequacy of the model 015 01 7 7 005 7 7 0 005 7 7 E 3 01 7 7 1 015 7 7 02 7 7 025 7 7 03 7 7 035 1 1 1 1 1 1960 1970 1980 1990 2000 Figure 1 Residuals from Dow errorcorrection regression 30000 1 sp14 44444 equH 4 i ii 1 f 11quot W 1 25000 7 1 i 20000 4 15000 4 10000 4 1 2000 5000 4 1 1990 1 1980 1950 Figure 2 Actual and estimated equilibrium values of the Dow Notes on Sampling and Hypothesis Testing Allin Cottrell 1 Population and sample In statistics a population is an entire set of objects or units of observation of one sort or another while a sample is a subset usually a proper subset of a population selected for particular study usually because it is impractical to study the whole population The numerical characteristics of a population are called parameters Generally the values of the parameters of interest remain unknown to the researcher we calculate the corresponding numerical characteristics of the sample known as statistics and use these to estimate or make inferences about the unknown parameter values A standard notation is often used to keep straight the distinction between population and sample The table below sets out some commonly used symbo s size mean variance proportion Population N u 02 7r Sample n 5 s2 p Note that it s common to use a Greek letter to denote a parameter and the corresponding Rom an letter to denote the associated statistic 2 Properties of estimators sample mean Consider for example the sample mean 5 1 n 7 xi n I If we want to use this statistic to make inferences regarding the population mean u we need to know something about the probability distribution of 3 The distribution of a sample statistic is known as a sampling distribu tion Two of its characteristics are of particular interest the mean or expected value and the variance or standard deviation What can we say about E J or u the mean of the sampling distribution of 5 First let s be sure we understand what itmeans It is the expectedvalue of 3 The thought experiment is as follows we sample repeatedly from the given population each time recording the sample mean and take the average of those sample means It s unlikely that any given sample will yield a value of 5 that precisely equals u the mean of the population from which we re drawing Due to random sampling error some samples will give a sample mean that exceeds the population mean and some will give an 5 that falls short of u But if our sampling procedure is unbiased then deviations of J from u in the upward and downward directions should be equally likely On average they should cancel out In that case E06 MEX 1 or the sample mean is an unbiased estimator of the population mean So far so good But we d also like to know how widely dispersed the sample mean values are likely to be around their expected value This is known as the issue of the e iciency of an estimator It is a comparative Last revised 20020129 concept one estimator is more ef cient than another if its values are more tightly clustered around its expected value Consider this alternative estimator for the population mean instead of 5 just take the average of the largest and smallest values in the sample This too should be an unbiased estimator of u but it is likely to be more widely spread out or in other words less ef cient than 5 unless of course the sample size is 2 in which case they amount to the same thing The de ree of dispersion of an estimator is generally measured by the standard deviation of its probability distribution sampling distribution This goes under the name standard error 21 Standard error of 5 What might the standard error of 5 look like In other words what factors are going to in uence the degree of dispersion of the sample mean around the population mean Without giving a formal derivation it s possible to understand intuitively the formula a 7 2 The lefthand term is read as sigma sub xbar The sigma tells us we re dealing with a standard deviation and the subscript 5 indicates this is the standard deviation of the distribution of 3 or in other words the standard error of 3 On the righthand side in the numerator we nd the standard deviation 0 of the population from which the samples are drawn The more widely dispersed are the population values around their mean the greater the scope for sampling error ie drawing by chance an unrepresentative sample whose mean differs substantially from M In the denominator is the square root of the sample size n It makes sense that if our samples are larger this reduces the probability of getting unrepresentative results and hence narrows the dispersion of 3 The fact that it is Jr rather than n that enters the formula indicates that an increase in sample size is subject to diminishing returns in terms of increasing the precision of the estimator For example increasing the sample size by a factor of four will reduce the standard error of 5 but only by a factor of two 3 Other statistics We have illustrated so far with the sample mean as an example estimator but you shouldn t get the idea that it s the only one For example suppose we re interested in the proportion of some population that has a certain characteristic e g an intention to vote for the Democratic candidate The population proportion is often written as Jr The corresponding sample statistic is the proportion of the sample having the characteristic in question p The sample proportion is an unbiased estimator of the population proportion Ep 7 3 17 o7417 w Or we might be particularly interested in the variance oz of a certain population Since the population variance is given by 1 N 2 7 i 7 2 a 7 N 20c u 1 it would seem that the obvious estimator is the statistic l quot r 7 2m 7 x2 n i1 But actually it turns out this estimator is biased The bias is corrected in the formula for sample variance 1 n71 l and its standard error is given by 2 quot xi 7 if 5 1 with a biascorrection factor of 4 The shape of sampling distributions Besides knowing the expected value and the standard error of a given statistic in order to work with that statistic for the purpose of statistical inference we need to know its shape In the case of the sample mean the Central Limit Theorem entitles us to the assumption that the sampling distribution is Gaussianieven if the population from which the samples are drawn does not follow a Gaussian distributioniprovided we are dealing with a large enough sample For a statistician large enough generally means 30 or greater as a rough rule of thumb although the approximation to a Gaussian sampling distribution may be quite good even with smaller samples Here s a rather striking illustration of the point Consider once again the distribution of X the number ap pearing uppermost when a fair die is rolled We know that this distribution is not close to Gaussian it s rectangular But recall what the distribution looked like for the average of the two face values when two dice are rolled it was triangular What happens if we crank up the number of dice further The triangle turns into a bell shape and if we compute the distribution of the mean face value when rolling ve dice it already looks quite close to the Gaussian see Figure 1 014 012 POE 010 9000 0000 tome l l 6 7 2 3 4 5 3 000 r 0 H4 Figure 1 Distribution of mean facevalue 5 dice We can think of the graph in Figure 1 as representing the sampling distribution of J for samples with n 5 from a population with p 35 and a rectangular distribution Although the parent distribution is rectangular the sampling distribution is a fair approximation to the Gaussian Not all sampling distributions are Gaussian We mentioned earlier the use of the sample variance as an estimator of the population variance In this case the ratio n 7 1s2a2 follows a skewed distribution known as X2 with n 7 1 degrees of freedom below Nonetheless if the sample size is large the X2 distribution converges towards the normal 5 Probability statements con dence intervals If we know the mean standard error and shape of the distribution of a given sample statistic we can then make de nite probability statements about the statistic For example suppose we know that u 100 and 0 12 for a certain population and we draw a sample with n 36 from that population The standard error of J is 0Ar7 126 2 and a sample size of 36 is large enough to justify the assumption of a Gaussian sampling distribution We know that the range 1 j 20 encloses the central 95 percent of a normal distribution so we can state P96 lt J lt 104 N 95 That is there s a 95 percent probability that the sample mean lies within 4 units 2 standard errors of the population mean 100 That s all very well you may say but if we already knew the population mean and standard deviation then why were we bothering to draw a sample Well let s try relaxing the assumptions regarding our knowledge of the population and see if we can still get something useful First suppose we don t know the value of 1 We can still say P174ltJE ltu495 That is with probability 95 the sample mean will be drawn from within 4 units of the unknown population mean So suppose we go ahead and draw the sample and calculate a sample mean of 97 If there s a probability of 95 that our 5 came from within 4 units of M we can turn that around we re entitled to be 95 percent con dent that 1 lies between 93 and 101 That is we can draw up a 95 percent con dence interval for u as J j 20 There s a further problem though If we don t know the value of u then presumably we don t know 0 either So how can we compute the standard error of 5 We can t but we can estimate it Our best estimate of the population standard deviation will be s the standard deviation calculated from our sample The estimated standard error of J is then 7 E 2 i 6 S x W The hat or caret over a parameter indicates an estimated value We can now reformulate our 95 percent con dence interval for MI 5 j 2sg But is this still valid when we ve had to replace 0 with an estimate Given a sample of size 36 it s close enough Strictly speaking the substitution of s for the unknown 0 alters the shape of the sampling distribution Instead of being Gaussian it now follows the t distribution which looks very much like the Gaussian except that it s a bit fatter in the tails 51 The Gaussian and t distributions Unlike the Gaussian the t distribution is not fully characterized by its mean and standard deviation there is an additional factor namely the degrees of freedom df For the issue in question here7estimating a population mean7the df term is the sample size minus 1 or 35 in the current example At low degrees of freedom the t distribution is noticeably more dispersed than the Gaussian for the same mean and standard deviation which means that a 95 percent con dence would have to be wider re ecting greater uncertainty But as the degrees of freedom increase the t distribution converges towards the Gaussian By the time we ve reached 30 degrees of freedom the two are almost indistinguishable For the normal distribution the values that enclose the central 95 percent are u 7 19600 and 04 1960039 for the t distribution with df 30 the corresponding values are u 7 20420 and u 20420 Both are well approximated by the rule of thumb u j 20 52 Further examples There s nothing sacred about 95 percent con dence The following information regarding the Gaussian distribution enables you to construct a 99 percent con dence interval POI 7 2580 lt x lt y 2580 W 099 Thus the 99 percent interval is J j 2580 If we want greater con dence that our interval straddles the unknown parameter value 99 percent versus 95 percent then our interval must be wider i258 standard errors versus i2 standard errors Here s an example using a different statistic An opinion polling agency questions a sample of 1200 people to assess the degree of support for candidate X In the sample the proportion 17 indicating support forX is 56 percent or 056 Our single best guess at the population proportion Jr is then 056 but we can quantify our uncertainty over this gure The standard error of p is 7r1 7 7rn The value of 7r is unknown but we can substitute 7 or if we want to be conservative ie ensure that we re not underestimating the width of the con dence interval we can put 7r 05 which maximizes the value of 7r1 7 7r On the latter procedure the estimated standard error is 40251200 00144 The large sample justi es the Gaussian assumption for the sampling distribution so our 95 percent con dence interval is 056 j 2 x 00144 056 j 00289 This is the basis for the statement accurate to within plus or minus 3 percent that you often see attached to opinion poll results 53 Generalizing the idea The procedure outlined in this section is of very general application so let me try to construct a more general statement of the principle To avoid tying the exposition to any particular parameter I ll use 0 to denote a generic parameter The rst step is to nd an estimator preferably an unbiased one for 0 that is a suitable statistic that we can calculate from sample data to yield an estimate 0 of the parameter of interest this value our single best guess at 0 is called a paint estimate We now set a con dence level for our interval estimate39 this is denoted generically by 1 7 oz thus for instance the 95 percent con dence level corresponds to oi 005 If the sampling distribution of 0 is symmetrical we can express the interval estimate as 0 j maximum error for 1 7 oz con dence The magnitude of the maximum error can be resolved into so many standard errors of such and such a size The number of standard errors depends on the chosen con dence level and also possibly on the degrees of freedom The size of the standard error a depends on the nature of the parameter being estimated and the sample size Suppose the sampling distribution of 0 can be assumed to be Gaussian which is often but not always the case The following notation is usefu xiu z T This standard normal score or zscore expresses the value of a variable in terms of its distance from the mean measured in standard deviations Thus if u 1000 and a 50 then the value x 850 has a zscore of 730 it lies 3 standard deviations below the mean We can subscript z to indicate the proportion of the standard normal distribution that lies to its right For instance since the normal distribution is symmetrical 205 0 It follows from points made earlier that 20025 196 and 20005 258 A picture may help to make this obvious Z975 7196 1025 196 Where the distribution of 0 is Gaussian therefore we can write the 1 7 oz con dence interval for 0 as 1 U Zaz 7 This is about as far as we can go in general terms The speci c formula for og depends on the parameter Let me emphasize the last point since people often seem to get it wrong The standard error formula 0 may be the rst one you encounter but it is not universal it applies only when we re using the sample mean to estimate a population mean In general each statistic has its own speci c standard error When a statistically savvy person encounters a new statistic a common question would be What s its standard error Warning it s not always possible to give an explicit formula in answer to this question although it is for most of the statistics we ll come across in this course in some cases standard errors have to be derived via computer simulations 6 The logic of hypothesis testing The interval estimation discussed above is a noncommittal sort of statistical inference We draw a sample calculate a sample statistic and use this to provide a point estimate of some parameter of interest along with a con dence interval Often in econometrics we re interested in a more pointed sort of inference We d like to know whether or not some claim is consistent with the data In other words we want to test hypotheses There s a wellknown and mostly apt analogy between the setup of a hypothesis test and a court of law The defendant on trial in the statistical court is the null hypothesis some de nite claim regarding a parameter of interest Just as the defendant is presumed innocent until proved guilty the null hypothesis is assumed true at least for the sake of argument until the evidence goes against it The formal decision taken at the conclusion of a hypothesis test is either to reject the null hypothesis cf nd the defendant guilty or to fail to reject that hypothesis cf not guilty The fail to reject locution may seem cumbersome why not just say accept but there s a reason for it Failing to reject a null hypothesis does not amount to proving that it s true Here the law court analogy falters since a defendant who is found not guilty is entitled to claim innocence The statistical decision is reject or fail to reject Meanwhile the null hypothesis often written H0 is in fact either true or false We can set up a matrix of possibilities H0 is in fact Decision True False Reject Type I error Correct decision Fail to reject Correct decision Type II error Rejecting a true null hypothesis goes under the name of Type I error This is like a guilty verdict for a defendant who is really innocent Failing to reject a false null hypothesis is called Type II error this corresponds to a guilty defendant being found not guilty Since the hypothesis testing procedure is probabilistic there is always some chance that one or other of these errors occurs The probability of Type I error is labeled oz and the probability of Type II error is labeled The quantity 1 7 has a name of its own it is the power of a test If is the probability that a false null hypothesis will not be rejected then 1 7 is the probability that a false hypothesis will indeed be rejected It thus represents the power of a test to discriminateito unmask false hypotheses so to speak Obviously we would like for both oz and to be as small as possible Unfortunately there s a tradeoff This is easily seen in the law court case If we want to minimize the chance of innocent parties being found guilty we can tighten up on regulations concerning police procedures rules of evidence and so on That s all very well but inevitably it raises the chances that the courts will fail to secure guilty verdicts for some guilty parties eg some people will get off on technicalities The same issue arises in hypothesis testing but in even more pointed form We get to choose in advance the value of oi the probability of Type I error This is also known as the signi cance level of the test And yes it s closely related to the oz of con dence intervals as we ll see before long While we want to choose a small value of oz we re constrained by the fact that shrinking oi is bound to crank up eroding the power of the test 6 Choosing the signi cance level How do we get to choose 01 Here s a rst approximation The calculations that compose a hypothesis test are condensed in a key number namely a conditional probability the probability of observing the given sample data on the assumption that the null hypothesis is true If this probability called the pvalue is small we can place one of two interpretations on the situation either a the null hypothesis is true and the sample we drew is an improbable unrepresentative one or b the null hypothesis is false and the sample is not such an odd one The smaller the pvalue the less comfortable we are with alternative a To reach a conclusion we must specify the limit of our comfort zone or in other words a pvalue below which we ll reject H0 Say we use a cutoff of 01 we ll reject the null hypothesis if the pvalue for the test is g 01 Suppose the null hypothesis is in fact true What then is the probability of our rejecting it It s the probability of getting a pvalue less than or equal to 01 which is by de nition 01 In selecting our cutoff we selected oi the probability of Type 1 error If you re thinking about this there should be several questions in your mind at this point But before developing the theoretical points further it may be useful to x ideas by giving an example of a hypothesis test 62 Example of hypothesis test Suppose a maker of RAM chips claims an access time of 60 nanoseconds ns for the chips The manufacture of computer memory is in part a probabilistic process there s no way the maker can guarantee that each chip meets the 60 ns spec The claim must be that the average response time is 60 ns and the variance is not too large Quality control has the job of checking that the production process is maintaining acceptable access speed To that end they test a sample of chips each day Today s sample information is that with 100 chips tested the mean access time is 63 ns with a standard deviation of 2ns Is this an acceptable result To put the question into the hypothesis testing framework the rst task is to formulate the hypotheses Hy potheses plural we need both a null hypothesis and an alternative hypothesis H1 to run against H0 One possibility would be to set H0 u 60 against H1 u 7f 60 That would be a symmetrical setup giving rise to a twotailed test But presumably we don t mind if the memory chips are faster than advertised we have a problem only if they re slower That suggests an asymmetrical setup H0 u g 60 the production process is OK versus H1 u gt 60 the process has a problem We then need to select a signi cance level or oi value for the test Let s go with 05 The next step is to compute the pvalue and compare it with the chosen oi The pvalue once again is the probability of the observed sample data on the assumption that H0 is true The observed sample data will be summarized in a relevant statistic39 since this test concerns a population mean the relevant statistic is the sample mean The pvalue can be written as POE 3 63W 60 when n 100 and s 2 That is if the population mean were really 60 or less as stated by H0 how probable is it that we would draw a sample of size 100 with the observed mean of 63 or greater and a standard deviation of 2 Note the force of the 63 or greater With a continuous variable the probability of drawing a sample with a mean of exactly 63 is effectively zero regardless of the truth or falsity of the null hypothesis We re really asking what are the chances of drawing a sample like this or worse from the standpoint of the null hypothesis We can assign a probability by using the sampling distribution concepts we discussed earlier The sample mean 63 was drawn from a particular distribution namely the sampling distribution of 3 1f the null hypothesis is true EOE is no greater than 60 The estimated standard error of J is sAh 210 2 With it 100 we can take the sampling distribution to be normal We use this information to formulate a test statistic a statistic whose probability on the assumption that H0 is true we can determine by reference to the standard tables In this case Gaussian sampling distribution the test statistic is the zscore introduced in section 53 above In general terms 2 equals value minus mean divided by standard deviation Here the mean in question is the mean of the sampling distribution of 5 namely the population mean according to the null hypothesis or uHO while the relevant standard deviation is the standard error of J The The zscore formula is therefore iii 10 7 S 15 63760 2 7 2 The pvalue therefore equals the probability of drawing from a normal distribution a value that is 15 standard deviations above the mean That is effectively zero it s far too small to be noted on any standard statistical tables At any rate it s much smaller than 05 so the decision must be to reject the null hypothesis We are driven to the alternative that the mean access time exceeds 60 ns and the production process has a problem 63 Variations on the example Suppose the test were as described above except that the sample was of size 10 instead of 100 How would that alter the situation Given the small sample and the fact that the population standard deviation 0 is unknown we could not justify the assumption of a Gaussian sampling distribution for 3 Rather we d have to use the t distribution with df 9 The estimated standard error s 2 m 0632 and the test statistic is 19 m 474 s 632 The pvalue for this statistic is 00005297a lot larger than for z 15 but still considerably smaller than the chosen signi cance level of 5 percent so we still reject the null hypothesis1 Note that in general the test statistic can be written as test 9 0H0 That is sample statistic minus the value stated in the null hypothesis which by assumption equals E 07divided by the estimated standard error of 0 The distribution to which test must be referred in order to obtain the pvalue depends on the situation Here s another variation We chose an asymmetrical test setup above What difference would it make if we went with the symmetrical version H0 M 60 versus H141 7f 60 This is the issue of onetailed versus two tailed tests We have to think what sort of values of the test statian should count against the null hypothesis In the asymmetrical case only values of J greater than 60 counted against H0 A sample mean of say 57 would be quite consistent with M g 6039 it is not even primafacie evidence against the null Therefore the critical region of the sampling distribution the region containing values that would cause us to reject the null lies strictly in the upper tail But if the null hypothesis were M 60 then values of 5 both substantially below and substantially above 60 would count against it The critical region would be divided into two portions one in each tail of the sampling distribution The practical consequence is that we d have to double the pvalue found above before comparing it to oz The sample mean was 63 and the pvalue was de ned as the probability of drawing a sample like this or worse from the standpoint of H0 In the symmetrical case like this or worse means with a sample mean this far away from the hypothesized population mean or farther in either direction So the pvalue is P J 3 63 U 5 g 57 which is double the value we found previously As it happens the pvalues found above were so small that a doubling would not alter the result namely rejection of H0 7 Hypothesis tests and p values further discussion Let E denote the sample evidence and H denote the null hypothesis that is on trial The pvalue can then be expressed as PE lH This may seem an awkward formulation Wouldn t it be better if we calculated the conditional probability the other way round PH lE Instead of working with the probability of obtaining a sample like the one we in fact obtained assuming the null hypothesis to be true why can t we think in terms of the probability that the null hypothesis is true given the sample evidence we obtained This would arguably be more natural and comprehensible To see what would be involved in the alternative approach let s remind ourselves of the multiplication rule for probabilities which we wrote as PA B PA x PBlA 11 determined the pvalue using the econometric software package gretl I ll explain how to do this in class Swapping the positions of A and B we can equally well write PB A PB x PAlB And taking these two equations together we can infer that PA x PBlA PB x PAlB PB A PB PAlB 8 A The above equation is known as Bayes rule after the Rev Thomas Bayes It provides a means of converting from a conditional probability one way round to the inverse conditional probability Substituting E for evidence and H for null hypothesis we get P H P E H PH E ltgtx l P E We know how to nd the pvalue PE lH To obtain the probability we re now canvassing as an alternative P H E we have to supply in addition P H and P E P H is the marginal probability of the null hypothesis and P E is the marginal probability of the sample evidence Where are these going to come from 71 Bayesian statistics There is an approach to statistics that offers a route to supplying these probabilities and computing P H E it is known as the Bayesian approach and it differs from the standard sampling theory doctrine On the standard view talking of PH is problematic The null hypothesis is in fact either true or false it s not a probabilistic matter Given a random sampling procedure though we can talk of a probability distribution for the sample statistic and it s on this basis that we determine the pvalue Bayesians dispute this they conceive probabilities in terms of degree of justi ed belief in propositions Thus it s quite acceptable to talk of a P H that differs from 0 or 1 yes the hypothesis is infact true or false but we don t know which and what matters is the degree of con dence we re justi ed in reposing in the hypothesis this can be represented as a probability For a Bayesian the P H that appears on the righthand side of Bayes rule is conceived as a prior probabil ity It s the degree of belief we have in H before seeing the evidence The conditional probability on the left is the posterior probability the modi ed probability after seeing the sample The rule provides an algorithm for modifying our probability judgments in the light of evidence One dif culty with the Bayesian approach is obtaining the prior probability For instance in the example above it s not obvious how we should assign a probability to u g 60 in advance of seeing any sample data There are techniques however for formulating ignorance priors iprior probabilities that correctly re ect an initial state of ignorance regarding the parameter values To illustrate the idea let me vary the example above Suppose the chip maker packages up RAM into boxes of one thousand modules with a speed speci cation of either 60 ns or 70 ns We re faced with a box whose label has come off which sort does it contain Suppose we set H0 1 60 against H1 1 70 If the 60 ns and 70 ns boxes are produced in equal numbers a suitable ignorance prior would be a PH of 050 for the hypothesis that the mystery box contains 60 ns chips We sample 9 of the chips and nd a sample mean access time of 64 ns with a standard deviation of 3 ns What then is the posterior probability of the hypothesis 1 60 The standard test statistic is 64 7 60 3 xg which has a twotailed pvalue of 0004 At this point we have the prior P H 050 and the pvalue E lHo 0004 What about the marginal probability of the evidence P E 7 We have to decompose this as follows 40 18 PE PElHoPHo PElHrPHr which means we have another calculation to perform P E lHl This is similar to the pvalue calculation for H0 We want the twotailed pvalue for 64 7 70 18 760 3N which is 00003234 So P H P E H 05 0004 PHlE X H X 0925 PE 0004 x 05 00003234 x 05 Based on the evidence if the only two possibilities are that the sample chips came from a batch with a mean of 60 ns or a batch with a mean of 70 we can be fairly con dent 925 percent that they came from a 60 ns batch Note that this seem ed unlikely on the face of it small pvalue but the probability of the evidence conditional on the alternative u 70 was much smaller still so the posterior probability of H0 came out quite high In this example PElH0 004 yet P HolE 925 The Bayesian take on statistics is interesting and has quite a lot to recommend it but in this course we ll con centrate on the standard samplingtheory approach Thus you ll have to get used to thinking in terms of those awkward pvalues Besides as you ve just seen while the Bayesian approach does yield a value for the prob ability of the hypothesis conditional on the evidence it is not really a simpli cation in fact it generally involves calculating the regular pvalue and more We need a prior probability for H0 and the marginal probability of the sample which are not required for the standard calculation If you d like to read more about Bayesian statistics here are two recommendations Data Analysis A Bayesian Tutorial by D S Sivia Oxford Clarendon Press 1996 and the fascinating work by E T Jaynes Probability Theory The Logic ofScience online at http bayes wustl eduetj prcgtb html 8 Relationship between con dence interval and hypothesis test We noted above that the symbol oi is used for both the signi cance level of a hypothesis test the probability of Type I error and in denoting the con dence level 1 7 a for interval estimation This is not coincidental There is an equivalence between a twotailed hypothesis test at signi cance level oz and an interval estimate using con dence level 1 7 01 Suppose u is unknown and a sample of size 64 yields 5 50 s 10 The 95 percent con dence interval for u is then 10 50il96 7 50j245 m Now suppose we want to test H0 u 55 using the 5 percent signi cance level No additional calculation is needed The value 55 lies outside of the 95 percent con dence interval so we can immediately conclude that H0 is rejected In a twotailed test at the 5 percent signi cance level we fail to reject H0 if and only if 5 falls within the central 95 percent of the sampling distribution conditional on H0 but since 55 exceeds 50 by more than the maximum error 245 we can see that conversely the central 95 percent of a sampling distribution centered on 55 will not include 50 so 5 50 must lead to rejection of the null Signi cance level and con dence level are complementary Key Probability Distributions in Econometrics The normal or Gaussian distribution is a symmetrical bell curve It is found everywhere and the Central Limit Theorem tells us why because whenever a large number of independently distributed random vari ables are added together the sum tends to the normal distribution even if the distributions of the individual random variables are far from normal The formula for the normal pdf density function is 1 w fxTe z 7wltxltw 7 7T where 14 denotes the mean of the distribution and 7 its standard deviation The probability of x falling into any given range can be found by integrating the above pdf from the lower to the upper limit of the range A couple of results to commit to memory are P14 7 217 lt x lt 14 217 m 095 and P14 7 317 lt x lt 14 317 m 0997 A compact notation for saying that x is distributed normally with mean 14 and variance 72 is x N N14 The x2 distribution represents the distribution of a sum of squares of normal random variables x2 is indexed by a degrees of freedom df term which corresponds to the number of normal variables whose squares compose the sum It is bounded by zero at the low end and skewed to the right In case you were wondering the pdf for the chisquare distribu tion with m degrees of freedom x5 is g 71 fx xgto 27Fgt where T01 00 xa le xdx a gt 0 Euler s gamma function This distribution naturally arises in econometrics when we are considering variances since the numer ator of the variance is the sum of squared deviations about the mean The t distribution represents the distribution of the ratio of a normal variate to the square root of an independent x2 It is indexed by a df term equal to the df of the x2 in the denominator In an econometric context the df is the sample size minus the number of parameters being estimated At low degrees of freedom the t distribution is thicker in the tails than the normal distribution but when df is large it becomes indistinguishable from the normal The F distribution arises as the ratio of two independent chisquares If U N x and V N 9amp1 then Fmquot Um Vn follows the F distribution with pdf m mil fx rltm ngtltgt 2 TT1 T This distribution has a similar shapeia skewed bellito the x2 It is indexed by two df terms one for the numerator and one for the denominator Note the connections among these distributions with the normal being the parent of all the others Ag gregation of independent random variables A normal Sum of squares of normal variables A X2 Ratio of normal to square root of chisquare A t ratio of two chisquares A F Regression Analysis Basic Concepts Allin Cottrell 1 The simple linear model This model represents the dependent variable yi as a linear function of one independent variable xi subject to a random disturbance or error ui yi I30 l31xi ui The error term ui is assumed to have a mean value of zero a constant variance and to be uncorrelated with itself across observations E uiuj 0 139 7f j We may summarize these conditions by saying that ui white noise The task of estimation is to determine regression coef cients Bo and El estimates of the unknown parameters g and l respectively The estimated equation will have the form iii 30 3136 We de ne the estimated error or residual associated with each pair of data values as i yi yi yi 30 Bixi In a scatter diagram of y against x this is the vertical distance between the observed yi vaue and the tted value J71 as shown in Figure 30 31x 0 xi Figure 1 Regression residual Note that we are using a different symbol for this estimated error in as opposed to the true disturbance or error term de ned above ui These two will coincide only if 30 and El happen to be exact estimates of the regression parameters g and l The basic technique for determining the coef cients 30 and 31 is Ordinary Least Squares OLS values for 30 and 31 are chosen so as to minimize the sum of the squared residuals SSR The SSR may be written as Last revised 20030203 SSR 2012 201402 7 2017 30 731x02 It should be understood throughout that 2 denotes the summation 211 where n denotes the number of observa tions in the sample The minimization of SSR is a calculus exercise we need to nd the partial derivatives of SSR with respect to both 30 and 31 and set them equal to zero This generates two equations the normal equations of least squares in the two unknowns 30 and 31 These equations are then solved jointly to yield the estimated coef cients We start out from aSSRa o 7 722917 0 7 lm 7 0 1 aSSRa l 7 722x101 7 30 7 lm 7 0 2 Equation 1 implies that Eyi 7130 312M 0 5 30 J7 i 313 3 while equation 2 implies that Exiyi 7 302m 7 312x12 0 4 We can now substitute for 30 in equation 4 using 3 This yields Exiyi 07 3133mm 7 312x 0 7 Exiyi 7 x 7 31mg 722x 0 A Exiyi 7 512x 7 5 61 2x 7 22x Equations 3 and 4 can now be used to generate the regression coef cients First use 5 to nd 31 then use 3 to nd 30 2 Goodness of t The OLS technique ensures that we nd the values of 30 and 31 which t the sample data best in the speci c sense of minimizing the sum of squared residuals There is no guarantee however that 30 and 31 correspond exactly with the unknown parameters g and l Neither in fact is there any guarantee that the best tting line ts the data well maybe the data do not even approximately lie along a straight line relationship So how do we assess the adequacy of the tted equation 9 First step ndAthe residuals For each xvalue in the sample compute the tted value or predicted value of y 115mg J7i I30 lim 9 Then subtract each tted value from the corresponding actual observed value of yi Squaring and summing these differences gives the SSR as shown in Table 1 Now obviously the magnitude of the SSR will depend in part on the number of data points in the sample other things equal the more data points the bigger the sum of squared residuals To allow for this we can divide though by the degrees of freedom which is the number of data points minus the number of parameters to be estimated 2 in the case of a simple regression with an intercept term Let n denote the number of data points or sample size then the degrees of freedom d f n 7 2 The square root of the resulting expression is called the estimated standard error of the regression 8 A SSR a n 7 2 Table 1 Example of nding residuals Given 30 523509 El 01388 data xi data Vi tted 07139 i yi 51139 1065 1999 2001 702 004 1254 2280 2263 17 289 1300 2350 2327 23 529 1577 2850 2712 138 19044 1600 2390 2744 7354 125316 1750 2930 2952 722 484 1800 2850 3021 7171 29241 1870 3650 3118 532 283024 1935 2950 3208 7258 66564 1948 2900 3226 7326 106276 2254 3850 3651 199 39601 2600 5050 4131 919 844561 2800 4250 4409 7159 25281 3000 4150 4686 7536 287296 2 0 2 182736 SSR The standard error gives us a rst handle on how well the tted equation ts the sample data But what is a big 8 and what is a sm all one depends on the context The standard error is sensitive to the units of measurement of the dependent variable A more standardized statistic which also gives a measure of the goodness of t of the estimated equation is R2 This statistic is calculated as follows 2 SSR SSR R 1 7 E 1 7 7 2m 7ygt2 SST Note that SSR can be thought of as the unexplained variation in the dependent variableithe variation left over once the predictions of the regression equation are taken into account The expression 201 7 W on the other hand represents the total variation total sum of squares or SST of the dependent variable around its mean value So R can be written as 1 minus the proportion of the variation in y that is unexplained or in other words it shows the proportion of the variation in yi that is accounted for by the estimated equation As such it must be bounded by 0 and 1 03112231 R2 1 is a perfect score obtained only if the data points happen to lie exactly along a straight line39 R2 0 is perfectly lousy score indicating that x is absolutely useless as a predictor for yi When you add an additional variable to a regression equation there is no way it can raise the SSR and in fact it s likely to lower the SSR somewhat even if the added variable is not very relevant And lowering the SSR means raising the R2 value One might therefore be tempted to add too many extraneous variables to a regression if one were focussed on achieving the maximum R2 An alternative calculation the adjusted R squared or R2 attaches a small penalty to adding more variables thus if adding an additional variable raises the R2 for a regression that s a better indication that is has improved the model that if it merely raises the plain unadjusted R2 The formula is 72 SSRnikil n71 2 R71 SSTn71 71 nikil1 R where k 1 represents the number of parameters being estimated 2 in a simple regression To summarize so far alongside the estimated regression coef cients 30 and 31 we should also examine the sum of squared residuals SSR the regression standard error 8 and the R2 value adjusted or unadjusted in order to judge whether the best tting line does in fact t the data to an adequate degree 3 Con dence intervals for regression coef cients As stated above even if the OLS math is performed correctly there is no guarantee that the coe icients 30 and 31 thus obtained correspond exactly with the underlying parameters g and l Actually such an exact corre spondence is highly unlikely The statistical issue here is a very general one estimation is inevitably subject to sampling error As we have seen a con dence interval provides a means of quantifying the uncertainty produced by sampling error Instead of simply stating I found a sample mean income of 39000 and that is my best guess at the population mean although I know it is probably wrong we can make a statement like I found a sample mean of 39000 and there is a 95 percent probability that my estimate is off the true parameter value by no more than 1200 Con dence intervals for regression coef cients can be constructed in a similar manner Suppose we come up with a slope estimate of El 90 using the OLS technique and we want to quantify our uncertainty over the true slope parameter A by drawing up a 95 percent con dence interval for this parameter Provided our sample size is reasonably large the rule of thumb is the same as before39 the 95 percent con dence interval for l is given by 31 j 2 standard errors Our single best guess at l point estimate is simply 31 since the OLS technique yields unbiased estimates of the parameters actually this is not always true but we ll postpone consideration of tricky cases where OLS estimates are biased And on exactly the same grounds as before there is a 95 per chance that our estimate 31 will lie within 2 standard errors of its mean value l But how do we nd the standard error of 31 I shall not derive this rigorously but give the formula along with an intuitive explanation The standard error of El written as se 1 and not to be confused with the standard error of the regression 8 is given by the formula 5531 ie it is the square root of the square of the regression standard error divided by the total variation of the indepen dent variable xi around its mean What are the various components of the calculation doing First note the general point that the larger is se 1 the wider will be the con dence interval for any speci ed con dence level Now according to the formula the larger is 8 the larger will be se 1 and hence the wider the con dence interval for the true slope This makes sense 8 provides a measure of the degree of t of the estimated equation as discussed above If the equation ts the data badly large 8 it stands to reason that we should have a relatively high degree of uncertainty over the true slope parameter Secondly the formula tells us that other things equal a high degree of variation of x makes for a smaller se 1 and so a tighter con dence interval Why should this be The more x has varied in our data sample the better the chance we have of picking up any relationship that exists between x and y Take an extreme case and this is rather obvious suppose that x happens not to have varied at all in our sample ie 2xi 7 x2 0 In that case we have no chance of detecting any in uence of x on And the more the independent variable has moved the more any in uence it may have on the dependent variable should stand out against the background noise iii 4 Example of con dence interval for the slope parameter One example Suppose we re interested in whether a positive linear relationship exists between x and yi We ve obtained l 90 and se 1 12 The approximate 95 percent con dence interval for l is then 90 j 2l2 90 j 24 66 to 114 This tells us that we can state with at least 95 percent con dence that l gt 0 and there is a positive relationship On the other hand if we had obtained se 1 61 our interval would have been 90 j 26l 90 j 122 732 to 212 In this case the interval straddles zero and we cannot be con dent at the 95 percent level that there exists a positive relationship Notes on Probability Allin Cottrell lm a ml I X 0 The classical approach The probability of any event A is the number of outcomes that correspond to A 11A divided by the total number of equiprobable outcomes n or the proportion of the total outcomes for which A occurs 0 s PA s 1 Example let A be the event of getting an even number when rolling a fair die Three outcomes correspond to this event namely 2 4 and 6 out of a total of siX possible outcomes so PA g um u Inn I X l Complementary probabilities If the probability of some event A is PA then the probability that event A does not occur P A must be P A 1 PA Example if the chance of rain for tomorrow is 80 percent the chance that it doesn t rain tomorrow must be 20 percent When trying to compute a given probability it is sometimes much easier to compute the complementary probability rst then subtract from 1 to get the desired answer lm ml I X 2 Addition Rule A means of calculating the probability of A U B the probability that either of two events occurs With equiprobable outcomes PA quot7A and PB First approximation PA U B quot4 quot3 Problem nA m may overstate the number of outcomes corresponding to A U B we must subtract the number of outcomes contained in the intersection A m B namely nAB B U quot Ml A lm a ml I X 3 Thus the full version of the addition rule is 71ATLB WAB n mm n n n PA PB PA B PAUB ll 4m 4 gt ml I X 4 Multiplication rule Clearly nAB MXWAB n n nA the RHS is the LHS multiplied by nAnA 1 nABn E PA O B is the probability that A and B both occur nAn E PA represents the marginal unconditional probability of A nABnA represents the number of outcomes in A m B over the number of outcomes in A or the probability of B given A lm a ml I X 5 The general form of the multiplication rule for joint probabilities is therefore PA m B PA gtlt PBIA Special case A and B are independent Then PBIA equals the marginal probability PB and the rule simpli es PA m B PA gtlt PB lm a ml I X 6 Exercises The probability of snow tomorrow is 20 and the probability of all members of ECN 215 being present in class is 8 let us say What is the probability of both these events occurring A researcher is experimenting with several regression equations Unknown to him all of his formulations are in fact worthless but nonetheless there is a 5 per cent chance that each regression will by the luck of the draw appear to come up with signi cant results Call such an event a success If the researcher tries 10 equations what is the probability that he has exactly one success What is the probability of at least one success um um I X 7 4 Marginal probabilities N PltAgt Z PltAEigt x Mi 11 where E1 EN represent N mutually exclusive and jointly exhaustive events Example conditional on Ei snow P 12 0 snow P Pa11 here 12 72 84 product m m 2 m lm I ml I X 8 Conditional probabilities In general PAB at PBA the probability of A given B is not the same as the probability of B given A Example The police department of a certain city nds that 60 percent of cyclists involved in accidents at night are wearing light colored clothing How can we express this in terms of conditional probability Should we conclude that wearing lightcolored clothing is dangerous lm a ml I X 9 Discrete random variables The probability distribution for a random variable X is a mapping from the possible values of X to the probability that X takes on each of those values Xi PX xi XiPX 2X1 M CD U1 4 w J l H mIH OUIH mIH mIH OUIH CDIH mlm OUIU39I ow mlw CHIN OUIH CHOU lm II III I X 10 N E00 2 m Z xiPltX xi i1 The mean is the probability weighted sum of the possible values of the random variable Uniform distribution one die PX Xi Xi lm I IIquot I X H Variance Probability weighted sum of the squared deviations of the possible values of the random variable from its mean or expected value of the squared deviation from the mean N VarX E a 2 x1 u2PX Xi 11 EltX u2 E X2 2Xu u2 EltX2 2EXu 112 2 EX2 2112 12 EltX2 u2 EX2 0012 Note that in general EX2 i EX2 4 4 un I X 12 Example variance for one die Xi PXx139 Xi IJ Xi 12 xi I12PXxt 1 g 25 625 10417 2 15 225 03750 3 g 05 025 00833 4 05 025 00833 5 15 225 03750 6 25 625 10417 2 1 0 2917 VarX 114gtun I X 13 Two dice Sample Space 11 21 31 41 51 61 1 2 3 4 5 6 10 15 20 25 30 35 2 2 2 2 2 2 15 20 25 30 35 40 1 2 3 4 5 6 3 3 3 3 3 3 20 25 30 35 40 45 lm I ml V 39v 1 4 2 4 3 4 4 4 5 4 6 4 25 30 35 40 45 50 5 5 5 5 5 5 35 40 45 50 55 60 IX 1 6 2 6 3 6 4 6 5 6 6 6 14 Xi 10 15 20 25 30 35 40 45 50 55 60 2 P061 3 3M 3 3 t lm lm 3 3 IN 3 36 E l lm X1PXt i 36 3 36 6 36 g 36 E 36 g 36 Q 36 g 36 E 36 11 36 6 36 E 36 4 35 Inn 961 11 25 20 15 10 05 00 05 10 15 20 25 00 xi 102 625 400 225 100 025 000 025 100 225 400 625 gtltPxi 017 022 019 011 003 000 003 011 019 022 017 146 S 02 01 005 02 lm TWO dice 1 2 3 4 5 6 7 SC Three dice I l 1 2 3 4 5 6 7 SC 4 gt I39ll I X 16 S 02 01 005 02 lm Four dice I I I I 3 4 5 6 1 2 7 3 Fivedice IIIII 1 2 3 4 5 6 7 3 4quot I39ll I X 17 Measures of Association The covariance of X and Y is the expected value of the cross product deviation of X from its mean times deviation of Y from its mean C0VX Y O39Xy EX EXY mm 01 N CovX Y Zixi EXyj EltYgt1 i1 It measures the linear association between X and Y lm II IIquot I X 18 EXEY III IV Cross products are positive ml and III negative in II and IV lm II III I X 19 The correlation coef cient for two variables X and Y is a scaled version of covariance divide through by the product of the standard deviations of the two variables CovX Y pXY xVarXVarY 4 4 mu I X 20 The correlation coef cient for two variables X and Y is a scaled version of covariance divide through by the product of the standard deviations of the two variables CovX Y pXY xVarXVarY Note that 1 s p 3 1 4 4 mu I X 20 Continuous random variables Let the random variable X the number towards which the spinner points when it comes to rest 12 4 4 nu I X 21 Continuous random variables Let the random variable X the number towards which the spinner points when it comes to rest 12 To nd probabilities think in terms of fractions of the total measure 4 4 In I X 21 Continuous random variables Let the random variable X the number towards which the spinner points when it comes to rest 12 To nd probabilities think in terms of fractions of the total measure P0ltXlt331214 4 4 In I X 21 Continuous random variables Let the random variable X the number towards which the spinner points when it comes to rest 12 To nd probabilities think in terms of fractions of the total measure P0ltXlt331214 P7ltXlt921216 4 4 nu I X 21 Cumulative density function or cdf Fx PX ltx uII4gt1Im I X 22 Cumulative density function or Cdf PX PX ltx The probability that a random variable X has a value less than some speci ed value x 4 4 mu I X 22 Cumulative density function or Cdf PX PX lt x The probability that a random variable X has a value less than some speci ed value x Cdf for the spinner example PX lm I I II I X 22 Probability density function or pdf 61 fX 1706 4m 4 In I X 23 Probability density function or pdf 61 fX FX Derivative of the cdf with respect to x 4 4 mu I X 23 Probability density function or pdf 61 fX FX Derivative of the cdf with respect to x Determine the probability of X falling into any given range by taking the integral of the pdf over that interval 4 4 mu I X 23 Probability density function or pdf 61 fX FX Derivative of the cdf with respect to x Determine the probability of X falling into any given range by taking the integral of the pdf over that interval f x H NIquotl 4 4 mu I X 23 Probability density function or pdf 61 fX FX Derivative of the cdf with respect to x Determine the probability of X falling into any given range by taking the integral of the pdf over that interval f x H NIquotl 0 3 6 9 12 X 962 PX1ltXltX2 fxdx 961 lm I I III I X 23 Gaussian distribution Central Limit Theorem If a random variable X represents the summation of numerous independent random factors then regardless of the speci c distribution of the individual factors X will tend to follow the normal or Gaussian distribution 4m 4 In I X 24 Gaussian distribution Central Limit Theorem If a random variable X represents the summation of numerous independent random factors then regardless of the speci c distribution of the individual factors X will tend to follow the normal or Gaussian distribution u0a1 4m 4 In I X 24 Gaussian distribution Central Limit Theorem If a random variable X represents the summation of numerous independent random factors then regardless of the speci c distribution of the individual factors X will tend to follow the normal or Gaussian distribution u0a1 General formula for the normal pdf 1 7xiu2 fxU e 202 ooltxltoo 4m 4 In I X 24 The standard normal distribution is obtained by setting 11 0 and 0 1 its pdf is 1 fX e xz2 ooltxltoo 4 4 mu I X 25 The standard normal distribution is obtained by setting 11 0 and 0 1 its pdf is Commit to memory Pu 20ltxltu20 2095 Pu 30ltxltu30 z0997 lm I ml OOltXltOO The standard normal distribution is obtained by setting 11 0 and 0 1 its pdf is 1 fX e xz2 ooltxltoo Commit to memory Pu 20ltxltu20 2095 Pu 30ltxltu30 z0997 A compact notation for saying that x is distributed normally with mean 1 and variance 02 is x Nu 02 4 4 mu I X 25 Contents 0 The Classical approach 9339 0 Discrete random variables 99 Addition Rule 239 o Multiplication rule 39 o Marginal probabilities 93 Conditional probabilities 2 39 GHUSSiaIl diStriblHiOD 3 Measures of Association 392 0 Continuous random variables 239 onll lll 33 I X 27
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'