Popular in Course
verified elite notetaker
Popular in Economcs
This 269 page Class Notes was uploaded by Ms. Ari Lesch on Saturday September 26, 2015. The Class Notes belongs to ECON 671 at Iowa State University taught by Staff in Fall. Since its upload, it has received 37 views. For similar materials see /class/214442/econ-671-iowa-state-university in Economcs at Iowa State University.
Reviews for ECONOMETRICS I
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/26/15
MULTIVARIATE PROBABILITY DISTRIBUTIONS 1 PRELIMINARIES 11 Example Consider an experiment that consists of tossing a die and a coin at the same time We can consider a number of random variables defined on this sample space We will assign an indicator random variable to the result of tossing the coin If it comes up heads we will assign a value of one and if it comes up zero we will assign a value of zero Consider the following random variables X1 The number of dots appearing on the die X2 The sum of the number of dots on the die and the indicator for the coin X 3 The value of the indicator for tossing the coin X4 The product of the number of dots on the die and the indicator for the coin There are twelve sample points associated with this experiment E11H E22H E33H E44H E55H E66H E7 2 E8 2 E9 2 E10 2 E11 2 5T E12 2 Random variable X1 has six possible outcomes each with probability Random vari able X 3 has two possible outcomes each with probability Consider the values of X2 for each of the sample points The possible outcomes and the probabilities for X2 are as follows TABLE 1 Probability of X2 Value of Random Variable Probability 1 12 1 6 1 6 1 6 1 6 1 6 1 12 IOU IHgtUNH Date August 9 2004 2 MULTIVARIATE PROBABILITY DISTRIBUTIONS 12 Bivariate Random Variables Now consider the intersection of X1 3 and X2 3 We call this intersection a bivariate random variable For a general bivariate case we write this as PX1 m1 X2 m2 We can write the probability distribution in the form of a table as follows for the above example TABLE 2 Joint Probability of X1 and X2 X2 1 2 3 4 5 6 7 1 5 0 0 0 0 0 2 0 i i 0 0 0 0 X1 3 0 0 i T12 0 0 0 4 0 0 0 T12 0 0 5 0 0 0 0 117 0 0 0 0 0 117 i For the example PX1 37 X2 3 i which is the probability of sample point E9 2 PROBABILITY DISTRIBUTIONS FOR DISCRETE MULTIVARIATE RANDOM VARIABLES 21 De nition If X1 and X2 be discrete random variables the function given by 199617 952 PX1 9617 X2 952 for each pair of values of 17 2 within the range of X1 and X2 is called the joint or bivariate probability distribution for X1 and X2 Specifically we write 19x17 m2 PX1 1 X2 27 700 lt 1 lt 007 700 lt 2 lt OO 1 In the single variable case the probability function for a discrete random variable X assigns non zero probabilities to a countable number of distinct values of X in such a way that the sum of the probabilities is equal to 1 Similarly in the bivariate case the joint probability function 19m17 2 assigns non zero probabilities to only a countable number of pairs of values 1 2 Further the non zero probabilities must sum to l 22 Properties of the Joint Probability or Density Function Theorem 1 If X 1 and X2 are discrete random variables with joint prababilityfanctian 19951 952 then i 19z17 m2 2 0f0r all 1 2 ii 2 199517 x2 1 where the sum is over all values 951 952 that are assigned non zero probabilities MULTIVARIATE PROBABILITY DISTRIBUTIONS 3 Once the joint probability function has been determined for discrete random variables X1 and X2 calculating joint probabilities involving X1 and X2 is straightforward 23 Example 1 Roll a red die and a green die Let X1 number of dots on the red die X2 number of dots on the green die There are 36 points in the sample space TABLE 3 Possible Outcomes of Rolling a Red Die and a Green Die First number in pair is number on red die Green 1 2 3 4 5 6 Red 1 11 12 13 14 15 16 2 21 22 23 24 25 26 3 31 32 33 34 35 36 4 41 42 43 44 45 46 5 51 52 53 54 55 56 6 61 62 63 64 65 66 The probability of 1 1 is The probability of 6 3 is also Now consider P2 X1 3 1 X2 2 This is given as P2 S X1 S 371 X2 S 2p271p272p371p372 4 l 36 9 24 Example 2 Consider the example of tossing a coin and rolling a die from section 1 Now consider P2 X1 3 1g X2 2 This is given as P2 X1 4 3 X2 5 p2 3 p2 4 p2 5 2937 3 2937 4 2937 5 2947 3 2947 4 2947 5 5 7 36 25 Example 3 Two caplets are selected at random from a bottle containing three aspirin two sedative and four cold caplets If X and Y are respectively the numbers of aspirin and sedative caplets included among the two caplets drawn from the bottle find the prob abilities associated with all possible pairs of values of X and Y The possible pairs are 00 01 10 11 02 and 20 To find the probability associated with 1 0 for example observe that we are concerned with the event of getting one of the three aspirin caplets none of the two sedative caplets and hence one of the four cold caplets The number of ways in which this can be done is 1 lt3 i 12 4 MULTIVARIATE PROBABILITY DISTRIBUTIONS and the total number of ways in which two of the nine caplets can be selected is lt2 Since those possibilities are all equally likely by virtue of the assumption that the selec tion is random it follows that the probability associated with 1 0 is Similarly the probability associated with 1 1 is WIN 6 1 6 36 36 and continuing this way we obtain the values shown in the following table TABLE 4 Joint Probability of Drawing Aspirin X 1 and Sedative Caplets ng cow Cth O O CNH new H x o o 5 N We can also represent this joint probability distribution as a formula 3 2 4 291724W7z07172 2407172 0 my 2 3 DISTRIBUTION FUNCTIONS FOR DISCRETE MULTIVARIATE RANDOM VARIABLES 31 De nition of the Distribution Function If X1 and X2 are discrete random variables the function given by 700 lt 1 lt OO F17 952 P X1 S 9617 X2 S 952 Z 2 190117142 ui wi M2312 2 fooltx2ltoo where 190117 u2is the value of the joint probability function of X1 and X2 at 741 742 is called the joint distribution function or the joint cumulative distribution of X1 and X2 MULTIVARIATE PROBABILITY DISTRIBUTIONS 5 32 Examples 321 Example 1 Consider the experiment of tossing a red and green die where X1 is the number of the red die and X2 is the number on the green die Now find F27 3 PX1 27 X2 3 This is given by summing as in the definition equation 2 F2 3 P X1 2 X2 3 Z 2 19011 712 u1lt2 u2lt3 1917 1 917 2 p17 3 192 1 p27 2 p27 3 111111 E E 7671 36 6 322 Example 2 Consider Example 3 from Section 2 The joint probability distribution is given in Table 4 which is repeated here for convenience TABLE 4 Joint Probability of Drawing Aspirin X 1 and Sedative Caplets O CMH Ah t H x o o 5 N 0 l 6 Z 9 A 36 The joint probability distribution is 2627121 px7yT7z07 17 2110 17 20 xy 2 For this problem find Fl 2 PX 17 Y 2 This is given by 6 MULTIVARIATE PROBABILITY DISTRIBUTIONS F172PIX 17YSQIZ2190417142 ui luz z p07 0 p07 1 p07 2p17 0 p17 1 2907 2 121110 6 9 36 3 6 i2 36 36 36 36 36 733 36 4 PROBABILITY DISTRIBUTIONS FOR CONTINUOUS BIVARIATE RANDOM VARIABLES 41 De nition of a Joint Probability Density Function A bivariate function with values 1 17 2 defined over the mag plane is called a joint probability density function Of the continuous random variables X1 and X2 if and only if P X17 X2 6 A fz1 2 dzl dzg for any region A E the an plane 3 A 42 Properties of the Joint Probability or Density Function in the Continuous Case Theorem 2 A bivariate function can serve as a joint probability density function of a pair of continuous random variables X1 and X2 if its values ml x2 satisfy the conditions i fm1x220 foriooltm1lt00700ltz2ltoo ii if ff 90179521951 1902 1 43 Example of a Joint Probability Density Function Given the joint probability density function 6x z2 0ltz1lt170ltz2ltl f9017 902 0 elsewhere Of the two random variables X1 and X2 find P X17 X2 6 A where A is the region 172 0lt1lt7lt2lt2 We find the probability by integrating the double integral over the relevant region ie MULTIVARIATE PROBABILITY DISTRIBUTIONS 2 E P0ltX1ltg ltX2lt214fx1z2dz1dz2 7 0 3 1 i 1 i 6m z2dz1dm2 del dmg 1 0 1 0 3 3 1 E 4 2 6m1z2dz1dz2 1 0 3 Integrate the inner integral first 1 1 6x z2 dzl dmg g P0ltX1ltltX2lt2 4 0 i lt2m i z 3 dmg 3 2 7 d2 27 ltlt2gt ltagt 12gt 1 54 72 d2 A 64 Now integrate the remaining integral 3 1 1 54 P0ltX1ltZ7 ltX2lt21a2d2 E 54 2 T 952 ltlt1gt lt M lt 48 3 8 1 i a 8 MULTIVARIATE PROBABILITY DISTRIBUTIONS This probability is the volume under the surface u 2 6z z2 and above the rect angular set 120lt1lt7 lt2lt1 in the z1m2 plane 44 De nition of a Joint Distribution Function If X1 and X2 are continuous random variables the function given by F PXlt Xlt m2 m d d iooltm1ltoo 4 9017902 179017 272li700700f0 171 2 U1 Uzioolt2ltoo where fu1 742 is the value of thejoint probability function of X1 and X2 at 741 742 is called the joint distribution function or the joint cumulative distribution of X1 and X2 If the joint distribution function is continuous everywhere and partially differentiable with respect to 1 and 2 for all but a finite set of values then 82 7 F 5 9017 902 83018352 9017 902 wherever these partial derivatives exist 45 Properties of the Joint Distribution Function Theorem 3 If X 1 and X2 are random variables with joint distribution function F 9517 952 then i Fltgt07 700 FPOO m2 Fm 700 0 ii Foo7 oo 1 iii Ifa lt bandc lt 1 then Fa7 c lt Fb7 1 iv Ifa gt 1 and b gt 2 then Fa7 b 7 Fa7 m2 7 Fm17 b F m1 m2 2 0 Part iv follows because Fa7 17Fa7 x27Fm17 bFz17 m2 Pz1 ltX1 0 m2ltX2 b 20 Note also that Foo7 00 E im lim 17 2 1 1 1400 2400 implies that the joint density function u 2 must be such that the integral of u 2 over all values of 17 2 is 1 46 Examples of a Joint Distribution Function and Density Functions 461 Deriving a Distribution Function from a Joint Density Function Consider a joint density function for X1 and X2 given by x1m2 0ltz1lt170ltz2ltl f 9017 902 0 elsewhere This has a positive value in the square bounded by the horizontal and vertical axes and the vertical and horizontal lines at one It is zero elsewhere We will therefore need to find the value of the distribution function for five different regions second third and fourth quadrants square defined by the vertical and horizontal lines at one area between MULTIVARIATE PROBABILITY DISTRIBUTIONS 9 the vertical axis and a vertical line at one and above a horizontal line at one in the first quadrant area between the horizontal axis and a horizontal line at one and to the right of a vertical line at one in the first quadrant the area in the first quadrant not previously mentioned This can be diagrammed as follows 125 II I 1 075 05 I I 025 I I I I I I I I I I I l 1 05 0 25 05 075 1 1 25 III IV We find the distribution function by integrating the joint density function If either 1 lt 0 or 2 lt 0 it follows that Fl17 2 0 For0ltz1 lt1and0ltz2 lt1weget 2 11 1 Fm17 m2 s t 13 dt Emlx ml m2 0 0 form gt1and0ltm2lt1weget m2 1 1 Fm17 m2 s tds dt z2x2 1 0 0 for0ltz1lt1andm2gt1weget 1 11 Fm17 m2 s tdsdt 1z1z1 1 0 0 10 MULTIVARIATE PROBABILITY DISTRIBUTIONS andforml gt1andz2 gt1weget 1 1 Fm1z2 stdsdt1 0 0 Because the joint distribution function is everywhere continuous the boundaries be tween any two of these regions can be included in either one and we can write 0 forz1 00rm2 0 121 2 fOI O lt 1 lt170 lt 2 lt1 Fz17 x2 x2x2 1 for 1 2170 lt 2 lt1 1lt11gt for0ltm1lt17m221 1 form1217z221 47 Deriving a Joint Density Function from a Distribution Function Consider two ran dom variables X1 and X2 whose joint distribution function is given by 17 e m117 e for 1 gt 0 and 2 gt 0 F 17 2 0 elsewhere Partial differentiation yields 82 7 ml z 8182Fltm17 2 8 For 1 gt 0 and 2 gt 0 and 0 elsewhere we find that the joint probability density of X1 and X2 is given by e WHZ for 1 gt 0 and 2 gt 0 0 elsewhere 9017902 5 MULTIVARIATE DISTRIBUTIONS FOR CONTINUOUS RANDOM VARIABLES 51 Joint Density of Several Random Variables The k dimensional random variable X17 X27 Xk is said to be a k dimensional random variable if there exists a function 7 7 2 Osuchthat wk mk1 mi Fm1z2mk fu1u2ukdu1duk 6 for all 17 2 zk where F172737PX1 17X2 27X3 37 MULTIVARIATE PROBABILITY DISTRIBUTIONS 11 The function f is defined to be a joint probability density function It has the following properties fz1 2 mk Z 0 fz1m2mkdm1dmk1 00 00 In order to make it clear the variables over which 1 is defined it is sometimes written 9017 9027 7 90k fX1X2Xk17 9027 7 90k 8 6 MARGINAL DISTRIBUTIONS 61 Example Problem Consider the example of tossing a coin and rolling a die from sec tion 1 The probability of any particular pair 1 2 is given in the Table 5 TABLE 5 Joint and Marginal Probabilities of X1 and X2 X2 1234567 11 1 I oooooa 11 1 zo ooooa X1300 712000 11 1 4000 00a 11 1 soooo oa 111 600000 a 1111111 126666612 Notice that we have summed the columns and the rows and placed this sums at the bottom and right hand side of the table The sum in the first column is the probability that X2 1 The sum in the sixth row is the probability that X1 6 Specifically the column totals are the probabilities thath will take on the values 1 2 3 7 They are the values 6 9z2 Z 19z17 2 for 2 17 2 37 7 7 m11 In the same way the row totals are the probabilities that X1 will take on the values in its space Because these numbers are computed in the margin of the table they are called marginal probabilities 12 MULTIVARIATE PROBABILITY DISTRIBUTIONS 62 Marginal Distributions for Discrete Random Variables If X1 and X2 are discrete random variables and 19m17 2 is the value of their joint distribution function at 17 2 the function given by 9901 2249017 902 9 for each 1 within the range of X1 is called the marginal distribution of X1 Correspond ingly the function given by hm 229621222 10 1 for each 2 within the range of X2 is called the marginal distribution of X2 63 Marginal Distributions for Continuous Random Variables If X and Y are jointly continuous random variables then the functions and fy are called the marginal probability density functions The subscripts remind us that fX is defined for the random variable X Intuitiver the marginal density is the density that results when we ignore any information about the random outcome Y The marginal densities are obtained by integration of the joint density fX fXY7 20 00 11 My am y dz In a similar fashion for a k dimensional random variable X fX1m1 flt1727 dm2dm3dxk fX2m2111fz12 dm1dz3dmk 64 Example 1 Let the joint density of two random variables ml and 2 be given by 2287w1 1 2 0 0 2 1 7 0 otherwise flt172 What are the marginal densities of 1 and 2 MULTIVARIATE PROBABILITY DISTRIBUTIONS 13 First find the marginal density for 1 1 f1901 290287m1 1902 0 zgeiml 3 e7m1 7 0 e ml Now find the marginal density for 2 00 f2902 2Zeimld1 0 72xge m1 0 7 72280 2m250 22 65 Example 2 Let the joint density of two random variables x and y be given by Wm 24 m4y 0ltzlt2 Oltyltl 0 otherwise What are the marginal densities of z and y First find the marginal density for m 1 mm 0 gm 4ydy H mlH IH IH A x2770 m2 14 MULTIVARIATE PROBABILITY DISTRIBUTIONS Now find the marginal density for y 6 g z I 3si7 mgt 2 8y fyy 02 4y dm 2 1 6 l 6 7 CONDITIONAL DISTRIBUTIONS 71 Conditional Probability Functions for Discrete Distributions We have previously shown that the conditional probability of A given B can be Obtained by dividing the prob ability Of the intersection by the probability of B specifically PA O B 133 Now consider two random variables X and Y We can write the probability that X z and Y y as PMIE as P X Y pXm yyM 14 p 957 a My provided PY y 7 0 where Mm7 y is the value ijoint probability distribution of X and Y at z y and Hg is the value of the marginal distribution of Y at y We can then define a conditional distribution of X given Y y as follows If 1995 y is the value of the joint probability distribution of the discrete random variables X and Y at as y and Hg is the valuefor the marginal distribution on at y then thefunction given by pulm3 more as for each x within the range of X is called the conditional distribution of X given Y y 72 Example for discrete distribution Consider the example of tossing a coin and rolling a die from section 1 The probability of any particular pair 1 2 is given in the following table where ml is the value on the die and 2 is the sum of the number on the die and an indicator that is one if the coin is a head and zero otherwise The data is in the Table 5 repeated following Consider the probability that 1 3 given that 2 4 We compute this as follows MULTIVARIATE PROBABILITY DISTRIBUTIONS TABLE 5 Joint and Marginal Probabilities of X1 and X2 X2 1234567 1T12 00000 20 0000 X1300 000 4000i 00 50000 0 600000712 For the example PX1 37 X2 3 1937 4 714 7 19961 m2 7 h902 pm1lm2p3l4 We can then make a table for the conditional probability function for 1 1172 which is the probability of sample point E9 16 MULTIVARIATE PROBABILITY DISTRIBUTIONS TABLE 6 Probability Function for X1 given X2 X2 1234567 1100000 200000 X1300000 4000 00 50000 0 6000001 We can do the same for X2 given X1 TABLE 7 Probability Function for X2 given X1 X2 1234567 1100 00 200000 X1300000 400000 50000 0 600000 73 Conditional Distribution Functions for Continuous Distributions 731 Discussion In the continuous case the idea of a conditional distribution takes on a slightly different meaning than in the discrete case If X1 and X2 are both continuous MULTIVARIATE PROBABILITY DISTRIBUTIONS 17 PX1 1 I X2 2 is not defined because the probability of any one point is identically zero It make sense however to define a conditional distribution function ie PX1 1 X2 2 because the value of X2 is known when we compute the value the probability that X1 is less than some specific value 732 De nition afu Continuous Distribution Function If X1 and X2 are jointly continuous random variables with joint density function 1 17 2 then the conditional distribution function of X1 given X 2 2 is F1 2 PX1 1 X2 2 We can obtain the unconditional distribution function by integrating the conditional one over 2 This is done as follows 00 Ml Fm mama dzz lt17 700 We can also find the probability that X1 is less than 1 is the usual fashion as 1 F x1 fX1t1dt1 18 But the marginal distribution inside the integral is obtained by integrating the joint den sity over the range of 2 Specifically le t1 fX1X2 7517 902 d902 19 This implies then that Fm1jo jl fX1X2t1m2dt1dz2 20 Now compare the integrand in equation 20 with that in equation 17 to conclude that 1 Fm in mm fX1X2ltth x2gtdt1 21 m1 fX1X2t172dt1 foo sz x2 We call the integrand in the second line of 21 the conditional density function of X1 given X2 m2 We denote it by fz1 l 2 oerle2z1 l 2 Specifically gtF1 2 MULTIVARIATE PROBABILITY DISTRIBUTIONS Let X1 and X2 be jointly continuous random variables with joint probability density leX2 9517 952 and marginal densities fX1z1 and fX2x2 respectively For any 952 such that fX2z2 gt 0 the conditional probability density function of X 1 given X2 x2 is defined to be fX1X217 902 sz 2 9017902 aw meXle I902 22 And sirnilarly fX1X217 902 fX11 f9017 902 an fX2IX12 I 901 23 74 Example Let the joint density of two random variables x and y be given by m4y 0ltxlt270ltylt1 x a 0 otherw1se The marginal density of z is fXm am I 2 while the marginal density of y is My 2 Sol Now find the conditional distribution of z given y This is given by 7 x a 190 4y 054y 8y 2 for 0 lt z lt 2 and 0 lt y lt 1 Now find the probability thatX 1 given thaty First determine the density function when y as follows Then MULTIVARIATE PROBABILITY DISTRIBUTIONS 19 8 INDEPENDENT RANDOM VARIABLES 81 Discussion We have previously shown that two events A and B are independent if the probability of their intersection is the product of their individual probabilities ie P A m B PAPB 24 In terms of random variables X and Y consistency with this definition would imply that PaXbcgygdPaXbPcgigd 25 That is if X and Y are independent the joint probability can be written as the product of the marginal probabilities We then have the following definition Let X have distribution function FXz Y have distribution function Fyy and X and Y gave joint distribution function Fab7 Then X and Y are said to be independent if and only if FM 24 FXXFYy 26 for every pair of real numbers as If X and Y are not independent they are said to be dependent 82 Independence De ned in Terms of Density Functions 821 Discrete Random Variables If X and Y are discrete random variables with joint prob ability density function Mm7 y and marginal density functions pXz and pyy respec tively then X and Y are independent if and only if PXY7 y pXpyy pltzgtpltygt 27 for all pairs of real numbers z 20 MULTIVARIATE PROBABILITY DISTRIBUTIONS 822 Continuous Bivoriote Random Variables If X and Y are continuous random variables with joint probability density function m y and marginal density functions g and fyy respectively then X and Y are independent if and only if z m fXY y fX fYy 28 NOW for all pairs of real numbers z 83 Continuous Multivariate Random Variables In a more general context the variables X17 X27 Xk are independent if and only if k fX1X2 WXk9017 9027 7 90k H in 90 1 29 fX11fX22 ink f1f2 fm In other words two random variables are independent if the joint density is equal to the product of the marginal densities 84 Examples 841 Example 1 7 Rolling a Die and Tossing a Coin Consider the previous example where we rolled a die and tossed a coin X1 is the number on the die X2 is the number of the die plus the value of the indicator on the coin H 1 Table 7 is repeated here for convenience For independence Mm7 y pzpy for all values of 1 and 2 TABLE 7 Probability Function for X2 given X1 X2 1234567 100000 200000 X1300000 400000 50000 0 00000 ttttt MULTIVARIATE PROBABILITY DISTRIBUTIONS 21 To show that the variables are not independent we only need show that Wm a 7g ppy Consider 1917 2 If we multiply the marginal probabilities we obtain 1 l l 1 lt6 lt8 a a 5 842 Example 2 7 A Continuous Multiplicative Iaint Density Let the joint density of two random variables 1 and 2 be given by 2287w1 1 2 07 0 2 1 0 otherwise 901952 The marginal density for 1 is given by 1 f1901 29028 m1 d902 0 zgeiml 3 e m1 7 0 ml 8 The marginal density for 2 is given by CO f2902 2Zeim1d1 0 723525 I 0 7 72280 2352620 2x2 It is clear the joint density is the product of the marginal densities 843 Example 3 Let the joint density of two random variables x and y be given by 3z210glyl 0 m 172 y 4 m y 21Iog2 7 210M 0 otherwise 22 MULTIVARIATE PROBABILITY DISTRIBUTIONS First find the marginal density for m 7 1 73z210gy hm TA 21 log2 7 210g4 dy 73m2ylogy 7 1 4 21log2 7 2 log4 i 73m24log4 7 1 3x22l0g2 7 1 21log2 7 2log4 i 32lt2l0g2 71 7 41og471 21log2 7 2l0g 4 i 32 2 log2 7 2 7 4log4 4 21log2 7 210g4 i 3z22 log2 7 4l0g4 2 T 21log2 7 210g4 2 i 3m2lt21l0g2l 7 210g4 T 21log2 7 2l0g4 3m2 Now find the marginal density for y i 1 73z210glyl Ivegt71 1 dm 2171og27 2log4l 0 7logy 0 21log2 7 2l0g4 7loglyl 21log2 7 2l0g4 It is clear the joint density is the product of the marginal densities 844 Example 4 Let the joint density of two random variables X and Y be given by 3 2 3 39 y 1 y 0 otherw1se MULTIVARIATE PROBABILITY DISTRIBUTIONS First find the marginal density for m 2 3 2 3 fXm7O gm Eygt dy 3 2 3 2 5 y T 209 gt 6 2 12 7 7 70 5 T 20 6 2 3 3 3 Now find the marginal density for y 7 1 3 5 101 The product of the marginal densities is not the joint density 845 Example 5 Let the joint density of two random variables X and Y be given by 7 25W 09037 03 m7 y 7 0 otherwise Find the marginal density of X fX 00 2814ry dy Qe Wer 728H00 487mm 0 26723 2521 The marginal density of Y is obtained as follows 24 MULTIVARIATE PROBABILITY DISTRIBUTIONS y fyy 2e mydm 0 728mm 23 728721441 7280ygt 72e 2y 2873 2873 17 e y We can show that this is a proper density function by integrating it over the range of z and y 00 00 00 2e7ltmy dy dm 72e y 00 dm 0 m 0 m X 2672mm 0 78721 00 O few 7 Hi 011 Or in the other order as follows A f2altwygtdmdy fo in rm dy co A 2641 7 2e 2y dy 72w I 3 7 few I 728700 2 7 787 1 02701 2 7 1 1 85 Separation of a Joint Density Function 851 Theorem 4 Theorem 4 Let X1 and X2 have a joint density function ml x2 that is positive if and only if a g 951 g b and c g 952 g dfor constants a b c and d and ml x2 0 otherwise Then X1 and X2 are independent random variables if and only if fz1 m2 gx1hx2 MULTIVARIATE PROBABILITY DISTRIBUTIONS 25 where 9951 is a nanenegative mctian of 951 alone and 71952 is a nanenegativefunctian of 952 alone Thus if we can separate the joint density into two multiplicative terms one depending on 1 alone and one on 2 alone we know the random variables are independent without showing that these functions are actually the marginal densities 852 Example Let the joint density of two random variables x and y be given by 32 ng0y1 0 otherwise Wm 24 We can write fm7 y as gmhy where m 0 g m 0 otherwise 8 0 lt y lt 1 h y 0 otherwise These functions are not density functions because they do not integrate to one 12 1 12 1 d A 2 l A 1 A a a 2 0 8 7g 1 1 8dy8y087 1 0 The marginal densities as defined below do sum to one 8m 0 g m g fXltgt T 0 otherwise 1 0 S y S 1 0 otherwise fyy 9 EXPECTED VALUE OF A FUNCTION OF RANDOM VARIABLES 91 De nition 911 Discrete Case Let X X17 X27 7 Xk be a k dimensional discrete random vari able with probability function pm17 27 7 Let g7 7 7 be a function of the k random variables X17 X27 7 Xk Then the expected value ofgX17 X27 7 Xk is EgX17 X27 7Xk Z Z ZZgm17 7 mkpx17 277 mk 30 mk mkil 2 26 MULTIVARIATE PROBABILITY DISTRIBUTIONS 912 Continuous Case Let X X17 X27 7 Xk be a k dimensional random variable with density fm17 27 7 Letg7 7 7 be a function of the k random variables X17 X27 7 Xk Then the expected value ofgX17 X27 7 Xk is El9X17 X2 7 Xkl 99017 7 kfX1Xk17 7 90kd1 dmk wk M71 m2 1 9m177mkfX1mXkx177mkdm1dmk if the integral is defined Similarly if gX is a bounded real function on the interval a7 1 then b b EltgltXgtgt gltzgtdFltzgt ng 32gt where the integral is in the sense of Lebesque and can be loosely interpreted asf dm Consider as an example 9z17 7 zk mi Then EgX177XkEEX7Ejo fem00 mifx177mkdx1dxk 33 3 96739in 195739 700 because integration over all the other variables gives the marginal density of mi 92 Example Let the joint density of two random variables 1 and 2 be given by 2287w1 1 2 0 0g 2 1 901952 7 0 otherwise The marginal density for 1 is given by 1 f1901 290287m1 1902 o l3 e7m1 7 0 17 e The marginal density for 2 is given by MULTIVARIATE PROBABILITY DISTRIBUTIONS 27 CO f2902 2Zeim1d1 0 72xge m1 0 7 72280 2280 22 We can find the expected value of X1 by integrating the joint density or the marginal density First with the joint density 1 00 E 2z1zge m1dz1 d2 0 0 Consider the inside integral first We will need a u dv substitution to evaluate the inte gral Let u 2mm and dv e m1 dzl du 2m2dm1 and v 78 Then 00 00 Qzlzge d1 721287m 30 7 72zeim1 d1 0 0 00 72m1z2e m1 30 2z2e ml dzl 0 0 723526 I 22 Now integrate with respect to 2 1 E 22 d2 0 mg 1 Now find it using the marginal density of 1 Integrate as follows 00 E X1 187 d1 0 We will need to use a u dv substitution to evaluate the integral Let MULTIVARIATE PROBABILITY DISTRIBUTIONS u 1 and dv e7m1 dzl du aim and v 78 Then 00 00 7 7 co 7 18 m1d1 718 1 0 7 7e m1d1 0 0 CO 7 co 7 7z1e 1 0 e m10ml 0 7 7 11 00 0 8 I0 787007 780 0ll We can likewise show that the expected value of 2 is Now consider E 12 We can obtain it as 1 co EX1X2 235135 dzl 1mg 0 0 Consider the inside integral first We will need a u dv substitution to evaluate the inte gral Let u 2z1z and dv e7m1 dzl du 2mg aim and v 78 Then 00 00 2z1mge m1dz1 72m1zge 30 7 72mg dzl 0 0 00 72m1zge 30 2zge 1 dzl 0 7 0 72353 30 2mg Now integrate with respect to 2 MULTIVARIATE PROBABILITY DISTRIBUTIONS 29 1 EX1X2 2mg 1mg 0 1 2 3 7352 7 3 0 3 93 Properties of Expectation 931 Constants Theorem 5 Let c be a constant Then EM 3 cfx7 y dydm w y A y m y dydz 34 C C 932 Theorem Theorem 6 Let 9X1 X2 be a function of the random variables X1 and X2 and let a be a constant Then EagX17 X2 3 agm17 m2fm1x2dm2dm1 35 a 9z17 x2fm1 m2dm2dm1 m1 m2 3 aEI9X17 X2I 933 Theorem Theorem 7 Let X and Y denote two random variables defined on the same probability space and let ab y be theirjoint density Then EaXbY ambyfm ydmdy y m aymmx wdmdy 36 bymy 967 yddy aEX bEY 30 MULTIVARIATE PROBABILITY DISTRIBUTIONS In matrix notation we can write this as Ea1a2 alaz aim 12112 37 934 Theorem Theorem 8 Let X and Y denote two random variables de ned on the same probability space and let 91X7 Y7 92X7 Y7 93X7 Y7 gkX Y befanctions ofX7 Y Then Ei91X7 Y 92ltX7 Y quot 9hX7 EI91X7 YI EI92X7 YI EI9kX7 WI 38 935 Independence Theorem 9 Let X1 and X 2 be independent random variables and gX1 and hX2 be functions of X 1 and X2 respectively Then E 9X1hX2I E 901 E 7102 39 provided that the expectations exist Proof Let u 2 be the joint density of X1 and X2 The product gX1hX2 is a function of X1 and X2 Therefore we have E9ltX11hltX2HE 911h2fz172dm2dm1 2 gltz1gthltz2gtfxlltz1gtfxxz2gtdzzdzl E MQ951VX1901 mzh2fX22dm2 1351 40 gltz1gtfxlltz1inhltX2gtD cm EhX2I 91fX1951d901 E EhX2E9X1I MULTIVARIATE PROBABILITY DISTRIBUTIONS 31 10 VARIANCE COVARIANCE AND CORRELATION 101 Variance of a Single Random Variable The variance of a random variable X with mean u is given by varX E 02 E E X E EX2 E E X 7 W21 3105 E WZJCW 195 41 E 1m2fmdmi 1xfmdz2 E Em2 E E2m The variance is a measure of the dispersion of the random variable about the mean 102 Covariance 1021 Definition Let X and Y be any two random variables defined in the same proba bility space The covariance of X and Y denoted covX7 Y or 0X y is defined as MIX Y 2 M 7 My 7 m1 3 EIXYI EIMXYI EIXMYI EIMYMXI EXY 7 EXEY 0 mzmmdmdw fwmwmf fmmwmdy 00700 Ellyf7yddyi1mex7ydm1nym72706124 The covariance measures the interaction between two random variables but its numer ical value is not independent of the units of measurement of X and Y Positive values of the covariance imply that X and Y that X increases when Y increases negative values indicate X decreases as Y decreases 42 1022 Examples i Let the joint density of two random variables ml and 2 be given by 2287w1 1 2 07 0 2 1 901952 E 0 otherwise 32 MULTIVARIATE PROBABILITY DISTRIBUTIONS We showed in Example 92 that E X1 1 E lel g E X129 g The covariance is then given by covX7 Y E EXY 7 EXEY ii Let the joint density of two random variables ml and 2 be given by 61 0912 0922 xizz 0 otherwise First compute the expected value of X1X2 as follows 3 2 1 EX1X2 i 2 d1 d2 0 0 6 MULTIVARIATE PROBABILITY DISTRIBUTIONS Then compute expected value of X1 as follows EX10302m dm1dx2 Tax l2 03d2 03 d2 1 3 1 Then compute the expected value of X2 as follows 3 2 1 E 612d1d2 0 0 3 1 2 0 0 dmg 3 4 72 d2 0 l2 3 l 72 d2 0 3 8 mm o o H wlw alts aha The covariance is then given by COVX7 Y E EXY 7 EXEY 224w 2720 34 MULTIVARIATE PROBABILITY DISTRIBUTIONS iii Let the joint density of two random variables 1 and 2 be given by 901 0 2 1 2 0 otherwise 1 951902 First compute the expected value of X1X2 as follows 2 23 EX1X2 7m m2dz1dm2 0 m2 8 2 3 3 2 70 mlxg m2 dmg 2 24 3 4 0 lt 27 2gt dmg 74 32 2 E 7 4 3 76 3 Then compute expected value of X1 as follows 2 11 EX1 g dmzdml 0 0 2 3 m1 0 Ogtdm1 MULTIVARIATE PROBABILITY DISTRIBUTIONS Then compute the expected value of X2 as follows 2 2 3 E 12d1d2 0 m2 2 3 2 2 7x m dm jg lt16 1 2 2 2 2 12 3 3 0 ltE2 7 zz dmg 2 3 3 3 0 ltZ2 7 zz dmg 2 4712 48 quot 8 7 64 96 48 48 a aa 47 3 quot4 The covariance is then given by COVX7 Y E EXY 7 EXEY 103 Correlation The correlation coefficient denoted by 0X7 Y or p X y of random vari ables X and Y is defined to be COVX7 Y 43 UXUY provided that COVX7 Y TX and try exist and 0X 0y are positive The correlation coeffi cient between two random variables is a measure of the interaction between them It also has the property of being independent of the units of measurement and being bounded between negative one and one The sign of the correlation coefficient is the same as the PXY 36 MULTIVARIATE PROBABILITY DISTRIBUTIONS sign of the covariance Thus p gt 0 indicates thath increases as X1 increases and p 1 in dicates perfect correlation with all the points falling on a straight line with positive slope If p 0 there is no correlation and the covariance is zero 104 Independence and Covariance 1041 Theorem Theorem 10 If X and Y are independent random variables then 1042 Example Consider the following discrete probability distribution covX7 Y 0 Proof We know from equation 42 that covX7 Y EXY 7 EXEY We also know from equation 39 that if X and Y are independent then E 9007100 E 9Xl E WW Let gX X and hY Y to obtain 131 1321311 Substituting into equation 45 we obtain mwxm4Ewmm4Ewmm40 44 45 46 47 48 The converse of Theorem 10 is not true ie covX7 Y 0 does not imply X and Y are independent m1 41 0 1 1 ii 1 5 16 16 16 16 3 3 6 3 0 444 0 444 444 4 4 2 16 16 16 8 1 3 1 5 16 16 16 16 6 3 44 444 1 16 16 8 16 MULTIVARIATE PROBABILITY DISTRIBUTIONS 37 These random variables are not independent because the joint probabilities are not the product of the marginal probabilities For example 1 5 5 25 pxlmil 71 E px1lt71gtpxllt71gt E E Now compute the covariance between X1 and X2 First find EX1 as follows Em 71gt lt0 lt1 0 Similarly for the expected value of X2 EX2 71 0 1 0 Now compute EX1X2 as follows Elele lt71gtlt71gt 16 lt71gtltogt Hm The covariance is then COVX7 Y E EXY 7 EXEY E 0 7 0 0 0 In this case the covariance is zero but the variables are not independent 105 Sum of Variances var 111111 121112 vara1z1 1222 a varz1 a3 varz2 I 204012 covml7 2 2 2 2 2 0101 204012012 1202 02 03912 04 ai7 a2 1 2 49 03921 02 a2 Z13 varml7 a2 38 MULTIVARIATE PROBABILITY DISTRIBUTIONS 106 The Expected Value and Variance of a Linear Functions of Random Variables 1061 Theorem Theorem 11 Let Y1 Y2 Yn and X1 X2 H Xm be random variables with EM M and Define V L m U1 Zam and U2 ijXj 50 i1 j1 for constants a1 a2 an and b1 b2 bm Then thefallawing three results hold 0 EIUII 221 MM ii varU1 ELI a varYl 2 Zaiaj COVDi l where the double sum is over all pairs 2 j withi lt j iii COVIUL U2 21 221 bi COVIYiv lee Proof i We want to show that 7L EIUil 2 MM i1 Write out the EU1 as follows 51 using Theorems 6 8 as appropriate MULTIVARIATE PROBABILITY DISTRIBUTIONS 39 ii Write out the varU1 as follows 7L 7L 2 varU1 EU1 7 EU12 E am 7 Zaim i1 i1 n 2 E MOi M i1 52 TL E 2 i 02 Z ZaiajOi MY M i1 1739 7L 2 agEOi 02 Z ZaiajEKYi MW 1 i1 1739 By definitions of variance and covariance we have 7L varU1 Z algVOi Z Zaiaj COVY17 53 i1 1739 Because 19gt coma712 we can write 7L varU1 ZagVOi 2 Zaiaj COVY1397 54 i1 iltj Similar steps can be used to obtain iii MULTIVARIATE PROBABILITY DISTRIBUTIONS iii We have COVU17 U2 E lUl EU1l U2 EU2l lt2 am 7 2 WM ijj T 25151 i1 i1 F1 F1 E MOi M 11 n m 55 E ZZMMYI T WXj 5D i1 j1 Z ZaibiE MiXj 7 i1 j1 11 CONDITIONAL EXPECTATIONS 111 De nition If X1 and X2 are any two random variables the conditional expectation of gX1 given that X2 m2 is defined to be EI9X1 IXZI 009m1fm1 I222de 56 if X1 and X2 are jointly continuous and E 9X1 l le 2990019901 l 902 57 if X1 and X2 are jointly discrete 112 Example Let the joint density of two random variables X and Y be given by Wm 24 2 95207 24207 my 1 0 otherwise MULTIVARIATE PROBABILITY DISTRIBUTIONS 41 We can find the marginal density of y by integrating the joint density with respect to x as follows hubj mwdm 17y de 0 2liy 0lty 1 We find the conditional density of X given that Y y by forming the ratio Wm 24 fYy leYlt l y 1 nggliy y We then form the expected value by multiplying the density by z and then integrating over an l m2 17y 7179 0 gt 7 1 17202 717ylt 2 gt 1iy 2 We can find the unconditional expected value of X by multiplying the marginal density of y by this expected value and integrating over y as follows 42 MULTIVARIATE PROBABILITY DISTRIBUTIONS EX EyEX I Y 1 012y2lt17ygtdy 1 17y2dy 0 17203 1 We can show this directly by multiplying the joint density by x then and integrating over z and y 1 17y EX demdy 0 0 1 2 17y m dy 0 lt 0 gt 1 1 i 202 dy 0 7 11793 0 3 dy The fact that we can find the expected value of m using the conditional distribution of z given y is due to the following theorem MULTIVARIATE PROBABILITY DISTRIBUTIONS 43 113 Theorem Theorem 12 Let X and Y denote random variables Then EIXl EY ExpIX l Yl 58 The inner expectation is with respect to the conditional distribution of X given Y and the outer expectation is with respect to the distribution of Y Proof Suppose that X and Y are jointly continuous with joint density FX7 y and marginal distributions g and fyy respectively Then EX znym ydzdy 19 le195 l yfYydzdy 59 1 Sm3 9 fYydy EXYymltygtdy EY ExpIX l Yll The proof is similar for the discrete case 114 Conditional Variance 1141 Definition Just as we can compute a conditional expected value we can compute a conditional variance The idea is that the variance of the random variable X may be different for different values of Y We define the conditional variance as follows VaFIXlYylEX EIXlYyl2lYyl 60 EX2Yy7EXYyHZ We can write the variance of X as a function of the expected value of the conditional variance This is sometimes useful for specific problems 11 42 Theorem Theorem 13 Let X and Y denote random variables Then varX EvarX I Y varEX I Y 61 44 MULTIVARIATE PROBABILITY DISTRIBUTIONS Proof First note the following three definitions varIX I Y 7 EIXZ IYI 7 IEIX I YIIZ 62a EIvarIX I YII 7 EIEIXZ I YII 7 EIIEIX I YIIZI lt62bgt varIEIX I YII 7 E IEIX I YIIZ 7 EIEIX I YII I2 620 The variance of X is given by 2 varIXI 7 EIXZI 7 IE XII 63 We can find the expected value of a variable by taking the expected value of the condi tional expectation as in Theorem 12 For this problem we can write EIXZI as the expected value of the conditional expectation of X 2 given Y Specifically EIXZI EYIEXIYIXZIYII 64 and IEIXIIZ 7 IEYIEXIYIX I YIIIZ 65 Write 63 substituting in 64 and 65 as follows varIXI 7 EIsz 7 IEIXIIZ 66 EY ExpIX2 I Yl IEYIEXIYIX I VIII2 Now subtract and add EI IE X I YI 2 to the right hand side of equation 66 as follows varIX 7 EYIEXIYIXZ I Yl 7 IEYIEXIYIX I YIIIZ 7 EYIEXIYIXZ I Yl 7EIIEltX I YIIZ 67 EIEltX I WW 7 IEYIEnyIX I VIII Now notice that the first two terms in equation 67 are the same as the right hand side of equation 62b which is EIvarIX I Then notice that the second two terms in equation 67 are the same as the right hand side of equation 62b which is var IE IX I YII We can then write var IX I as varIX 7 EYIEXIYIXZ I Yl 7 IEYIEXIYIX I YIIIZ 68 EIvarIX I YII 7I7 varIEIX I YII MULTIVARIATE PROBABILITY DISTRIBUTIONS 1143 Example Let the joint density of two random variables X and Y be given by ilt21y0 z17 egg1 Wm 24 0 otherwise We can find the marginal density of x by integrating the joint density with respect to y as follows 69 We can find the marginal density of y by integrating the joint density with respect to x as follows My 0 m mm 1 1 12x y dm 0 70 46 MULTIVARIATE PROBABILITY DISTRIBUTIONS We find the expected value of X by multiplying the conditional density by z and then integrating over z 71 72 l l U H to The variance of X is then given by varX E E X 7 EX2 E Em2 7 E2m 5 72 ala 5 49 73 im 60 49 M m 11 m MULTIVARIATE PROBABILITY DISTRIBUTIONS 47 We find the conditional density of X given that Y y by forming the ratio leYltm l y 74 We then form the expected value of X given Y by multiplying the density by z and then integrating over z 1 lt2lygt 75 48 MULTIVARIATE PROBABILITY DISTRIBUTIONS We can find the unconditional expected value of X by multiplying the marginal density of y by this expected value and integrating over y as follows EX EyEX m 7 2 4 3y1 i 0 1 243yly 0 71 61y dy 2 4 3y dy 76 1 32 i lt4y ygt T We find the conditional variance by finding the expected value of X 2 given Y and then subtracting the square of EX I Y 77 MULTIVARIATE PROBABILITY DISTRIBUTIONS 49 Now square EX I Y 2 Ezixmltlt2gt 1133 7 143y2 736 1y2 78 Now subtract equation 78 from equation 77 1 32y l43y2 X Y 7 7 if 4 9amp2amp2amp M W 7 12192 30y 18 716 24y 9272 f4447 57444 79 7 3342 634 2 7 361y2 For example if y 1 we obtain 332 I 6y I 2 X Y 1 var l 36lt1 y2 11 m 111 80 To find the expected value of this variance we need to multiply the expression in equa tion 80 by the marginal density of Y and then integrate over the range of Y 23y26y21 EVarXlY 0 Wl l wdy 81 4L w m 144 0 1 y y Consider first the indefinite integral 3342 I 6y I 2 z d 82 1 y y This integral would be easier to solve if 1 y in the denominator could be eliminated This would be the case if it could be factored out of the numerator One way to do this is 50 MULTIVARIATE PROBABILITY DISTRIBUTIONS carry out the specified division 3y I 3 l y 3342 I 6y I 2 3242 3y 3y I 2 3y 3 83 71 3342 I 6y I 2 l a 3 3 7 7 1 y y 1 y Now substitute in equation 83 in to equation 82 as follows 3y2 I 6y I 2 d z 1 y y 73 37 1 84 7 24 1 y y 3 2 73y7log1y Now compute the expected value of the variance as 1 2 3g 6y 2 E var X Y 7 d l l l H 1440 1y y 1 3 2 m 3yilogl1yl 85 1 12 7 m 6 7log3gt 7log1 i 12 7 10g3 144 To compute the variance of EX I Y we need to find Ey l YDZ and then subtract Ey EX Y 2 First find the second term The expected value of X given Y comes from equation 75 EX Y 3135 86 MULTIVARIATE PROBABILITY DISTRIBUTIONS 51 We found the expected value of EX I Y in equation 76 We repeat the derivation here by multiplying EX I Y by the marginal density of Y and then integrating over the range of Y 2 mm M 411353Elty my 1 2 Jlt4Bygtdy 1 32 i lt4y ygt 1 12 WW Now find the first term 2 2 2 1 43y1 E EX Y i 77 1d M1 1 1 036 1y24lty gt y 1 2 4 3 2 144 1 y dy 88 0 9 71 29y224yl6 7 144 0 1 y Now find the indefinite integral by first simplifying the integrand using long division 92 24 16 ly9y2 24y 16 89 Now carry out the diVision dy 9y 15 1 ygyz 24y 16 9242 9y 15y 16 90 15y 15 l 9y224yl67 1 j 7 9 15 7 Hy y Hy 52 MULTIVARIATE PROBABILITY DISTRIBUTIONS Now substitute in equation 90 into equation 88 as follows EyE1x1Y12 1 2 9g 24y 16 dy 0 m 1y 7i 29 15id 144 0 y 1y y 1 912 2 7 91 4 2 151 logy 1 0 H g g 30 10g3 1 HHgtH q m 48 log3 The variance is obtained by subtracting the square of 87 from 91 2 varEX Y Ey ltEX Y2gt Ey EX YD 1 7 2 mus log3 7 1 Q 144 1 92 m 48 log3 m logm 7 11 We can show that the sum of 85 and 92 is equal to the var X1 as in Theorem 13 varX EVarX Y 7 varEX Y mlog371 T 12 7 l0g3 7 log3 7 1 12 7 log3 93 144 11 m which is the same as in equation 73 MULTIVARIATE PROBABILITY DISTRIBUTIONS 53 12 CAUCHY SCHWARZ INEQUALITY 121 Statement of Inequality For any functions gz and hz and cumulative distribu tion function F zthe following holds gmhzdFz ltgz2dFzgt lthmzdrmgt 94 where z is a vector random variable 122 Proof Form a linear combination of gz and Mm square it and then integrate as follows tgm hm 1MB 2 0 95 The inequality holds because of the square and dF gt 0 Now expand the integrand in 95 to obtain 252 gm2 dFz 2t gmhm day hz2 1mg 2 0 96 This is a quadratic equation in t which holds for all 25 Now define t as follows f9hdF t 97 fgm2 mm and substitute in 96 Umwm flt9ltmgtgt2dFltzgt WM dFltzgt em gltzgthltzgtdFltzgt2 hltzgt2dFltzgt gltzgt2dFltzgt hltmgt2dFltmgt gzgt2dFltzgt2 123 Corollary 1 Consider two random variables X1 and X2 and the expectation of their product Using 98 we obtain I gltzgthltzgtdFltzgt EltX1X2gt2 EltXfEltX 1 1 99 IEltX1X2gt I EltXfgt EltX gt 54 MULTIVARIATE PROBABILITY DISTRIBUTIONS 124 Corollary 2 COVX1X2 lt varX1VarX2 Proof Apply 98 to the centered random variables gX X1 7 p1 and hX X2 7 p2 where M 100 MULTIVARIATE PROBABILITY DISTRIBUTIONS 55 REFERENCES 1 Amemiya T Advanced Econometrics Cambridge Harvard University Press 1985 Z Bickel P and KA Doksum Mathematical Statistics Basic Ideas and Selected Topics Vol 1 2nd edition Upper Saddle River N Prentice Hall 2001 3 Billingsley P Probability and Measure 3rd edition New York Wiley 1995 4 Casella G And KL Berger Statistical Inference Pacific Grove CA Duxbury 2002 5 Cramer H Mathematical Methods ofStatistics Princeton Princeton University Press 1946 6 Goldberger AS Econometric Theory New York Wiley 1964 7 Lindgren BW Statistical TheorySrd edition New York Macmillan Publishing Company 1976 8 Rao CR Linear Statistical Inference and its Applications 2nd edition New York Wiley 1973 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES 1 THE MULTIVARIATE NORMAL DISTRIBUTION The n x 1 vector of random variables y is said to be distributed as a multivariate normal with mean vector u and variance covariance matrix 2 denoted y N N p E if the density of y is given by 87y7M E 127M 1 f y 7 E m 2703 121 Consider the special case where n 1 y yl u M1 2 02 e y11 z 21141 Mn 17 0 lt2wgtlt02gt 2 y1 H12 e 202 7 V 271172 is just the normal density for a single random variable 2 THEOREMS ON QUADRATIC FORMS IN NORMAL VARIABLES 21 Quadratic Form Theorem 1 Theorem 1 lfy N Nuy7 By then 2 A29 N NMz 14121 22 142214 where A is a matrix afcansttmts 211 Proof Ay 7 Any Ag 7 Any 3 AW 7 dzXIV MyAl 1 7 MW 7 MW Date July 12 2004 2 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES 212 Example Let Y17 Yn denote a random sample drawn from Nu7 02 Then Y1 u 02 0 g 4 Yn u 0 0 Now Theorem 1 implies that 7 1 1 Y7Y17Yn 71 71 5 u 1 1 lt7 7 E M and n n u i 1 1 2 77102702 71739 7n 7712 7 E 22 Quadratic Form Theorem 2 Theorem 2 Let the n x 1vectar y N N07 I Then y y X201 Proof Consider that each yi is an independent standard normal variable Write out y y in summation notation as 7 Zn 2 6 y y 7 ilyi which is the sum of squares of 71 standard normal variables 23 Quadratic Form Theorem 3 Theorem 3 If y N N 07 02 and M is a symmetric idempotent matrix afrimk m then y M y 7 em M 7 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES 3 Proof Since M is symmetric it can be diagonalized with an orthogonal matrix Q This means that M 0 0 0 0 amp 0 0 QMQA I 0 0 0 M Furthermore since M is idempotent all these roots are either zero or one Thus we can choose Q so that A will look like I 0 The dimension of the identity matrix will be equal to the rank of M since the number of non zero roots is the rank of the matrix Since the sum of the roots is equal to the trace the dimension is also equal to the trace of M Now let u Q y Compute the moments of v Q y MWQEF4 varv QUZIQ 10 UZQQ 02 since Q is orthogonal UNNmah Now consider the distribution of y M y using the transformation 12 Since Q is orthogonal its inverse is equal to its transpose This means that y Q 1v Qv Now write the quadratic form as follows yMy i UQMQ U 02 7 a 11 This is the sum of squares of tr M standard normal variables and so is a X2 variable with tr M degrees of freedom 4 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES Corollary If the n x 1 vector y N N07 I and the n x 71 matrix A is idempotent and of rank m Then y Ay X2071 24 Quadratic Form Theorem 4 Theorem 4 lfy N07 02 M is d symmetric idempotent matrix afarder n and L is d k x n matrix then Ly and y M y are independently distributed if LM 0 Proof Define the matrix Q as before so that 7 7 I 0 QMQiAi 0 0 12 Let r denote the dimension of the identity matrix which is equal to the rank of M Thus r tr M Let U Q y and partition 1 as follows U1 U2 7 U1 7 U 7 U2 7 UT 13 Ur1 MA The number of elements of U1 is r while 122 contains 71 7 r elements Clearly v1 and 122 are independent of each other since they are independent standard normals What we will show now is that y M y depends only on 121 and Ly depends only on 122 Given that the 11139 are independent 3 M y and Ly will be independent First use Theorem 3 to note that yMy UQMQ U I O 1 0 0 U 14 Ul Ul Now consider the product of L and Q which we denote C Partition C as Cl7 02 01 has k rows and r columns 02 has k rows and n 7 r columns Now consider the following product CQ MQ LQQ MQ since 0 LQ 15 LMQ 07 since LM 0 by assumption SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES 5 Now consider the product of C and the matrix Q M Q I 0 C M C C Q 17 2 0 0 16 0 This of course implies that 01 O This then implies that LQ C 07 02 17 Now consider Ly It can be written as Ly LCQCQy7 since Q is orthogonal Co by definition of C and v 18 02122 since 01 0 Now note that Ly depends only on 122 and y My depends only on 121 But since 121 and 122 are independent so are Ly and y M y 25 Quadratic Form Theorem 5 Theorem 5 Let the n x 1 vector y N N 07 I let A be an n x n idempotent matrix of rank m let B be an n x n idempotent matrix of rank 3 and suppose BA 0 Then y Ay and y By are independently distributed X2 variables Proof By Theorem 3 both quadratic forms are distributed as chi square variables We need only to demonstrate their independence Define the matrix Q as before so that Q AQ A 19 Let U Q y and partition 1 as 121 v2 7 U1 7 z u 7 i a 20 Ur1 i Un i Now form the quadratic form y Ay and note that yAy UQAQ U 1 u 21 Ul Ul 6 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES Now define G Q BQ Since B is only considered as part of a quadratic form we may consider that it is symmetric and thus note that G is also symmetric Now form the product GA Q BQQ AQ Since Q is orthogonal its transpose is equal to its inverse and we can write GA Q BAQ 0 since BA 0 by assumption Now write out this identity in partitioned form as G G I 0 Ggt i0 0 22 7 G1 0 7 OT 0 7 1 2 0 T 0 0 whereG1 isrxrG2isrx nirandG3is 7177 x 7177 This means then that G1 OT and G2 G Z O This means that G is given by 0 0 G lt0 GS 23 Given this information write the quadratic form in B as y By y Q QBQQ y UG U 7W 0 0 U1 24 U17 U2 0 GS U2 vZvag It is now obvious that y Ay can be written in terms of the first r terms of 1 while y By can be written in terms of the last 71 7 r terms of 1 Since the 1 s are independent the result follows 26 Quadratic Form Theorem 6 Craig s Theorem Theorem 6 lfy Nu7 Q where Q is positive de nite then ql y Ay and q2 y By are independently distributed if AQB 0 Proof of suf ciency This is just a generalization of Theorem 5 Since 9 is a covariance matrix of full rank it is positive definite and can be factored as Q TT Therefore the condition AQB 0 can be written ATT B 0 Now pre multiply this expression by T and post multiply by T to obtain that T ATT BT 0 Now define C T AT and K T BT and note that if AQB 0 then CK T ATT BT T QBT T OT 0 25 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES 7 Consequently due to the symmetry of C and K we also have 00CKKCKC 26 Thus CK 0 and KC 0 and KC OK A simultaneous diagonalization theorem in matrix algebra 9 Theorem 415 p 155 says that if CK KC then there exists an orthogonal matrix Q such that 12 3 27 am 7 32 where D1 is an m x 711 diagonal matrix and D2 is an n 7 m x n 7 711 diagonal matrix Now define v Q T ly It is then distributed as a normal variable with expected value and variance given by EM Q T lL varv Q T l T l Q QT71TTT71Q I 28 Thus the vector v is a vector of independent standard normal variables Now consider ql y Ay in terms of v First note that y TQU and that y UQT Now write out y Ay as follows ql yAy UQTATQ U U Q T T 1CT 1TQU UQCQ U U1D1U1 29 Similarly we can define 3131 in terms of v as qz y By U Q T BTQU U Q T T 1KT1TQU UQKQ U UZDZ UZ 30 Thus ql y Ay is defined in terms of the first 711 elements of v and q2 y By is defined in terms of the last 71 7 n1 elements of v and so they are independent The proof of necessity is difficult and has a long history 2 3 8 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES 27 Quadratic Form Theorem 7 Theorem 7 lfy is a n x 1 random variable and y N N0 2 then a 7 2710 7 a N X201 Proof Let w y 7 p 2 1y 7 In If we can show that w 2 2 where z is distributed as N 0 7 I then the proof is complete Start by diagonalizing E with an orthogonal matrix Q Since 2 is positive definite all the elements of the diagonal matrix A will be positive 1 0 0 0 0 A2 0 0 QEQA 31 0 0 0 An 1 7 0 0 0 VAl 1 0 7 0 0 M 2 32 i 0 0 0 Vn Now let the matrix H Q AQ Obviously H is symmetric Furthermore HH QAQQAQ Q A lQ 33 2 1 The last equality follows from the definition of E QAQ after taking the inverse of both sides remembering that the inverse of an orthogonal matrix is equal to its transpose Furthermore it is obvious that HEH QMQ EQMQ QAlQQAQQNQ 34 I Now let 6 y 7 u so that 6 N N07 2 Now consider the distribution of z H8 It is a standard normal since SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES 9 Em HE6 0 z Hvar6H 35 HEH I var A Now write w as w girls and see that it is equal to 2 2 as follows w 62718 sHHs HQHE 22 36 28 Quadratic Form Theorem 8 Let y N N07 I Let M be a non random idempotent matrix of dimension n x 71 rank r g Let A be a non random matrix such that AM 0 Let t1 My and let t2 Ay Then t1 and 252 are independent random vectors Proof Since M is symmetric and idempotent it can be diagonalized using an orthonor mal matrix Q as before 7 7 17Xquot Orxmir Q 7 A 7 0nirgtlt39r 0n739rgtltn739r Further note that since Q is orthogonal that M QAQ Now partition Q as Q Q17 Q2 where Q1 is n x r Now use the fact that Q is orthonormal to obtain the following identities QQwD QlQi QZQz In 38 Qi C in C in QQ ng bwll 7 IT 0 i 0 Inf Now multiply A by Q to obtain I 0 QA QlQZ lt0 0 39 Q1 0 10 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES Now compute M as M QAQ Q1Q2 31 2 QlQi Now let 21 Q31 and let 22 Q Zy Note that Z 21722 0 1 is a standard normal since 0 and varz CCquot I Furthermore 21 and 22 are independent Now consider t1 M y Rewrite this using 40 as Q1 Qiy 121 Thus t1 depends only on 21 Now let the matrix N I i M 292 from 38 and 40 Now notice that ANAIiMA7AMA since AM 0 Now consider t2 Ay Replace A with AN to obtain 40 t2 Ay AN y AQ2Q2y AQ Q zy AQ222 Now tldepends only on 21 and t2 depends only on 22 and since the 2s are independent the ts are also independent 41 SOME THEOREMS ON QUADRATIC FORMS AND NORMAL VARIABLES 11 REFERENCES 1 Cramer H Mathematical Methods ofStatistics Princeton Princeton University Press 1946 Z Driscoll MP and WR Grundberg 1 A History of the Development of Craig s Theorem The American Statistician 4019866569 3 Driscoll MP and B Krasnicka l An accessible proof of Craig s theorem in the general case The American Statistician 4919955961 4 Goldberger AS Econometric Theory New York Wiley 1964 5 Goldberger AS A Course in Econometrics Cambridge Harvard University Press 1991 6 Hocking R R The Analysis ofLinear Models Monterey BrooksCole 1986 7 Hocking R R Methods and Applications ofLinear Models New York Wiley 1996 8 Rao CR Linear Statistical Inference and its Applications 2nd edition New York Wiley 1973 9 Schott R Matrix Analysisfor Statistics New York Wiley 1997 A F 1 September 2004 Statistical Inference in the Classical Linear Regression Model Introduction In this section we will summarize the properties of estimators in the classical linear regression model previously developed make additional distributional assumptions and develop further properties associated with the added assumptions Before presenting the results it will be useful to summarize the structure of the model and some of the algebraic and statistical results presented elsewhere Statement of the classical linear regression model The classical linear regression model can be written in a variety of forms Using summation notation we write it as yt 31 32 x12 33393 81 V t linear model E8t x x12 O V t zero mean Var8t X rnpgk 02 Vt homoskedasticity E8t 85 O t as s no autocorrelation Xd is a known constant X39s nonstochastic No xj is a linear combination of the other X39s 81 N NO02 normality We can also write it in matrix notation as follows I j xp a II E8X 0 III Et8 021 IV X i a nomtorbam39r matrix of rank 1 V 8 N02021 The ordinary least squares estimator of 3 in the model is given by l3 X 39XVX 39J X XquotX Xl3 e B Km 1X2 The tted value ofy and the estimated vectors of residuals e in the model are de ned by J7 X9 9 j X B J X B e 1 lt2 lt3 lt4 53 lt5b lt6 1 2 3 The variance of e 02 is usually estimated using the estimated residuals as l E 62 2 x1 939 4 I n la n 15 C The fundamental matrices of linear regression 1 M the residual creation matrix The residuals from the least squares regression can be expressed as e J x9 J XX39XquotX39J I XltX39XquotX39y s My wbm MX I XX X1X a The matrix MX is symmetric and idempotent bi MXX 0 ci e MXE di e39e y39MXyi ei e39e e39Mxei 2 P the projection matrix Consider a representation of the predicted value of y A JJe J MJ I M 6 ny when PX I M9 ir the pmjem39an matrix a P is symmetric and idempotent bi PXX X c PX MX 0 3 AB The deviation transformation matrix Consider the matrix An below which transforms a vector or matrix to deviations from the mean AIJ 0o 111 o o 1111 39 a 001 11 1 7 11 1 quot 1 139 1 r1 r1 39 39 1 1 a A is symmetric and idempotent b1 21 i J 2 J39AJ c1 An MX MX first column ofX is a column of ones Proof First write An in a different fashion noting that the vector of ones we called j is the same as the first column of the X matrix in a regression with a constant termi 1 391391 it I II 391l39 A n n It 391391 1 II I n 00 111 01111 s 001 11 1 8 100 1 010 1 l111 001 1 100 1 1391 010 1 1 111 11 1 001 1 1 I x1x1 x1391x139 Now consider the product of An and MX AnMX I 39 x1x1x11X1MX MX 1x1lx1 1x1IMX 9 From previous results MXX Onxk which implies that X MX 0km This then implies that x 39M 0quot 1 Given that this product is a row of zeroes it is clear that the entire second term vanishesi This then implies A MX MX 10 D1 Some results on traces of matrices The trace ofa square matrix is the sum of the diagonal elements and is denoted tr A or tr We will state Without proof some properties of the trace operator a trace In n h trkA k trA c1 trace A B trace A trace P1 T P 39u trAB trBA if both AB and BA are defined trace ABC 2 trace CAB trace BCA The results in part e hold as along as the matrices involved are conformable though the products may be different dimensions We will also use Theorem 17 from the lecture on characteristic roots and vectors A proof of this theorem is given there Theorem 17 Let A be a square symmetric idempotent matrix of order n and rank r Then the trace of A is equal to the rank of A ie trA rA Some theorems on quadratic forms and normal variables stated Without proof 1 N 9 b S 53 4 9 Quadratic Form Theorem 1 Ify Nltpy 2y then 2 Cy N N p CH 2 C By C39 Where C is a matrix of constants Quadratic Form Theorem 2 Let the nxl vector yNO I then y39y X2n Quadratic Form Theorem 3 If yNO02I and M is a symmetric idempotent matrix of rank m then 10 7 x2trM 11 Corollary If the nx1 vector yNOI and the nxn matrix A is idempotent and of rank m then y39Ay x2011 Quadratic Form Theorem 4 If yNO02I M is a symmetric idempotent matrix of order n and L is a kxn matrix then Ly and y My are independently distributed if LM O Quadratic Form Theorem 5 Let the nx1 vector yNOI let A be an nxn idempotent matrix of rank m let B be an nxn idempotent matrix of rank s and suppose BA 0 Then y Ay and y39By are independently distributed X2 variables Quadratic Form Theorem 6 Craig s Theorem If yNp Q Where Q is positive definite then q1 y Ay and q2 y39By are independently distributed iff AQB O Quadratic Form Theorem 7 Ify is a nx1 random variable and yNpE then y 0392 1y 1 x201 Quadratic Form Theorem 8 Let y NO Let M be a nonrandom idempotent matrix of dimension nxn rankMr g in Let A be a nonrandom matrix such that AM 0 Let t1 My and let t2 Ay Then t1 and t2 are independent random vectors Some finite sample properties of the ordinary least squares estimator in the classical linear regression model can be derived Without specific assumptions about the exact distribution of the error term 1 Unbiasedness of 3 Given the properties of the model we can show thatB is unbiased as follows if X is a nonstochastic 2 9 b matrix of full rank 3 X39Xle39J X X391X XB e B X39X391X t BIB B EX39Xgt391X39e B Xx ix39m 3 lt12 Variance of y We know that yt depends on the constants X1 and 3 and on the stochastic error eti We write this as y xt39B e t1n 13 This implies that Varolt Vader 02 t1n 14 Furthermore with E8t as O t as s ice the covariance between yt and y s is zero implying that Varol Var8 021 15 Variance of B We can determine the variance of 9 by writing it out and then using the information we have on the variance of y and the formula for the variance of any quadratic formi 3 X39XYlX39J C V1148 X39X391X VaryXX39X391 X X391X39azIXX Xquot 02X XquotX XX Xquot 02 XIX1 a2 CC 16 B is the best linear unbiased estimator of 5 We can show that a is the best linear unbiased estimator off by showing that any other linear unbiased estimator has a variance which is larger that the variance of G by a positive definite matrix The least squares estimator is given y B X39XV X39J Cy 17 Consider another linear unbiased estimator given by 3 G Linearity is imposed by the linear form of We can determine the restrictions on G for B to be unbiased by writing it out as S follows J X9 6 3 G Em EGJ EGXB Ge 18 6138 EGXD GXI i unhiaud The variance of p is similar to the variance of a Varm ozGG39 19 Now let D G C G XIX 1X so that G D Ci Now rewrite the variance of 3 as Vari OZGG39 uZDcDc39 azDD CD DC CC 20 02DD XIO1XIDI DXXIX1 XIO1XIXXIlt1 02DD XIQ1XIDI DXXIX1 X90711 Now substitute in equation 20 for D G X39X 1X39 and D39 GX 1k and X G 1k VarB 02DD X XquotX D DXX Xquot X Xquot azDD ozltX39XgtquotX39 39 XltX39X 02G X XquotX XX Xquot 02000quot ozDD 02X Xquot 02X X391 02X X391 02X Xquot 02X X391 OZDD39 02X X391 ozDD Vardi G 39 XX39X 391 noting that 21 The variance of 3 is thus the variance of a plus a matrix that can be shown to be positive definite Unbiasedness of s2 Given the properties of the model we can show that s2 is an unbiased estimator of 02 First write e39e as a function of e e MXj MXXB e MXB Jinn MX 0 22 e e 839MX39MX8 eMXc Now take the expected value ofe39e use the property of the trace operator that tr ABC 2 tr BCA and then simplify 9 e Me E e e E e Me nk Izk Etrc Mc nk Mss nk trME8839 23 nl MOZI nk oztrM nk E We find the trace of M using the properties on sums products and iden ty matrices trMx WU 39 XX39X391X39 I 39 XXX391X39 ma trX XX X391 24 I Ii n h Covariance of a and e Given the properties of the model we can show that the covariance of B and e is zero First write bothB and e as functions ofe from equations 2 and 5 J X13 8 B 5 ltX XquotX39e 25 e MXy Remember thatB has an expected valued of Bhecause it is unbiased We can show that e has an expected valued of zero as follows X3 e MX I XX39X391X e MX MXXB I5 0 MXs MXX 0 26 MXe Ee EMX8 MXEe 0 We then have t3 Edi B B XXf X39s 2 eEce 7 Now compute the covariance directly Gummt e EKB B e 0gt39 EX XquotX 22 M X39XquotX39Eee39M M i ymmmit X XquotX 021M 02X39X391X39M 28 02X39X391X39 I XX39X391X39 02X X391X39 X PQ39iX39MX XYIX J 2X ltquotX X XY X J o2 0 o Gr Distribution of a given normally distributed errors 1 intro ductio n Now make the assumption that 81 NO02 or 8 N0 Z OZIi Given that J XB 8 X ir a namtocbam39c matrix of mnk k 5 NO2UZI 29 then y is also distributed normally because we are simply adding a constant vector to the random vector 6 The error vector 6 is not transformed in forming yr Given E O Ey X and Vary 021 we then have J NX3 2 021 30 exact distribution of n We can writeB as a linear function of the normal random variable y from equation 2 as follows XI 1XI B X J 31 C C X XY X We can find its distribution by applying Quadratic Form Theorem 1 From this theorem 5a CEO and V1149 C Vary C i Substituting we obtain E3 XX 1X50 X39Xr X39Xp l3 32 V446 X XquotX VaryXX X391 XIXin 021XX1X71 02 XIX 1 Therefore we have B NB02X39X391 33 We can also show this by viewing 9 directly as a function ofe and then applying the theoremi B 5 X X391X398 3 C8 N0021 34 NBC021C Nl3 zX Xquot I 11 W39Dcu 2 Hi Distribution of s2 Consider the quantity n IZ lt35 This can be written e39 nk nk12 n k 02 02 Lquot 2 36 e MXe 02 a 39 s M a X a The random variable 80 is a standard normal variable with mean zero and variance 1 The matrix MX is symmetric and idempotent By Theorem 3 on quadratic forms this ratio is distributed as a X2 variable with n k degrees of freedom that is k 2 x2 1 37 0 where we found the trace ofMX in equation 24 2 Given thatw x2n k we can use information on the properties of chi squared random 0 variables to find the variance of szi First remember that the variance of a X2 variable is equal to twice its degrees of freedom iiei Var X2 2V Now rearrange equation 37 as as follows Val w 39afWZ 2W nk 2 02 Varrz 2n k 38 VarJ2 quot2 L sampling distribution of B 3 1 sample variance of B We showed in equation 34 that 9 NpozX39Xgtquot 39 We can write the variance of B as Z 91 M ma 2 02XX1 0 quot 092 0 quot 40 mm quotare quotpk We can estimate this using s2 as an estimate of 02 72 J39 I 91 M M 2 I2XrX4 9231 62 I929 41 1291 IN IS Note that the individual variances of the coef cients are equal to an element of X39X 1 say sii times szi Using s for the ijth element of X39X 1 is a sometimes confusing notation but seems to be standard distribution of N p39 0 First consider the moments of a From equation 2 write B as a function of e 6 X39X1X39J X XquotX X e 42 a X39Xr X39e As usual define C X39X1X39 and write 42 as C 8 B p 43 l3 Ce Now compute the mean and variance of 9 p EC8 CE8 0 VarC8 CVar8C 1 2 1 44 X39X X 0 IXX39X a2 XIX We noted previously that s 1 2 N O OI N0I 45 o 02 Now consider the moments of 3 B i This can be written in a manner similar to 44 using the 0 matrix C as follows 3 B C U 0 J J c Eltegt 0 Var C CVar C 46 O U XIX1XI aZIXXIXl U XIXJ i Given that E is distributed normally this implies that B quot3 65 Nlt0 X39Xgtquotgt lt47 0 Now consider a single element of B say all NltoltX39Xgtquot B 5 gt quotNo a In 48 To create a NO 1 variable we divide the left hand side by 1 5 the appropriate element on the diagonal of X39X 1 Doing so we obtain B 3 N 0ri a lt gt 49 gt 39 N 1 a I 0110 3 distribution of 0939 We start by recalling the discussion of the distribution of s2 from equation 37 n kiz 2 x201 le 50 0 Now multiply the numerator and denominator of this expression by sii as follows I 1 I2 2 x20 0 n la 2 r Q x2 1g 0sz 51 2 s 1e 9 N 2 X 391 1 Og Given that the numerator and denominator are multiplied by the same thing the distribution does not change p39 b distribution of We start by dividing the expression in equation 51 by n k and then taking its square root as o ows n by X01 2 quot 10 2 1 l 2 52 Gal l O i aa 53 Equation 53 is the ratio of a NO 1 variable from equation 49 and the square root of a chi squared random variable divided by its degrees of freedom from equation 52 If we can show that there two variables are independent then the expression in equation 53 is distributed as a t random variable with n k degrees of freedomi Given that multiplying the numerator and denominator of 50 by the constant sii to obtain 51 and the denominator of 48 by will not affect independence we will show independence of the terms in 53 by showing independence of 48 and 50 These two equations are both functions of the same standard normal variable We can show that the are inde endent as follows First write 6 as a function of as in e uation 47 Y P o q 8 C 54 122 2 Then write as a function of as in equation 36 a 0 8 I 0 8 0 M 2 quot f x 55 0 Jr 16 E We showed that has a mean of zero and a variance of 1 in equation 45 Now consider Quadratic CI Form Theorem 4 which we repeat here for convenience Quadratic Form Theorem 4 If yNO02I M is a symmetric idempotent matrix of order n and L is a kxn matrix then Ly and y My are independently distributed if LM 0 Apply the theorem withE in the place ofy MX is the place ofM and C in the place of L If CMX O 0 then the numerator and denominator of equation 53 are independent We can show this as follows CMX X XquotX I XX XquotX XIX 1XI XIX 1XIXXIX 1XI X gtltquotX X X391X 0 56 What we have then shown is that Bi pi I9 9 Bi 1 x2X39X 12 B p 57 Hull m a y m 0 Hypotheses of the form HD 3 3m can he tested using the result 0 tn 1e 58 In independence of 9 and e under normality First use equation 43 and equation 26 to writea and e as functions ofe as follows a p X39Xgt391X39e 5 Ca a p c e MX MXX5 c MK 59 Now consider application of Theorem 8 on quadratic forms Given possible confusion with the variable y in the theorem as stated earlier and the y in our model we restate the theorem with u replacing y as follows Quadratic Form Theorem 8 Let u NO Let M be a nonrandom idempotent matrix of dimension nxn rankMr g n Let A be a nonrandom matrix such that AM O Let t1 Mu and let t2 Aui Then t1 and t2 are independent random vectors We will let MX replace M and C replace A when we apply the theoremi Now let u 108 or 8 our Clearly u N NO Now rewrite the expressions in equation 59 replacing e with Ou as follows B B C01 9 MXOu 60 Now define the new variables 21 and 22 as 1 I E MXu 1 61 2 8 B Cu 0 The theorem states that if CMX 0 then 21 and 22 are independent and so are e 021 and B 3 azzi We have shown previously that CMX O as follows C XIX 1XI MX I XX39X391X39 CMX X39XY X39U XX39X 1X39 62 XIX 1X XIX 1XXXIX 1XI XIX 1Xl XIX 1XI 0 So the estimate of 3 is independent of the error term in the regression equation IQ distribution of certain quadratic forms representing sums of squares L We will consider the statistical distribution of the following quadratic forms T 21039 39 if O J YU39J F 55E 3 y 1 92 e 39e 63 F1 HR y w Hm 3 We will be able to show that they are chi squared variables and thus useful in performing statistical tests It will be useful to write SST in terms of the deviation matrix Ani When the matrix ngtltn matrix An premultiplies any n vector y the resulting n vector is each element of y minus the mean of the y si Speci cally 1 A I J I 1 39 39 n 1 1 1 1 39 39 11 64 11 39 39 J1 J 1 1 1 2 32 J SAJ n n n J J J 1 n n Clearlythen J1 J J2 J7 WAX1 11 J J J J 1 65 Given that AB is symmetric and idempotent we can also write 5 J39AJ 211 y If 66 distribution of SSE n ks2 Given that s2 SSEn k we already know that it is a chi squared variable The demonstration is obvious given thatE is a NO 1 variable First write SSE as e39e and remember that we can write e39e U as a function ofe from equation 22 a MK MXXB 8 st Jim MX 0 67 gt 55E e39e S39MX39MXS I 8MX8 Now rewrite SSE using the residual matrix MXi Consider now the following expression and its distribution SSE e39e e MXE 2 68 02 02 0 By appropriately rearranging equation 68 we can invoke Quadratic Form Theorem 3 as before Divide each element of 8 in 68 by 0 to obtain a standard normal variable and then rewrite as follows 8 N 01 0 gt E 69 8 S N 2 2 7 3 MX xter xn la distribution of SSR Write SSR as the difference between SST and SSE T 211039 12 yJHyJ J AJ 5932 A2e e39M H y 1 J J 70 HR a gt2 Hm i B39X39AXB J ST 55E j A j39Mj Because SSR measures the sum of squares due to the inclusion of the slope coefficients 32 33 3 we need to consider the model with this fact explicitly represented JXB8 x131 X232 8 1 3 12 x13 1 x22 x23 X z X 20 3 11 p 81 2 x21 82 71 5 x k 2 Now we need to consider the properties of An in relation to this rewritten model Note that An multiplied by a column of constants will yield the zero vector because the mean of the column will equal each element of the column This is speci cally true for x1 in equation 71 1 391 39 1 1 1 Aux 391 391 391 quot10 1 1o 1 72 10 1 Anxlpl AnXZBZ A118 AIXZBZ Ans This then implies that we can obtain 32 by a regression of deviations of variables from their column means It also means that we can write the vector of deviations of each element ofy from is mean as 1 J39 12 Now construct SSR using this information AX2 n2 s 73 21 HR j AJ y MX 74 Jr AX232 As c Ms Now substitute from equation 71 for y in equation 74 HR JI AX232 A8 e Me B1 x139 BZ XZ S JLAX2B2 Ae e Me 5156194th Bi39xl AB 75 pZIXZIAXZBZ BZIXZIAB t AX2 n2 e Ae e Me The terms containing xl Awill be zero from equation 72 We can also reverse the order in terms as they are conformable given that we are computing a scalar so we have 55R 31 x1 AX2 32 31 x1 A8 BZIXZIAXZBZ BZIJQIAS e AXZIIi2 e Ae e Ms n2 X2 AX2 n2 ZBZ XZ Ae e Ae e Me 76 e Ae e Me 1f 52 0 8 A Ms Now we want to nd the distribution of the ratio e A MK 02 39 02 77 3 a We know from equation 45 thatE is a NO 1 variable If we apply Quadratic Form Theorem 3 we 0 then obtain SIR e39 i 7 E A Mx a 73 x2tr A39 MX tf AnMX i ymmetn39r and idempotent Clearly A M is symmetric given that A and M are both symmetric To check if it is idempotent write it out as 22 A MA M AA MA AMMM A 2AMM 79 A M tf AM M Then remember from equation 10 that AIMX MX 80 So We have A MA M A 2AMM A 2M M 81 A M We need only to determine the trace of A The trace of the sum of matrices is equal to the sum of the traces 11 39 n n 1 tr c4 32 1 1 1 n n 1 f 1 1 Now nd the trace ofM trM 1 trXX39XquotX39 t39rIl trX39XX X391 quot477mg order 83 trI 2 th n la Combining the information from equations 82 and 83 we obtain tr A n 1 n 15 M 84 1e 1 To summarize 23 I SSR 8 2 A M x2quotA M a 0 0 If all Ilape mq a39mtr an gm 4 distribution of SST We showed in equation 66 that SST can be written as SST 2 y j2 j39AJ 86 H As discussed earlier in the section on probability distributions the sum of two independent X2 variables is also X2 with degrees of freedom equal to the sum of the degrees of freedom of the variables in the sum If SSE and SSR are independent then SST will be distributed as a X2 variable with n k k 1 n 1 degrees of freedom The question is if SSR and SSE are independent To show independence we will use Quadratic Form Theorem 5 which we repeat here for convenience Quadratic Form Theorem 5 Let the nxl vector yNOI let A be an nxn idempotent matrix of rank m let B be an nxn idempotent matrix of rank s and suppose BA 0 Then y Ay and y39By are independently distributed X2 variables To show independence we we must show that product of the matrices in the two quadratic a a a 39 8 f M d AM S fill h ormsL39 X a an a quot Xa1s zero pec1 ca ywe ave SSE a 39 a N 2 7 3 Mx x 1 k R 37 8 8 A M 2 lel a a X a x gt So we must show that the product of the matrices MX and An MX is zero A MXMX AaMX MXMX A MX MX MX i idempotent 88 MX MX an equation 10 0 Therefore 552R g x2 and but degm af vedom 89 a2 o o II 1 131 ixk 24 L Tests for signi cance of the regression Suppose we want to test the following hypothesis H61 2 3ui k20 This hypothesis tests for the statistical signi cance of overall explanatory power iiei yt 31 81 all nonintercept coefficients 2 O The best way to test this is by using information on the sum of squares due to the regression the error and overall Recall that the total sum of squares can be partitioned as T 3 7 52 2 0 J72 Z 07 5 2 F1 F1 F1 90 55E SIR Dividing both sides of the equation by 02 yields quadratic forms which have chi square distributions as above From the section on probability distributions we know that the ratio of two chi square variables each divided by its degrees of freedom is a F random variable This result provides the basis for using 55R 131 x2k1 IIk F F 1 91 5 E fol k 12 1 n k to test the hypothesis that 32 33 3k 0 Also note that amp 55R 55R SIT R2 92 5513 SIT SIR 1SSR 1R2 J J T hence the F statian can be rewritten as R2 1 1 7112 R2 F quotF k 1 k 1 K2 151 1 K2 93 nl If the computed F statian is larger than the tabled value then we reject the hypothesis that 32 all slope coefficients is zero 25 Mi Tests of a single linear restriction on 3 1 N idea Sometimes we want to test a hypothesis regarding a linear combination of the ms in the classical linear regression model Such a hypothesis can be written 5quot Y when 539 ix 1 x k 94 For example to test that 32 33 in a model with 4 regressors X is n X 4 we might define 5 and Y as follows 63901 1 0 y0 5 Bv 1 pz 95 01 10 0 distribution of 6396 We have previously equation 40 shown that the variance of Bis given by 2 quotm 09192 091k 2 OWE 1 991 rs 39 99 96 2 quotm an at This then implies that the variance of 5398 is given by Var5 B 53902X39X16 97 So if the null hypothesis that 5396 539 is true then 5396 is distributed normally with mean 5quot and variance 02539X39X 15ie B B Na39p 026 X Xquot6 98 This then implies that 9 b 26 I I M N0 1 iaz X quot6 X 99 BB 6 3 No1 0 5 X39Xquot5 estimating the variance of 5398 The variance of 5396 is 029 02539X39X15i We can estimate this variance as Q 26 X39X 3915 100 distribution of 6396 From equation 37 we know that k 2 W x20 k 101 0 If we multiply the numerator and denominator of equation 101 by 539X39X 15we obtain 1 kJ 2 539X39X15 201 la 02 539X X 391 5 102 n 12 f quot X201 13 0613 Now consider the ratio of the statistic in equation 99 with the square root of the statistic in equation 102 divided by n k Writing this out and simplifying we obtain 5quot 5quot 0 i539X3915 n krz 539X39X 15 I k 02 539X39 15 6 3 5 B 2 566 5 9 G B 56 5393 513 w a X X 15 103 Equation 103 is ratio of a N0 1 variable from equation 99 and the square root of a chi squared 27 random variable divided by its degrees of freedom from equation 102 If the numerator and denominator in equation 103 are independent then the statistic is distributed as a t random variable with n k degrees of freedomi But we showed in part equations 60 62 that under normality Band 6 are independent Given that s2 is only a function of e the numerator and denominator must 95 pi be independent We can also show this in a manner similar to that used to show that is I 9 distributed as a t random variable in equa1ions 57 and 58 First write 513 as a function of e as follows 5 5 5 X XquotX J 5 X XquotX X e 615 6 X39XquotX e 5396 5393 6 X X391X e 104 B Ce 69 sip U 0 We can then apply Quadratic Form Theorem 8 as previously given that the denominator in equation 2 103 is just Was previouslyi Therefore 0 615 613 a39B v n 1a 5II2XIX16 Iti 105 where Y 511 Such a test involves running a regression and constructing the estimated values of 5393 and the variance of 5393 from 3 and s2X39X 1 Ni Tests of several linear restrictions on 3 1i idea Consider a set ofm linear constraints on the coefficients denoted by R13 r R ir mack ri mxi 106 To test this we need to discover how far R9 is from r To understand the intuition define a new variable d as r RBi This variable should be close to zero if the hypothesis is true Note that d is normally distributed since it is a linear function of the normal variable Its mean and variance are given as EM Er R8 0 ft Mathew ir m 107 Vard Varr RB RVarBR 02RX XquotR 28 2 A possible test statistic for testing Hg d 0 9 Consider Quadratic Form Theorem 7 which is as follows Quadratic Form Theorem 7 Ify is a nx1 random variable and yNpE then y 0392 1y 10 X201 A possible test statistic is d 0 Vard391d 0 d 02RX X 1R391 1d d39 R X39X 1R39 4d 108 z lt 2 1 WW 0 The problem with this is that 02 is not known A more useful test statistic Consider the following test statistic r RB RX39XquotR391391r RBgt r R RltX39XgtquotR39 1391 r KB 711 m 109 55E I2 nk We can show that it is distributed as an F by showing that the numerator and denominator are independent chi square variables First consider the numerator We will show that we can write it as e Qe where Q is symmetric and idempotent First writer Ra in the following useful manner Ifr M rim r R R R Rm 3 110 Now write Ra as a function ofe B X39Xr X39J l5 X39XPX a p X39XquotX39e g p B X1X1X8 w a RltX39gtoquotx39e XXV X39iXB a 39s 111 Then write out the numerator of equation 109 as follows quot R RX XquotR39quotr R6 339XX39X R39 RW39A Q R3911RX39X1X39E 23992 Q XXIo1RIRXIo1Rl1RXIX1XI 112 Notice by inspection that the matrix Q is symmetric It is idempotent as can be seen by writing it out as follows 29 Q XXIX1RIRXIX1RIlRXIX1XI QQ XXIX 1RIRXIX 1RI 1RXIQ IXI XXI 1RIRXI0 1RI 1RXIX 1XI 113 XXllRlKXIPQIKIIRXIDQIRIRXIDQIRIIROIQ1XI XltX39XgtquotR39IRX3959quotR391quotRor39xquotX39 Now nd the trace on m9 XX39X71R39RX39X71R39171KX39X71X39 frlRX39XquotR39quotRX39XquotX39XO39XJquotR39 11 4 trRXIIRI 1RXlo 1Rl trI m Now remember from equation 45 that t 1 z N0 UI N0I 115 0 02 We then have t 39 c e e Q 92 0 116 x20 from Quadratic Form Theorem 3 Now consider the denominator We can show that it distributed as X2n k using Quadratic Form Theorem 3 as follows SSE 9 nk nk 5 s Mxe ME e MXc I 117 a 8 M a X s M 8 02X 201 12 The last step follows because MX is symmetric and idempotent Independence follows from Quadratic Form Theorem 5 because MXQ 01 MX I XX XquotX g XX39XR39 RltX39XquotR391quotRltX39XquotX39 MXQ I XX39X391X39XX39X391R RX X391R 391RX X391X XX XquotR RX XquotR quotRX XquotX XX XquotX XX XquotR RX XquotR quotRX XquotX XX XquotR RX XquotR 391RX XquotX XX XquotR RX XquotR quotRX XquotX 0 118 Or we can simply remember that MX X 0 from previously and note that the leading term in Q is X The test statistic is then distributed as an F with m and n k degrees of freedomi We reject the null hypothesis that the set of linear restrictions holds if the computed value of the statistic is larger than the tabled value A random variable distributed as an F1 n k is the square of a random variable distributed as a tn k so when there is a single linear linear restriction on Bltm 1 a t test based on 105 and an F test based on equation 109 give the same result A3 5 and 3 are all unbiased consistent minimum variance of all unbiased estimators normally distributed marry 02X39X 1 can be shown to be the Cramer Rao matrix that is minimum variance 2 CHARACTERISTIC ROOTS AND VECTORS 15 Complex conjugate of a complex number For each complex number 2 x iy the number 2 x iy is called the complex conjugate of z The product of a complex number and its conjugate is a real number In particular if z x iy then 22 z yz 7y 12 y2 71y yr 12 y2 0 12 y2 9 Sometimes we will use the notation 2 to represent the complex conjugate of a complex number So 2 x iy We can then write 22 zyz 7y 12 y2 izy yr I2 y20 12 y2 10 16 Graphical representation of a complex number Consider representing a complex number in a two dimensional graph with the vertical axis representing the imaginary part In this framework the modulus of the complex number is the distance from the origin to the point This is seen clearly in figure 1 FIGURE 1 Graphical Representation of Complex Number 2 E g xSqux1quot2x2quot2 g XX1IX2 39 I x g x39 I a xquot xx X2 I 1 39 f I I I x g 0 x 1 Real aXIs 17 Polar form of a complex number We can represent a complex number by its angle and dis tance from the origin Consider a complex number 21 x1 i yl Now consider the angle 61 which the ray from the origin to the point 21 makes with the x axis Let the modulus of 2 be denoted by r1 Then cos 61 zlrl and sin 61 ylrl This then implies that 21 11 iy1 7 1 cos 61 iT1sin61 11 7 1 cos 91 2 Sin 91 Figure 2 shows how a complex number is represented in polar corrdinates 18 Complex Exponentials The exponential e is a real number We want to define e2 when 2 is a complex number in such a way that the principle properties of the real exponential function will be preserved The main properties of e for x real are the law of exponents ewe 2 651 and the equation e 1 If we want the law of exponents to hold for complex numbers then it must be that CHARACTERISTIC ROOTS AND VECTORS FIGURE 2 Graphical Representation of Complex Number 2 21 r1cos 1 i sin 51 E E E n a 339 quot x I a I r i a I Y1 r x I x 51 I x I 0 x 1 Real aXIs ez away 7 ex ezy 12 We already know the meaning of e We therefore need to define what we mean by eiy Specifi cally we define eiy in equation 13 De nition 1 I Ely cos yi siny With this in mind we can then define e2 exiy as follows De nition 2 ez ex Ely ecos yi sin y Obviously if x 0 so that z is a pure imaginary number this yields eiy cos y i sin y It is easy to show that e0 1 lf 2 is real then y 0 Equation 16 then becomes ez ex eio ex cos 0 i sin exeio 610 0 ex So e0 obviously is equal to 1 13 14 15 16 To show that exeiy e W or ellez2 e 22 we will need to remember some trigonometric formulas Theorem 1 Sinlt 9 sin 45 cos 9 cos 45 sin 9 51114 7 9 sin 45 cos 9 7 cos 45 sin 9 00545 9 cos 45 cos 9 7 sin 45 sin 9 005W i 9 cos 45 cos 9 sin 45 sin 9 AA Restricted Least Squares Hypothesis Testing and Prediction in the Classical Linear Regression Model Introduction and assumptions The classical linear regression model can be written as J X9 8 1 or j xt39B at t1n 2 Where X1 is the tth row of the matrix X or simply as jt xt 8 t1n 3 Where it is implicit that x1 is a row vector containing the regressors for the tth time period The classical assumptions on the model can be summarized as I j X3 8 II EeX 0 III Eee X 021 IV X it a nomtorbaxtir matrix of mule k V 3 Max021 4 Assumption V as written implies II and 111 With normally distributed disturbances the joint density and therefore likelihood function ofy is UXB E391UXB Lo 112 ozI 1 211 I2 I5 U XB U XB a 203 s l ZnyZ IOZI I 2 1 U Xp U XB a 202 1 210m 02 2 The natural log of the likelihood function is given by alnL 0XB397XBIIZTE1U 2 2a2 6 J J39ZBXJ p XXB 1 ham 111102 202 2 2 Maximum likelihood estimators are obtained by setting the derivatives of 6 equal to zero and solving the resulting k1 equations for the k 3 s and 0 i These rst order conditions for the ML estimators are 80 1 r a 2Xy 2XXB 0 6B 262 7 all 1 1 2 q l n 1 0 602 262 0 XBW XB 202 Solving we obtain 6 X XY X J 62 lUX WXB 8 22 g r r1 n The ordinary least squares estimator is obtained be minimizing the sum of squared errors which is de ned by 55136 2 20 xt39 f t1 r1 1 e 2 1 2 5 9 e39e y XBgt39 y XBgt J39J B39X39JU39XB B39X39XB J39J ZB39X39J39B39X39XB The necessary condition for S39J Eda to be a minimum is that 655EB 0 6639 10 2X39J 2X X o This gives the normal equations which can then be solved to obtain the least squares estimator I I X X9 X 11 B 0 quot X39J39 The maximum likelihood estimator of B is the same as the least squares estimatori Bi Restricted least squares 1 Linear restrictions on 3 Consider a set ofm linear constraints on the coefficients denoted by R rRzlrmxkrirmx1 12 Restricted least squares estimation or restricted maximum likelihood estimation consists of minimizing the objective function in 9 or maximizing the objective function in 6 subject to the constraint in 12 2 Constrained maximum likelihood estimates Given that there is no constraint on 02 we can differentiate equation 6 with respect to 02 to get an estimator of 02 as a function of the restricted estimator of Doing so we obtain 1 1 2 joXB 39UXB 0 13 62 lu Xm39 yXB n where 3 is the constrained maximum likelihood estimator Now substitute this estimator for 02 back into the log likelihood equation 6 and simplify to obtain 1 lnz lln02 2 2 202 lnL yXWWJ39 XB Emu 11x1 2 o XW y Xm 2 1 F F goX 0 X15 j 14 1 11 v1 21m zmj ox wx 2 Note that the concentrated likelihood function as opposed to the concentrated log likelihood function is given y In L 1 1n2n 11n lO XB y XB 1 2 2 2 15 1 1 1 1 L 2 2 U XB39390 XB39 2 2 n The maximization problem defining the restricted estimator can then be stated as mm 1 111211 1111 10 chygy Xp 1 p 2 2 n 2 16 Li R r Clearly we maximize this likelihood function by minimizing the sum of squared errors y X y To maximize this subject to the constraint we form the Lagrangian function where A39 is an m X 1 vector of Lagrangian multipliers a2 jy 2pquotx39jpquotx39xp39 A39r RB 17 Differentiation with respect to 3 and A yields the conditions 2X39 ZX XB R 0 18 R r 0 Now multiply the first equation in 18 by ROCK 1 to obtain 2RX X391X j 2RD RX X391R o 19 Now solve this equation for A substituting 9 X39X71Xy as appropriate 2RX XquotX ZRB RX XquotR A 0 gt RX39XquotR39A 2RX X391X ZRB ZRB ZRB 20 A RX39XquotR39391 RB 2R6 2RX39XquotR39 quot1r KB The last step follows because Kg r Now substitute this back into the first equation in 18 to obtain 2X j ZX XB R A 0 a 2X 2X X R 2RX39X391R39quotr RM o X XB X R39RX39XquotR39391r RB 21 p ltX39gtoquotX39y ltX39gtlt391R39RltX39gtoquotR39quotr RB B X X R RX XquotR quotr RB With normally distributed errors in the model the maximum likelihood and least squares estimates of the constrained model are the same We can rearrange 21 in the following useful fashion a B ltX39X391R39RX39XquotR39r R8 B B X39mquotR39RltX39XquotR39391r RB 22 X XB B R39RltX39Xgt391R39quotr RB Now multiply both sides of 22 by W B r RB RX39X1R3971RX39X1 to obtain 3 B X X B r RG39RltX39XgtquotR39quotRltX39Xquot R39RX39XquotR39quot r RB 23 r RB39RltX39XgtR39quotr RB We can rearrange equation 21 in another useful fashion by multiplying both sides by X and then subtracting both sides from y Doing so we obtain XB XB XltX39Xgt391R39RX39Xgt391R39quot r RB w XB 1 XB XX X391R39RX XgtquotR39391r RB 24 u e XX39XquotR39RX39XquotR39quotr RB where u is the estimated residual from the constrained regressioni Consider also u39u which is the sum of squared errors from the constrained regression 39u quot quotRM RX3991R391RX39Q1X3939 XX39Q1R39RX39301R39139 RM n e XX39X391R RX XquotR 1r R r R RX XquotR quotRX39XquotX r RM39 RX39X391R 391RX XquotX XX39XquotR RX XquotR 391r RB 25 n e XX39X391R RX XquotR 1r R r R RX XquotR quotRX39XquotX r RB RX39X391R39391r R where e is the estimated residual vector from the unconstrained model Now remember that in ordinary least squares X39e O as can be seen by rewriting equation 10 as follows 2X3 2X XB 0 26 X y XBX 0 Using this information in equation 25 we obtain u u I39s a XX39X391R39RX39X391R39391r RB r RB RX X391R39391RX39X391X39a r RB39RX39XquotR39r RB r R 39RltX39XgtquotR39quotr R9 u u at r RB RX XquotR 1r RB 27 Thus the difference in the sum of squared errors in the constrained and unconstrained models can be written as a quadratic form in the difference betweenR and r where B X 39X1X 39j is the unconstrained ordinary least squares estimate Equation 21 can be rearranged in yet another fashion that will be useful in nding the variance of the constrained estimatori First write the ordinary least square estimator as a function of 3 and 8 as follows 9 X39JQ 1X39J39 X39XquotX39Xp e 28 B X39XquotX39e Then substitute this expression for Sin equation 21 as follows W B X39XquotR39RltX39XquotR39391 r R6 5 X39Xr X39e ltX39XgtquotR39RltX39XquotR39r Rl X39gtltquotX39e p X X391X s X XquotR39RX39X391R39quotr RB RX39X391X39z 29 gt 3 p X39XquotX39e X39XquotR39RX39X391R39quot RX XquotX e r Rp t p Xlxyixie XIQ 1RIRXIX 1RI1 RXX71X Now define the matrix Mc as follows M I X XquotR RX XquotR quotR 30 We can then write 3 3 as p p X39XquotX39e X XquotR RX X391R 391RX X391X e I X39XgtquotR39RltX39XquotR39quotRX39Xr X39e M X XquotX e M9 3 Statistical properties of the restricted least squares estimates 31 a expected value of it w a x39m R39RltX39XJ R39r RB Edi E43 EX39XgtquotR39RltX39XquotR39quotr RM p X3920391R39RX39XgtquotR39quot r MW 32 a ltX3920391R39RltX39X R39quotr Rm B g tlJe mmtmint i true bi variance of 3c VmB EM B W M E EM X XquotX 88 XX XquotMquot M39X39XquotX39Eee39XX39XquotMquot M X X391X OZIXX X391M 02 M X XquotX XX XquotM 02 M39X XquotM 33 The matrix M is not symmetric but it is idempotent as can be seen by multiplying it by itselfi M M I X39XquotR39RX39XquotR39quotRI X39XquotR39RX39XquotR39quotR I X39Jg391R39RX39XquotR39quotR X39m391R39RX39X391R39quotR X39XquotR39RX39Xquot R 391 R X39X391R39RX39X391R39 391 R 34 I 2X XquotR RX XquotR 391R X XquotR RX XquotR 391R I X39Jg391R39RX39XquotR39quotR Now consider the ex ression for the variance of C39 We can write it out and sim if to obtain P P Y Varw39 02 M X XquotM 0le X39Jo 1R39RltX39XquotR39391RlltX39XJquotI R39RX39XgtquotR39391RX39X 1l 02ltX39X391l 2X39X391R39RltX39XquotR39391RX39Xgtl 2X39X391R39RltX39XquotR39391RX39Xgtquotl 2X39X391R39RltX39XJquotR39391RX39XgtquotR39RX39Xgt391R39391RX39Xquotl azlltX39X391 2X39XquotR39RX39XquotR39RX39Xgt391 X39XPR39RltX39X391R39391RX39X391l azlltX39X391 X39XquotR39RX39XquotR39RX39Xgt391l 02I R391iX39X391K39RX39X391X39X391 02M X Xquot 35 We can also write this is another useful form Varw ozkx39xr ltX39XquotR39RX39XgtquotR39quotRX39Xquotl 36 02 XIX 1 02XIX 1RIRXX 1R 1RXX i The variance of the restricted least squares estimator is thus the variance of the ordinary least squares estimator minus a positive semi definite matrix implying that the restricted least squares estimator has a lower variance that the OLS estimatori 4 Testing the restrictions on the model using estimated residuals We showed previously equation 109 in the section on statistical inference that r RB RX39X 1R 1rRl3 f Fr1 n k 37 e t k Consider the numerator in equation 37 It can be written in terms of the residuals from the restricted and unrestricted models using equation u u 3 r RQIRXIX1RI139 Rn gt 3914 e39e rRB 1I7111 r u h 38 7 7 Fm nk e e Izl Denoting the sum of squared residuals from a particular model by SSE3 we obtain HEW 55136 III 55m n 15 39 F021 n la Rather than performing the hypothesis test by inverting the matrix RX X 1R and then pre and post multiplying by r R9 we simply run two different regressions and compute the F statistic from the constrained and unconstrained residuals The form of the test statistic in equation 39 is referred to as a test based on the change in the objective function A number of tests fall in this category The idea is to compare the sum of squares in the constrained and unconstrained models If the restriction causes SSE to be signi cantly larger than otherwise this is evidence that the data to not satisfy the restriction and we reject the hypothesis that the restriction holds The general procedure for such tests is to run two regressions as follows 1 Estimate the regression model without imposing any constraints on the vector Let the associated sum of squared errors SSE and degrees of freedom be denoted by SSE and n k respective y 2 Estimate the same regression model where the 3 is constrained as specified by the hypothesis Let the associated sum of squared errors SSE and degrees of freedom be denoted by SSE3 and n k respectively 3 Perform test using the following statistic 5mm mam 40 5558 Hm I k n k where m n k n k is the number of independent restrictions imposed on 3 by the hypothesis For example if the hypothesis was Ho 2 3 4 4 0 then the numerator degrees of freedom is equal to 2 If the hypothesis is valid then SSE3 and SSE should not be significantly different from each other Thus we reject the constraint if the F value is large Two useful references on this type of test are Chow and Fisher 5 Testing the restrictions on the model using a likelihood ratio LR test 2 idea and definition The likelihood ratio LR test is a common method of statistical inference in classical statistics The LR test statistic re ects the compatibility between a sample of data and the null hypothesis through a comparison of the constrained and unconstrained likelihood functions It is based on determining whether there as been a significant reduction in the value of the likelihood or log likelihood value as a result of imposing the 1 l 1 i i on the l e in the estimation process Formally let the random sample X1 X2 i i X1 have the joint probability density function fltX1 X2 i i X e and the associated likelihood function Le X1 X2 i i X The generalized likelihood atio GLR is defined as 2 Fquot media Le x1x2 Xn A x1 X2 Lex1 XZ XI39 41 c a 9 BEHDUH39 where sup denotes the supremum of the function over the set of parameters satisfying the null hypothesis Hg or the set of parameters that would satisfy either the null or alternative hypothesisl A generalized likelihood ratio test for testing HD against the alternative H1 is given by the following test rule Reject Ho if and only if 1x1x2xn S c 42 where c is the critical value from a yet to be determined distributionl For an x test the constant c is chosen to satisfy upeeHDMB mpeeHOPlxlx2 xl g 6 a 43 where 116 is the power function of the statistical test We use the term generalized likelihood ratio as compared to likelihood ratio to indicate that the two likelihood functions in the ratio are optimized with respect to 0 over the two different domains The literature often refers to this as a likelihood ratio test without the modifier generalized and we will often follow that conventionl likelihood ratio test for the classical normal linear regression model Consider the null hypothesis in the classical normal linear regression model R r The likelihood function evaluated at the restricted least squares estimates from equation 15 is 1 1 1 2 2 lty X gt39 y Xm e L 2107 44 In an analogous manner we can write the likelihood function evaluated at the OLS estimates as L 2107 45 u X y warm The generalized likelihood ratio statistic is then 210 g lo XW y Xm x 1 u X r o xml39ie39i y Xm39 yXB 7 y XW y Xt3lE yXB 0 2433 6 y XBgt39 y XB Mxvxp 2107 46 We reject the null hypothesis for small values of or small values of the right hand side of equation 46 The dif culty is that we not know the distribution of the right hand side of equation 46 Note that we can write it in terms of estimated residuals as A OXD yXD U X Y U XB quot39 J g 47 e39e e l X6 14 y XB39 This can then be written as A I 1 939 43 m1305 3 man So we reject the null hypothesis that Kg 2 r if A mam 2 m39 49 a Aquot 55543 2 Iquot 55MB Now subtract SJES 1 from both sides of equation 49 and simplify E 2 5 3 A7 55159 2 f n 5556 3 I 3 3 A mad was ma 2 59 50 m4 ssEltBgt 5mm ma 1 5mm ma 2 f 1 55mm Now multiply by n k to obtain E A 1 mam55ml 2 c 1 35186 51 E p 3 341 1 5559 55El3 24 1 m 1 ma m We reject the null hypothesis if the value in equation 51 is greater than some arbitrary value k 3 b n t quot 1 i The question is then finding the distribution of the test statistic in equation 51 We can also write the test statistic as 2 JSEGV 15436 quot 41771 52 m ma n h We have already shown that the numerator in equation 52 is the same as the numerator in equations 37 39 Therefore this statistic is equivalent to the those statistics and is distributed as an F Speci cally 2 may 5556 quot4417 1 Fmn e 53 1 Mai n k Therefore the likelihood ratio test and the F test for a set of linear restrictions R r in the classical normal linear regression model are equivalent ci asymptotic distribution of the likelihood ratio statistic We show in section on non linear estimation that mpeeHo Lex1 xz x z 5 2 54 Lam xz mm 20 2 x0 21nlxxwxl 21n 1 2 PeeHouH where Q is the value of the log likelihood function for the model estimated subject to the null hypothesis 0 is the value of the log likelihood function for the unconstrained model and there are m restrictions on the parameters in the form R6 r 6 Some examples a two linear constraints Consider the unconstrained model Jr p1 32xe paxu 64x 8t 55 with the usual assumptionsi Consider the null hypothesis HQ 32 33 O which consists of two restrictions on the coefficients We can test this hypothesis by running two regressions and then forming the test statistic 1 Estimate Jr p1 32xe paxu 64x 8t 56 and obtain SSE n 4s2 where s2 is the estimated variance from the unrestricted model 2 Estimate 1191 B4xuex 57 and obtain SSE n 2s2 where s2 is the estimated variance from the unrestricted mo el 3 Construct the test statistic HEW Ed 535m 5mm Ixh tik 2 Fmne 58 HEW 5358 5553 quotT Fquot equality of coefficients in two separate regressions possibly different time periods We sometimes want to test the equality of the full set of regression coefficients in two regressions Consider the two models below jl X131 81 111 observations 59 jz X zpz 82 112 observations We may want to test the hypothesis HQ 31 32 where there are k coef cients and thus k restrictions Rewrite the model in stacked form as 1 1 1 1 X 0 e j J 5 60 12 0 X2 62 82 and estimate as usual to obtain SSE no restriction Note that n k n1 n2 2k Then impose the restriction hypothesis that 3 32 by writing the model in equation 60 as 1 1 1 X a J J p 61 2 X2 82 Estimate equation 61 using least squares to obtain the constrained sum of squared errors SSE with degrees of freedom n k n1 n2 k1 Then construct the test statistics 55EB 55EB 5559 ssEB n k n le k Fk n1 H12 25 5415 13 quot1551ng 62 HEW 4555 5559 n1n22e T C Fommmng 1 mkodumon L2 1 I mP 39 39 63 dmote m2 0mm relanonshxp betwem m2 mnable y md m2 vector ofvm bles 3 7 3 AK A r Mum i 1 0 D 64 The forecast an 5 grim by 9 1 7 5 Thae re at 1215 four farmxs whuzh conmburz to forecast 210 1 Incorrect funmoxnl fozm Incorrect Functional Form Jr zxt population regression line b m2 sustains 0mg mndom dunxbvmce 2 Even 1f m2 apptopxnte value om was known mm Emmy md m2 pmmmznas s was known me forecast not would mth 1amp0 because emu pregame of The forecast an 5 m by JG W 1 ZUFW 1 170 F gammy Vm39uaoi 66 VI a Con dmcemtavals for y would be obmmad om mm F qu lt v lt Hm ma 1 m We an 522 um gnpmmny 5 Con dence Intervals for Regression Line IBM52X c untammty about Canada 1 m 0an obsa mnons not mdudad In due ongm mmple of n obsemuons Let x dmote me n obsa mnons on me regressoxs md y me obsavnuons on y The xdewntmodel for am out ofsample forecast JX u as whaeEGn 7 0 md mung 15 md taxsmdzpmdmt of Sum 11 mplepenod Now consider a forecast for yD given by j X B 69 where G is the OLS estimator obtained using the initial n observations Then let vD be the set of forecast errors defined by 1 0 10 X0B 10 X0XIX1XIJ 70 We can show that vD has an expectation of zero as follows 51 E00 X06 Xop Xop 0 71 Now consider the variance of the forecast error We can derive it as Em39 E00 X0 B 0 XOB EX0B co XOB X06 co XOB EXol3 B co X0B B co EltX0B B so X0B B so EltX0B B so ltB B Xo 8039 EX0B BB B39X39 20B B39X39 X0B Be39 202039 72 Now note that EU andB B are independent by the independence of SD and 8 and the independence of 8 and B B which we proved earlier in the section on statistical inference equations 59 62 As a result the middle two terms in 72 will have an expectation of zero We then obtain 5W0 EX0B BltB B39X39 eoltB B Xo X043 Wo39 085 XOEB BB B Xo E3086 73 X002X39X391X039 021 Baum1x0 1 This indicates the sampling variability is composed of two parts that due to the equation error SD and that due to the error in estimating the unknown parameters We should note that jo XOB can be viewed as an estimator of Ey and as a predictor of yDi In other words J70 XOB is the best predictor we have of the regression line and of an individual y i The least squares estimator of Elty ionB which has expected value Xo i Now consider the covariance matrix for the random vector X09 X013 To simplify the derivation write it as X06 X06 XOB 5 Then we can compute this covariance as EXDltB M m39Xo39 X0EB BXB pgt39Xo39 XanX39X391Xo39 74 02X0X XquotXo39 This is less than the covariance in equation 73 by the variance of the equation error 021 Now consider the case where nD l and we are forecasting a given yD for a given X0 where X0 is a row vectori Then the predicted value of yD for a given value of XD is given by J xo 9 75 The prediction error is given by prediction enter v0 y0 yo 76 The variance of the prediction error is given by Var 0 jo 02x039X X391x0 1 77 The variance of EltyD lXD is Vaon XOB a2 x039X X391x0 73 Based on these variances we can consider confidence intervals for yD and EltyD lXD where we estimate 02 with s27 The confidence interval for EltyD l X is Ptx039B tm2 s lt yo lt xt39B tm 5 l oc 79 where I is the square root of the variance in equation 78 The confidence interval for yD is given 0 by P X0 B tm2 sf lt Yo lt Kt B t an2 Sfc 1 80 where r is the square root of the variance in equation 77 Graphically the confidence bounds in predicting an individual yD are wider than in predicting the expected value of y Confidence Intervals for Prediction CI for y zsn 39 2quotquot l z x0 15 CI for Ey x b 2n an in an 1m 1m uncammty aboutX 1n mmy gunman 11 value ofthemdzpmdmtvmable also needs to be A p001 for y 2 pmdmm mm Then useLhe xanmnmg obsa muons is 1 cm ofthe model 1 Computz 9 using obsavnuons 1 w n 2 Compurzj xp usinthe obsa mnons on hex s om n to n 3 Compuethe pmdittad values ofy w h dmmm ones from me mple for obsavmons n w n 20 Literature Cited Chow Gr Ci quotTests of Equality Between Subsets of Coef cients in Two Linear Regressionsquot Econometrica 281960 591 605i Fisher Fr Mi quotTests of Equality Between Sets of Coef cients in Two Linear Regressions An Expository Notequot Econometrica 381970 361 66 21 That equation 38 is distributed as an F random variable can also be seen by remembering that an F is the ratio of two chi square random variable each divided by its degrees of freedomi We have previously shown equation 177 in the section on Statistical Inference that JSEBis distributed as a X2n k random variable Note that the vector r R is distributed normally with a mean of zero Its variance is given y Varr R9 RmeR39 R02X 391R 81 02RXIX 1Rl GEOMETRY OF MATRICES 1 SPACES OF VECTORS 11 De nition of R The space R consists of all column vectors with n components The com ponents are real numbers 12 Representation of Vectors in R 121 R2 The space R2 is represented by the usual x1 x plane The two components of the vector give the x1 and x coordinates of a point and the vector is a directed line segment that goes out from 00 FIGURE 1 Vector in R2 53 05 X1 122 R3 We can think of this geometrically as the x1 and x axes being laid out on a table top with the X3 axis being perpendicular to this horizontal surface as in figure 2 13 Properties of a Vector Space Date August 20 2004 7 mt AVA V VA quot397 1quot 4 b F X2 2 2A 9 GEOMETRY OF MATRICES FIGURE 2 Vector in R3 f 3 131 Subspaces A subspace of a vector space is a set of vectors including 0 that satisfies two requirements If a and b are vectors in the subspace and c is any scalar then 1 a b is in the subspace 2 ca is in the subspace GEOMEmy OF MATRICES 3 132 Geometric representation DfSCIZZIZT multiplication A scalar multiple of a vector a is another vec tor say u whose coordinates are the scalar multiples of a s coordinates u ca Consider the example below 2 a 5 c 3 uicai 6 7 71539 Geometrically a scalar multiple of a vector a is39 a segment of the line that passes through 0 and a and continues forever in both directions For example if we multiply the vector 53 by 15 we obtain 75 45 as shownin figure 3 1 FIGURE Scalar Multiple of Vector If we multiply a vector by a scalar with a negative sign the direction changes as in figure 4 133 Geometric fepreseiitatiaii of addition of veetar s The sum of two vectors and b is a third vector whose coordinates are the sums of39the corresponding coordinates of39a and b Consider the example below 39 4 GEOMETRY OF MATRICES FIGURE 4 Negative Scalar Multiple of Vector X2 53 3 7 lt 2 1 X1 2 1 1 2 3 4 5 o 1 7 53 1 Geometrically vector addition can be represented by the parallelogram law which is that the sum vector ab corresponds to the directed line segment along the diagonal of the parallelogram having a and b as sides Another way to say this is to move a copy of the vector a parallel to a so that its tail rests at the head of vector b and let the geometrical vector connecting the origin and the head of this shifted a be the vector ab Yet another way to think of this is to move in the direction and distance defined by a from the tip ofb Consider figure 5 where altgt ltgt cabltgt The copy of a extends from the tip of b to the point 76 The parallelogram is formed by ex tending a copy of b from the tip of a which also ends at the point 76 The line segment from the origin to this point which is the diagonal of the parallelogram is the vector c 134 Subspaces A subspace of a vector space is a set of vectors including 0 that satisfies two requirements If a and b are vectors in the subspace and c is any scalar then 1 a b is in the subspace 2 ca is in the subspace 135 Subspaces and linear combinations A subspace containing the vectors a and b must contain all linear combinations of a and b GEOMETRY OF MATRICES 5 FIGURE 5 Vector Addition X2 6 7 63 5 T I d III 7 vquot I 4 xquot I Copy of a E c III 3 2 3 a 533 2 b 1 X1 1 2 3 4 5 6 7 136 Column space ofu matrix The column space of the m X n matrix A consists of all linear com binations of the columns of A The combinations can be written as Ax The column space of A is a subspace of Rm because each of the columns of A has In rows The system of equations Ax b is solvable if and only if b is in the column space of A What this means is that the system is solvable if there is some way to write b as a linear combination of the columns of A Consider the following example where A is a 3x2 matrix x is a 2x1 vector and b is a 3x1 vector 10 b A 4 3 zlt11gt b 4 2 3 3 b3 The matrix A has two columns Linear combinations of these columns will lie on a plane in R3 Any vectors b which lie on this plane can be written as linear combinations of the columns of A Other vectors in R3 cannot be written in this fashion For example consider the vector b 1108 This can be written as follows 6 GEOMETRY OF MATRICES where the vector x 1 2 The vector b 3159 can also be written as a linear combination of the columns of A Specifi cally where x 31 To find the coefficients x that allow us to write a vector b as a linear combination of the columns of A we can perform row reduction on the augmented matrix A b For this example we obtain 1 0 3 A 4 3 15 7 2 3 9 Multiply the first row by 4 to yield 4 0 12 and subtract it from the second row 4 3 15 4 0 12 0 3 3 This will give a new matrix on which to operate 103 A1033 239 Multiply the first row by 2 2 0 6 and subtract from the third row 2 3 9 2 0 6 0 3 3 This will give Now divide the second row by 3 Now multiply the second row by 3 and subtract from the third row This will give GEOMETRY OF MATRICES 7 103 42011 10 033 033 0 33 0 33 0 00 N 103 43011 11 000 At this point we have an identity matrix in the upper left and know that x1 3 and x 1 The third equation implies that 0x1 0X2 0 So the vector b combination of the columns of A 3 15 9 can be written as linear Now consider the vector b 2 10 10 Write out the augmented matrix system and perform row operations Multiply the first row by 4 and subtract from the second row This will give Multiply the first row by 2 1 0 2 A 4 3 10 12 2 3 10 4 0 3 4 3 10 4 0 3 0 3 2 N 1 0 2 A1 0 3 2 13 2 3 10 GEOMETRY OF MATRICES 1 Consider the matrix A Multiply the first row by 3 3 6 and subtract it from the second row 3 6 3 6 0 0 This will give the new matrix 1 2 A2 0 0 This implies that x can be anything The convention is to set it equal to one If x 1 then we have 11 21 0 11 7 2 ltgt lt2gt lt2gt The null space of the matrix A is the vector 712 All vectors of this form where the first elemement is 2 times the second element will make the equation Ax 0 true 2 Consider the matrix A Multiply the first row by 3 3 6 and subtract it from the second row 3 6 3 4 0 2 This will give the new matrix 1 2 A2 0 2 This implies that x must be zero If x 0 then we have x1 20 0 11 0 W i lt0gt i lt3 12 GEOMETRY OF MATRICES The null space of the matrix A is the vector The only vector x that will make the equation Ax 0 is x 0 3 Consider the matrix 1 2 3 4 A 2 4 8 10 3 6 ll 14 Multiply the first row by 2 2 4 6 8 and subtract it from the second row Now multiply the first row by 3 3 6 9 12 and subtract it from the third row 0 0 2 2 0 0 2 2 CEOMETRY OF MATRICES 13 If we write the system using A4 we obtain 11 12341 0 0022 20 0000 13 0 4 The variables X2 and X4 can be any values The convention is to set them equal to zero and one First write the system with X2 1 and and X4 0 This will give 11 1234 1 0 0022 0 0000 if 0 This implies that X3 must be zero If X3 0 then we have 1 2 11 7 2 72 The vector 1 1 is in the nullspace of the matrix A as can be seen by writing out 0 the system as follows 1 2 3 4 0 72 2 l 4 0 8 0 10 0 3 6 11 14 0 Now write the system with X2 0 and and X4 1 This will give 11 1234 0 0 0022 0 0000 If 0 This implies that 2X3 2 must be zero This implies that X3 1 Making this substitution we have A OOH com omw owl V HLOE l A 000 V GEOMETRY OF MATRICES 7 1 is in the nullspace of the matrix A as can be seen by writing out the E 0 71 1 system as follows 1 2 3 4 1 2 0 4 1 8 1 10gt 3 6 11 14 We can then write a general x as follows The vector 72 71 7212 i 14 X 7 1 0 7 12 I 0 I4 71 714 0 1 I4 4 Consider the matrix 1 2 3 1 2 4 6 10 A 3 6 9 11 71 72 73 7 Multiply the first row by 2 2 4 6 2 and subtract it from the second row This will give the new matrix 1 2 3 1 0 0 0 8 A2 3 6 9 11 71 72 73 7 Now multiply the first row by 3 3 6 9 3 and subtract it from the third row This will give the new matrix GEOMETRY OF MATRICES 1 2 3 1 0 0 0 8 A3 0 0 0 8 71 72 73 7 1 8 8 8 This will give the new matrix 123 000 A5000 OOOH 0008 Now multiply the second row negative one and subtract it from the fourth row This will give the new matrix A5 OOOH DOOM OOOCAD OOOOH If we write the system using 1215 we obtain GEOMETRY OF MATRICES 1 2 3 1 11 0 0 0 0 8 12 0 0 0 0 0 13 0 0 0 0 0 I4 0 gt11212313I4 0 814 0 This implies that X4 0 The system has three vartiables and only one equation Assume that x and X3 are free variables Set x 1 and X3 0 to obtain 11 212 0 0 11 2 0 11 7 2 72 So the vector 5 is in the nullspace of the matrix A as can be seen by writing out the 0 system as follows 1 2 3 1 0 2 4 6 10 0 72 3 1 6 0 9 0 11 0 71 72 73 7 0 Now set x 0 and X3 1 to obtain 11212313 0 7 1 3 0 11 7 3 73 So the vector i is in the nullspace of the matrix A as can be seen by writing out the 0 system as follows 1 2 3 1 0 2 4 6 10 0 3 3 0 6 1 9 0 11 0 71 72 73 7 0 We can then write a general x as follows 72 73 7212 7 313 x 7 1 0 12 I 0 13 1 13 0 0 0 14 Basis vectors GEOMETRY OF MATRICES 19 144 Linear independence A set of vectors is linearly independent if and only if the only solution to a1a1aga2akak0 40 a1a2ak0 41 We can also write this in matrix form The columns of the matrix A are independent if the only solution to the equation Ax 0 is x 145 Spanning vectors The set of all linear combinations of a set of vectors is the vector space spanned by those vectors By this we mean all vectors in this space can be written as a linear combination of this particular set of vectors Consider for example the following vectors in R3 1 0 1 a1 0 a2 1 a3 2 42 0 0 0 The vector as can be written as a1 2212 But vectors in R3 having a nonzero third element cannot be written as a combination of a1 and a2 These three vectors do not form a basis for R3 They span the space that is made up of the floor of R3 Similarly the two vectors from equation 4 make up a basis for a space defined as the plane passing through those two vectors 146 Rowspace ofa matrix The rowspace of the mgtlt n matrix A consists of all linear combinations of the rows of A The combinations can be written as x A or A x depending on whether one considers the resulting vectors to be rows or columns The row space of A is a subspace of R because each of the rows of A has n columns The row space of a matrix is the subspace of R spanned by the rows 147 Linear independence anal the basisfor a vector space A basis for a vector space with k dimensions is any set of k linearly independent vectors in that space 148 Formal relationship between a vector space anal its basis vectors A basis for a vector space is a sequence of vectors that has two properties simultaneously 1 The vectors are linearly independent 2 The vectors span the space There will be one and only one way to write any vector in a given vector space as a linear combination of a set of basis vectors There are an infinite number of basis vectors for a given space but only one way to write any given vector as a linear combination of a particular basis 149 Bases anal Invertible Matrices The vectors 51 52 i l 15 are a basis for R exactly when they are the columns of an n X n invertible matrix Therefore R has infinitely many bases one associated with every different invertible matrix 1410 Pivots anal Bases When reducing an m X n matrix A to rowechelon form the pivot columns form a basis for the column space of the matrix A The pivot rows form a basis for the row space of the matrix A 1411 Dimension ofa vector space The dimension of avector space is the number of vectors in every basis For example the dimension of R is 2 while the dimension of the vector space consisting of points on a particular plane is R3 is also 2 20 GEOMETRY OF MATRICES 1412 Dimension of a subspace ofa vector space The dimension of a subspace space Sn of an n dimensional vector space Vn is the maximum number of linearly independent vectors in the sub space 1413 Rank ofa Matrix 1 The number of nonzero rows in the row echelon form of an m x n matrix A produced by elementary operations on A is called the rank of A 2 The rank of an m X n matrix A is the number of pivot columns in the row echelon form of The column rank of an m X n matrix A is the maximum number of linearly independent columns in A The row rank of an m x n matrix A is the maximum number of linearly independent rows in A 5 The column rank of an m x n matrix A is equal to the row rank of the m x n matrix A This common number is called the rank of A 6 An n x n matrix A with rank n is said to be of full rank 3 5 1414 Dimension and Rank The dimension of the column space of an m X n matrix A equals the rank of A which also equals the dimension of the row space of A The number ofindependent columns of A equals the number of independent rows of A As stated earlier the r columns containing pivots in the row echelon form of the matrix A form a basis for the column space of A 1415 Vector spaces and matrices An m x n matrix A with full column rank ie the rank of the matrix is equal to the number of columns has all the following properties 1 The n columns are independent 2 The only solution to Ax 0 is x 0 3 The rank of the matrix dimension of the column space n 4 The columns are a basis for the column space 1416 Determinants Minors and Rank Theorem 1 The rank of an m X n matrix A is k ifand only ifeverj minor in A of order k 1 vanishes while there is at least one minor of order k which does not vanish Proposition 1 Consider an m x n matrix A 1 det A 0 if every minor of order n 1 vanishes 2 If every minor of order n equals zero then the same holds for the minors of higher order 3 restatement of theorem The largest among the orders of the nonzero minors generated by a matrix is the rank of the matrix 1417 Nullitj of an m x n Matrix A The nullspace of an m x n Matrix A is made up of vectors in R and is a subspace of R The dimension of the nullspace of A is called the nullity of A It is the the maximum number of linearly independent vectors in the nullspace 1418 Dimension ofRow Space and Null Space ofan m x n matrix A Consider an m x n Matrix A The dimension ofthe nullspace the maximum number of linearly independent vectors in the nullspace plus the rank of A is equal to n Specifically Theorem 2 rankA nullityA n 43 GEOMETRY OF MATRICES 21 Proof Let the vectors 51 52 i i i k be a basis for the nullspace of the m X n matrix A This is a subset of R with n or less elements These vectors are of course linearly independent There exist vectors 51 5k i i Tin in R such that 51 52 i i 5k 516 i i i 5 form abasis for R We can prove that A EH1 k2 En Aiku A5k2 m 1457 is a basis for the column space of A The vectors A51 A52 i i i A5 span the column space of A because 51 52 i i 16516 i i i 5 form a basis for R and thus certainly span R Because AEj 0 forj g k we see that A516 A516 i i T A5 also span the column space of A Specifically we obtain 1451 52 51c 5k1 En 0 0 0 A5k1 A5k2 A571 We need to show that these vectors are independent Suppose that they are not independent Then there exist scalars c such that Z c A 5 0 44 ik1 We can rewrite 44 as follows A lt i 55 0 45 ik1 This imples that the vector n 5 2 Ci 5139 46 zk1 is in the nullspace of A Because 51 52 i i 5k form a basis for the nullspace of A there must be scalars 71 72 i i i k suc that k 5 2 ms 47 i1 If we substract one expression for 5 from the other we obtain k n 527151quot Z szi0 48 i1 jk1 and because 51 52 i i i 7516751644 i i 5 are linearly independent we must have 17 72 u7k6k16k2 Cn0 49 7 If r is the rank of A the fact that A k1 A51 i i T A5 form a basis for the column space of A tells us that r nk Because k is the nullity of A then we have TankA nullityA n n nikk 50 D 2 PROJECTIONS 21 Orthogonal vectors 22 CEOMETRY OF MATRICES 211 De nition of orthogonal vectors Two vectors a and b are said to be orthogonal if their inner product is zero that is if a b 0 Geometrically two vectors are orthogonal if they are perpendic ular to each other 212 Example in R2 Consider the two unit vectors 5 m a b 1 0 m lt1 o lt0 lt1 0 Consider two different vectors that are also orthogonal 1 72 a H 7 b l 2 1 GD a b 1 1 22 1 72 1 2 0 We can represent this second case graphically as in figure 6 FIGURE 6 Orthogonal Vectors in R2 X2 22 L1 214 Another example in R3 Consider the three vectors or alternatively as in figure 8 AV39 I V V3939 quot l EE lll lE 39 FIGURE 7 Orthogonal Vectors in R3 X m We can represent this three dimensional case as in figure 7 db 111 0 l l l 10 1 1 40 0 53 213 An example in R3 Consider the two vectors GEOMETRY OF MATRICES GEOMETRY OF MATRICES FIGURE 8 Orthogonal Vectors in R3 X3 X2 3 4 Z l 5 a3blcgl 1 e E lt54 49 719 719 lt gtlt gtltgtlt gtltgtlt gt0 5 Here a and c are orthogonal but a and b are obviously not We can represent this three dimen sional case graphically in figure 9 Note that the line segment d is parallel to c and perpendicular to a just as c is orthogonal to a 215 Another example in R3 Consider the four vectors below where p is a scalar multiple of a al lb lp We can see that e is is orthogonal to p Z 5 g 5 ge 1 55 0 1 GEOMETRY OF MATRICES 25 FIGURE 9 Orthogonal Vectors in R3 lt2gt lt2gt lt gt lt2gt E a i ll lt9 ltgt and to a This examples is represented in figure 10 m 6 H mm l cm H O Non mm H 56 OH 22 Projection onto a line 221 Idea and de nition ofu projection Consider a vector b b1 b2 bn in R Also consider a line through the origin in R that passes through the point a a1 a2 an To project the vector onto the line a we find the point on the line through the point a that is closest to b We find this point on the line a call it p by finding the line connecting b and the line through a that is perpendicular to the line through a This point p which is on the line through a will be some multiple of a ie p ca where c is a scalar Consider figure 11 The idea is to find the point along the line a that is closest to the tip of the vector b In other words find the vector bp that is perpendicular to a Given that p can be written as scalar multiple of a this also means that bca is 26 GEOMETRY OF MATRICES FIGURE 10 Orthogonal Vectors in R3 perpendicular or orthogonal to a This implies that abca 0 or ab caa 0 Given that ab and aa are scalars we can solve this for the scalar c as in equation 57 ab 7 ca 0 abi caa 0 57 a b a 5 7 i a 39 a aa De nition 1 The projection of the vector b onto the line through a is the vector p ca a 222 An Example Project the vector b 111 onto the line through a 120 First find the scalar needed to construct the projection p For this case 1 1 1 1 20 GEOMETRY OF MATRICES 29 FIGURE 12 Projection on to a Line 231 Idea and de nition Consider a vector b b1 b2 bm in Rm Then consider a set of n mx1 vectors 61 62 i i 16 The goal is to find the combination c1 a c a3 c7 a7 that is closest to the vector b This is called the projection of vector b onto the ndimensional subspace spanned by the 6 s If we treat the vectors as columns of a matrix A then we are looking for the vector in the column space of A that is closest to the vector b ie the point c such that b and Ac are closest We find the point in this subspace such that the error vector b Ac is perpendicular to the subspace This point is some linear combination of vectors 61 62 i n Consider figure 13 where we plot the vectors a1 111 and a2 012 The subspace spanned by these two vectors is a plane in R3 The plane is shown in figure 14 GEOMETRY OF MATRICES GEOMETRY OF MATRICES 31 Now consider a third vector given by b 600 We represent the three vectors in figure 15 FIGURE 15 A Vector in R3 not on the Plane This vector does not lie in the space spanned by a1 and a2 The idea is to find the point in the subspace spanned by a1 and a that is closest to the point 600 The error vector e b Ac will be perpendicular to the subspace In figure 16 the vector labeled as in the space spanned by a1 and a that is closest to the point b is drawn The diagram also shows the vector b a1 232 Formula for the projection p and the projection matrix P The error vector e b Ac will be orthogonal to each of the vectors in the column space of A ie it will be orthogonal to 51 62 i i i 6W We can write this as n separate conditions as follows aid 7 Ac 0 a b 7 Ac 0 66 ab 7 Ac 0 We can also write this as a matrix equation in the following manner 32 GEOMETRY OF MATRICES FIGURE 16 Projection onto a Subspace AU 7 Ac 0 e A 7 A Ac 0 67 e A Ac A b Equation 67 can then be solved for the coefficients c by finding the inverse of A A This is given by A Ac A b 68 5 A A 1A b The projection p Ac is then p Ac 69 The projection matrix that produces p Pb is P AA A 1Al 70 Given any m X n matrix A and any vector b in Rm we can find the projection of b onto the column space of A by premultiplying the vector b by this projection matrix The vector b is split into A Few Special Distributions and Their Properties Econ 675 Iowa State University November 12 2006 Justin L Tobias ISU Distributional Catalog November 12 2006 Special Distributions and Their Associated Properties 0 Uniform Distribution 0 Gamma Distribution 9 Inverse Gamma Distribution 6 Multivariate Normal Distribution 0 Marginals and Conditionals e Multivariate Student t Distribution 0 Mean and Variance o Marginals and Conditionals of Student t G The Wishart Distribution 0 The Binomial Distribution 6 The Poisson Distribution 6 The Multinomial Distrubion The Dirichlet and Beta Distributions The Pareto Distribution lzu Justin L Tobias ISU Distributional Catalog November 12 2006 2 2U The Uniform Distribution A continuous random variable Y has a Uniform distribution over the interval ab denoted Y N Uab if its pdf is given by if agygb 1 7 E fU Vlav b i 0 otherwise whereiooltaltbltoo If Y N Uab then EY quot and mm bI 2 Justin L Tobias ISU Distributional Catalog November 12 2006 The Gamma Distribution A continuous random variable Y has a Gamma distribution with mean u gt 0 and degrees of freedom 1 gt 0 denoted Y N 701m if its pdf is yu 1 V2 gt c 2 ex 7 If 0 lt lt 00 WW 7 y P 2 y 0 otherWIse 2 where the Integrating constant Is gIven by Q l39 It is also common to parameterize the Gamma in terms of a g and in which case we denote the distribution as Y N G a The associated density function is denoted by fgyla where if 0ltyltoo 71 0471 c6 y eXpGy fcYlav 7 0 otherwise and CG a 32quot Justin L Tobias ISU Distributional Catalog November 12 2006 4 2U Gamma Distribution Mean and Variance of the Gamma Distribution If Y N Ga then EY a and VarY a 2 If Y N yp1 then EY p and VarY 2p2V Notes Distributions related to the Gamma include the Chi square distribution which is a Gamma distribution with 1 p It is denoted by Y N X2 The Exponential distribution is a Gamma distribution with 1 2 Justin L Tobias ISU Distributional Catalog November 12 ZUUG 52quot Inverse Gamma Distribution The lnverse Gamma Distribution We denote the inverted Gamma density as Y N G a Though different parameterizations exist particularly for how enters the density we utilize the following form here Y N Ga7 5 py lfa 1y a1eXP1y l7 y gt 0 The mean of this inverse Gamma is EY Mai 1 1 for a gt1 and the variance is VarY 2a 712a i 2 1 for a gt 2 Justin L Tobias ISU Distributional Catalog November 12 ZUUG 6 2quot Multivariate Normal Distribution The Multivariate Normal Distribution A continuous kidimensional random vector Y Y17 Yk has a Normal distribution with mean p a kivector and variance X a k X k positive definite matrix denoted Y N NpX if its pdf is given by Ylitl WWI 15 ma exp i yin2 1 yin 2702 2 The cumulative distribution function of the multivariate Normal evaluated at the point y is denoted by CD ylpX or if p 0X I by CD y Note The special case where k 1 p 0 and X 1 is referred to as the Standard Normal distribution 72u Justin L Tobias ISU Distributional Catalog November 12 ZUUG Multivariate Normal Distribution Marginals and Conditionals Marginals and Conditionals of Multivariate Normal Suppose the kivector Y N NpX is partitioned as lt gt W where Y is a kiivector for i 12 with k1 k2 k and p and X have been partitioned conformany as Mn 2 l Rm Rm Elm 22 Distributional Catalog and Justin L Tobias ISU November 12 ZUUG 3 2quot Multivariate Normal Distribution Marginals and Conditionals Marginals and Conditionals Continued The Multivariate Studentt A continuous kidimensional random vector Y Y17 Yk has a t distribution with mean u a kivector scale matrix X a k x k positive Then the following results hold definite matrix and 1 a positive scalar referred to as 3 degrees of freedom parameter denoted Y N tpX1 if its pdf is given by o The marginal distribution of Y is N pX for i 12 1132173511 of m Yw mm aw 1 We more where W2 M1 t 032312 K2 112 Wgr Z and ct sz Km 7 X12X 212X212 Notes The special case where k 1 p 0 and X 1 is referred to as the Student t distribution with 1 degrees of freedom Tables providing percentiles of the Student t are available in most econometrics and statistics textbooks The case where 1 1 is referred to as the Cauchy distribution Justin L Tobias ISU Distributional Catalog November 12 21106 9 2quot Justin L Tobias ISU Distributional Catalog November 12 2UUG 1quot 2U Multivariate Stuxlentrt Distribution Mean and Variance Multivariate Stuxlentrt Distribution Marginals and Conditionals of Stuxlentrt Mean and Variance of Studentt Marginals and Conditionals of Studentt Suppose the kivector Y N tpX1 is partitioned as in our description of the multivariate Normal as are p and ZThen the following results hold a The marginal distribution of Y is tpX1 for i 12 If Y N tpX1 then EY M if 1 gt 1 and varY X if 1gt 2 V a The conditional distribution of Y0 given Y2 y2 is t X 1 k where Notes The mean and variance only exist if 1 gt 1 and 1 gt 2 respectively mm 1 2 1 This implies for instance that the mean of the Cauchy does not exist even though it is a valid pdf and hence its median and other quantiles M1l2 M1 X12T212 W2 M2v exist X is not exactly the same as the variance matrix and hence is given another name the scale matrix 1 2 hop 211 1Z7212212 and 1 1 h1l2 1 k2 l t 02 9 91 yo M20 Justin L Tobias ISU Distributional Catalog November 12 2UUG 11 2quot Justin L Tobias ISU Distributional Catalog November 12 2UUG 12 2U The Wishart Distribution Let H be an N x N positive definite symmetric random matrix A be a fixed non random N x N positive definite matrix and 1 gt 0 a scalar degrees of freedom parameter Then H has a Wishart distribution denoted H N WA71 if its pdf is given by ViNil fWHlA1 5w lAl exp i n A lH where N NgNilg N u1ii CW2 Tn 4 i1 Note If N 1 then the Wishart reduces to a Gamma distribution ie fWHlA1 fgHl1A71 ifN 1 Justin L Tobias ISU Distributional Catalog November 12 20116 Some Moments of the Wishart Distribution If H N W Am then EHijVAijv VarHjj1A2j l AiiAjj7 ilj177N and covHj7HkmVlAikAjmAimAjk7 i7j7k7m17quot7N7 where subscripts i7j7 ka refer to elements of matrices 132quot Justin L Tobias ISU Distributional Catalog November 12 20116 14 2quot The Binomial Distribution A discrete random variable Y has a Binomial distribution with parameters T and p denoted Y N BTp if its probability function is given by T y 7 Tiy f8 y TVP WP 1 P If y 390717quot7 T 0 otherWIse where 0 g p g 1 and T is a positive integer The Bernoulidistribution is a special case of the Binomial when T 1 If Y N BT7p then 150 Tp varY Tp1p Note This distribution is used in cases where an experiment the outcome of which is either success or failure is repeated independently T times The probability of success in an experiment is p The distribution of the random variable Y which counts the number of successes is B TVp Justin L Tobias ISU Distributional Catalog November 12 20116 The Poisson Distribution A discrete random variable Y has a Poisson distribution with parameter A denoted Y N Po if its probability function is given by AyeXJpi 012 0 77739quot ify f A Po yl otherwise where A a positive real number If Y N Po then EY A and varY A 15 2n Justin L Tobias ISU Distributional Catalog November 12 20116 16 2quot The Multinom ial Distrubion The Multinomial Distribution A discrete Nidimensional random vector Y Y1 YN has a Mutinomia distribution with para meters T and p denoted Y N M Tp if its probability density function is given by T 7 N L TWOWm mpilpxlquot If yii01Tand 21yii T 0 otherWIse wherepp1pN0gpiglfori1N Zilpiland T isa positive integer Justin L Tobias ISU Distributional Catalog November 12 2UUG 17 Zn The Dirichlet and Beta Distributions The Dirichlet and Beta Distributions Let Y Y1 YN be a vector of continuous random variables with the property that Y1 YN 1 Then Y has a Dirichlet distribution denoted Y N Da if its pdf is given by r a 112 r M where a a1aN a gt 0 for i 1N and a 04 The Beta distribution denoted by Y N a1a2 is the Dirichlet distribution for the case N 2 Its pdf is denoted by f3 Yla1a2 Note In the case N 2 the restriction Y1 Y2 1 can be used to remove one of the random variables Thus the Beta distribution is a univariate distribution N 0471 Hy v i1 foYla Justin L Tobias ISU Distributional Catalog November 12 2UUG 13 2U The Dirichlet and Beta Distributions Moments of the Dirichlet Distribution Suppose Y N D a where a and a are as given as in the previous page Then for ij 1N o EM 0 varY a 7 a Y aaa 22a1 o covY7 732 and Justin L Tobias ISU Distributional Catalog November 12 2UUG The Pareto Distribution The Pareto Distribution A continuous random variable Y has a Pareto distribution if its pdf is given by if y 2 v otherwise AA fPa yl YvA 3 Justin L Tobias ISU Distributional Catalog November 12 2UUG 2U 2K a momnonsop RANDOM vemus Prue Consider the digram m gure 1 FIGURE 1 y P0015 an Increasing function y x 134a 134b x A n be seen from 1n gure 1 each poxnt on the y am maps mto a poxnt on the x 3x15 that 15X must take 0 a Va ue between a and b when 1 takes on a Va ue between a and Therefore Pm lt Y lt bPCIgt 1ult X lt quotb 407 18 e401 mew What we would lxke to do ts replace x m the second 1me thhy and D Xa and quotb thh a and b To d 50 we need to make a change ofvanable Con tdet 0 make a u substitution J example lfu hgtlt then du h gtlt dx So le 40 then 1 de M M 031 Then we can wme TRANR QRMATKQNS OF RANDOM VAMABLE S FIGURE 2 y ac 15 a decreasmg funcuon b a CD Kb CD Ka x PultYltbPtDquotbltXltltIgtquotu 25 V oz mm 407 7 a 7 N W 75 fx w M M x 1 ia fx w dy Because dCD 1g 7 m 7 1 03y 7 73 7 when the funcuony x 15 mcreasmg and TRANSFORMATIONS OF RANDOM VARIABLES 13 FIGURE 3 The Two Density Functions Value of Density Function 4 METHOD OF TRANSFORMATIONS MULTIPLE VARIABLES 41 General de nition of a transformation Let I be any function from Rk to Rm k m 2 1 such that ltIgt 1A x e Rk ltIgtxe A e Bk for every A 6 I where I is the smallest a field having all the open rectangles in Rm as members If we write y ltIgtx the function I defines a mapping from the sample space of the variable X E to a sample space Y of the random variable 11 Specifically ltIgtz E A 11 39 and ltIgt71Aze 112ltlgtzeA 40 42 Transformations involving multiple functions of multiple random variables Theorem 4 Let le X2 11 12 be the value of the joint probability density of the continuous random variables X1 and X2 at x1 x2 Ifthefunctions given by y1 u xl x2 and y2 u2x1 x2 are partially differentiable with respect to 361 and x2 and represent a oneetoeone transformation for all values within the range of X1 and X2 for which le X2 11 x2 y 0 then for these values of 361 and 362 the equations y1 u xl x2 and y2 u2x1 362 can be uniquely solved for 361 and 362 to ive x1 wl y1 y2 and x2 w2 yl7 y2 andfor corresponding values ofy1 and y2 the joint probability density of Y1 u1X1 X2 and Y2 u2X1 X2 is given by fY1Y2yl ye le X2 w1y17y27w2yi y2l 39 lJl 41 where is the acobian of the transformation and is defined as the determinant TRANSFORMATIONS OF RANDOM VARIABLES 7 yiy2y1 313 fY1Y2y1y2 fX1X2 M 117 92 w2yly2l 39 l J 67 l 6 l yi 71 l yll yl e yl Considering all possible values of values of y1 and y we obtain le Y2 917 92 We can then find the marginal density of Y2 by integrating over yl as follows yle yl forzl 2 00 lt 12 lt l 0 elsewhETe sz 917 y2 ng Y2 yiy y291 0 91 6 dyi We make a uv substitution to integrate where u v du and dv are define as u yl v 75 y1 du dyldv e yl dyl This then implies ng 917 92 for all y such thatO lt yg lt 1 f0 916 yldyi 7yle yl 807 mie yldyl 07076 1 1l o 0e y1l8 7ltei ee gt 0701 lt46 47 48 49 0 H 0 N H3 A E Q N V 03 FIGURE 5 Joint Density of Y1 and Y2 This joint density of Y1 Y2 is contained in figure 5 X2 I lllllllllmmllllll 5quotquotquot3939II 39 El quotilluminant I nm lvnmaa 0 FIGURE 4 oint Density of X1 and X2 shown in figure 4 A graph of the joint densities and the marginal density follows The joint density of X1 X2 is TRANSFORMATIONS OF RANDOM VARIABLES TRANSFORMATIONS OF RANDOM VARIABLES 17 This marginal density of Y2 is shown graphically in figure 6 FIGURE 6 Marginal Density of Y2 SEEK SEEK gm r223 Y2 TRANSFORMATIONS OF RANDOM VARIABLES REFERENCES 1 Billingsley R Probability and Measure 3rd edition New York Mley 1995 2 CasellaG and KL Berger Statistical Inference Paci c Grove CA Duxbury 2002 3 Protter Murray H and Charles B Morrey Ir Interrriediate Calculus New York SpringerVerlag 1985 2 CHARACTERISTIC ROOTS AND VECTORS 12 Determinantal equation used in solving the characteristic root problem Now consider the singularity condition in more detail AI7Az 0 7 lAI7Al 0 A i an a12 39 39 39 704171 5 7a21 A 7 L122 7a2n 7 0 7am 7am A 7 am This equation is a polynomial in A since the formula for the determinant is a sum containing n1 terms each of which is a product of n elements one element from each column of A The fundamental polynomials are given as lAI 7 Al A mm MW b1A be 6 This is obvious since each row of lAI 7 Al contributes one and only one power of A as the deter minant is expanded Only when the permutation is such that column included for each row is the same one will each term contain A giving A Other permutations will give lesser powers and be comes from the product of the terms on the diagonal not containing A of A with other members of the matrix The fact that be comes from all the terms not involving A implies that it is equal to l 7 Al Consider a 2x2 example lMiAl A7a11 7a12 7amp21 A 7 a22 A 7 a11A 7 a22 7 an an A2 aiiA a22A an a22 a12a21 2 0411 7 a22 an a22 7 a12 a21 2 all a22 a11a22 a12 an A2 blA be be l Al Consider also a 3x3 example where we find the determinant using the expansion of the first row A 7 an 7am 7am lAI 7 Al 7a21 A7 egg 7a23 70m 7as2 A 7 ass 7a21 7a2s 7a21 A7a22 7am A 7 ass A 7 A 7 an a22 3 7a32 A 7 ass 7 dis l 7amp31 7amp32 8 Now expand each of the three determinants in equation 8 We start with the first term CHARACTERISTIC ROOTS AND VECTORS A A7a11 fad an A7a11 A2 7 Aaaa 7 Aa22 122 ass 7 azaaazl 32 A 7 133 A 7 111 A2 7Aa33 122 122133 7 azsaazn A3 7 A2 133 122 A amass 7 123132 7 A2a11 Aa11a33 122 7 111 amass 7 123132 3 2 A 7 A 11117 122 133 Aa11 133 111122 122133 7 123132 7 a11a22a33 a11a23a32 9 Now the second term a12 fan 7amp23 a12 7A a21 a21 a33 7 a23 a31 7a31 A 7 a33 10 7 A a12 a21 a12 a21 a33 7 a12 a23 a31 Now the third term 7cm A 7 L122 7a a a Aa 7a a fan 7M2 13 l 21 32 31 22 31 11 7 0413 7 0413 0421 0432 7 A 0413 0431 0413 0422 0431 Now combine the three expressions to obtain A 7 an 7a12 7amp13 lAI 7 Al 7a21 A 7 egg 7a23 7am 7amp32 A 7 a33 3 2 A 7 A an a22 a33 A a11a33 a11a22 a22a33 7 a23a32 7 a11a22a33 a11a23a32 7 Aa12a21 a12a21a33 7 a12a23a31 7 a13a21a32 7 Aa13a31 a13a22a31 3 2 A 7 A an a22 a33 A a11a33 a11a22 a22a33 7 a23a32 7 a12a21 7 a13a31 7 041104220433 041104230432 041204210433 7 041204230431 7 041304210432 041304220431 12 The first term will be A3 the others will give polynomials in A2 and A Note that the constant term is the negative of the determinant of 13 Fundamental theorem of algebra 131 Statement ofPandamental Theorem Theorem ofAlgehra Theorem 1 Any polynomial px ofdegree at least 1 with complex coejficients has at least one zero 2 ie z is a root ofthe equation px 0 among the complex numbers Farther ifz is a zero ofpx then x 7 z divides px and px x7zax where ax is a polynomial with complex coejficients whose degree is 1 smaller than p What this says is that we can write the polynomial as a product of xz and a term which is a polynomial of one less degree than px For example if px is a polynomial of degree 4 then we can write it as 101 z 7 11 polynomial with power no greater than 3 13 4 CHARACTERISTIC ROOTS AND VECTORS where x1 is a root of the equation Given that qx is a polynomial and if it is of degree greater than one then we can write it in a similar fashion as 4a z 7 12 polynomial with power no greater than 2 14 where I is a root of the equation qx 0 This then implies that px can be written as pz z 7 11 z 7 12 polynomial with power no greater than 2 15 or continuing pz z 7 11 z 7 12 z 7 13 polynomialwith power no greater than 1 16 101 z 7 11 z 7 12 z 7 13 z 7 x4 term not containing x 17 If we set this equation equal to zero it implies that pz z 7 11 z 7 12 z 7 13 z 7 I4 0 18 132 Example ofFanalamental Theorem Theorem ofAlgehra Consider the equation 100 7 3 7 6t2 7 11 7 6 19 This has roots 11 1 t2 2 t3 3 Consider then that we can write the equation as t1qt as follows pt t71t2 7 5 6 t375t26t7t25t76 20 73 7 6t211t76 Now carry this one step further as qt 12 7 51 6 7 t 7 2gtslttgt 7 t 7 no 7 3 lt21 7 pt t 7 t1t 7 t2t 7 t3 133 Theorem that Follows from the Fundamental Theorem Theorem ofAlgehra Theorem 2 A polynomial of alegree n 2 1 with complex coejficients has counting multiplicities exactly n zeroes among the complex numbers The multiplicity of a root 2 of px 0 is the largest integer k for which xzk divides px that is the number of times 2 occurs as a root of px 0 lf 2 has a multiplicity of 2 then it is counted twice 2 times toward the number n of roots of px 0 CHARACTERISTIC ROOTS AND VECTORS fA 7 A4 A37A1 7A2 7 A3 7A4 A2A1A2 Am Am A1A4A2A4 7 AW Aemm 7 mm 7 mm 7 mm mmm 7 O 7A4 7A3A1A2A3A4 A212 AlAS AZAS MM MM MM 7 AA1A2A3 A1A2A4 A1A3A4 A2A3A4 A1A2A3A4 0 We can write this in an alternative way as follows fA A4 7 A3 21Ai A2 Ei vAZAj 30 7 A1 EiujukAiAjAk 714 111 A 7 0 Similarly for polynomials of degree 2 3 and 5 we find N 7 A2 7 mim 7021111 i 7 0 31a fA A3 7 A2 21Ai A Ei vAZAj 31b 713 111 A 7 0 fA A5 7 A4 Ei1Ai A3 Ei vAZAj 31c 7 A2 EiujukAiAjAk A1 Eijkgijkg 04 111 i 0 Now consider the general case fA A7A1A7A2HAA7A 0 32 Each product contributing to the power Ak in the expansion is the result of k times choosing A and n7k times choosing one of the Ai For example with the first term of a third degree polynomial we choose A three times and do not choose any of the Ai For the second term we choose A twice and each of the A once When we choose A once we choose each of the A twice The term not involving A implies that we choose all of the Ai This is of course A1 A27 A3 Choosing A k times and A n7k times is a problem in combinatorics The answer is given by the binomomial formula CHARACTERISTIC ROOTS AND VECTORS 7 specifically there are products with the power V In a cubic when k 3 there are no terms involving Ai when k 2 there will three terms involving the individual Ai when k 1 there will three terms involving the product of two of the M and when k 0 there will one term involving the product of three of the A as can be seen in equation 33 100 A A1 A2 A3 A27M17M2A1A2A7A3 AS 7A2A17A2A2AA1A27A2A3AA1A3AA2A3iAlAgAg AS A2123 121323A1A2A3 The coefficient of Ak will be the sum of the appropriate products multiplied by 71 if an odd number of A have been chosen The general case follows 33 fM A 7 Aim 7 A2 A 7 An n nil Z3211 i AH EMMA AH Eik i j k 34 NH Eijki i jk i i i 1nngl1i 0 The term not containing A is always 71 times the product of the roots A17 A2 A3 i i of the polynomial and term containing A 1 is always 7 21 A 137 Implications offacto ring result for the fundamental determinantal equation 6 The fundamental equations for computing characteristic roots are AI 7 An 7 0 35a AI7Al 0 35b w 7 Al 7 A bHAH bHAH m be 0 35c Equation 35c is just a polynomial in A If we solve the equation we will obtain n roots some perhaps repeated Once we find these roots we can also write equation 35c in factored form using equation 34 It is useful to write the two forms together M17 A l A bn1 1 bng 2 i H blA 120 0 36a MI 7 A l A iAn l ELI Ai An 2 Ei inAj i H 71 ng1 A 0 36b We can then find the coefficients of the various powers of A by comparing the two equations For example 1274 7 21 A and b0 71 H2 Ai 8 CHARACTERISTIC ROOTS AND VECTORS 138 Implications of theorem 1 and theorem 2 The n roots of a polynomial equation need not all be different but if a root is counted the number of times equal to its multiplicity there are n roots of the equation Thus there are n roots of the characteristic equation since it is an nth degree polynomial These roots may all be different or some may be the same The theorem also implies that there can not be more than n distinct values of A for which lAI 7 Al 0 For values of A different than the roots of fA solutions of the characteristic equation require x 0 If we set A equal to one of the roots of the equation say Ai then lAiI 7 Al 0 For example consider a matrix A as follows A 37 This implies that A 7 4 72 lAI7Al 72 A74 38 Taking the determinant we obtain M17 Al A74A7474 A2 7 8A12 0 39 A76A720 7A1 67 A2 2 Now write out equation 39 with A1 2 as follows 4 2 A Z lt2 4 4 72 21 7 A 72 2 7 4 40 7 72 72 T 72 72 14 A numerical example of computing characteristic roots eigenvalues Consider a simple ex ample matrix CHARACTERISTIC ROOTS AND VECTORS 11 2 SIMILAR MATRICES 21 De nition of Similar Matrices Theorem 3 Let A be a square matrix of order n and let Q be an invertible matrix oforder n Then B Q IAQ 48 has the same characteristic roots as A and x is a characteristic vector of A then Q lx is a characteristic vector ofB The matrices A and B are said to be similar Proof Let A x be a characteristic root and vector pair for the matrix A Then AI Az QilAz AQilz Also Q7114 QilAQQil 49 SOIQ IAQQ III MQ lrl Q IAQIQ lrl MQ lrl This then implies that A is a characteristic root of Q lAQ with characteristic vector Q lx 22 Matrices similar to diagonal matrices 221 Diagonizahle matrix A matrix A is said to be diagonalizable if it is similar to a diagonal ma trix This means that there exists a matrix P such that P lAP D where D is a diagonal matrix 222 Theorem on diagonizahle matrices Theorem 4 The nxn matrix A is diagonalizahle ijfthere is a set ofn linearly independent vectors each of which is a characteristic vector ofA Proof First show that if we have n linearly independent characteristic vectors A is diagonaliz able Take the n linearly independent characteristic vectors of A denoted 11 x2 x with their characteristic roots A1 An and form a nonsingular matrix P with them as columns So we have II I I 96 I I 1 2 n 11 x2 I P I I I 3 3 3 50 I 00 002 Let A be a diagonal matrix with the characteristic roots of A on the diagonal Now calculate P lAP as follows 12 CHARACTERISTIC ROOTS AND VECTORS P lAP P 1AzlAz2 AM P71111 A212 Anzn A1 0 0 0 0 A2 0 0 P llzl 12 In 51 0 0 0 An 71zl 12 I ETA 7 PTIPA A Now suppose that A is diagonalizable so that P lAP D Then multiplying both sides by P we obtain AP PD This means that A times the ith column of P is the ith diagonal entry of D times the ith column of P This means that the ith column of P is a characteristic vector of A associated with the ith diagonal entry of D Thus D is equal to A Since we assume that P is nonsingular the columns characteristic vectors are independent D 3 INDEPENDENCE OF CHARACTERISTIC VECTORS Theorem 5 Suppose Al A16 are characteristic roots of an mm matrix A no two of which are the same Let xi he the characteristic vector associated with M Then the set 11 12 1k is linearly independent Proof Suppose 11 M are linearly dependent Then there exists a nontrivial linear combination such that 1 2 3 19 1 1 1 1 96 I I3 I a1 a2 a3 quot390 k 0 52 I r3 I3 15 As an example the sum of the first two vectors might be zero In this case a1 1 a2 l and a3 ak 0 Consider the linear combination of the x s that has the fewest nonzero coefficients Renumber the vectors such that these are the first r Call the coefficients a That is alzl 1212 0szT Ofor r g k and aprhnwak 0 53 Notice thatr gt 1 because all xi y 0 Now multiply the matrix equation in 53 by A Aalzl 0912 arzra1Azl 121412 ozTAzT alAlzl 09qu QTATIT 0 Now multiply the result in 53 by AT and subtract from the last expression in 54 First multiply 53 by AT 54 AT 1111 1212 0szT Aralxl prlQCI Q ATaTzT 1 2 55 alATz agATz QTATIT 0 Now subtract CHARACTERISTIC ROOTS AND VECTORS 13 JiiAix1 OL2A2I2 ar7lAr71Til aTAwT 0 OtiArIl 12sz maAwTTl aTAwT 0 041A1 Ar1 0909 NWT OLT71AT71 ArTTl 0Mw AUIT O gt 11A17 A xl a2A2 7 A x2 aT1AT1 7 A xTTl O 56 This dependence relation has fewer coefficients than 53 thus contradicting the assumption that 53 contains the minimal number Thus the vectors 11 xk must be independent D 4 DISTINCT CHARACTERISTIC ROOTS AND DIAGONALIZABILITY Theorem 6 Ifari mm matrix A has r1 distinct Characteristic roots then it is diagonalizable Proof Let the distinct characteristic roots of A be given by A1 An Since they are all different 11 IniS a linearly independent set by the last theorem Then A is diagonalizable by the Theorem 4 D 5 DETERMINANTS TRACES AND CHARACTERISTIC ROOTS 51 Theorem on determinants Theorem 7 Let A be a square matrix of order r1 Let A1 A be its Characteristic roots Then I Al 7 1121 Ai 57 Proof Consider the n roots of the equation defining the characteristic roots Consider the general equation for the roots of an equation as in equation 6 the example in equation 12 or equation 34 and the specific case here AIiAl A7A1A7A2A7An 0 gt Aquot AT 1 terms involving A Aterms involving A H 7 AZ 7 0 21 58 s Aquot 7 An 1 2311 Y Ei inAj 7 Aquot 3 Emamxm Aquot74 Ei j k zAiAjAkAz 7 71quot H1Ai 0 The first term is A and the last term contains only the Ai This last term is given by H1Ai 71 Him 59 Now compare equation 6 and equation 58 6 l AI 7 A l A bn1A 1 bn2A 2 i H 120 0 60a 58 l A 7 Al An 7An71 2211Ai An72 Ei inAj 7 39 1n nil i 0 60b Notice that the last expression in 58 must be equivalent to the 120 in equation 6 because it is the only term not containing A Specifically be 4 H221 Az 61 CHARACTERISTIC ROOTS AND VECTORS Now set A 0 in 60a This will give lAliAl A bn1A 1 bn2A 2 b00 l 7 A l b0 62 71 lAl 120 Therefore be 7 71 H221 i 7 71 l Al 7 be A 71 11221 Az39 71 l Al 63 7 H2 Ai l A l D 52 Theorem on traces Theorem 8 Let A be a square matrix of order r1 Let A1 m A be its characteristic roots Then tr A ELI Ai 64 To help visualize the proof consider the following case where n 4 A 7 an 7amp12 7am 7amp14 MI 7 A 7a21 A7a22 7a2s 7a24 7cm 7amp32 A 7 ass 7as4 7a41 7amp42 7a4s A 7 L144 A 7 L122 7a2s 7a24 7a21 7a2s 7a24 65 A 7 an 7amp32 A 7 ass 7amp34 L112 7cm A 7 ass 7as4 7amp42 7a4s A 7 L144 7a41 7a4s A 7 L144 7cm A 7 L122 7a24 7a21 A 7 L122 7a2s 7a13 7amp31 7amp32 7amp34 ai4 7amp31 7amp32 A 7 ass 7a41 7amp42 A 7 L144 7a41 7amp42 7a4s Also consider expanding the determinant A 7 L122 7a2s 7a24 7amp32 A 7 ass 7as4 66 7amp42 7a4s A 7 L144 by its first row Aid A7ass 7as4 a 7amp32 7as4 7a 7amp32 A7ass 67 22 7a4s A 7 L144 23 42 A 7 L144 24 7amp42 7a4s Now proceed with the proof Proof Expanding the determinantal equation 5 by the first row we obtain 68 lAI Al fW A i 0011 E2aljclj Here Cu is the cofactor of alj in the matrix lAI 7 Al Note that there are only n2 elements involving aii A in the matrices C12 C13 Cm Given that there are only n2 elements involving CHARACTERISTIC ROOTS AND VECTORS 17 7 ORTHOGONAL MATRICES 71 Orthogonal and orthonormal vectors and matrices De nition 1 The nxl vectors a and b are orthogonal if a b 0 80 Definition 2 The nxl vectors a and b are orthonormal if a b 0 do 1 81 1212 1 Definition 3 The square matrix Q nxn is orthogonal if its columns are orthonormal Specifically Q 11 2 3 In x 11 82 zizj Oi j A consequence of the definition is that QQ I 83 72 Nonsingularity property of orthogonal matrices Theorem 10 An orthogonal matrix is nonsingulu r Proof If the columns of the matrix are independent then it is nonsingular Consider the matrix Q given by 4 4i 41 4 a a Q 11 12 4 13 13 q 84 q 43 42 If the columns are dependent then 4 a q q q 43 43 42 i a1 a2 as an 0 ai a 0 for at least one1 85 q 43 43 42 Now premultiply the dependence equation by the transpose of one of the columns of Q say qj 18 CHARACTERISTIC ROOTS AND VECTORS 4i 4 v v 42 v v 42 4114i 4 4 3 4214 4 4 3 1 2 q q 7 n 7 86 11 4 anqi q gill 07aiy 0f0ratleast0nez 42 By orthogonality all the terms involvingj and i y are zero This then implies 4 1 v v 4 4114 4 4 0 87 4 But because the columns are orthonormal q j qj 1 which implies that aj 0 Becausej is arbi trary this implies that all the aj are zero Thus the columns are independent and the matrix has an inverse D 73 Tranposes and inverses of orthogonal matrices Theorem 11 IfQ is an orthogonal matrix then Ql Q l Proof By the definition of an orthogonal matrix Q Q I Now postmultiply the identity by Q l It exists by Theorem 10 Q Q I Q QQ 1 IQ 1 88 a Q Q 1 E 74 Determinants and characteristic roots of orthogonal matrices Theorem 12 IfQ is an orthogonal matrix of order n then a lQl 1or lQl 71 b IfAi is a Characteristic root of Q then M i1 i171 Proof To show note that Q Q I Then 1 11 l lQ QllQ HQl 89 1Q12 because lQl lQ l To show the second part note that Q Q l Also note that the characteristic roots of Q are the same as the roots of Q By theorem 9 the characteristic roots of Q 1 are 1Ai This implies then that the roots of Q are the same as Q This means then that M 1Ai This can be true iff Ai i1 20 CHARACTERISTIC ROOTS AND VECTORS 85 Complex conjugate of a complex number For each complex number 2 x iy the number 2 x iy is called the complex conjugate of z The product of a complex number and its conjugate is a real number In particular if z x iy then 22 z yz 7y 12 y2 71y yr 12 y2 0 12 y2 98 Sometimes we will use the notation 2 to represent the complex conjugate of a complex number So 2 x iy We can then write 25 17yrry12 92 71y yr 12 y270 12 f 99 86 Graphical representation of a complex number Consider representing a complex number in a two dimensional graph with the vertical axis representing the imaginary part In this framework the modulus of the complex number is the distance from the origin to the point This is seen clearly in figure 1 FIGURE 1 Graphical Representation of Complex Number 2 E g xSqux1quot2x2quot2 g XX1IX2 39 I x g x39 I a xquot xx X2 I 1 39 f I I I x g 0 x 1 Real aXIs 87 Polar form of a complex number We can represent a complex number by its angle and dis tance from the origin Consider a complex number 21 x1 i yl Now consider the angle 61 which the ray from the origin to the point 21 makes with the x axis Let the modulus of 2 be denoted by r1 Then cos 61 zlrl and sin 91 yirl This then implies that 21 11 iy1 7 1 cos 91 i39r1 sin 91 7 1 cos 91 i sin 91 100 Figure 2 shows how a complex number is represented in polar corrdinates 88 Complex Exponentials The exponential e is a real number We want to define e2 when 2 is a complex number in such a way that the principle properties of the real exponential function will be preserved The main properties of e for x real are the law of exponents ewe 2 651 and the equation e 1 If we want the law of exponents to hold for complex numbers then it must be that CHARACTERISTIC ROOTS AND VECTORS 21 FIGURE 2 Graphical Representation of Complex Number 2 21 r1cos 1 i sin 51 E E E n a 339 quot x I a I r i a I Y1 r x I x 51 I x I 0 x 1 Real aXIs ez 61 ex eiy 101 We already know the meaning of e We therefore need to define what we mean by eiy Specifi cally we define eiy in equation 102 De nition 4 I Ely cos y i sin y 102 With this in mind we can then define e2 exiy as follows De nition 5 I ez ex Ely 6 cos yi sin y 103 Obviously if x 0 so that z is a pure imaginary number this yields eiy cos y i sin y 104 It is easy to show that e0 1 lf 2 is real then y 0 Equation 105 then becomes ez exeio ex cos 0 isin 0 390 105 egceZ 610 0 ex So e0 obviously is equal to 1 To show that exeiy e W or ellez2 e21 22 we will need to remember some trigonometric formulas Theorem 13 sin45 9 sin 45 cos 9 cos 45 sin 9 106a sin45 7 9 sin 45 cos 9 7 cos 45 sin 9 106b cos45 9 cos 45 cos 9 7 sin 45 sin 9 106C cos45 7 9 cos 45 cos 9 sin 45 sin 9 106d CHARACTERISTIC ROOTS AND VECTORS 23 2 52 12 2 1 7 iy 51 A1 iA21 7 We can simplify 115 as follows 115 I 7 WWW 1399 7 A1 WW 7 WWI 1399 7 w 7 mm 21 7 A1 212 w m m 7 am 1 51 i1 5y 7 iy 51 7 iy 5y A1 012 1 1 y y 0 7 1 51 iy 5 1 7 iy 51 7 iy 5y A1 2712 1 1 y y 07 1 5y a scalar 116 7 1 51 iy 5y 7 iy 51 7 iy 5y A1 2712 1 1 y y 07 S is symmetric 1 51 y 5y A1 012 1 1 y y 07 ii 1 1 51 y 5y A1 M2 1 1 y y Further note that 2 2 1 7 1 1 y y7 I y 7 WE 1 1 y y 0 1 1 y y gt 0 117 is a real number The left hand side of 116 is a real number Given that 1 1 y y is positive and real this implies that A2 0 This then implies that 1 A1 iAg A1 and is real D Theorem 16 IfS is a symmetric matrix whose elements are real corresponding to any characteristic root there exist characteristic vectors that are real Proof The matrix S is symmetric so we can write equation 114 as follows 52 AZ 33 iy A1 i2z 1 118 AMI 19 A11 2119 Thus 2 1 iy will be a characteristic vector of S corrresponding to 1 A1 as long as x and y satisfy A1 A1 Ay M 119 and both are not zero so that z y 0 Then construct a real characteristic vector by choosing 1 y 0 such that Ax A11 and y 0 D 24 CHARACTERISTIC ROOTS AND VECTORS 92 Symmetric Matices and Orthogonal Characteristic Vectors Theorem 17 IfA is a symmetric matrix the characteristic vectors corresponding to different characteristic roots are orthogonal ie corresponds to A and xj corresponds to M A 9 M then xi 11 0 Proof We will use the fact that the matrix is symmetric and characteristic root equation 1 Multiply the equations by 11 and mi Ari Aizi zjAzi AizjIi 120a Azj Ajzj 7 zi Azj Ajzi zj 1201 Because the matrix A is symmetric it has real characteristic vectors The inner product of two of them will be a real scalar so Now subtract 120b for 120a using 121 1 A11 Aiz El 2quot j zquot j 7 1 AI Ajz z 122 zjAzi 7 ziAzj M 7 A z zi Because A is symmetric and the characteristic vectors are real xjAzi is symmetric and real Therefore xjAzi ziA zj IllA11 The left hand side of 122 is therfore zero This then implies 0 A7A39 zjzi Z J 123 1111 0 Aifhj 93 An Aside on the Row Space Column Space and the Null Space of a Square Matrix 931 Definition of R The space R consists of all column vectors with n components The com ponents are real numbers 932 Sahspaces A subspace of a vector space is a set of vectors including 0 that satisfies two requirements If a and b are vectors in the subspace and c is any scalar then 1 a b is in the subspace 2 ca is in the subspace 933 Sahspaces and linear combinations A subspace containing the vectors a and b must contain all linear combinations of a and b CHARACTERISTIC ROOTS AND VECTORS 25 934 Column space ofa matrix The column space of the mxn matrix A consists of all linear combi nations of the columns of A The combinations can be written as Ax The column space of A is a subspace of Rm because each of the columns of A has In rows The system of equations Ax b is solvable if and only if b is in the column space of A What this means is that the system is solvable if there is some way to write b as a linear combination of the columns of A 935 Nallspace ofa matrix The nullspace of an mgtltn matrix A consists of all solutions to Ax 0 These vectors x are in R The elements of x are the multipliers of the columns of A whose weighted sum gives the zero vector The nullspace containing the solutions x is denoted by NA The null space is a subspace of R while the column space is a subspace of Rm For many matrices the only solution to Ax 0 is x 0 If n gt m the nullspace will always contain vectors other than x 0 936 Basis vectors A set of vectors in a vector space is a basis for that vector space if any vector in the vector space can be written as a linear combination of them The minimum number of vectors needed to form a basis for Rk is k For example in R2 we need two vectors of length two to form a basis One vector would only allow for other points in R2 that lie along the line through that vector 937 Linear dependence A set of vectors is linearly dependent if any one of the vectors in the set can be written as a linear combination of the others The largest number of linearly independent vectors we can have in Rk is k 938 Linear independence A set of vectors is linearly independent if and only if the only solution to alal 01202 akak 0 124 a1a2ak0 125 We can also write this in matrix form The columns of the matrix A are independent if the only solution to the equation Ax 0 is x 0 939 Spanning vectors The set of all linear combinations of a set of vectors is the vector space spanned by those vectors By this we mean all vectors in this space can be written as a linear combination of this particular set of vectors 9310 Rowspace ofa matrix The rowspace of the m X n matrix A consists of all linear combinations of the rows of A The combinations can be written as x A or A x depending on whether one con siders the resulting vectors to be rows or columns The row space of A is a subspace of R because each of the rows of A has n columns The row space of a matrix is the subspace of R spanned by the rows 9311 Linear independence and the basis for a vector space A basis for a vector space with k dimen sions is any set of k linearly independent vectors in that space 9312 Formal relationship between a vector space and its basis vectors A basis for a vector space is a sequence of vectors that has two properties simultaneously 1 The vectors are linearly independent 2 The vectors span the space There will be one and only one way to write any vector in a given vector space as a linear combination of a set of basis vectors There are an infinite number of basis vectors for a given space but only one way to write any given vector as a linear combination of a particular basis 26 CHARACTERISTIC ROOTS AND VECTORS 9313 Bases and Invertible Matrices The vectors 51 52 i i in are a basis for R exactly when they are the columns of and n X n invertible natirx Therefore R has infinitely many bases one associ ated with every differnt invertible matrix 9314 Pivots and Bases When reducing an m X n matrix A to rowechelon form the pivot columns form a basis for the column space of the matrix A The the pivot rows form a basis for the row space of the matrix A 9315 Dimension ofa vector space The dimension of a vector space is the number of vectors in every basis For example the dimension of R2 is 2 while the dimension of the vector space consisting of points on a particular plane is R3 is also 2 9316 Dimension of a subspace ofa vector space The dimension of a subspace space Sn of an n dimensional vector space Vn is the maximum number of linearly independent vectors in the sub space 9317 Ranh ofa Matrix 1 The number of nonzero rows in the row echelon form of an m X n matrix A produced by elementary operations on A is called the rank of 2 The rank of an m X n matrix A is the number of pivot columns in the row echelon form of The column rank of an m X n matrix A is the maximum number of linearly independent columns in A The row rank of an m X n matrix A is the maximum number of linearly independent rows in 5 The column rank of an m X n matrix A is eqaul to row rank of the m X n matrix A This common number is called the rank of A 6 An n X n matrix A with rank n is said to be of full rank b 5 9318 Dimension and Rank The dimension of the column space of an m X n matrix A equals the rank of A which also equals the dimension of the row space of A The number ofindependent columns of A equals the number of independent rows of A As stated earlier the r columns containing pivots in the row echelon form of the matrix A form a basis for the cloumn space of A 9319 Vector spaces and matrices An mxn matrix A with full column rank ie the rank of the matrix is equal to the number of columns has all the following properties 1 The n columns are independent 2 The only solution to AX 0 is x 0 3 The rank of the matrix dimension of the column space n 4 The columns are a basis for the column space 9320 Determinants Minors and Rank Theorem 18 The rank of an m X n matrix A is k ifand only ifeverj minor in A of order k 1 vanishes while there is at least one minor of order k which does not vanish Proposition 1 Consider an m X n matrix A 1 det A 0 if every minor of order n 1 vanishes 2 If every minor of order n equals zero then the same holds for the minors of higher order 3 restatement of theorem The largest among the orders of the nonzero minors generated by a matrix is the rank of the matrix CHARACTERISTIC ROOTS AND VECTORS 27 9321 Nullity of an m X n Matrix A The nullspace of an m X n Matrix A is made up of vectors in R and is a subspace of R The dimension of the nullspace of A is called the nullity of A It is the the maximum number of linearly independent vectors in the nullspace 9322 Dimension ofRow Space and Null Space 17an m X n Matrix A Consider an m X n Matrix A The dimension ofthe nullspace the maximum number of linearly independent vectors in the nullspace plus the rank of A is equal to n Specifically Theorem 19 TankA nullityA n 126 28 CHARACTERISTIC ROOTS AND VECTORS 94 GramSchmidt Orthogonalization and Orthonormal Vectors Theorem 20 Let eh i lZWn be a set ofn linearly independent neelernent vectors Hey can be transformed into a set of orthonormal vectors Proof Transform them into an orthogonal set and divide each vector by the square root of its length Start by defining the orthogonal vectors as yl 61 y2 algel e2 yg algel 02362 e3 127 9 alnel a2n 2 an71 n71 6 The aij must be chosen in a way such that the y vectors are orthogonal yi yj 0 i12mj71 128 Note that since yi depends only on the ei the following condition is equivalent ei yj 0 il2mj71 129 Now define matrices containing the above information Xj 61 e2 e3 ej I alj V agj 130 a auenj The matrix X2 e1 X3 e1 e2 X4 616263 and so forth Now rewrite the system of equations defining the aij yl 61 y Xjaj ej alj 616263 6771 a 6739 131 aainj aljel agje2 aj1jej 1 ej We can write the condition that yi and yj be orthogonal for all i andj as follows ei yj 0 i12mj71 Xj yJ39 0 132 Xj Xjaj Xj Xj ej 07 substitute from 131 30 CHARACTERISTIC ROOTS AND VECTORS AQ1 A41 Au2 Augm Au 138 A141 Au Augm Au Now premultiply AQ1 by Q1 j qlAuQ 41 Aug I qlAun quql uZAug uQAug I u2Aun QlAQI juSQl ugAu2 uSAu3 uSAu 139 Mun41 unAu2 unAug umAun The last nl elements of the first column are zero by orthogonality of ql and the vectors u Simi larly because A is symmetric and qlAui is a scalar the first row is zero except for the first element because qiAuz39 ui Aqi ui qul 140 Ajuiql 0 Now rewrite Ql AQl as j 0 0 0 0 uQAu2 uQAu3 u2Au Q l AQI 0 uSAu2 uSAu3 uSAu 5 WM unAus j W39Aun 141 v 0 7 0 all 7 A1 where al in an n1xn1 symmetric matrix Because Q1 is an orthogonal matrix its inverse is equal to its transpose so that A and A1 are similar matrices see theorem 3 and have the same characteristic roots With the same characteristic roots one of which is here denoted by A we obtain lAIniAllIniA1l0 142 Now consider caseswhere In 2 2 that is the root M is not distinct Write the equation l ALL 7 A1 0 l in the following form 7 A 0 M 0 WHO all AiAj 0 0 ALklial We can compute this determinant by a cofactor expansion of the first row We obtain 143 lAIn AlA Ajl 1nilia1l0 144 CHARACTERISTIC ROOTS AND VECTORS 31 Because the multiplicity of Aj Z 2 at least 2 elements of A will be equal to Aj If a characteristic root that solves equation 144 is not equal to Aj then l ALLA 7 a1 l must equal zero ieif A y Aj lAjIn1 7 all 7 0 145 If this determinant is zero then all minors of the matrix 0 0 0 AjIn1 7 al of order nl will vanish This means that the nullity of A11 7 A1 is at least two This is true because the nullity of the submatrix Ajln1 a is at least one and it is a submatrix of A1 Ajln1 which has a row and column of zeroes and thus already has a nullity of one Because TankAjIn 7 A TankAjIn 7 A1 the nullity of A117 7 A is Z 2 This means that we can find another characteristic vector q2 that is linearly independent of q1 and is also orthogonal to ql If the multiplicity of Aj 2 we are finished Otherwise define the matrix Q2 as ML 7 A1 146 Q2 L11 q2 u3 un 147 Now let A2 be defined as Aj 0 0 0 0 0 0 Q2 AQ2 0 uSAu2 uSAu3 ug Au um Au2 um Aug I I um 148 Aj 0 0 0 Aj 0 0 0 12 where 12 in an n2xn2 symmetric matrix Because Q2 is an orthogonal matrix its inverse is equal to its transpose so that A and A2 are similar matrices and have the same characteristic roots that is lAIn7AllAIn7A2l0 149 Now consider cases where In 2 2 that is the root Aj is not distinctand has multiplicity Z 3 Write the equation l ALL 7 A2 0 l in the following form A 0 0 M 0 0 lAIn7A2l 0 A 0 7 0 A 0 0 0 AIn2 0 0 12 150 A 7 Aj 0 0 0 A 7 Aj 0 0 0 AIn2 7 12 If we expand this by the first row and then again by the first row we obtain M1 7 Al 7 A7Aj2 lAIn2 7112 7 0 151 32 CHARACTERISTIC ROOTS AND VECTORS Because the multiplicity of M Z 3 at least 3 elements of 1 will be equal to M If a characteristic root that solves equation 151 is not equal to M then 1 ALF 7 a 1 must equal zero ieif 1 y Aj lAjIn2 7 12 0 152 If this determinant is zero then all minors of the matrix 0 0 0 M1 7 A2 7 0 0 0 153 of order n2 will vanish This means that the rank of matrix is less than n2 and so its nullity is greater than or equal to 3 This means that we can find another characteristic vector q3 that is linearly independent of L11 and q2 and is also orthogonal to ql If the multiplicity of M 3 we are finished Otherwise proceed as before Continuing in this way we can find kj orthonormal vectors Now we need to show that we cannot choose more than kj such vectors After choosing if we would have A 7 M 1k 0 lAIn 7 Al 7 lAIn 7 A16 1 7 0 ALP 7 0 154 If we expand this we obtain Mn 7 Al 7 A 7 Am Mack 7 Oak 1 7 0 155 It is evident that ALL 7 Al 7 A 7 Am Mack 7 Oak 1 7 0 156 implies M17146 7 0le y 0 157 If this determinant were zero the multiplicity of M would exceed kj Thus rank VI 7 A n 7 kj 158 159 nullity MI 7 A kj 160 And this implies that the vectors we have chosen form a basis for the null space of M A and any additional vectors would be linearly dependent D Corollary 1 Let A be a matrix as in the above theorem The multiplicity of the root M is equal to the nullity of MI A Theorem 22 Let S be a symmetric matrix of order n Then the characteristic vectors ofS can be chosen to he an orthonormal set ie there exists a Q such that Q SQ A where Q is orthonormal 161 CHARACTERISTIC ROOTS AND VECTORS 33 Proof Let the distinct characteristic roots of S be Ajj 1 2 s g n The multiplicity of M is given by kj and 21 kj n 162 By the corollary above the multiplicity of A1 kj is equal to the nullity of M S By theorem 21 there exist kj orthonormal characteristic vectors corresponding to M By theorem 5 the characteris tic vectors corresponding to distinct roots are linearly independent Hence the matrix Q 41 42 q 163 where the first In columns are the characteristic vectors corresponding to A1 the next k2 columns correspond to A2 and so on is an orthogonal matrix Now define the following matrix A A dianglIkl A211 A100000000 0 A1 0 0 0 0 0 0 0 7 0 0 0 A1 0 0 0 0 0 164 0 0 0 0 A2 0 0 0 0 0 0 0 0 0 A2 0 0 0 0 0 0 0 0 0 0 0 A2 Notethat SQ QA 165 because A is the matrix of characteristic roots associated with the characteristic vectors Q Now premultiply both sides of 165 by Q 1 Q to obtain Q SQ A 166 D Consider the implication of theorem 22 Given a matrix 2 we can convert it to a diagonal matrix by premultiplying and postmultiplying by an orthonormal matrix Q The matrix 2 is U11 U12 017 U21 U22 I I I U27 2 167 an as 39 a And the matrix product is 311 0 0 0 322 0 Q EQ 168 34 CHARACTERISTIC ROOTS AND VECTORS 10 lDEMPOTENT MATRICES AND CHARACTERISTIC ROOTS De nition 6 A matrix A is idempotent if it is square and AA A Theorem 23 Let A be a square idempotent matrix of order n Then its characteristic roots are either zero or one Proof Consider the equation defining characteristic roots and multiply it by A as follows AI AI AAI AAz A21 169 gt A1 A21 because A is idempotent Now multiply both sides of equation 169 by 1 AI A21 1 AI A21 z 1 IA A21 1 170 A 3 A 1 or A 0 D Theorem 24 Let A be a square symmetric idempotent matrix of order n anal rank r Then the trace ofA is equal to the rank ofA ie trA rA To prove this theorem we need a lemma Lemma 1 The rank ofa symmetric matrix S is equal to the number ofnonezero characteristic roots Proof of lemma 1 Because S is symmetric it can be diagonalized using an orthogonal matrix Q ie Q SQ A 171 Because Q is orthogonal it is nonsingular We can rearrange equation 171 to obtain the follow ing expression for S using the fact that Q 1 Q and Q T1 Q Q SQ A QQ SQ QA premultiply by Q Q 1 172 QQ SQQ QAQ postmultiply by Q Q l 5QAQl QQ I Given that S QAQ the rank of S is equal to the rank of QAQ The multiplication of a matrix by a nonsingular matrix does not affect its rank so that the rank of S is equal to the rank of A But the only nonzero elements in the diagonal matrix A are the nonzero characteristic roots Thus the rank of the matrix A is equal to the number of nonzero characteristic roots This result actually holds for all diagonalizable matrices not just those that are symmetric D CHARACTERISTIC ROOTS AND VECTORS 35 Proof oftheorem 24 Consider now an idempotent matrix A Use the orthogonal transformation to diagonalize it as follows A1 0 0 0 0 A2 0 0 Q AQA 173 0 0 0 An All the elements of A will be zero or one because A is idempotent For example the matrix might look like this 10000 01000 Q AQA 00100 174 00000 00000 Now the number of nonzero roots is just the sum of the ones Furthermore the sum of the characteristic roots of a matrix is equal to the trace Thus the trace of A is equal to its rank D 11 SIMULTANEOUS DIAGONALIZATION OF MATRICES Theorem 25 Suppose that A anal B are m x m symmetric matrices Then there exists an orthogonal matrix P such that P AP anal P BP are both diagonal ifanal only ifA anal B commute that is ifanal only ifAB BA Proof First suppose that such an orthogonal matrix does exist that is there is an orthogonal matrix P such as P AP A1 and P BP A2 where A1 and A2 are diagonal matrices Given that P is orthogonal by Theorem 11 we have P P 1 and P 1 P We can then write A in terms of P and A and B in terms of P and A as follows A1 P AP A1 P 1 P A P 1A1 P 1 A 175 PAlPl A forAand A2 P BP e P 1A2P 1 B 176 PAQPl B for B Because A1 and A2 are diagonal matrices we have A1 A2 write AB as A2 A1 Using this fact we can 36 CHARACTERISTIC ROOTS AND VECTORS AB PAlP PAgPl PAlAgP PAgAlP 177 PAQPPAIP BA and hence A and B do commute Conversely now assuming that AB BA we need to show that such an orthogonal matrix P does exist Let M1 in m be the distinct values of the characteristic roots of A having multiplicities T1 W Th respectively Because A is symmetric there exists an orthogonal matrix Q satisfying Q AQ Ai dia9ltM11T17M21T27 7MhI h M111 0 0 0 0 2179 0 39 39 39 0 178 0 0 MIN 0 0 0 0 mm If the multiplicity of each root is one then we can write M1 0 0 0 0 M2 0 0 Q AQ A1 179 0 0 0 M Performing this same transformation on B and partitioning the resulting matrix in the same way that Q AQ has been partitioned we obtain 011 012 39 39 39 01h 021 022 39 39 39 02h 0 Q BQ 180 Chl Cm Chh where Cij is r x rj For example C11 is square and will have dimension equal to the multiplicity of the first characteristic root of A C12 will have the same number of rows as the multiplicity of the first characteristic root of A and number of columns equal to the multiplicity of the second characteristic root of A Given that AB BA we can show that AlC CA1 To do this we write AlC substitute for A1 and C simplify make the substitution and reconstitute A10 Q AQ Q BQ dEfinition Q AQQBQ TEgTOup Q ABQ QQ I Q BAQ AB BA Q BQQ AQ QQ 1 CA1 definition 181 CHARACTERISTIC ROOTS AND VECTORS 37 Equating the ijth submatrix of MC to the ijth submatrix of CA1 yields the identity M Cij M Cij Because M 9 M ifi y j we must have Cij 0 if i y j that is C is not a densely populated matrix but rather is C diagC11 022 Chh 011 0 0 0 0 CEQ 0 0 182 0 0 0 Chh Now because C is symmetric so also is Ci for each i and thus we can find an r X T orthogonal matrix X by theorem 22 satisfying XampA am where A is diagonal Let P QX where X is the block diagonal matrix X diagX17 X2 i i iiXh that is X1 0 0 0 0 amp 00 X 184 0 0 0 Xh Now write out P P simplify and substitute for X as follows FPXQQX XX M 0 00 A 0 00 0 amp 00 0 amp 00 s s s s s s s s s 185 0 0m n 0 0m m diag Xi X17 X X27 i i WXLXh diag I1 In i 71 1m Given that P Pis an identity matrix P is orthogonal Finally the matrix A diagA1 A2 i i i Ah is diagonal and PAP X Q AQX X Q AQX XAlX diagXiX i i i7XLdiagM11T1 21 i i i7MhIThdiag X17X27 i i i 7Xh dia9M1XiX17 M2 X X27 7 MhXhXh dia9M11T17 M21 7M7 Mm Al aw 38 CHARACTERISTIC ROOTS AND VECTORS P BP X Q BQX X AQX diagXi X o XLdiag0117 Cap ChhdiagX1 X27 7 Xh dia9X 011X17X C22X27uwXLCthh 187 dia9A117 Amp Ahh A This then completes the proof CHARACTERISTIC ROOTS AND VECTORS 39 REFERENCES 1 Dhrymes R Mathematics for Econometrics 7 3 Edition New York SpringerVerlag 2000 2 GantInacher ER We Meory of Matrices Vol 1 New York Chelsea Publishing Company 1977 3 Hadley G Linear Algebra Reading Addison Wesley Publishing Company 1961 4 Horn RA and CR Johnson Matrix Analysis Cambridge Cambridge University Press 1985 5 Schott I R Matrix Analysis for Statistics New York Wiley 1997 6 Searle S R Matrix Algebra Useful for Statistics New York Wiley 1982 7 Sydsaeter K Topics in MathematicalAnalysis for Economists London Academic Press 1981 Introduction 671 Helle BUnZeI Fall 2008 Today s program 0 We ll talk about the organization of the course focus of the course 9 I39ll give an introduction to what econometrics is and what will be the Frisch 1933 In oduces Econ metrics There are several aspects of the quantitative approach to economics and no single one of these aspects taken by itself should be confounded with econometrics Thus econometrics is by no means the same as economic statistics Nor is it identical with what we call general economic theory although a considerable portion of this theory has a definitely quantitative character Nor should econometrics be taken as synonomous sic with the application of mathematics to economics Experience has shown that each of these three viewpoints that of statistics economic theory and mathematics is a necessary but not by itself a sufficient condition for a real understanding of the quantitative relations in modern economic life It is the unification of all three that is powerful And it is this unification that constitutes econometrics Introduction Notes to Frisch39s introduction in Econometrica 0 Econometrics is a unification of 9 Statistics 3 Mathematics 0 Economic Theory 0 Why is it notjust Statistics 9 Lots of statistics is used but a Statistic is mostly aimed at the natural sciences In general the issues that interest econometricians are different Such as a Serial correlation 0 Seriously flawed data in ways that are different than those in other sciences 0 Lack of ability to do controlled experiments Introduction Notes to Frisch39s introduction in Econometrica 0 Why is it notjust Mathematics 3 Again mathematics is heavily used especially in theoretical econometrics but we use it for specific purposes Those purposes are defined by the economic theory and the statistics we want to use a For example we39d use math to show certain theoretical properties of an economic theory so we can verify that it is possible to apply certain statistical methods 0 Why is it notjust Economic Theory 0 Econometrics is build on economic theory We always start of with an economic model but this does not tell us how to put it to the data All economic theory models have to be converted to econometric models 9 Example Labor supply is continuous in models but in the data it is usually a 01 choice Hence the invention of methods like Limited Dependent Variables II o In general the situation facing an econometrician is one where we have a specific economic model Features of economics models a Very precisely defined relationships between specific variables 9 Typically nothing is stochastic in the model a The model does not take into account various random acts of life Examples are a Snowstorm slows down production 0 A flu epidemic reduces the available work force o The manager breaks his leg 0 This all implies that when we take the very precise relationship of the theoretical model to the data we need to add something stochastic to all account for the factors the model couldn t possibly take into account a This does not mean that it is OK to leave out input prices Ii Introduction 0 Introducing a stochastic element changes the model from something that makes exact predictions The firm will definitely produce 2 billion bags of taco chips at a price of 50 cents per bag to one that makes probabilistic predictions There is a good chance that the firm will produce an average of 2 billion bags of taco chips at an average price of 50 cents per bag 0 What is the difference a If the firm produces 2 billion and 1 bags of taco chips the deterministic model is wrong 0 This however would seem pretty close by the standards of the stochastic model 0 Deterministic models require only that you nd 1 counter example to reject them whereas to reject a stochastic model you need to see that model does not describe facts most of the time II o A major difference between econometrics and statistics performed in other natural sciences is that we are not able to perform controlled experiments a What is a controlled experiment in the context of economics 0 As such we are left with whatever data we can observe and must try to get as much information from it as we can Here theory is extremely important The economic theory tells us how to organize the data and what to look for Example 9 Productivity we can get from a simple production function Solow residual 0 Suppose you want the question How do interest rates affect the economy answered How to even measure the interest rate There are so many out there a For examining personal saving the model tells us that what matters is how much the individual gets in return a In a model of the behavior of banks it would be the interbank fund rate a In a model of housing demand it would be the mortgage rates 0 So economic theory tells us which data to use to answer which questions o It tells us which variables are the eFFect and which are the cause something we still have a hard time making out just from the data o The examine the model and even the data we do have may be imprecisely problem is that we very rarely have exactly the data we need to measured 0 Some frequently encountered data problems The data may be badly measured or vaguely defined The interest rate Some variables cannot be measured Expectations effort are examples We may not have an explicit functional form defining the relationship between the variables The the possibilities are endless The assumptions made on the stochastic properties of the model may not be met by the data in which case the methods of estimation and inference may be wrong The economic model may not include all relevant variables Introduction 0 These are the issues of econometrics in general 50 what is the difference between applied and theoretical econometrics Roughly 9 Theoretical econometricians develop methods 0 Applied econometricians use the methods 0 The lines are very blurred though People who do theory must analyze what happens if methods are incorrectly applied They must know what the issues facing applied econometricians are People who do applied work will frequently run into specific problems that have not been dealt with in that form before and therefore be forced to modify existing methods or even develop new ones 0 Conclusion One cannot do good theory without an extensive knowledge of the applications One cannot do applied work without having a firm grasp of the theory RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 7 for all i 1 2 In Here p 9r is the probability that the experiment results in avalue for the function f of the initial random variable of 9139 Using the definition of expected value in equation we obtain E9Xl yipgii 18 Now substitute in to obtain E19001 9i 1090 z 2 9139 2 101139 i1 V17 3 917gz 19 Z Z 9139 101139 11 V1 3 917g 2 911 1011 j1 D 19 Properties of mathematical expectation 191 Constants Theorem 2 Let X be a discrete random variable with probability function p06 and C be a constant Then 135 C Proof Consider the function gX c Then by theorem 1 E161 E 2 6101 6 2101 20 ac But by property 14b we have 2 101 1 ac and hence E c c 1 CI 21 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 13 FIGURE 1 Frequency Function for Tossing a Die le 251 Alternative de nition of continuous random variable In section 232 we defined a random vari able to be continuous if FX is a continuous function of x We also say that a random variable X is continuous if there exists a function f such that Fxltzgt f M du lt46 for every real number x The integral in equation 46 is a Riemann integral evaluated from 00 to a real number x 252 Definition of u probability density frequency function pdf The probability density function fx 1 of a continuous random variable X is the function that satisfies FX fX du 47 253 Properties of continuous density functions Z 0 VI 48a dz 17 48b Analogous to equation 42 we can write in the continuous case PX6A A dz 49 14 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS where the integral is interpreted in the sense of Lebesgue Theorem 6 For a densityfunction x de ned over the set of ull real numbers thefollowing holds 12 Pa g X g b fX dz 50 for any real constunts u and 7 with u 3 7 a Also note that for a continuous random variable X the following are equivalent Pa X bPa XltbPaltX bPaltXltb 51 Note that we can obtain the various probabilities by integrating the area under the density func tion as seen in figure 2 FIGURE 2 Area under the Density Function as Probability fX 254 Example 1 ofu continuous densityfunction Consider the following function 16 6 3 0T1 fltzgt 0 f gt 0 52 elsewhETe First we must find the value of k that makes this a valid density function Given the condition in equation 48b we must have that L fzdzAmkeismdzl 53 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS FIGURE 3 Graph of Density Function 1 e fx This is represented by the area between the lines in figure 4 We can also find the distribution function in this case Fz te tdt 61 Make the u dv substitution as before to obtain Fz 7te t 3 7 01 7e tdt ite tig eitig 7e lt717tgt3 e 71 7 z 7 670717 0 e 71 7 1 1 17 1 z The distribution function is shown in figure 5 Now consider the probability that 1 g X g 2 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 17 FIGURE 4 P1 g X g 2 fx 03 025 02 015 01 005 P1 g X g 2 F2 7 F1 17 e21 271 e 1l 1 7261 7 362 63 7073575 7 0406 032975 We can see this as the difference in the values of Fx at 1 and at 2 in figure 6 256 Example 3 ofu continuous densityfunction Consider the normal density function given by 71 67 2 64 1 z z 7 a 7 f M r W 02 where M and a are parameters of the function The shape and location of the density function depends on the parameters M and a In figure 7 the diagram the density is drawn for M 0 and a 1 and a 2 257 Example 4 ofu continuous densityfunction Consider a random variable with density function given by 1 WW 0 S I S 1 65 0 otherwise 18 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS FIGURE 5 Graph of Distribution Function of Density Function 1 e fx 1 08 06 04 02 X 1 2 3 4 5 6 7 where p is greater than 1 For example if p 0 then fx 1 if p 1 then fx 2x and so on The density function with p 2 is shown in figure 8 The distribution function with p 2 is shown in figure 9 26 Expected value 261 Expectation of a single random variable Let X be a random variable with density fx The expected value of the random variable denoted EX is defined to be 66 Ex 6 X 1px X is discrete provided the sum or integral is defined The expected value is kind of a weighted average It is also sometimes referred to as the population mean of the random variable and denoted MX EX I fI dI if X is continuous 262 Expectation of afanction of a single random variable Let X be a random variable with density fX The expected value of a function g of the random variable denoted EgX is defined to be EltgltXgtgt gltzgt f was lt67 if the integral is defined The expectation of a random variable can also be defined using the RiemannStieltjes integral where F is a monotonically increasing function of X Specifically RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 19 FIGURE 6 P1 g X g 2 using the Distribution Function W 08 06 04 02 EX Mpg zdF 68 27 Properties of expectation 271 Constants EM 3 afzdz Ea 69 a 272 Constants multiplied by a random variable Eaq 3 am E a Do I 70 EaEX 20 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS FIGURE 7 Normal Density Function fx 4 2 2 4 273 Constants multiplied by ufunction ofu random variable ElagXl 2 a gltzgt fltzgtdz E a 00 91 E a El9Xl 71 274 Sums of expected values Let X be a continuous random variable with density function fx and let gl X7 92 X7 gg X7 gk X be k functions of X Also let Cl 52 53 ck be k constants Then E 0191X 0292X 611c gkX E E 0191X E 0292X E CkgkX 72 28 Example 1 Consider the density function f1plzp 0 g z s 1 73 0 otherwise where p is greater than 1 We can compute the EX as follows RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 21 FIGURE 8 Density Function 17 l I fx 3 25 2 15 1 05 X 02 04 06 08 1 EX 1 zp 01de 0 1 Ip1p 1dz 74 0 Ip2P1 1 0 2 0 7 10 1 7 10 2 29 Example 2 Consider the exponential distribution which has density function 1 fzXeA0 z oogt0 75 We can compute the EX as follows 22 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS FIGURE 9 Density Function p 1 I Fx 1 08 06 04 02 X 02 04 06 08 1 EX f0 1 e A dz 71677 18 f0 erz u 717dudzv7Ae dverzgt 0 0 am 76 210 Variance 2101 De nition of variance The variance of a single random variable X with mean M is given by VmX E a2 E E X 7 E X2 7 2 E X 7 M 77 2 z 7 M2fIdI We can write this in a different fashion by expanding the last term in equation 77 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 25 ML EXT 71735ng when X is continuous The rth moment about the origin is only defined if EXT exists A moment about the origin is sometimes called a raw moment Note that Ml EX MX the mean of the distribution of X or simply the mean of X The rth moment is sometimes written as function of 6 where 6 is a vector of parameters that characterize the distribution of X 85 312 Central moments The rth moment about the mean of a random variable X denoted by MT is the expected value of X 7 MXV symbolically M 7 EltX 7 Mm 7 Z x 7 warms 86 for r 0 1 2 when X is discrete and Mr 7 ElX 7 MXVl 7 z 7 WWW dz when X is continuous The rth moment about the mean is only defined if EX 7 MXV exists The rth moment about the mean of a random variable X is sometimes called the rth central moment of X The rth central moment of X about a is defined as EX7aT If a MX we have the rth central moment of X about MX Note that M1 EX 7 MX 0 and M2 EX 7 MX2 VarX Also note that all odd moments of X around its mean are zero for symmetrical distributions provided such moments exist lt87 313 Alternative formula for the variance Theorem 7 r 7 M2 7 r 88 Proof VmX E a E E X 7 Eco EEX Mx EEX2 7 ZMXX M34 E X2 7 ZMXE X M2 89 ElX2l 7 2 r 7E X2 7 r 7 M2 7 M 32 Moment generating functions RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 43 REFERENCES 1 AInerniya T Advanced Econometrics Cambridge Harvard University Press 1985 2 Bickel RI and KA Doksum Mathematical Statistics Basic Ideas and Selected Topics Vol 1 2 01 Edition Upper Saddle River N Prentice Hall 2001 3 Billingsley R Probability and Measure 3rd edition New York Wiley 1995 4 Casella G And KL Berger Statistical Inference Paci c Grove CA Duxburv 2002 5 Crarner H MathematicalMethods of Statistics Princeton Princeton University Press 1946 6 Goldberger AS Econometric Theory New York Wiley 1964 7 Lindgren BW Statistical Theory 4th edition Boca Raton FL Chapman amp HallCRC 1993 8 Rao CR Linear Statistical Inference and its Applications 2nd edition New York Wiley 1973 9 TheiL H Principles of Econometrics New York Wiley 1971 Mincer Model Helle BUnZeI Angust 26 2008 It is important to realize that there is a long way from economic model to econometric model Mincer 1974 Theory of investment in human capital used to examine income distribution Relationship between schooling earnings and post school investments in human capital First we ll consider only schooling Later we ll add on postschool Investment The Mincer Model Assumptions and definitions 0 Assumptions of the economic model 0 An individual with S years of schooling has earnings which do not depend on age A 40 yr old new graduate will get the same as a 18 yr old new graduate 9 PV of lifetime incomes are the same across individual regardless of schooling if no post school investments are ma e a The number of years spent at work are independent of the number of years of schooling 0 Definition ES t is the earnings at time t of a person with 5 years of schooling The Mincer Model 9 PV of earnings of an individual who enters the labor market after 5 years of schooling Vs 5R 55 te tdt 0 Under assumption 1 ES t does not depend on t and we can therefore write just E S and take it outside the integral VS 5R ESe dt EsR 87nd 55 le 7 e rsl 55 airs 8 h The Mincer Model Assumption 2 however states that V 5 should not depend on S and therefore VS V Assumption 3 states that for some T R S T such that everyone works the same number of years T is then the number of years in the work force Using these two pieces of information Vs v 55 eirs 7 eirseirT r ltgt rV ES e45 7 e rse rT But again since this number should be the same regardless of the years of schooling The Mincer Model rV 50 175quot 5ser57er5er7 i 5se 517e e 50 5ser5ltgt In 55 In 50 r5 0 The assumptions that we have made lead to a loglinear relationship between earnings and schooling The Mincer Model 0 Recall that we wanted to look at human capital investments and the effect on income distribution Note that we haven t concerned ourselves with postschool investment in human capital as yet Loglinear Relationship Earnings Schooling Il The Mincer Model 0 Note that even if the years of schooling is distributed symmetricallyevenlyuniformly the distribution of income is going to be very skewed 0 To elaborate Assume uniform dist of schooling Assume each line is 10000 Do one person on each level of schooling 35 people in all Income 0 10000 10 20000 20 30000 30 40000 40 50000 50 60000 Frequency 15 8 4 3 2 2 The Mincer Model a VERY heavy at the bottom If our assumptions are relatively OK even uniform human capital investments can lead to a skewed income distribution 0 While this is a linear equation we were interested in the relationship between experience schooling and earnings The Mincer Model Post school investment This was all without post school investment Make another assumption 4 The return to postschool investment is a constant p 0 Well assume that a worker devotes a fraction k of his time to investment in human capital and a fraction 17 k to actual work 0 This all implies that growth in earnings is determined by 355 t 7 k t E S t at p lt gt lt y gt 0 Do numerical example k l All the time in school p 01 10 return the earnings go up by 10 The Mincer Model 0 Solving this differential equation gives us 1 355 1 ES t at 7 Pkm a BIniiSj PW a 3nl5t Pkt nESt pm El 5 3 RCquot The Mincer Model 0 For the speci c solution we know that In as t CpAtkudu 0 We need one point to nd the value of C but we know from before that when there is no postschool investment in human capital In ES In E0 r o If we insert ku 0 into the differential equation we get In ES t C implying that C In E0 r5 0 This gives the solution lnESt InE0r5p0tkudu The Mincer Model 0 Why do we start at 07 a The growth does not start till after school is over 0 Finally we need to assume something about the frequency of investment in human capital Mincer assumed that a Note that this definition is consistent with 1 running from the time we finish schoo The Mincer Model oThen nESt nE0rSptkudu U 2 t nE0rSpk ui h t2 lnE0rSpkt7pk 0 This gives a relationship between potential earnings and schooling El 5 5 RCquot The Mincer Model 0 Now note that since we only work part of the time YSt liktEStltgt In YSt nEStnlikt 0 This is where the economic theory is done After this we need to make the model linear so that we can do econometrics 0 We still have this investment term in there and it is not quite linear n1ikt In yew L 1 punom uogsuedxe JOB o IQPOW 493 V 9LLL The Mincer Model 0 Note The error is highest when ikquot 15 is big t 0 O In total we would get InYst nEStn1ikt t2 1 2 IE 5 k7kiik77 k n rPtP2T 2 Jr o iltkgt2tltgt12 InE0 ikt wV r5k7f if pkt gt22 t2 m 1 The Mincer Model 0 This can be written as a regression model In Y30315l32t33t2 o What do we get from this We get an explanation for how earnings depend on schooling and experience 0 Can we sign any of the coefficients The Mincer Model Conclusion 0 Strategy 0 Use the economic model a Make assumptions that simplify and eventually lead us to something we can put to the data 0 In spite of many omitted issues and simplifying assumption this model does explain a lot of the variation in earnings 0 General discussion If we had just done the linear regression to start with Why isn t that a great idea a Assumptions Simplifications 3 Which assumptions would you relax o What information doesjus t running the regression give you a Simple correlation 3 Were you to create a model which features need to be included II Illustration of CLT Poisson Sampling Justin L Tobias1 1Iowa State University Department of Economics Septem ber 23 2007 c As argued in class a central limit theorem is a powerful tool for approximating sampling distributions in finite samples a As argued in class a central limit theorem is a powerful tool for approximating sampling distributions in finite samples 0 In some cases the finite sample behavior of an estimator is difficult to ascertain or the model is not rich enough to even allow for the possiblility and thus a large sample approximation can be used for testing and inference purposes a As argued in class a central limit theorem is a powerful tool for approximating sampling distributions in finite samples 0 In some cases the finite sample behavior of an estimator is difficult to ascertain or the model is not rich enough to even allow for the possiblility and thus a large sample approximation can be used for testing and inference purposes 0 Here we provide a simple example of the CLT approximation when the finite sample sampling distribution can be analytically obtained a As argued in class a central limit theorem is a powerful tool for approximating sampling distributions in finite samples 0 In some cases the finite sample behavior of an estimator is difficult to ascertain or the model is not rich enough to even allow for the possiblility and thus a large sample approximation can be used for testing and inference purposes 0 Here we provide a simple example of the CLT approximation when the finite sample sampling distribution can be analytically obtained a We then compare the exact results to those based on the CLT l I Consider the random variable l I Consider the random variable YTX1X2 XT7 l I Consider the random variable YTX1X2 XT7 where Xt quotNd Poisson t 12T I Consider the random variable Yrxlx2mxT where Xt quot3 Poisson t 17277T That is x 7 1 09 LN xt 012 v t Xt l I Consider the random variable YTX1X2 XT7 where Xt quot3 Poisson t 17277T That is x 7 1 09 LN xt 012 v t Xti Using the moment generating function approach as shown in class we know that YT N PoissonT Thus 7 TACexp7T 7 c 7 PrYTc c012 Thus 7 TACexp7T 7 c 7 PrYTc c012 Now consider the distribution of the sample average which we define as WT W iy T 7 T T Thus 7 TACexp7T 7 c 7 PrYTc c012 Now consider the distribution of the sample average which we define as WT 1 W EiY T T T Note that WT can take values in the set 1 2 T1 O177 so that the support of fWTWT gets finer and finer as T increases In addition note In addition note PrYT c PrTWT c In addition note PrYT c PrTWT c PrWT cT In addition note PrYT c PrTWT c PrWT cT TACexp7T I c012 C In addition note PrYT c PrTWT c PrWT cT lTAl explinl c012 c We can compute these probabilities V6 to give the exact sampling distribution of the sample average for various values of T Note also that EX VarX known moments of the Poisson so that the Lindberg Levy CLT gives i N01 Note also that EX VarX known moments of the Poisson so that the Lindberg Levy CLT gives W 7 d AN 01 A l To compare the normal approximation to the exact finite sample result let Mw We note We note PrYT c PrWT cT We note PrYT c PrWT cT MMEH which we plot over 6 071727 Wenote PrYTc PrWTcT T c Prlt TVXl lgtv which we plot over 6 071727 We then compare this exact finite sample result with the large sample standard normal approximation Wenote PrYTc PrWTcT T c Prlt TVXl lgtv which we plot over 6 071727 We then compare this exact finite sample result with the large sample standard normal approximation We set 1 to fix ideas Samphng Dwst Std Norma Note that when T 1 and 1 Note that when T 1 and 1 Note that when T 1 and 1 w 1H Note that when T 1 and 1 as shown on the previous figure Samphng Dwst Std Norma l I Note that when T 2 and 1 Note that when T 2 and 1 w H Note that when T 2 and w SH ne2 a 1 so that the range of possible values for 1A2 is 7amp7 il 0 1 Note that when T 2 and 1 w HM KCQ 1 so that the range of possible values for 1A2 is 7amp7 1 7 07 1 7 The range for 117 continutes to spread out as T grows For T 4 for example 4 6 727 732771771270 127 Samphng Dwst Std Norma Samphng Dwst Std Norma 0 Note that the shape of the bar graph is approaching the shape of the normal density
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'