LINEAR STATISTICL MODELS
LINEAR STATISTICL MODELS STAT 714
Popular in Course
Popular in Statistics
This 151 page Class Notes was uploaded by Shane Marks on Monday October 26, 2015. The Class Notes belongs to STAT 714 at University of South Carolina - Columbia taught by J. Tebbs in Fall. Since its upload, it has received 8 views. For similar materials see /class/229664/stat-714-university-of-south-carolina-columbia in Statistics at University of South Carolina - Columbia.
Reviews for LINEAR STATISTICL MODELS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/26/15
STAT 714 LINEAR STATISTICAL MODELS Fan 2008 Lecture Notes Joshua M Tebbs Department of Statistics The University of South Carolina TABLE OF CONTENTS STAT 7147 J TEBBS Contents 1 Introduction Linear Algebra Review and Random Vectors 11 12 13 Gauss Markov GM models Linear models that are not GM Linear algebra review 131 Basic de nitions 132 lnverse 133 Linear independence and rank 134 Orthogonality 135 Vector spaces 136 Matrix subspaces 137 Generalized inverses 138 Projection matrices 139 Trace and determinant functions 1310 Eigenvalues and eigenvectors 1311 Quadratic forms7 de niteness7 and factorizations Random vectors 141 Me ans and variances 142 Linear transformations 143 Variance covariance matrices 144 Application Linear prediction Linear Least Squares Problem Least squares estimation Geometric considerations Reparameterization Gram Schmidt orthomormalization 29 31 32 35 37 37 42 43 45 45 46 56 59 TABLE OF CONTENTS STAT 7147 J TEBBS 3 Estimability and Least Squares Estimators 66 31 Introduction 66 32 Estimability 66 321 One way ANOVA 73 322 Two way crossed ANOVA with no interaction 75 323 Two way crossed ANOVA with interaction 77 33 Reparameterization 79 34 Unique least squares solutions Via linear constraints 85 4 The GaussMarkov Model 92 41 Introduction 92 42 The Gauss Markov Theorem 92 43 Estimation of 02 in the GM model 95 44 The geometry of linear model misspeci cation 97 441 Under tting 98 442 Over tting 99 45 The Aitken model and generalized least squares 100 5 Distributional Theory 105 51 Introduction 105 52 Univariate normal distribution 105 53 Multivariate normal distribution 106 54 Moment generating functions 107 55 Properties of the multivariate normal distribution 109 551 Linear transformations 109 552 Less than full rank normal distributions 111 553 lndependence results 111 554 Conditional distributions 113 TABLE OF CONTENTS STAT 7147 J TEBBS 56 Noncentral X2 distribution 114 57 Noncentral F distribution 116 58 Distributions of quadratic forms 118 59 Independence of quadratic forms 122 6 Statistical Inference 126 61 Introduction 126 62 Estimation 126 63 Testing models 128 64 Testing linear parametric functions 133 65 Testing models or testing linear parametric functions 137 66 Likelihood ratio tests 140 661 Constrained estimation 140 662 Testing procedure 141 67 Con dence intervals and multiple comparisons 143 671 Single intervals 143 672 Multiple intervals 145 CHAPTER 1 STAT 7147 J TEBBS 1 Introduction Linear Algebra Review and Ran dom Vectors Complementary reading from Monahan Chapter 1 and Appendix A INTRODUCTION This course is about linear models Linear models are models that are linear in their parameters The general form of a linear model is given by y Xb e where y is an N gtlt 1 vector of observed responses X is an N gtlt p design matrix of xed constants b is a p gtlt 1 vector of xed but unknown parameters and e is an N gtlt 1 vector of unobserved random errors The model is called a linear model because the mean of the response vector y is linear in the unknown parameter b SCOPE OF APPLICATION Several models commonly used in statistics are examples of the general linear model y Xbe These include simple and multiple linear regression models and analysis of variance ANOVA models Regression models generally refer to those for which X is full rank while ANOVA models refer to those for which X consists of zeros and ones Other models of this form include analysis of covariance models some time series models and others Model I Least squares model y Xb e This model makes no assumptions on e The parameter space is G b b 6 72 Model II Gauss Markov model y Xb e where Ee 0 and cove 021 The parameter space is G b02 boz 6 RP gtlt 72 Model III Altkerz model y Xb e where Ee 0 and cove 02V V known The parameter space is G b02 boz 6 RP gtlt 72 Model IV General lmear mlcoed model y Xb e where Ee 0 and cove 2 E 20 The parameter space is G b0 b0 6 RP gtlt Q where Q is the set of all values of 0 for which 20 is positive de nite PAGE 1 CHAPTER 1 STAT 7147 J TEBBS NOTE In all of these cases7 we assume that b 6 72 In some instances7 we may want to constrain b to be in a subspace of RP eg7 b K b m or b K b 2 0 BIG PICTURE We will address the following important problems in this course 1 Estimation of b and Xb7 a linear function of b 2 Con dence intervals for Nb 3 Hypothesis tests of the form H0 K b m 4 Prediction of future values of y 11 Gauss Markov GM models GA USS MARKOV MODEL Consider the linear model y Xb e7 where Ee 0 and cove 021 We now give several examples Example 11 One sample problem Suppose that y1y2yN is an iid sample with mean M and variance 02 gt 0 If the es are iid with mean Eei 0 and common variance 02 we can write y XM e where 91 1 e1 3 in y 7 Xin I 7 b1gtlt1 M7 393in 62 9N 1 eN Example 12 Simple liriear regression Consider the model where a response variable y is linearly related to an independent variable x via 9139 50 51 5m PAGE 2 CHAPTER 1 STAT 7147 J TEBBS for i 127N7 where the e are uncorrelated random variables with mean 0 and common variance 02 gt 0 If 1727 N are xed constants7 measured without error7 then this model is a special GM model y Xb e with in 1 1 51 112 1 2 50 52 3 in 7 XNXZ 7 b2gtlt1 7 393in 3 3 39 51 1W 1 N 6N Example 13 Multiple liriear regression Suppose that a response variable y is linearly related to several independent predictor variables7 say7 1 2 xk via 1 50 511 522 39 39 39 ik 677 for i 17 27 N7 where e are uncorrelated random variables with mean 0 and common variance 02 gt 0 If the independent variables are xed constants7 measured without error7 then this model is a special GM model y Xb e where 50 ill 1 11 12 quot39 z1k B 51 1 92 1 21 22 quot39 Mk 52 y t 7 XNXp 39 t t t t 7 prl g 7 e 1W 1 N1 N2 quot39 Nk B 6N k and p k 1 Note that the simple linear regression model is a special case of the multiple linear regression model with k 1 D Example 14 One way ANOVA Consider an experiment that is performed to compare a 2 2 treatments For the ith treatment level7 suppose that n experimental units are selected at random and assigned to the ith treatment Consider the model M M 04739 6777 for i 17 27 a andj 17 27 quot7717 where the random errors eij are uncorrelated random variables with zero mean and common variance 02 gt 0 If the a treatment effects PAGE 3 CHAPTER 1 STAT 714 J TEBBS 041042 Oza are best regarded as xed constants then this model is a special case of the GM model y Xb e To see this note that with N 71 M yll 1n1 1n1 Onl Onl 041 gm 0 1 39 39 0 3 le t 7 Xpr Tn Tn Tm 39 TD 7 prl 042 7 yang 1nd 0nd 0nd 39 39 39 1nd 04a where p a 1 and eNXl 5117512775am where 1 is an n gtlt 1 column vector of ones and 0 is an m gtlt 1 column vector of zeros Note that if a 2 then this data structure is equivalent to the standard two sample setup D NOTE In Example 14 note that the rst column of X is the sum of the last 1 columns ie there is a linear dependence in the columns of X From results in linear algebra we know that X is not of full column rank In fact the rank of X is r a one less than the number of columns p a 1 This is a common characteristic of ANOVA models namely their X matrices are not of full column rank On the other hand linear regression models are models of the form y Xb e where X is of full column rank See Examples 12 and 13 Example 15 Two way nested ANOVA Consider an experiment with two factors where one of the factors say Factor B is nested within Factor A In other words every level of B appears with exactly one level of Factor A A statistical model for this situation is given by yijk M 04739 5739 6mm for 239 12a j 12b and k 12 nj In this model M denotes the overall mean 04 represents the effect due to the 2th level of A 67 represents the effect of the jth level of B nested within the 2th level of A If all parameters are xed and the random errors gigk are uncorrelated random variables with zero mean and constant unknown variance 02 gt 0 then this is a special GM model y Xb e For example with 13 b 2 and nij n 4 we have PAGE 4 CHAPTER 1 STAT 7147 J TEBBS 74111 1 1 0 0 1 0 0 0 0 0 74112 1 1 0 0 1 0 0 0 0 0 74113 1 1 0 0 1 0 0 0 0 0 74114 1 1 0 0 1 0 0 0 0 0 74121 1 1 0 0 0 1 0 0 0 0 74122 1 1 0 0 0 1 0 0 0 0 74123 1 1 0 0 0 1 0 0 0 0 74124 1 1 0 0 0 1 0 0 0 0 M 74211 1 0 1 0 0 0 1 0 0 0 041 74212 1 0 1 0 0 0 1 0 0 0 042 74213 1 0 1 0 0 0 1 0 0 0 043 y 74214 7 X 1 0 1 0 0 0 1 0 0 0 7 b 611 7 74221 1 0 1 0 0 0 0 1 0 0 512 74222 1 0 1 0 0 0 0 1 0 0 621 74223 1 0 1 0 0 0 0 1 0 0 522 74224 1 0 1 0 0 0 0 1 0 0 531 7Jeni 1 0 0 1 0 0 0 0 1 0 532 74312 1 0 0 1 0 0 0 0 1 0 7Jana 1 0 0 1 0 0 0 0 1 0 74314 1 0 0 1 0 0 0 0 1 0 74321 1 0 0 1 0 0 0 0 0 1 74322 1 0 0 1 0 0 0 0 0 1 74323 1 0 0 1 0 0 0 0 0 1 74324 1 0 0 1 0 0 0 0 0 1 and e 61116112 6324 The X matrix is not of full column rank The rank of X is r 6 and there are p 10 columns D Example 16 Two way crossed ANOVA with interaction Consider an experiment with two factors A and l37 where Factor A has 1 levels and Factor B has b levels In general7 we say that factors A and B are crossed if every level of A occurs in combination with PAGE 5 CHAPTER 1 STAT 7147 J TEBBS every level of B Consider the two factor crossed ANOVA model given by yijk M 0 5739 7 6m for i 127a7 j 127b7 and k 17271177 where the random errors 617 are uncorrelated random variables with zero mean and constant unknown variance 02 gt 0 If all the parameters are xed7 this is a special GM model y Xb e For example7 witha3b27 andnljii37 74111 1 1 0 0 1 0 1 0 0 0 0 0 74112 1 1 0 0 1 0 1 0 0 0 0 0 74113 1 1 0 0 1 0 1 0 0 0 0 0 74121 1 1 0 0 0 1 0 1 0 0 0 0 M 74122 1 1 0 0 0 1 0 1 0 0 0 0 041 74123 1 1 0 0 0 1 0 1 0 0 0 0 042 74211 1 0 1 0 1 0 0 0 1 0 0 0 043 74212 1 0 1 0 1 0 0 0 1 0 0 0 31 y 74213 7 X 1 0 1 0 1 0 0 0 1 0 0 0 7 b g 7 74221 1 0 1 0 0 1 0 0 0 1 0 0 711 74222 1 0 1 0 0 1 0 0 0 1 0 0 712 74223 1 0 1 0 0 1 0 0 0 1 0 0 721 7Jeni 1 0 0 1 1 0 0 0 0 0 1 0 722 74312 1 0 0 1 1 0 0 0 0 0 1 0 731 7Jana 1 0 0 1 1 0 0 0 0 0 1 0 732 74321 1 0 0 1 0 1 0 0 0 0 0 1 74322 1 0 0 1 0 1 0 0 0 0 0 1 74323 1 0 0 1 0 1 0 0 0 0 0 1 and e 61116112 6323 The X matrix is not of full column rank The rank of X is r 6 and there are p 12 columns D Example 17 Two way crossed ANOVA without interaction Consider an experiment with two factors A and l37 where Factor A has 1 levels and Factor B has b levels The PAGE 6 CHAPTER 1 STAT 7147 J TEBBS two way crossed model without interaction is given by yijk M 041 5739 6mm for 239 172717 j 127b7 and k 12771177 where the random errors 617 are uncorrelated random variables with zero mean and common variance 02 gt 0 Note that no interaction model is a special case of the interaction model in Example 16 when H0 yu V12 V32 0 is true That is7 the no interaction model is a reduced version of the interaction model With a 37 b 27 and 7117 n 3 as before7 we have 74111 1 1 0 0 1 0 74112 1 1 0 0 1 0 74113 1 1 0 0 1 0 74121 1 1 0 0 0 1 74122 1 1 0 0 0 1 74123 1 1 0 0 0 1 74211 1 0 1 0 1 0 M 74212 1 0 1 0 1 0 a1 y 7 74213 7 X 1 0 1 0 1 0 7 b a2 7 74221 1 0 1 0 0 1 043 74222 1 0 1 0 0 1 51 74223 1 0 1 0 0 1 g 7Jeni 1 0 0 1 1 0 74312 1 0 0 1 1 0 7glans 1 0 0 1 1 0 74321 1 0 0 1 0 1 74322 1 0 0 1 0 1 74323 1 0 0 1 0 1 and e 6111 6112 6323 The X matrix is not of full column rank The rank of X is r 4 and there are p 6 columns Also note that the design matrix for the no interaction model is the same as the design matrix for the interaction model7 except that the last 6 columns removed these columns pertain to the 6 interaction terms D PAGE 7 CHAPTER 1 STAT 7147 J TEBBS Example 18 Analysis of covariance Consider an experiment to compare a 2 2 treatments after adjusting for the effects of a covariate x A model for the analysis of covariance is given by yij M 041 ij 5m for 239 127a7 j 1277117 where the random errors 617 are uncorrelated random variables with zero mean and common variance 02 gt 0 In this model7 n represents the overall mean7 041 represents the xed effect of receiving the 2th treatment disregarding the covariates7 and 6 denotes the slope of the line that relates y to z for the 2th treatment Note that this model allows the treatment slopes to be different The ais are assumed to be xed values measured without error NOTE The analysis of covariance ANCOVA model is a special GM model y Xbe For example7 with a 3 and n1 n2 n3 37 we have gm 1 1 0 0 11 0 0 511 712 1 1 0 0 12 0 0 M 512 gm 1 1 0 0 x13 0 0 041 613 gm 1 0 1 0 0 x21 0 042 621 y 122 7 X 1010 0 9522 0 b a3 e 522 1123 1 0 1 0 0 x23 0 B1 523 gal 1 0 0 1 0 0 x31 32 531 ygz 1 0 0 1 0 0 x32 g 532 ygg 1 0 0 1 0 0 33 533 The X matrix is not of full column rank If there are no linear dependencies among the last 3 columns7 the rank of X is r 6 and there are p 7 columns REDUCED MODEL Consider the ANCOVA model in Example 18 which allows for unequal slopes 1f 61 g Ba that is7 all slopes are equal7 then the ANCOVA model reduces to yij M a 5739 51739 That is7 the common slopes ANCOVA model is a reduced version of the model that PAGE 8 CHAPTER 1 STAT 7147 J TEBBS allows for different slopes Assuming the same error structure7 this reduced ANCOVA model is also a special GM model y Xb e With a 3 and n1 n2 n3 37 as before7 we have 911 1 1 0 0 11 611 912 1 1 0 0 12 512 913 1 1 0 0 913 M 613 gm 1 0 1 0 21 041 521 y 922 7 X 1 0 1 0 22 7 b 042 v e 522 123 1 0 1 0 23 043 523 931 1 0 0 1 31 5 531 932 1 0 0 1 32 532 933 1 0 0 1 33 633 The rank of X is r 4 and there are p 5 columns D 12 Linear models that are not GM GOAL We now provide examples of linear models of the form y Xb e that are not GM models TERMINOLOGY A factor of classi cation is said to be random if it has an in nitely large number of levels and the levels included in the experiment can be viewed as a random sample from the population of possible levels Example 19 One way random e ects ANO VA Consider the model 91739 M 041 6173 for 239 127a andj 1277117 where the treatment effects 041042704a are best regarded as random eg7 the 1 levels of the factor of interest are drawn from a large population of possible levels For concreteness7 let a 4 and 7117 n 3 The model y Xb e looks like PAGE 9 CHAPTER 1 STAT 7147 J TEBBS 911 511 912 512 913 513 921 521 922 13 03 03 03 041 522 923 03 13 03 03 042 523 1124 7 931 03 03 13 03 043 531 932 03 03 03 13 044 532 h 933 Z1 e1 633 941 541 942 542 943 543 W e2 where we identify X 1127 b M and e Zlel e2 This is not a GM model because cove 31 021 D Example 110 Two factor micced model Consider an experiment with two factors A and l37 where Factor A is xed and has 1 levels and Factor B is random with b levels A statistical model for this situation is given by 9 M 04 5739 6m for 239 12a7 j 127 b7 and k 127 H771 Here7 the ozs are best regarded as xed constants and the fs are best regarded as random variables This model assumes no interaction APPLICATION In a randomized block experiment7 b blocks may have been selected randomly from a large collection of available blocks If the goal is to make a statement about the large population of blocks and not those b blocks in the experiment7 then blocks may be considered as a random factor The treatment effects 041042 Oza are regarded as xed constants if the 1 treatments are the only ones of interest PAGE 10 CHAPTER 1 STAT 7147 J TEBBS NOTE For concreteness7 suppose that a 27 b 4 and 7117 n 1 We can write the model above as 911 511 912 512 9 3 51 513 M 914 14 14 04 I4 52 614 041 921 14 04 14 I4 53 521 042 922 54 522 Xb 923 Z1e1 623 924 524 HP This is not a GM model because cove 31 021 D GENERAL FORM A linear mixed model can be expressed generally as yXbZ1e1Z2e2Zkek where Z17 Z2 Zk are known matrices typically Zk 1k and e17 e27 ek are uncorre lated random vectors with uncorrelated components NOTE In Example 1107 if the ozs are best regarded as random as well7 then we have 911 511 912 512 913 51 513 914 14 04 041 I4 52 514 18M 921 04 14 042 I4 53 521 V 922 Z1e1 54 522 923 Z2e2 623 924 524 W This model is also known as a random effects model or variance component model PAGE 11 CHAPTER 1 STAT 7147 J TEBBS Example 111 Time series models When measurements are taken on the same ex perimental unit over time7 the GM model may not be appropriate This is true be cause observations on the same subject are likely correlated A linear model of the form y Xb e7 where Ee 0 and cove 02V V known7 may be more appropriate The general form of V is chosen to model the correlation of the observed responses D Example 112 Random eoe elerzt models Suppose that t measurements are taken over time on 71 individuals and consider the model yo le i 5177 for 239 12771 and j 127 t that is7 the different p gtlt 1 regression parameters are subject speci c If the individuals are considered to be a random sample7 eg7 if 17 2 n are iid random vectors with mean and covariance matrix 2133 we can write this model as yij ng i i eij le 7 TL 51739 xed random If the s are independent of the elfs7 note that varyij xngBBXZj 02 31 02 so that this is not a GM model B Example 113 Measurement error models Consider the statistical model yi 50 iXl 613 where the el are iid J0o 7 and the Xs are not observed exactly7 but are measured with non negligible error so that m Xi Uh where the Ul are iid N0o Here7 Observed data yi Not observed Xi7 ei Ui Unknown parameters 607 61 oi 0 PAGE 12 CHAPTER 1 STAT 714 J TEBBS The model above can be rewritten as yi 5051Wi Ui5i o im6ii iUz 1 Because the Ws are not xed in advance we would need 0 for this to be a linear model However note that The rst term is zero if e is independent of both X and Ui The second term generally is not zero unless 61 0 of course because U and X U are correlated Thus this is not a GM model B 13 Linear algebra review 131 Basic de nitions TERMINOLOGY A matrix A is a rectangular array of elements eg 3 5 4 A 1 2 8 The z jth element of A is denoted by aij The dimensions of A are m the number of rows by n the number of columns If m n A is square If we want to emphasize the dimension of A we can write Amm In this course we restrict attention to real matrices ie matrices whose elements are real numbers TERMINOLOGY A vector is a matrix consisting of one column or one row A column vector is denoted by an A row vector is denoted by alm By convention we assume a vector is a column vector unless otherwise noted that is PAGE 13 CHAPTER 1 STAT 7147 J TEBBS a 39 alta1a2 an TERMINOLOGY If A calj is an m gtlt 72 matrix the transpose of A denoted by A or AT is the n gtlt 772 matrix 217 If A A we say A is symmetric Result M1 a A3939 A b For any matrix A A A and AA are symmetric c A0 iffA A0 d AB39 B A e A B A B TERMINOLOGY The n gtlt 72 identity matrix I is given by 10 0 01 0 0 0 1 VLX VL that is 17 1 for 2 j and 17 0 when 2 31 j The n gtlt 72 matrix of ones J is given by 11 1 11 1 11 1 that is aij 1 for all2 and j Note that J 1139 where 1 1 is an n gtlt 1 column vector of ones The n gtlt 72 matrix where aij 0 for all 2 and j is called the null matrix or the zero matrix and is denoted by 0 PAGE 14 CHAPTER 1 STAT 7147 J TEBBS 132 Inverse TERMINOLOGY If A is an n gtlt n matrix7 and there exists a matrix C such that AC CA I7 then A is nonsingular and C is called the inverse of A henceforth denoted by A l If A is nonsingular7 A 1 is unique If A is a square matrix and is not nonsingular7 A is singular SPECIAL CASE The inverse of the 2 gtlt 2 matrix a b 1 1 d 7b A is given by A C d ad 7 b0 C a SPECIAL CASE The inverse of the n gtlt 71 diagonal matrix all 0 0 ail 0 0 A 13922 is given by A71 12721 0 C a 0 0 1 Result M2 a A is nonsingular iff lAl 31 0 b If A and B are nonsingular matrices7 AB71 B lA l c If A is nonsingular7 then A 1 A 1 133 Linear independence and rank TERMINOLOGY The m gtlt 1 vectors a17 a27 an are said to be linearly dependent if and only if there exist scalars 0102 on such that V L Z cial 0 i1 PAGE 15 CHAPTER 1 STAT 7147 J TEBBS and at least one of the cs is not zero that is7 it is possible to express at least one vector as a nontrivial linear combination of the others If V L ZCia i0gtcl52C 07 i1 a17 a27 an are linearly independent If m lt 717 then a17 a27 an must be linearly dependent NOTE If a17 a27 an denote the columns of an m gtlt 71 matrix A ie7 Altal 212 then the columns of A are linearly independent if and only if Ac 0 i c 07 where c 0102 cn Thus7 if you can nd at least one nonzero c such that Ac 07 the columns of A are linearly dependent TERMINOLOGY The rank of a matrix A is de ned as rA number of linearly independent columns of A number of linearly independent rows of A The number of linearly independent rows of any matrix is always equal to the number of linearly independent columns TERMINOLOGY If A is n gtlt p7 then rA S minnp o If rA minnp7 then A is said to be of full rank lf rA 717 we say that A is of full row rank o If rA p7 we say that A is of full column rank lf rA lt minnp7 we say that A is less than full rank or rank de cient Since the maximum possible rank of an n gtlt p matrix is the minimum of n and p7 for any rectangular ie7 non square matrix7 either the rows or columns or both must be linearly dependent PAGE 16 CHAPTER 1 STAT 7147 J TEBBS Result M3 a For any matrix A rA rA b For any matrix A rA A rA c For conformable matrices rAB S rA and rAB S rB d If B is nonsingular then rAB rA e For any n gtlt 71 matrix A rA n ltgt A 1 exists ltgt A 31 0 f For any matrix Amm and vector bnxl rAb 2 rA ie the inclusion of a column vector cannot decrease the rank of a matrix ALTERNATE DEFINITION An m gtlt 71 matrix A has rank 7 if the dimension of the largest possible nonsingular submatrix of A is r gtlt 7 APPLICATION TO LINEAR MODELS Consider our general linear model y Xb e where y is an Ngtlt1 vector of observed responses X is an Ngtltp matrix of xed constants b is apgtlt1 vector of xed but unknown parameters and e is an Ngtlt 1 vector of unobserved random errors with zero mean If X is N gtltp then X X is p gtltp Furthermore if rX p ie it is full column rank then rX X p Thus we know that X X 1 exists On the other hand if X has rank 7 lt p then rX X lt p and X X 1 does not exist Consider the normal equations which will be motivated later X Xb X y We see that left multiplication by X X 1 produces the solution B x xrlx y This is the unique solution to the normal equations since inverses are unique Note that if rX r lt p then a unique solution to the normal equations does not exist PAGE 17 CHAPTER 1 STAT 7147 J TEBBS 134 Ort hogonality TERMINOLOGY We say that two vectors a and b are orthogonal7 and write a1b7 if their inner product is zero ie7 ab 0 Vectors a17 a27 an are mutually orthogonal if and only if agaj 0 for all 239 31 j lf a17 a27 an are mutually orthogonal7 then they are also linearly independent verifyl The converse is not necessarily true TERMINOLOGY Suppose that a17 a27 an are orthogonal lf agal 17 for all 239 127717 we say that a17 a27 an are orthonormal TERMINOLOGY Suppose that a17 a27 an are orthogonal Then Ci aiHaillv where agalVLZ7 239 172717 are orthonorrnal The quantity is the length of ai lf a17 a27 an are the columns of A7 then A A is diagonal sirnilarly7 C C I TERMINOLOGY Let A be an n gtlt 71 square matrix We say that A is orthogonal if A A I AA 7 or equivalently7 if A A l 135 Vector spaces TERMINOLOGY Let V Q R be a set of n gtlt 1 vectors We call V a vector space if i x1 VxZ V X1x2 Vand ii KEVgtCX VfOFCER That is7 V is closed under addition and scalar rnultiplication TERMINOLOGY A set of n gtlt 1 vectors 8 Q R is a subspace of V if S is a vector space andSQV ieifx S x V PAGE 18 CHAPTER 1 STAT 7147 J TEBBS TERMINOLOGY We say that subspaces 81 and 82 are orthogonal7 and write 81182 if xixz 07 for all x1 6 81 and for all x2 6 82 Example 114 Suppose that V R3 Then7 V is a vector space Proof Suppose x1 6 V and Kg 6 V Then7 x1 x2 6 V and 0x1 6 V for all c E R D Example 115 Suppose that V R3 The subspace consisting of the z axis is 0 81 0 forzER Z The subspace consisting of the x y plane is 82 y forx7yER It is easy to see that 81 and 82 are orthogonal That 81 is a subspace is argued as follows Clearly7 81 Q V Now7 suppose that x1 6 81 and x2 6 81 ie7 0 0 x1 0 and xz 0 7 21 22 for 2122 E R Then7 0 x1 xz 0 E 81 21 22 and 0 0X1 0 6 S1 021 for all c E R Thus7 81 is a subspace That 82 is a subspace follows sirnilarly D PAGE 19 CHAPTER 1 STAT 7147 J TEBBS TERMINOLOGY Suppose that V is a vector space and that x1 X2 xn E V The set of all linear combinations of X17X2 Xn ie7 SXEVxZnclxi i1 is a subspace of V We say that S is generated by X17X2 xn In other words7 S is the space spanned by X17X2 Xn written 8 spanX1X27 xn Example 116 Suppose that V R3 and let 1 1 x1 1 and x2 0 1 0 For 0102 E R the linear combination 01x1 Cng 01 020101 Thus7 the space spanned by x1 and x2 is the subspace S which consists of all the vectors in R3 of the form abb 7 for ab E R D TERMINOLOGY Suppose that S is a subspace of V lf x1x2xn is a linearly independent spanning set for S we call x1x2 xn a basis for 8 ln general7 a basis is not unique However7 the number of vectors in the basis7 called the dimension of 8 written dimS7 is unique TERMINOLOGY The subspaces 81 and 82 are orthogonal complements in R if and only if 81 Q R 82 Q R 81 and 82 are orthogonal7 81 82 07 dim81 r and dim82 m 7 7 Result M4 Let 81 and 82 be orthogonal complements in R Then7 any vector y E R can be uniquely decomposed as y yl yg where yl E 81 and y2 E 82 Proof Suppose that the decomposition is not possible that is7 suppose that y is linearly independent of basis vectors in both 81 and 2 However7 this would give m 1 linearly independent vectors in R which is not possible Thus7 the decomposition must be possible To establish uniqueness7 suppose that y yl yg and y y y where y17yj E 81 and y27y E 82 Then7 y17yf y37y2 But7 y17yf E 81 and y7y2 E 82 Thus7 both yl 7 y and y 7 yg must be the 0 vector D PAGE 20 CHAPTER 1 STAT 7147 J TEBBS NOTE In the last result7 note that we can write llYllZ Y y y1 y2 y1 yz y lyl 2y 1y2 y yz Hylll2 HYZHZ This is simply Pythagorean7s Theorem The cross product term is zero since yl and y2 are orthogonal 136 Matrix subspaces TERMINOLOGY For the matrix Ammlta1 32 angt7 where aj is m gtlt 17 the column space of A7 CA V L cjaj 07 E R 7391 XERmxAc ceRn XERWCX is the set of all m gtlt 1 vectors spanned by the columns of A that is7 CA is the set of all vectors that can be written as a linear combination of the columns of A The dimension of CA is the column rank of A TERMINOLOGY Let Amxn 39 7 where b is n gtlt 1 Denote RA XER XZdib d e72 i1 X R xdA dERm We call RA the row space of A It is the set of all n gtlt 1 vectors spanned by the rows of A that is7 the set of all vectors that can be written as a linear combination of the rows of A The dimension of RA is the row rank of A PAGE 21 CHAPTER 1 STAT 7147 J TEBBS TERMINOLOGY The set NA x Ax 0 is called the null space of A7 denoted The dimension of NA is called the nullity of A Result M5 a CB Q CA iff B AC for some matrix C b RB Q RA iff B DA for some matrix D c CA7 7ZA7 and NA are all vector spaces d RA CA and CA e CA A CA and RA A RA f For any A and B7 CAB Q CA If B is nonsingular7 then CAB CA Result M6 If A has full column rank7 then NA Proof Suppose that A has full column rank Then7 the columns of A are linearly independent and the only solution to Ax 0 is x 0 D Example 117 De ne 1 1 2 3 A 1 0 3 and c 71 1 0 3 71 The column space of A is the set of all linear combinations of the columns of A ie7 01 62 203 C1211 6232 0333 01 303 Ci 303 Thus7 the column space CA is the set of all 3 gtlt 1 vectors of the form JJJJJ 7 where ab 6 R Any two vectors of 211212213 span this space In addition7 any two of 211212213 are linearly independent7 and hence form a basis for CA The set 211212213 is not linearly independent since Ac 0 The dimension of CA ie7 the rank of A7 is r 2 The dimension of the null space is 17 and c forms a basis for this space E PAGE 22 CHAPTER 1 STAT 7147 J TEBBS Example 118 De ne 1 1 73 5 1 2 7 A and c 72 1 3 1 1 1 4 3 A has two linearly independent columns since 5211 72a2a3 0 Thus7 rankA r 2 ie7 the dimension of CA is 2 The vector c forms a basis for NA it has dimension 1 The sum of the dimensions of the two spaces is 37 the number of columns in A D Result M7 For an m gtlt 71 matrix A with rank 7 S 717 the dimension of NA is n 7 7 That is7 dimCA dimNA 71 Result M8 For an m gtlt 71 matrix A7 NA and CA are orthogonal complements in 73 Proof Both NA and CA are vector spaces with vectors in 72 quot From the last result7 we know that dimCA rankA r say7 and dimNA m 7 7 since 7 rankA rankA Now we need to show NA CA Suppose x is in both spaces If x E CA7 then x Ac for some c If X E JA 7 then A X 0 Thus7 AxAAc 0 gt cAAc 0gt AcAc 0gt Ac x0 To nish the proof7 we need to show that NA and CA are orthogonal spaces Suppose that x1 6 CA and Kg 6 NA lt suf ces to show that XIXZ 0 But7 note that x1 6 CA gt x1 Ac7 for some c Also7 x2 6 NA gt A xz 0 Since X1X2 Ac x2 c A xz c O 07 the result follows E 137 Generalized inverses REVIEW Consider the system of equations Ax c If A is square and nonsingular7 then there is a unique solution to the system and it is x A lc If A is not nonsingular7 then the system can have no solution7 nitely many solutions7 or in nitely many solutions PAGE 23 CHAPTER 1 STAT 7147 J TEBBS TERMINOLOGY The linear system Ax c is consistent if there exists an x such that Ax c that is if c E CA REMARK We will show that o for every m gtlt 71 matrix A there exists a n gtlt m matrix G such that AGA A o for a consistent system Ax c if AGA A then x Gc is a solution Result M9 Suppose that Ax c is consistent If G is a matrix such that AGA A then x Gc is a solution to Ax c Proof Because Ax c is consistent there exists an x such that Ax c Note that AGc AGAX Ax c Thus Gc is a solution B TERMINOLOGY A matrix G that satis es AGA A is called a generalized inverse of A and is denoted by A That is AGA A gt AA A A If A is square and nonsingular then the generalized inverse of A is A 1 since AA A AA lA A NOTES 0 Every matrix A regardless of its dimension has a generalized inverse o Generalized inverses are not unique unless A is nonsingular o lfAisznthenA isngtltm o A generalized inverse of A A symmetric is not necessarily symmetric However a symmetric generalized inverse can always be found We will thus assume that the generalized inverse of a symmetric matrix is symmetric o If G is a generalized inverse of A then G is a generalized inverse of A PAGE 24 CHAPTER 1 STAT 7147 J TEBBS NOTATION Monahan uses A9 to denote generalized inverse7 but I will use A Example 119 Consider the matrices 4 1 2 13 713 0 A 1 1 5 and G 713 43 0 3 l 3 0 0 0 Note that rA 2 because 711 6212 7 a3 0 Thus A 1 does not exist However7 it is easy to show that AGA A7 showing that G is a generalized inverse of A D Result M10 Let A be an m gtlt 71 matrix with rA r If A can be partitioned as follows C D EF A where rA rC r and CM is nonsingular7 then C 10 0 0 G is a generalized inverse of A This result essentially shows that every matrix has a generalized inverse see Results A10 and A117 Monahan Also7 it gives a method to compute it COMPUTATION This is an algorithm for nding a generalized inverse A for A7 any m gtlt 71 matrix of rank 7 H Find any 7 gtlt r nonsingular submatrix C It is not necessary that the elements of C occupy adjacent rows and columns in A E0 Find C 1 and C 1 9 Replace the elements of C by the elements of C 1 7 Replace all other elements of A by zeros U Transpose the resulting matrix PAGE 25 CHAPTER 1 STAT 7147 J TEBBS Result M11 Let Amm xnxl cmxl and Imm be matrices and suppose that Ax c is consistent Then x is a solution to Ax c if and only if x A c I 7 A Az for some 2 E R Thus we can generate all solutions by just knowing one of them ie by knowing A c Proof We know that x A c is a solution Result M9 Suppose that x A c I 7 A Az for some 2 E R Thus A AA c A 7 AA AZ AA c A c that is x A c I 7 A Az solves Ax c Conversely suppose that x is a solution to Ax c Now x A c x 7 A c A c x 7 A Ax A c I 7 A Ax Thus x A c I 7 A Az where z XI Note that if A is nonsingular A A 1 and x A lc I 7 A lAz A lc ie there is just one solution B NOTE Consider the general form of the solution to Ax c which is assumed to be consistent ie x A cI7A AZ We call A c a particular solution The term I 7 A Az is the general solution to the homogeneous equations Ax 0 producing vectors in APPLICATION Consider the general linear model y Xb e where y is an N gtlt1 vector of observed responses X is an N gtltp matrix of rank 7 lt p b is a p gtlt 1 vector of xed but unknown parameters and e is an N gtlt 1 vector of unobserved random errors with zero mean The normal equations are given by X Xb X y PAGE 26 CHAPTER 1 STAT 7147 J TEBBS The normal equations are consistent see below Thus7 the general form of the least squares estimator is given by b X X X y I i X X X Xz where z 6 72 Of course7 if rX p7 then X X 1 exists7 and the unique solution becomes B X X 1X y PROPOSITION The normal equations X Xb X y are consistent Proof First7 we will state and prove the following lemma LEMMA For any matrix X7 and for any matrices A and B7 XXA XXB ltgt XA XB Proof The necessity part is obvious For the suf ciency part gt7 note that X XA X XB gt X XA 7 X XB 0 A i B X XA i X XB 0 A i B X XA i XB 0 A X i B X XA i XB 0 XA i XB XA i XB 0 This can only be true if XA 7 XB 0 Result Ml Thus7 the lemma is proven Now7 let X X denote a generalized inverse of X X so that X XX X X X X X Taking A X XX X and B I in the lemma7 we have X XX X X X X X X XX X X X X XX X X y X y This implies that b X X X y is a solution to the normal equations Hence7 the normal equations are consistent D PAGE 27 CHAPTER 1 STAT 7147 J TEBBS Example 120 Consider the one way xed effects ANOVA model 91739 M 041 6173 for 239 12 and j 1277117 where 711 2 and n2 3 It is easy to show that 5 2 3 X X 2 2 0 3 0 3 One generalized inverse of X X is 0 0 0 X X 0 12 0 7 0 0 13 and a solution to the normal equations based on this generalized inverse is 0 0 0 911 912 921 922 923 E1 XXHXy 0 12 0 911 912 0 0 13 921922923 0 0 911 912 91 921 922 923 92 Another generalized inverse of X X is 13 713 0 X X 713 56 0 0 0 0 and a solution to the normal equations based on this generalized inverse is 13 13 0 911912921922923 32 X XEX y 713 56 0 gm m 0 0 0 921 922 923 921 922 923 92 911 912 i 921 922 923 91 92 0 0 PAGE 28 CHAPTER 1 STAT 7147 J TEBBS The general solution is given by B X XX y IX X1 X Xz 0 l 0 0 21 731 71 0 0 22 172 71 0 0 23 21 111 7 21 7 172 21 where z 212223 6 R3 Furthermore7 we see that the rst particular solution corresponds to 21 0 while the second corresponds to 21 172 D 138 Projection matrices TERMINOLOGY A square matrix P is idempotent if P2 P TERMINOLOGY A square matrix P is a projection matrix onto the vector space S if and only if 1 P is idempotent 2 PX E S for any x 3 z E S gt Pz 2 projection Result M12 The matrix P AA projects onto CA Proof Clearly7 AA is a square matrix Note that AA AA AA Note that AA X AA x E CA Finally7 if z E CA7 then 2 Ax7 for some x Thus7 AA Z AA AX Ax z D NOTE In general7 projection matrices are not unique However7 if we add the require ment that Pz 07 for any ZLS7 then P is called a perpendicular projection matrix7 which is unique These matrices are important in linear models PAGE 29 CHAPTER 1 STAT 7147 J TEBBS Result M13 The matrix I 7 A A projects onto Proof Clearly7 I 7 A A is a square matrix Note that I i A AI i A A I i 2A A A A I i A A For any x7 note that I 7 A AX E NA because AI 7 A Ax 0 Finally7 if z E JA7 then AZ 0 Thus7 I 7 A Az z 7 A AZ z D Example 121 Consider the linear subspace of R2 de ned by 2a 8 22 7foraER a and take 08 04 04 02 EXERCISE Show that P is a projection matrix onto 8 Show that I 7 P is a projection matrix onto Si7 the orthogonal complement of 8 Result M14 The matrix M is a perpendicular projection matrix onto CM if and only if M is symmetric and idempotent Proof Suppose that M is a perpendicular projection matrix onto CM and write V V1 V27 where V1 6 CM and VZICM Also7 let W W1 W2 where W1 6 CM and WZICM Since I 7 MV I 7 MV2 and MW MW1 W1 we get WMI 7 MV wiMI 7 MV2 W1V2 0 This is true for any V and W7 so it must be true that M I 7 M 0 gt M M M Since M M is symmetric7 so is M 7 and this7 in turn7 implies that M M2 Now7 suppose that M is symmetric and idempotent If M M2 and V E CM7 then since V Mb7 for some b7 we have that MV MMb Mb V this establishes that M is a projection matrix To establish perpendicularity7 note that if M M and WICM7 then MW M W 07 because the columns of M are in D Result M15 If M is a perpendicular projection matrix onto CX7 then CM Proof We need to show that CM Q CX and CX Q Suppose that V E PAGE 30 CHAPTER 1 STAT 7147 J TEBBS Then V Mb for some b Now write b b1 b2 where b1 6 CX and ngCX Thus v Mb Mb1 b2 Mbl sz b1 6 CX Thus CM g CX Now suppose that V E Since M is a perpendicular projection matrix onto CX we know that V MV MV1 V2 where V1 6 CX and VZICX But MV1 V2 MV1 showing that V E Thus CX Q CM and the result follows E Result M16 Perpendicular projection matrices are unique Proof Suppose that M1 and M2 are both perpendicular projection matrices onto any arbitrary subspace S Q R Let V E R and write V V1 V2 where V1 6 S and VZIS Since V is arbitrary and M1V V1 M2V we have M1 M2 D Result M17 If M is the perpendicular projection matrix onto CX then I 7 M is the perpendicular projection matrix onto Sketch of Proof I 7 M is symmetric and idempotent so that I 7 M is the perpendicular projection matrix onto CI7 Show that CI 7 M NX and use Result M16 D 139 Trace and determinant functions TERMINOLOGY The sum of the diagonal elements of a square matrix A is called the trace of A written trA That is for Amm clj i1 Result M 18 H trA i B trA i trB E0 troA o gtlt trA 3 trA trA 4 trAB trBA U tTA A 221 2 0 PAGE 31 CHAPTER 1 STAT 7147 J TEBBS TERMINOLOGY The determinant of a square matrix A is a real number denoted by A or detA Result M19 1 Aquot A 2 AB BA 3 A 1Al 1 4 A 0 iff A is singular 5 For any n gtlt 71 upper lower triangular matrix7 A Hz a REVIEW The table below summarizes equivalent conditions for the existence of an inverse matrix A L where A has dimension n gtlt A L exists A L does not exist A is nonsingular A is singular W e o w 0 A has full rank A has less than full rank rA n rA lt n A has LlN rows columns A does not have LlN rows columns Ax 0 has one solution7 x 0 Ax 0 has many solutions 1310 Eigenvalues and eigenvector39s EIGENVALUES Suppose that A is a square matrix and consider the equations Au Au Note that AuAultgtAu7AuA7AIu0 If u 31 07 then A 7 AI must be singular Thus7 the values of A which satisfy Au Au are those values where A7 All 0 PAGE 32 CHAPTER 1 STAT 714 J TEBBS This is called the characteristic equation of A If A is n gtlt n then the characteristic equation is a polynomial in A of degree n The roots of this polynomial say A1 A2 An are the eigenvalues of A some of these may be zero or even imaginary If A is a symmetric matrix then A1 A2 An must be real Searle pp 290 EIGENVECTORS lf A1 A2 An are eigenvalues for A then vectors u satisfying Alli Mun for 239 1 2 n are called eigenvectors Note that Au Aiui gt Au 7 Aiui A 7 AiIu 0 From our discussion on systems of equations and consistency we know a general solution for u is given by u I 7 A 7 AI A 7 AIz for z E R Result M20 If A and Aj are eigenvalues of a symmetric matrix A and if A 31 A then the corresponding eigenvectors u and u are orthogonal Proof We know that Au Aiui and Au Ajuj The key is to recognize that Aiuguj ugAuj Ajuguj which can only happen if A Aj or if uguj 0 But A 31 Aj by assumption D PUNCHLINE For a symmetric matrix A eigenvectors associated with distinct eigen values are orthogonal we7ve just proven this and hence are linearly independent MULTIPLICITY If the symmetric matrix A has an eigenvalue Ak of multiplicity mk then we can nd mk orthogonal eigenvectors of A which correspond to Ak Searle pp 291 This leads to the following result cf Christensen pp 402 Result M21 If A is a symmetric matrix then there exists a basis for CA consisting of eigenvectors of nonzero eigenvalues If A is a nonzero eigenvalue of multiplicity m then the basis will contain m eigenvectors for A Furthermore NA consists of the eigenvectors associated with A 0 along with 0 PAGE 33 CHAPTER 1 STAT 7147 J TEBBS SPECTRAL DECOMPOSITION Suppose that Amm is symmetric with eigenvalues A17 A27 An The spectral decomposition of A is given by A QDQ7 where 0 Q is orthogonal ie7 QQ Q Q I7 0 D diagA1A2An7 a diagonal matrix consisting of the eigenvalues of A note that rD rA7 because Q is orthogonal7 and o the columns of Q are orthonormal eigenvectors of A Result M22 If A is an n gtlt n symmetric matrix with eigenvalues A17A27An7 then 1 lAl H221 z39 2 trA 221 Ai NOTE These facts are also true for a general 71 gtlt 71 matrix A Proof in the symmetric case Write A in its Spectral Decomposition A QDQ By Result M19 lAl lQDQ l lDQ Ql lDl mm By Result M18 WA trQDQ trDQQ trD 21 Ai D Result M23 Suppose that A is symmetric The rank of A equals the number of nonzero eigenvalues of A Proof Write A in its spectral decomposition A QDQ Because rD rA and because the only nonzero elements in D are the nonzero eigenvalues7 the rank of D must be the number of nonzero eigenvalues of A D Result M24 The eignenvalues of an idemptotent matrix A are equal to 0 or 1 Proof If A is an eigenvalue of A7 then Au Au Note that AZu AAu AAu AAu A2u This shows that A2 is an eigenvalue of A2 A Thus7 we have Au Au and Au A2u7 which implies that A 0 or A 1 D PAGE 34 CHAPTER 1 STAT 714 J TEBBS Result M25 If the n gtlt 71 matrix A is idempotent then rA trA Proof From the last result we know that the eignenvalues of A are equal to 0 or 1 Let V1V2 V be a basis for CA Denote by S the subspace of all eigenvectors associated with A 1 Suppose V E S Then because AV V E CA V can be written as a linear combination of V1V2 V This means that any basis for CA is also a basis for 8 Furthermore NA consists of eigenvectors associated with A 0 because AV 0V 0 Thus 71 dimR dimCA dimNA r dimNA showing that dimNA n 7 7 Since A has n eigenvalues all are accounted for A 1 with multiplicity r and for A 0 with multiplicity n 7 7 Now trA A r the multiplicity of A 1 But TA dimCA r as well D 1311 Quadratic forms de niteness and factorizations TERMINOLOGY Suppose that x is an n gtlt 1 vector A quadratic form is a function f R a R of the form fx Z Z lljaw xAx i1 j1 The matrix A is called the matrix of the quadratic form Result M26 lf X AX is any quadratic form there exists a symmetric matrix B such that X AX X BX Proof Note that X A x X Ax X Ax since a quadratic form is a scalar Thus 1 1 XAX 5xAx xAx 1 1 x A A XXBX where B A A It is easy to show that B is symmetric D PAGE 35 CHAPTER 1 STAT 7147 J TEBBS UPSHOT In working with quadratic forms7 we can7 without loss of generality7 assume that the matrix of the quadratic form is symmetric TERMINOLOGY The quadratic form X AX is said to be 0 nonnegative de nite nnd if X AX 2 07 for all x E R 0 positive de nite pd if XAX gt 07 for all x 31 0 0 positive semide nite psd if X Ax is nnd but not pd TERMINOLOGY A symmetric n gtlt 71 matrix A is said to be nnd7 pd7 or psd if the quadratic form X AX is nnd7 pd7 or psd7 respectively Result M27 Let A be a symmetric matrix Then 1 A pd gt lAl gt 0 2 A nnd gt lAl 2 0 Result M28 Let A be a symmetric matrix Then 1 A pd ltgt all eigenvalues of A are positive 2 A nnd ltgt all eigenvalues of A are nonnegative Result M29 A pd matrix is nonsingular A psd matrix is singular The converses are not true Result M30 Let A be an m gtlt 71 matrix of rank 7 Then A A is nnd with rank 7 Furthermore7 A A is pd if r n and is psd if r lt 71 Proof Let x be an n gtlt 1 vector Then X A Ax Ax Ax 2 07 showing that A A is nnd Also7 rA A rA r by Result M3 lf 7 717 then the columns of A are linearly independent and the only solution to Ax 0 is x 0 This shows that A A is pd lf 7 lt 717 then the columns of A are linearly dependent ie7 there exists an x 31 0 such that Ax 0 Thus7 A A is nnd but not pd7 so it must psd D PAGE 36 CHAPTER 1 STAT 7147 J TEBBS OIIOLESKY FAOTORIZATION A square matrix A is pd iff there exists a nonsingular lower triangular matrix L such that A LL Monahan proves this result see pp 2587 provides an algorithm on how to nd L7 and includes an example SYMMETRIO SQUARE ROOT DECOMPOSITION Suppose that A is symmetric and pd Writing A in its Spectral Decomposition we have A QDQ Because A is pd7 A12 A the eigenvalues of A7 are positive Result M28 If we de ne AlZ QD lZQ 7 where DlZ diagT1Em then A12 is symmetric and A12A12 QDlZqQDlZq QDlZIDlZq QDq A The matrix A12 is called the symmetric square root of A 14 Random vectors 141 Means and variances TERMINOLOGY Suppose that y17y27yn are random variables We call a random vector The joint pdf of y is denoted by fyy MEAN AND VARIANCE Suppose that pi varyl Tl2 for 239 127717 and covyiyj UH for 239 31 j The mean of y is Mill 1 M Ey M2 Mn PAGE 37 CHAPTER 1 STAT 7147 J TEBBS The variance of y is the n gtlt 71 matrix 01 012 Uln 021 U quot39 0 2 covy 2 Unl Un2 39 39 39 Un NOTE 2 is also called the variancecovariance matrix of y7 because it contains the n 2 covariance terms covyiyj7 for 239 lt j variances aiag quot47 on the diagonal and the as the elements strictly above the diagonal Since covyiyj covyjyi7 it follows that 2 is symmetric Example 122 Suppose that y1y2yn is an iid sample with mean M and variance varyl 02 and let y y17y2yn Then u Ey M1 and 2 covy 021 D Example 123 Consider the GM linear model y Xb e In this model7 the ran dom errors 61 62 en are uncorrelated random variables with zero mean and constant variance 02 We have Ee Owl and cove 021 D TERMINOLOGY Suppose that 211212 znp are random variables We call 211 212 39 39 39 21p 221 222 39 39 39 22p anp an Zn2 2n a random matrix The mean of Z is E211 E212 39 39 39 E21P EZ 7 E921 E922 m E92 Ezn1 EltZn2 Ean nxp Result RV31 Suppose that y is a random vector with mean u Then7 E COVy MW 7 MW 7 u l EWY 7 MM PAGE 38 CHAPTER 1 STAT 7147 J TEBBS Proof That covy Ey 7 u y 7 u is obvious Showing this equals Eyy 7 uu is a simple algebraic argument D OOVARIANOE Suppose that ypxl and qul are random vectors with means uy and uX7 respectively The covariance between y and x is the p gtlt q matrix de ned by COVy7 X My 7 MYX 7 Mx l mm where Uij EHM Eyill E95jl COVWle Result RV32 1 covyy 2 covy 2 covyx covxy DEFINITION Random vectors ypxl and qul are uncorrelated if covy7 x 0pm Result RV33 lf covy7 x 07 then covy7 aBx 07 for all nonrandom conformable a and B That is7 y is uncorrelated with any linear function of x Proof Exercise D TERMINOLOGY Suppose that varyl 012 for 239 127717 and covyiyj Uig7 for 239 31 j The correlation matrix of y is the n gtlt 71 matrix 1 P12 39 pln P21 1 P2n R Pia 7 pnl an 1 for 2397j 12771 PAGE 39 CHAPTER 1 STAT 7147 J TEBBS 142 Linear transformations TERMINOLOGY Suppose that y17y27yn are random variables and that 1170271n are constants De ne a 0170271ny and y y17y2yn The random variable 71 3y 2 WM i1 is called a linear combination ofy1y2yn Result RV34 If a 0170271ny is a vector of constants and y y17y2yn is a random vector with mean u Ey7 then Eay au Proof The quantity a y is a scalar so Ea y is also a scalar Note that EaY Edd139 ZaiEyi 2am a u D i1 i1 i1 Result RV35 Suppose that y y17y2yn is a random vector with mean u Ey7 let Z be a random matrix7 and let A and B a and b be nonrandom conformable matrices vectors Then7 1 EAy AM 2 Ea Zb a EZb 3 EAZB AEZB Result RV36 If a 0170271ny is a vector of constants and y y17y2yn is a random vector with mean u Ey and covariance matrix 2 covy7 then vara y a Ea Proof The quantity a y is a scalar random variable7 and its variance is given by vara y Ea y 7 21702 Ela y 7 u2l Ea y 7 Ma y 7 m PAGE 40 CHAPTER 1 STAT 714 J TEBBS But note that a y 7 u is a scalar and hence equals y 7 u a Using this fact we can rewrite the last expectation to get Ea y 7 MW 7 Ma 7 a Ey 7 MW 7 10 7 a Ea 5 Result RV37 Suppose that y y1y2 yn is a random vector with covariance matrix 2 covy and let a and b be conformable vectors of constants Then covayby aEb Proof Exercise D Result RV38 Suppose that y y1y2 yn is a random vector with mean u Ey and covariance matrix 2 covy Let b A and B denote nonrandom conformable vectorsmatrices Then 1 EAyb Aub 2 covAy b AEA 3 covAy By AEB Example 124 Suppose that y y1y2y3 has mean u and covariance matrix 2 given by 4 8 5 0 u 6 and 2 5 12 4 10 0 4 9 a Find the mean and variance of z yl 7 12 yg SOLUTION Letting a 1 71 1 we have that z a y Thus the mean of z is E2a u1711gt 6 8 PAGE 41 CHAPTER 1 STAT 7147 J TEBBS and the variance is 8 5 0 1 aEa lt1711gt 512 4 71 11 0 4 9 1 b Let 3 5 4 1 A and b 1 2 8 2 Find EAy b and covAy b SOLUTION 4 5 4 1 71 EAybAub 6 2 8 2 98 10 8 5 0 3 1 3 5 4 826 679 covAybA2A 5 12 4 5 2 1 2 8 679 780 0 4 9 4 8 143 Variancecovariance matrices Result RV39 A variance covariance matrix covy is nonnegative de nite Proof Suppose that ynxl has variance covariance matrix 2 We need to show that 3221 2 07 for all a E R Consider z a y7 where a is a conformable vector of constants Then7 z is scalar and varx 2 0 But7 varx vara y a Ea Since a is arbitrary7 the result follows E Result RV40 Suppose that y y17y27yn is a random vector with mean u Ey and covariance matrix V Then7 Py 7 u E CV 1 Proof Without loss7 take u 07 and let MV be the perpendicular projection matrix onto CV We know that y ny I 7 Mvy and that E1 Mvy1 MVEY 07 PAGE 42 CHAPTER 1 STAT 714 J TEBBS since u Ey 0 Also COVI Mvy1 MVVI le V MVVI le 07 since MVV V Thus we have shown that PI 7 Mvy 0 1 which implies that Py ny 1 Since ny E CV we are done D IMPLICATION Result RVAO says that there exists a subset CV Q R that contains y with probability one ie almost surely If V is positive semide nite psd then V is singular and CV is concentrated in a subspace of R where the subspace has dimension 7 rV r lt n In this situation the pdf of y may not exist Result RV41 Suppose that x y and z are n gtlt 1 vectors and that x y 2 Then 1 Ex Ey Ez 2 covx covy covz covy z covzy 3 if y and z are uncorrelated then covx covy covz 144 Application Linear prediction CONVENTION lf A1 and A2 are n gtlt n matrices we write A1 Zmd A2 if A1 7 A2 is nonnegative de nite nnd and A1 Zpd A2 if A1 7 A2 is positive de nite pd SETTING ONE Suppose that y is a scalar random variable and that we would like to predict its value using c E R De ne the mean squared error of prediction to be Q Ey7c2 Note that Q Ey7c2 vary p7c2 where M Thus to minimize Q we take 0 u SETTING TWO Suppose that ynxl is a random vector and that we would like to predict its value using c E R The mean squared error of prediction is Q 7 MW 7 cy 7 CY 7 COVW M 7 CM 7 c Zpd COVy since u 7 c u 7 c Zpd 0 Thus Q Zpd covy with equality when u c PAGE 43 CHAPTER 2 STAT 7147 J TEBBS SETTING THREE Suppose that ynxl is a random vector and that we would like to predict its value using the random vector xkxl Suppose further that Ey uy Ex uX covy 2y covx 2X covyx EYX and that Ex is nonsingular De ne W uy EYXE1X 7 uX and z y 7 W It follows that Ez 0 and covzx 0 verifyl Consider predicting ynxl by a Bx for some nonrandom anxl and ank and let Q Ely 7 a BXy 7 a BX l Clearly Q Zpd 0 Now write Q as Q Ely 7 W W i a BXy 7 W W i a BX l7 where W is de ned as previously Note that Q Ez W 7 a 7 Bxz W 7 a 7 Ex Now since W is a linear function of x so is W 7 a 7 Ex Also since 2 is uncorrelated with x z is also uncorrelated with W 7 217 Ex Thus our last expression for Q becomes Q Ezz EW 7 a 7 Ex W 7 a 7 Bx Zpd Ezz E covz since EW 7 a 7 BxW 7 a 7 Bx Zpd 0 Thus Q Zpd covz with equality when EW 7 a 7 BxW 7 a 7 Bx 0 If we choose a and B so that W 7 a 7 Ex 0 then Q will be at a minimum Thus because W uy EYX21X 7 uX we want a My EYXEElMX B 2mg This shows that if we want to predict ynxl from xkxl a good linear predictor is W My 2YX 33c1X x Also the error of prediction 2 y 7 W is uncorrelated with x PAGE 44 CHAPTER 2 STAT 714 J TEBBS 2 The Linear Least Squares Problem Complementary reading from Monahan Chapter 2 INTRODUCTION Consider the linear model y Xb e where Ee 0 Suppose that we want to estimate Ey Xb Note that Ey Xb when Ee 0 in what follows the assumption cove 021 is not needed If Ey Xb then since b is unknown all we really know is that Ey E To estimate Ey it seems natural to take the vector in CX that is closest to y 21 Least squares estimation DEFINITION An estimate b is a least squares estimate of b if Xb is the vector in CX that is closest to y In other words b is a least squares estimate of b iff y e Xtgt lty 7 KB mbinlty e Xbgt lty 7 Kb LEAST SQUARES Let b b1b2 bp and de ne the error sum of squares Qb y 7 Xb y 7 Xb the squared distance from y to Xb Qb is a quadratic function of b From calculus we know that points where Qb is minimized satisfy 8Q b 0 3Qb a b 7 0 0 or in other words 3172 6b 3 39 3Qb Twp 0 This problem can be tackled either algebraically or geometrically Result LS1 Let a and b be p gtlt 1 vectors and A be a p gtlt p matrix of constants Then 6a b i 6b Ab 7 6b a an 6b A A b Proof See Monahan pp 14 D PAGE 45 CHAPTER 2 STAT 7147 J TEBBS NOTE In Result LS17 note that ab Ab 7 2Ab 6b if A is symmetric NORMAL EQUATIONS Simple calculations show that Qb y 7 Xby 7 Xb yy 7 2be bXXb Using Result LS17 we have 7 72Xy 2X Xb7 because X X is symmetric Setting this expression equal to 0 and rearranging gives X Xb X y These are known as the normal equations lf X X is nonsingular7 then the unique least squares estimate of b is E X X 1X y However7 when X X is singular7 which can happen in certain parameterizations of ANOVA like models see Chapter 17 there can be multiple solutions to the normal equations Having already proved algebraically that the normal equations are consistent7 we know that the general form of the least squares estimate is b X X X y I i X X X Xz where z 6 72 22 Geometric considerations CONSISTENCY Recall that a linear system Ax c is consistent if there exists an x such that Ax c that is7 if c E CA Applying this de nition7 we know that the PAGE 46 CHAPTER 2 STAT 7147 J TEBBS normal equations are consistent if X y E CX X Clearly X y E CX Thus well be able to establish consistency geometrically if we can show that CX X CX Result LS2 NX X Proof Suppose that W E Then XW 0 and X XW 0 so that W E Suppose that W E Then X XW 0 and W X XW 0 Thus HXwH2 0 which implies that XW 0 ie W E D Result LS3 Suppose that 81 and 71 are orthogonal complements as well as 82 and 72 81 Q 82 then 72 Q Proof See Monahan pp 244 D CONSISTENCY We use the previous two results to show that CX X CX Take 81 NX X Tl CX X 82 NX and 72 CX We know that 81 and 7182 and 72 are orthogonal complements Because NX X Q NX the last lemma says that CX Q CX X But CX X Q CX trivially so were done Note additionally that CX X CX gt rX X rX D NOTE We now state and prove a result that characterizes all solutions to the normal equations Result LS4 Qb y 7 Xb y 7Xb is minimized at B if and only if IA is a solution to the normal equations Proof Suppose that B is a solution to the normal equations Then Qb y 7 Xb y 7 Xb y7XBXE7Xb y7XBXE7Xb 7 y i XB y 7 XS XS 7 Xb XB i Xb since the cross product term 2XB 7 Xb y 7 0 verify this using the fact that b solves the normal equations Thus we have shown that Qb Qb z z where z XE 7 Xb Therefore Qb 2 for all b and hence E minimizes Qb PAGE 47 CHAPTER 2 STAT 714 J TEBBS Now suppose that b minimizes Qb We already know that 2 Qb where b X X X y by assumption but also S because b minimizes Qb Thus But because 22 where z XS 7 Xb it must be true that z Xb 7 Xb 0 that is Xb Thus X XE X XE x y since b is a solution to the normal equations This shows that b is also solution to the normal equations D REMARK ln proving the last result we have discovered a very important fact namely if b and b solve the normal equations then Xb In other words X6 is invariant to the choice of The following result ties least squares estimation to the notion of a perpendicular projection matrix It also produces a general formula for the matrix Result LS5 An estimate b is a least squares estimate if and only if Xb My where M is the perpendicular projection matrix onto Proof We will show that y 7 Xb y 7 Xb 7 y 7 My y 7 My My 7 Xb My 7 Kb Both terms on the right hand side are nonnegative and the rst term does not involve b Thus y 7 Xb y 7 Xb is minimized by minimizing My 7 Xb My 7 Xb the squared distance between My and Xb This distance is zero if and only if My Xb which proves the result Now to show the above equation y 7 Xb y 7 Xb 7 y 7 My My 7 Xb y 7 My My 7 Xb 7 y 7 My y 7 My y 7 My My 7 Xb 7 My 7 Xb y 7 My My 7 Xb My 7 Xb lt77 lt suf ces to show that and are zero To show that is zero note that y 7 My My 7 Xb 7 y 1 7 MMy 7 Xb 7 1 7 Myl My 7 Xb 7 0 PAGE 48 CHAPTER 2 STAT 7147 J TEBBS because I 7 My E NX and My 7 Xb E Similarly 0 as well D Result LS6 The perpendicular projection matrix onto CX is given by M XX X X Proof We know that E X X X y is a solution to the normal equations so it is a least squares estimate But by Result LS5 we know XE My Because perpendicular projection matrices are unique M XX X X as claimed D NOTATION Monahan uses PX to denote the perpendicular projection matrix onto We will henceforth do the same that is PX XX X X PROPERTIES Let PX denote the perpendicular projection matrix onto Then a PX is idempotent b PX projects onto CX c PX is invariant to the choice of X X d PX is symmetric and e PX is unique You7ll note that we have already proven a b d and e see Results M14 16 Part c must be true otherwise part e would not hold However we can prove c more rigorously Result LS7 lf X X and X X are generalized inverses of X X then 1 XxXxx XxXgx x x 2 XxXx XxXgx PAGE 49 CHAPTER 2 STAT 7147 J TEBBS Proof For V E R let V V1 V27 where V1 6 CX and VglCX Since V1 6 CX7 we know that V1 Xd7 for some vector 1 Then7 VXXXXX V1XXXXX dX XX XfXX dXX VX since VglCX Since V and X X were arbitrary7 we have shown the rst part To show the second part7 note that XXXXV XX X1 X Xd XX X X Xd XXXXV Since V is arbitrary7 the second part follows as well D Result LS8 Suppose that X is an N gtlt p matrix with rank 7 3 p7 and let PX be the perpendicular projection matrix onto Then rPX rX r and rI 7 PX N 7 7 Proof Note that PX is N gtlt N We know that CPX CX7 so the rst part is obvious To show the second part7 recall that I 7 PX is the perpendicular projection matrix onto X 7 so it is idernpotent Thus7 rI 7 PX trI 7 PX trI 7 trPX N 7 rPX N 7 7 because the trace operator is linear and because PX is idernpotent as well D SUMMARY Consider the linear model y Xb e7 where Ee 0 in what follows7 the cove 021 assumption is not needed We have shown that a least squares estimate of b is given by B X X X y This solution is not unique unless X X is nonsingular However7 nyXBE is unique We call the vector of tted values Geornetrically7 is the point in CX that is closest to y Now7 recall that I 7 PX is the perpendicular projection matrix onto Note that 1PXYYPXYY EE PAGE 50 CHAPTER 2 STAT 714 J TEBBS We call 6 the vector of residuals Note that e E Because CX and NX are orthogonal complements we know that y can be uniquely decomposed as y We also know that y and E are orthogonal vectors Finally note that y y y Iy y Px I 7 Pxy y ny y 1 7 Pxy y Pxny y 1 7 Px1 7 Pxy 17 6 6 since PX and I7PX are both symmetric and idempotent ie they are both perpendicular projection matrices but onto orthogonal spaces This orthogonal decomposition of y y is often given in a tabular display called an analysis of variance ANOVA table ANOVA TABLE Suppose that y is N gtlt 1 X is N gtlt p with rank 7 S p b is p gtlt 1 and e is N gtlt 1 An ANOVA table looks like Source df SS Model 7 y y PXy Residual N 7 7 6 6 y I 7 Pxy Total N y y y Iy It is interesting to note that the sum of squares column abbreviated SS catalogues 3 quadratic forms y PXy y I 7 ny and y Iy In turn the degrees of freedom column abbreviated df catalogues the ranks of the associated quadratic form matrices ie rPX r rI 7 PX N 7 r rI N The quantity y ny is called the uncorrected model sum of squares y I 7 Pxy is called the residual sum of squares and y y is called the uncorrected total sum of squares PAGE 51 CHAPTER 2 STAT 714 J TEBBS VISUALIZATION One can think about the geometry of least squares estimation in three dimensions ie when n 3 Consider your kitchen table and take one corner of the table to be the origin Take CX as the two dimensional subspace determined by the surface of the table and let y be any vector originating at the origin ie any point in R3 The linear model says that Ey Xb which just says that Ey is somewhere on the table The least squares estimate y Xb ny is the perpendicular projection of y onto the surface of the table The residual vector 3 I 7 Pxy is the vector starting at the origin perpendicular to the surface of the table that reaches the same height as y Another way to think of the residual vector is to rst connect y and ny with a line segment that is perpendicular to the surface of the table Then shift the line segment along the surface keeping it perpendicular until the line segment has one end at the origin The residual vector 3 is the perpendicular projection of y onto CI 7 PX NX that is the projection onto the orthogonal complement of the table surface The orthogonal complement CI 7 PX is the one dimensional space in the vertical direction that goes through the origin Once you have these vectors in place sums of squares arise from using Pythagorean7s Theorem D A SIMPLE PPM Suppose that y1y2 yN are iid with mean M and variance vary 02 As we saw in Chapter 1 we can write in terms of the linear model y Xb e where 91 1 51 1 e y yz X1 bM7 e 2 yN 1 EN The perpendicular projection matrix onto CX is given by P1 11 11 N lll N lJ where J is the N gtlt N matrix of ones P1 projects y onto the space CPl Z 6 R z aa a a E R in fact Ply N lJy 7317 PAGE 52 CHAPTER 2 STAT 714 J TEBBS where 17 N l yi Note that rP1 1 The perpendicular projection matrix 1 P1 projeCtS y onto CI PI Z E R 3 Z 017027 7aN 6 E R7 az 0 ln particular 11in 7 12in I7P1yy7P1yy7y1 7 9N7 the vector which contains the deviations from the mean Note that rI 7 P1 N 7 1 REMARK The matrix P1 plays an important role in linear models Here is why Most linear models when written out in non matrix notation contain an intercept term For example in simple linear regression yi 50 51 6h or in ANOVA type models like yijk M 0417 3739 W 6mm the intercept terms are 60 and M respectively In the corresponding design matrices the rst column of X is 1 If we discard the other terms like 61 and a 67 77 in the models above then we have a reduced model of the form y M 6 that is a model that relates y to its overall mean or in matrix notation y 1p e The perpendicular projection matrix onto Cl is P1 and y Ply y PlPly P1Y P1y 1W2 This is the model sum of squares for the model y M 6 that is y Ply is the sum of squares that arises from tting the mean Now consider a general linear model of the form y Xb e where Ee 0 and suppose that the rst column of X is 1 In general we know that y y y Iy y ny Y 1 7 PxY Subtracting y Ply from both sides we get y 1 7 P1y yPx 7 P1y Y 1 7 PxY PAGE 53 CHAPTER 2 STAT 714 J TEBBS The quantity y I7 P1y is called the corrected total sum of squares and y PX 7 P1y is called the corrected model sum of squares The term corrected is understood to mean that we have removed the effects of tting the mean7 This is important because this is the sum of squares breakdown that is commonly used ie Source df SS Model Corrected r 7 1 yPx P1y Residual N 7 r yI Pxy Total Corrected N 7 1 y I 7 P1y ln ANOVA models the corrected model sum of squares y PX 7 P1y is often bro ken down into smaller components which correspond to different parts eg orthogonal contrasts main effects interaction terms etc Finally the degrees of freedom are the corresponding ranks of PX 7 P1 I 7 PX and I 7 P1 verifyl NOTE In the general linear model y Xbe the residual vector from the least squares t 6 I 7 Pxy E NX so E X 0 that is the residuals in a least squares t are orthogonal to the columns of X since the columns of X are in As a special case note that if 1 E CX which is true of all linear models with an intercept term then V L 6 1 Za 0 i1 that is the sum of the residuals from a least squares t is zero This is not necessarily true of models for which 1 Result LS9 lf CW C CX then PX 7 PW is the perpendicular projection matrix onto CI 7 PWX Proof lt suf ces to show that a PX 7 PW is symmetric and idempotent and that b CPX 7 PW CI 7 PWX First note that PXPW PW because the columns of PW are in CW C By symmetry PWPX PW Now PX i PWPX 7 PW P i PXPW i PWPX P v PX7Pw7PwPwPX7Pw PAGE 54 CHAPTER 2 STAT 714 J TEBBS Thus PX7PW is idempotent Also PX 7Pw X 7 W PX7 PW so PX7 PW is symmetric Thus PX 7 PW is a perpendicular projection matrix onto CPX 7 PW Suppose that V E CPX 7 PW ie V PX 7 Pwd for some 1 Write d 11 d2 where 11 E CX and d2 6 NX that is 11 Xa for some a and X dg 0 Then V PX 7 d2 PX 7 d2 PXXa de2 7 Pan 7 Pwdg Xa 0 i Pan i 0 I i PwXa e CI i PWX Thus CPX 7 PW Q CI 7 PWX Now suppose that W E CI 7 PWX Then W I 7 PWXc for some c Thus W XC 7 wac PXXc 7 wac PX 7 E 7 This shows that CI 7 PWX Q CPX 7 PW D TERMINOLOGY Suppose that V is a vector space and that S is a subspace of V ie S C V The subspace S z VzlS is called the orthogonal complement of S with respect to V If V R then S SL is simply referred to as the orthogonal complement of 8 Result LS10 lf CPw C CPX then CPX 7 PW is the orthogonal complement of CPw with respect to CPX that is CPX 7 PW CPwgPx Proof CPx 7 PwlCPw because PX 7 PwPw Pwa 7 Pa Pw 7 Pw 0 Because CPX 7 PW C CPX CPX 7 PW is contained in the orthogonal complement of CPw with respect to CPX Now suppose that V E CPX and VlCPw Then V PXV PX 7 PwV PWV PX 7 PwV E 7 Pw showing that the orthogonal complement of CPw with respect to CPX is contained in 7 D PAGE 55 CHAPTER 2 STAT 7147 J TEBBS REMARK The preceding two results are important for hypothesis testing in linear models Consider the linear models yXbe and yWce where CW C As we will learn later the condition CW C CX implies that the model y WC e is a reduced model when compared to y Xb e sometimes called the full model If Ee 0 then if the full model is the true model Eny PXEy PXXb Xb e CX Similarly if the reduced model is true EPwy WC 6 Note that if the reduced model y Wc e is true then the full model y Xb e is also true since CW C Thus if the reduced model is true ny and Pwy are attempting to estimate the same things and their difference PX 7 Pwy should thus be small On the other hand ifthe reduced model is not true then ny and Pwy are estimating different things and one would expect PX 7 Pwy to be large The question about whether or not to accept77 the reduced model as plausible thus hinges on deciding whether or not PX 7 Pwy the perpendicular projection of y onto CPX 7 PW CPw Px is large or small 23 Reparameterization REMARK For estimation in the general linear model y Xb e where Ee 0 we can only learn about b through Xb E Thus the crucial item needed is PX the perpendicular projection matrix onto For convenience we call CX the estimation space We call NX the error space I 7 PX is the perpendicular projection matrix onto the error space IMPORTANT In a profound sense any two linear models with the same estimation space are the same model and the models are said to be reparameterizations of each other Any two such models will give the same predicted values the same residuals the PAGE 56 CHAPTER 2 STAT 714 J TEBBS same ANOVA table etc In particular suppose that we have two linear models for a vector of observations y yXbe and yWce lfCX CW then PX does not depend on which of X or W is used it depends only on CX As we will nd out the least squares estimate of Ey is nyxEwe IMPLICATION Basically the b parameters in the model y Xb e where Ee 0 are either a convenience or a nuisance depending on what we are trying to do In fact reparameterizations are most commonly used to exploit the clarity of one model and the computational ease of the other The essence of the model is that Ey E As long as we do not change CX we can change X itself to suit our convenience Example 21 Recall the simple linear regression model from Chapter 1 given by yi 50 51 6h for 239 1 2 N Although not critical for this discussion we will assume that 61 62 6N are uncorrelated random variables with mean 0 and common variance 02 gt 0 Recall that in matrix notation 91 1 1 61 1 z e y 7 12 7 X 2 7 b 50 7 e 2 61 9N 1 N 5N As long as 2122 zN is not a multiple of 1N and at least one x 7 0 then TX 2 and X X 1 exists Straightforward calculations show that XX N 1 7 XXyl 7 and Ziyi Xy PAGE 57 CHAPTER 2 STAT 7147 J TEBBS Thus the unique least squares estimator is given by A 30 17731 i 71 7 i biXX Xy B 1 Zilt i752 For the simple linear regression model it can be shown verify that the perpendicular projection matrix PX is given by PX XX X 1X 752 miimii 17517 i i 42332 i 4759 2 i i 1242352 A wlii wri A 32 A 12wiv N immv Nziltzrigt2 N immv A 11wiv 5 A 12wiv 5 MI WZ immv N immv N 24sz A reparameterization of the simple linear regression model y 60 612 e is yi Vo Y139 if 5i or y Wc e where yl 1 17 61 y 1 x if e y 2 7 W7 2 7 c Yo 7 e 2 Yi gm 1 znif 6N then W XU and X WU l verify so that CX Moreover Ey Xb Wc XUc Taking P X X 1X leads to b P Xb P XUc Uc ie 50 7 Yo 71 51 Yi To nd the least squares estimator for c in the reparameterized model observe that N 0 1 W W and W W 0 ZZZ95139 32 0 o ZlH 1 ZMFEV PAGE 58 CHAPTER 2 STAT 7147 J TEBBS Note that W W 1 is diagonal this is one of the bene ts to working with this param eterization The least squares estimator of c is given by A 7 a0 17 c W W 1W y A glaringH7 m which is different than However7 it can be shown verify that the perpendicular projection matrix onto CW is PW Nzo z Nwilvv A If2 A wlii wri A IPEWN W N ZMFW N ZMFW N gnaw miimii zii2 17517 lt1gtlt32gt 2 lt2gtltNgt N ZMWW i 32 A IPTWN A 12wNii A wNii2 N ZMFW N ZMFW N gnaw which is the same as PX Thus7 the tted values will be the same ie7 y ny XE WE PWy7 so the analysis will be the same under both parameterizations D 24 GramSchmidt orthomormalization MOTIVATION Orthogonality is an important concept in linear models One instance where orthogonality is useful is in the analysis of variance Example 22 Consider the one way ANOVA model 917 M 04 6m for 239 1721 and j 1277117 so that the design matrix is 1n1 1n1 01 01 XM 17 OT 17 OT 1 0 0 1 PAGE 59 CHAPTER 2 STAT 7147 J TEBBS where p a 1 and N 71 Straightforward calculations verify show that the perpendicular projection matrix onto CX is given by the N gtlt N matrix PX 131k Diagn1JnxLi7 where mem is the n gtlt 71 matrix of ones and Elk Diag77 stands for block diagonal77 For example if a 3 n1 n2 2 and n3 3 then N 7 and 12 12 0 0 0 0 0 12 12 0 0 0 0 0 0 0 12 12 0 0 0 I x 12 12 0 0 0 0 0 13 13 13 0 0 13 13 13 0 0 0 0 0 0 13 13 13 0 0 0 7x7 We have seen that P1 N lJNxN is the perpendicular projection matrix responsible for removing the effects of the intercept term in this model the intercept term is M We have also seen that PX 7 P1 is the perpendicular projection matrix which projects y onto the orthogonal complement of Cl with respect to CX a subspace of dimension rPX 7 P1 a 71 The quantity y PX 7 P1y is the corrected model sum of squares for the 0 treatments l7ll leave it to you to verify that 1 yPx 7 P1y 7 172 11 The quantity y PX 7 P1y is useful An uninteresting use of this quantity involves testing the hypothesis that all the ozs are equal ie testing H0 041 02 04a lnformally this is done by comparing the size of y PX 7 P1y to the size of y I7 Pxy the residual sum of squares while adjusting for the ranks of PX 7 P1 and I 7 PX ie a 7 1 and N 7 a It is more useful and more interesting to break this quantity up into smaller pieces and test more re ned hypotheses that correspond to the pieces One way to do this is to break up y PX 7 P1y into a 7 1 components xwx7P yYMwxM HxMww7 PAGE 60 CHAPTER 2 STAT 714 J TEBBS where MiMj 0 for all 2 31 j and M1M2Ma1 are perpendicular projection matrices onto a 7 1 orthogonal subspaces of CPX 7 P1 The sums of squares y MZy 2 12a 7 1 each have 1 degree of freedom and can be used to test orthogonal contrasts Breaking up CPX 7P1 in this fashion can be done using the Gram Schmidt orthonormalization procedure Furthermore with additional assumptions that we will introduce later the quadratic forms y MZy 2 12a 7 1 will be shown to be independent random variables GRAM SCHMIDT The GramSchmidt procedure is a method for orthonormalizing a set of basis vectors Let V be a vector space with basis uh 112 u For 5 1 2 7 de ne inductively V1 ulllulll s71 WS u57 Zu39sVVi 21 Vs Wsllwsll39 Then V1V2 V is an orthonormal basis for V where VS 6 spanu1u2 u5 Example 23 Consider the vector space V R3 and the vectors ul 111 L12 011 and 113 001 Write U as 100 Ultu1 112113 7 111 and note that u1u2 us is a basis for V since U is full rank and any vector in R3 can be written as a linear combination of 111 112 and 113 We now use Gram Schmidt to orthonormalize the basis Step 1 Hum xZ and 1 1 W Vl L11 7 1 1 i 1111111 xg f 1 i PAGE 61 STAT 7147 J TEBBS CHAPTER 2 Step 2 1 1 2 0 W W E W2u2U2V1V1 1 0 1 1 1 1 1 1 3 1g 11 W 3 3 V6 gt V2 72 7 l 1 llW2 15 3 V5 1 1 E W Step 3 W3 L13 7 u v vl 7 u v vz 0 1 1 11 11 x5 x5 1 1 1 1 1 1 1 0 001 W lt001gtW W 1 1 1 1 1 x x 0 11 2 1 5 0 0 gt 7 2 11 11 V3 HWS f 2 1 i 2 Thus7 an orthonormal basis for V R3 is V1V27V3 where V1 1 1 1 V2 72 1 61 z and V3 07 1 71 y REMARK The following important result merges the ideas of a perpendicular projection matrix and an orthonormal basis Result LS11 Let 01027 oT be an orthonormal basis for CX and O 01 02 0 Then 00 2171 010 is the perpendicular projection matrix onto Proof 00 is clearly symmetric 00 is idempotent too since OOOO OITO 00 PAGE 62 CHAPTER 2 STAT 7147 J TEBBS lt suf ces to show that COO Clearly COO Q CO CX because 0102 o is an orthonormal basis for Suppose that V E CO Then V Ob for some b Thus V Ob OOOb E COO which shows that CX Q COO D EXERCISE For the matrix U in Example 23 show that L 1 0 L 1 0 x xg x5 1 1 1 1 1 1 L i L E E W W E UUU U PUv L L L L L L xg x x xg illustrating the use of Result LS11 DISCUSSION Let V be a vector space with basis u1u2 u The Gram Schmidt procedure orthonormalizes the basis to produce V1V2 V De ne s71 V571 E Vivi i1 Applying Result LS11 we know that V54 is the perpendicular projection matrix onto Cu1u2us1 Thus for s 23r WS I 7 ijuS is the perpendicular projection of uS onto the orthogonal complement of Cu1u2 ukl or in other words uS is projected onto a space orthogonal to the space spanned by the us that preceded it We start with Cul Then 112 is projected onto CujL 113 is projected onto Cuju2L u4 is projected onto Cuju2u3L u is projected onto Cu1u2ur1i In this sense the Gram Schmidt procedure allows us to break up77 any vector space V into 7 subspaces each one orthogonal to the r 7 1 others But note that 971 WS I 7 ijuS uS 7 Zugvivi i1 Finally V5 is just WS normalized PAGE 63 CHAPTER 2 STAT 7147 J TEBBS DISCUSSION We now return to Example 22 and take M1 PX 7 P1 We now break up CMr into a 7 1 orthogonal subspaces Take an orthonormal basis for CMi7 say7 010270a1 Note that7 using Gram Schmidt7 01 can be any normalized vector in CM17 02 can be any normalized vector in CM1 orthogonal to 017 and so on Set 0 01 02 owl From Result LS117 we have 171 M oo 2010 i1 Take Ml 0102 Then7 M1 is a perpendicular projection matrix in its own right and MiMj 07 for 239 31 j because of orthogonality Finally7 note that y MJ y M1 M2 Mainy y Mly y sz yMa71Y This demonstrates that the corrected model sum of squares y Miy can be written as the sum of a 7 1 pieces7 as claimed in Example 22 Finally7 note that there are many orthonormal bases at one7s disposal ie7 y Miy can be broken up in many different ways REMARK Monahan Section 24 uses Gram Schmidt to create a orthogonal reparam eterization in a full rank regression model y Xb e The primary goal is to come up with a matrix U having orthogonal columns with CU GRAM SCHMIDT Suppose that the N gtlt p design matrix X is denoted by Xltx1 X2 KP Step 1 Set 111 X1 Step 2 Regress x2 on 111 that is7 t the model x12 Pun 6139 for 239 127 N Compute the least squares estimate of 6152 given by Biz u u lu xZ Compute the residual vector x2 7 353m Call this 112 Step 3 Regress X3 on 111 and ug that is7 t the model x13 Pun 691112 6139 for 239 127N7 or in matrix notation7 X3 U2b3 e7 where U2 uL 112 and b3 6153 SW Compute the least squares estimate of b3 given by 63 U ZU2 1U 2X3 Compute the residual vector X3 7 U2b3 Call this 113 PAGE 64 CHAPTER 3 STAT 7147 J TEBBS Step 4 Regress x4 on 111 112 and 113 that is7 t the model7 z You ght2 lm3 6139 for z39 127N7 or in matrix notation7 x4 U3b4 e7 where U3 111 L12 113 and b4 4gt 4gt 4gt Compute the least squares estimate of b4 given by 34 UgU3 1UgX4 Compute the residual vector x4 7 USE Call this u4 Step 5 Continue until the last residual vector produced is up and de ne Ultu1 1121117 NOTE At each step of the algorithm7 o the matrix U jUJfl is easy to compute since the columns of U are orthogonal7 and7 hence7 U JUj is diagonal o uj is the projection of x onto Cu1u2uj1i Thus7 the columns of the com pleted matrix U will be orthogonal that is7 U U D7 where D diagu Lu1u 2u27 u p A normalized version of U is obtained by normalizing each column NOTE Monahan de nes the p gtlt p upper triangular matrix S as follows 1 312 39 El Bl 0 1 3 BS 32 S 0 0 1 a BE 0 0 0 1 Eff 7 0 0 0 0 39 0 0 0 0 0 1 so that X US gt CX Q CU But7 also U XS l7 so CU Q Thus7 CX CU7 as desired Note that normalizing the columns of U won7t change CU REMARK Creating an orthogonal reparameterization of a full rank design matrix con fers computational advantages in working with linear models see also Example 21 PAGE 65 CHAPTER 3 STAT 7147 J TEBBS 3 Estimability and Least Squares Estimators Complementary reading from Monahan Chapter 3 omit Section 39 for now 31 Introduction REMARK Estimability is one ofthe most important concepts in linear models Consider our general linear model y Xb e7 where Ee 0 In the discussion which follows7 the assumption cove 021 is not needed Suppose that X is N gtlt p with rank 7 S p lf 7 p as in regression models7 then estimability concerns vanish as b can be estimated uniquely by b X X 1X y lf 7 lt p7 a common characteristic of ANOVA models7 then b can not be estimated uniquely However7 even if b is not estimable7 certain functions of b may be estimable 32 Estimability DEFINITIONS H An estimator ty is said to be unbiased for Xb iff Ety Xb7 for all b D An estimator ty is said to be a linear estimator in y iff ty c a y7 for c E R and a 11412aN 7 ai E R CO A function Xb is said to be linearly estimable iff there exists a linear unbiased estimator for it Otherwise7 Xb is said to be nonestimable Result E1 Under the model assumptions y Xbe7 where Ee 07 a linear function Xb is estimable iff there exists a vector a such that X a X that is7 X E Proof Suppose that there exists a vector a such that X a X Then7 Ea y a Xb Xb7 for all b Therefore7 a y is a linear unbiased estimator of Xb and hence Xb PAGE 66 CHAPTER 3 STAT 7147 J TEBBS is estimable Suppose that Xb is estimable Then7 there exists an estimator 0a y that is unbiased for it that is7 Eca y Xb7 for all b Note that Eca y ca Xb7 so Xb c ale7 for all b Taking b 0 shows that c 0 Successively taking b to be of e e the standard unit vectors7 convinces us that X a X D 7 p7 Example 31 Consider the one way xed effects ANOVA model yij M 041 6173 for 239 127a and j 1277117 where Eeij 0 Take a 3 and m 2 so that 911 1 1 0 0 112 1 1 0 0 M y gm 7 X 1 0 1 0 7 and b 041 122 1 0 1 0 042 931 1 0 0 1 a3 132 1 0 0 1 Note that rX 37 so X is not of full rank ie7 b is not uniquely estimable Consider the following parametric functions Xb Parameter X X E Estimable le 11 X1 17 07 07 0 no no XZb 041 X2 0100 no no Agbuo 1 Ag 1717070 yes yes X4b 041 7 042 A 01710 yes yes X5b04170420432 X501712712 yes yes Because Agb M 041 Mb 041 7 042 and X5b 041 7 042 a32 are linearly estimable7 there must exist linear unbiased estimators for them Note that 7 911 912 Em E f 7 M0412M0412 Ma1 ng PAGE 67 CHAPTER 3 STAT 7147 J TEBBS and that H 0 a y where c 0 and a 12120000 Also My 7 732 u a1 7 u a2 041 7 042 and that 171 7172 c a y where c 0 and a 1212 712 71200 Finally M041M042M0432 a17oz2oz32X5b Note that 7 17 y 1 HaM where c 0 and a 1212714714714714 D REMARKS 1 The elements of the vector Xb are estimable 2 If AbX2bA b are estimable then any linear combination of them ie 21 diAgb where di 6 R is also estimable 3 If X is N gtlt p and rX p then RP and Xb is estimable for all A DEFINITION Linear functions le XZb Azb are said to be linearly independent if A1 A2 Ak comprise a set of linearly independent vectors ie A A1 A2 Ak has rank k Result E2 Under the model assumptions y Xb e where Ee 0 we can al ways nd 7 rX linearly independent estimable functions Moreover no collection of estimable functions can contain more than 7 linearly independent estimable functions Proof Let 7 denote the 2th row of X for 239 12 N Clearly 13b y2b 7Vb are estimable Because rX r we can select 7 linearly independent rows of X the corre sponding 7 functions 72b are linearly independent Now let Ab AibeZb Azb be any collection of estimable functions Then A E RX for 239 1 2 k and hence PAGE 68 CHAPTER 3 STAT 714 J TEBBS there exists a matrix A such that A A X Therefore rA rA X S rX r Hence there can be at most r linearly independent estimable functions D DEFINITION A least squares estimator of an estimable function Xb is Xb where b X X X y is any solution to the normal equations Result E3 Under the model assumptions y Xb e where Ee 0 if Xb is estimable then Xb Xb for any two solutions I and b to the normal equations Proof Suppose that Xb is estimable Then X a X for some a From Result LS5 Xb a Xb a ny Xb aXb any This proves the result D Alternate proof If E and 1 both solve the normal equations then X Xb 7 0 that is b 7 b E NX X lf Xb is estimable then X E ltgt A E CX ltgt A1NX Thus X03 7 B 0 ie XE XE D IMPLICATION Least squares estimators of linearly estimable functions are invariant to the choice of generalized inverse used to solve the normal equations Example 32 In Example 31 we considered the one way xed effects ANOVA model yij M 04 61739 for 239 123 and j 12 For this model it is easy to show that X X DMMCTJ 2 2 2 2 0 0 0 2 0 0 0 2 and rX X 3 Here are two generalized inverses of X X 0 O O ch A H l ch A l ch A Uh t Uh t X XE 0 0 0 H mh A 0 0 XXX 0 0 O O NlH O O O O O NlH O O Uh A O l 2 PAGE 69 CHAPTER 3 STAT 7147 J TEBBS Note that 911 1 1 1 1 1 1 912 911 912 921 912 931 932 Xy 1 1 0 0 0 0 gm 911 912 0 0 1 1 0 0 922 921 922 0 0 0 0 1 1 931 931 932 932 Two least squares solutions verify are thus 0 93 13 X XX y y and E X XX y y 7 y 92 92 93 93 0 Recall our estimable functions from Example 31 Parameter X X E Estimable Agbuo 1 X 1717070 yes yes X4b 041 7 042 A 07177170 yes yes X5b04170420432 X501712712 yes yes Note that for Agb M 041 the unique least squares estimator is Agb Agb 171 For Ailb 041 7 042 the unique least squares estimator is ZLb A111 91 92 For X5b 041 7 042 avg27 the unique least squares estimator is AgB X5b 21 7 7321 9392 PAGE 70 CHAPTER 3 STAT 7147 J TEBBS Finally note that these three estimable functions are linearly independent since 1 0 0 1 1 Alt3 A4 A5 0 71712 0 0 712 has rank rA 3 Of course more estimable functions Agb can be found but we can nd no more linearly independent estimable functions because rX 3 D Result E4 Under the model assumptions y Xb e where Ee 0 the least squares estimator Xb of an estimable function Xb is a linear unbiased estimator of Xb Proof Suppose that b solves the normal equations We know by de nition that Xb is the least squares estimator of Xb Note that Xb XX X X y17X X X Xz XX X X y XI i x xrx xiz Also Xb is estimable by assumption so X E ltgt A E CX ltgt ALAX Result M13 says that 17XXTXXZ E NX X NX so AIiXXTXXZ 0 Thus Xb XX X X y which is a linear estimator in y We now show that Xb is unbiased Because Xb is estimable X E gt X a X for some 21 Thus EXEEXX X X y XX X X Ey XX X X Xb a XX X X Xb a PXXb a Xb Xb D ESTIMABILITY Consider the linear model y Xb e where Ee 0 From the de nition we know that Xb is estimable iff there exists a linear unbiased estimator for it so if we can nd a linear estimator 0a y whose expectation equals Xb for all b then Xb is estimable From Result E1 we know that Xb is estimable iff X E Thus if X can be expressed as a linear combination of the rows of X then Xb is estimable PAGE 71 CHAPTER 3 STAT 7147 J TEBBS IMPORTANT Here is a commonly used method of nding necessary and su icient conditions for estimability Suppose that X is N gtlt p with rank 7 lt p We know that Xb is estimable iff X E 0 Typically when we nd the rank of X we nd 7 linearly independent columns of X and express the remaining 5 p 7 7 columns as linear combinations of the r linearly independent columns of X Suppose that c1c2 cS satisfy Xc 0 for 239 12s that is C E NX for 239 12s lf C1C2C5 forms a basis for ie c1c2 C9 are linearly independent then ACl 0 ACZ 0 XcS 0 are necessary and suf cient conditions for Xb to be estimable REMARK There are two spaces of interest here CX and If X is Ngtltp with rank 7 lt p then dimCX r and s p77 lf c1c2 C9 are linearly independent then C1C2 cs must be a basis for But Xb estimable ltgt X e RX ltgt A e CX ltgt A is orthogonal to every vector in ltgt A is orthogonal to c1c2 cS ltgt Xci 0 z3912s Therefore Xb is estimable iff Xc 0 for 239 1 2 s where c1 c2 C5 are 5 linearly independent vectors such that Xc 0 TERMINOLOGY A set of linear functions A1bX2b A b is said to be jointly nonestimable if the only linear combination of AibX2bA b that is estimable is the trivial one ie E 0 PAGE 72 CHAPTER 3 STAT 7147 J TEBBS 321 Oneway ANOVA GENERAL CASE Consider the one way xed effects ANOVA model yij M 04 61739 for 239 1721 and j 1277117 where Eeij 0 ln matrix form7 X and b are M 1n1 1n1 0n1 0n1 041 In On 1 0 Xpr 392 392 392 392 and prl 042 7 1 0 0 1 all where p a 1 and N m Note that the last 1 columns of X are linearly independent and the the rst column is the sum of the last 1 columns Hence7 rX r a and s p 7 r 1 With c1 1711 note that XcL 0 and that Cl forms a basis for Thus7 the necessary and suf cient condition for Xb AoM ELI A104 to be estimable is 1 ACl 0gt0 i1 Here are some examples of estimable functions 1 M 04 2 oz 7 04k 3 any contrast in the 047s ie7 21 Aim7 where ELI Al 0 Here are some examples of nonestimable functions 2 04139 a 3 Zi1 7740412 There is only 5 1 jointly nonestimable function Later we will learn that jointly non estimable functions can be used to force77 particular solutions to the normal equations PAGE 73 CHAPTER 3 STAT 714 J TEBBS The following are examples of sets of linearly independent estimable functions verifyl 1 u amt a2 mu an 2 M 041041 7 042 041 7 an LEAST SQUARES ESTIMATES We now wish to calculate the least squares estimates of estimable functions Note that X X and one generalized inverse of X X is given by N n1 n2 na 0 0 0 0 n1 n1 0 0 0 1711 0 0 X X n2 0 n2 0 and X XY 0 0 1712 0 na 0 0 na 0 0 0 171 For this generalized inverse the least squares estimate is 0 0 0 0 2211 0 01711 0 0 ijlj yl bX X X y 0 0 1712 0 zjyzj Q 0 0 0 171 ijW 17 REMARK We know that this solution is not unique had we used a different generalized inverse above we would have gotten a different least squares estimate ofb However least squares estimates of estimable functions Xb are invariant to the choice of generalized inverse so our choice of X X above is as good as any other From this solution we have the unique least squares estimates Estimable function Xb Least squares estimate Xb M cm 1714 CW CW 1714 17 21 Aim with 2 A 0 2 My PAGE 74 CHAPTER 3 STAT 7147 J TEBBS RECALL For the one way xed effects ANOVA model7 711711 1M1 0 0 n lJn n PX Blk Diagn1JmW t 2 39 2X 2 0 0 The perpendicular projection of y onto CX is 1711n1 7 1 ny y m 1 ya a Ngtlt1 322 Twoway crossed ANOVA With no interaction 71 na Jna Xna NgtltN GENERAL CASE Consider the two way xed effects crossed ANOVA model yijk M 04 5739 6mm for 239 127a andj 127b7 and k 12771177 where Eeij 0 For ease of presentation7 we take nij 17 so there is no need for a k subscript The model is written as yij M 04 67 iii In matrix form7 X and b are lb lb Ob Ob 15 1b 0b 1b 39 39 39 0b 1b XNXP lb 5 ob lb 15 and bpxl an PAGE 75 CHAPTER 3 STAT 7147 J TEBBS where p a b 1 and N ab Note that the rst column is the sum of the last b columns The 2nd column is the sum of the last b columns minus the sum of columns 3 through a 1 Thus7 we have s 2 linear dependencies so that rX a b 7 1 The dimension of is s 2 Taking 1 1 c1 7 1a and c2 0a 0b 11 produces XcL Xcz 0 Since c1 and c2 are linearly independent7 chcz is a basis for Thus7 necessary and suf cient conditions for Xb to be estimable are ACl 0 gt i1 b ACZ 0 gt A0 ZAaj j1 Here are some examples of estimable functions 1 M04i j 2 aiiak 3 j k 4 any contrast in the 047s ie7 ELI Aim where ELI AZ 0 5 any contrast in the 67s ie7 221 Aa j where 221 Aa 0 Here are some examples of nonestimable functions 1 M 2 al 3 5739 4 22110 539 2213quot PAGE 76 CHAPTER 3 STAT 7147 J TEBBS There are 5 2 jointly nonestirnable functions Examples of sets of jointly nonestirnable functions are 1 M1041 2 212257 A set of linearly independent estimable functions verify is 1 M 041 517041 0427 7041 0411751 i 527 751 5b LEAST SQUARES ESTIMATES When replication occurs ie7 when m gt 17 for allz39 and j our estirnability ndings are unchanged Replication does not change We obtain the following least squares estirnates Estirnable function7 Xb Least squares estirnate7 Xb M CH 5739 171 04139 0 1 EH 171M 5739 Bl l7j yl 2211 Ciaiv With 2211 Ci 0 221 Cigi 221 di jv With 221 di 0 221 51127174 These formulae are still technically correct when m 1 When some m 07 ie7 there are missing cells7 estirnability may be affected see Monahan7 pp 46 48 323 Twoway crossed ANOVA With interaction GENERAL CASE Consider the two way xed effects crossed ANOVA rnodel yijk M 041 3739 mj 6mm for 239 1721 and j 127 b7 and k 12771277 where Eeij 0 PAGE 77 CHAPTER 3 STAT 7147 J TEBBS SPECIAL CASE With a 3 b 2 and nij n 2 X and b are 1 1 0 0 1 0 1 0 0 0 0 0 11 1 1 0 0 1 0 1 0 0 0 0 0 a1 1 1 0 0 0 1 0 1 0 0 0 0 a2 1 1 0 0 0 1 0 1 0 0 0 0 a3 1 0 1 0 1 0 0 0 1 0 0 0 61 X 1 0 1 0 1 0 0 0 1 0 0 0 and b g 1 0 1 0 0 1 0 0 0 1 0 0 711 1 0 1 0 0 1 0 0 0 1 0 0 712 1 0 0 1 1 0 0 0 0 0 1 0 721 1 0 0 1 1 0 0 0 0 0 1 0 722 1 0 0 1 0 1 0 0 0 0 0 1 731 1 0 0 1 0 1 0 0 0 0 0 1 732 There are p 12 parameters The last six columns of X are linearly independent and the other columns can be written as linear combinations of the last six columns so TX 6 and s p 7 r 6 To determine which functions Xb are estimable we need to nd a basis for One basis C1C2 c6 is i1 i1 0 0 0 i1 1 0 i1 0 1 1 0 0 i1 0 1 1 0 0 0 0 0 0 1 0 0 i1 1 0 1 0 0 0 0 0 7 0 7 1 7 0 7 1 7 i1 0 0 1 0 0 0 0 0 0 1 1 i1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 PAGE 78 CHAPTER 3 STAT 714 J TEBBS Functions Xb must satisfy Xe 0 for each 239 1 2 6 to be estimable It should be obvious that neither the main effect terms nor the interaction terms ie 04 67 ylj are estimable on their own However the six 04 67 77 cell means77 terms are these are not that interesting though lnteraction makes the analysis more dif cult No longer are contrasts in the as or 67s estimable 33 Reparameterization SETTING Consider the Gauss Markov model Model GM y Xb e where Ee 0 cove 021 Assume that X is N gtlt p with rank 7 S p Suppose that W is an N gtlt t matrix such that CW Then we know that there exist matrices Tpxt and prt such that W XT and X WS Note that Xb WSb WC where c S b The model Model GM R y Wc e where Ee 0 cove 021 is called a reparameterization of Model GM REMARK Since Xb WSb Wc XTc we might suspect that the estimation of an estimable function Xb under Model GM should be essentially the same as the estimation of XTc under Model GM R and that estimation of an estimable function q c under Model GM R should be essentially the same as estimation of q S b under Model GM The upshot of the following results is that in determining a least squares estimate of an estimable function Xb we can work with either Model GM or Model GM R The actual nature of these conjectured relationships is now made precise Result E5 Consider Models GM and GM R with CW 1 PW PX PAGE 79 CHAPTER 3 STAT 7147 J TEBBS 2 If E is any solution to the normal equations W Wc W y associated with Model GM R7 then b T5 is a solution to the normal equations X Xb X y associated with Model GM 9 lf Xb is estimable under Model GM and if E is any solution to the normal equa tions W Wc W y associated with Model GM R7 then XTE is the least squares estimate of Xb r If q c is estimable under Model GM R ie7 if q E 72W7 then q S b is estimable under Model GM and its least squares estimate is given by q E where E is any solution to the normal equations W Wc W y Proof 1 This is obvious 2 Note that X XTE X WE X Pwy X ny X y Hence7 T6 is a solution to the normal equations X Xb X y 3 This follows from 27 since the least squares estimate is invariant to the choice of the solution to the normal equations 4 If q E 72W7 then q aW7 for some a Then7 q S a WS a X E 72X7 so that q S b is estimable under Model GM From 37 we know the least squares estimate of q S b is q S TE But7 qSTE aWSTE aXTE aWE qE D WARNING The converse to 4 is not true that is7 q S b being estimable under Model GM doesn7t necessarily imply that q c is estimable under Model GM R See Monahan7 pp 52 FULL RANK REPARAMETERIZATION Because CW CX and rX r WNxt must have at least 7 columns lf W has exactly 7 columns ie7 if t r then the reparameterization of Model GM is called a full rank reparameterization lf7 in addition7 W W is diagonal7 the reparameterization of Model GM is called an orthogonal reparameterization see7 eg7 Example 21 notes PAGE 80 CHAPTER 3 STAT 714 J TEBBS NOTE A full rank reparameterization always exists just delete the columns of X that are linearly dependent on the others In a full rank reparameterization W W 1 exists so the normal equations W Wc W y have a unique solution ie E W W 1W y DISCUSSION There are two opposing points of view concerning the utility of full rank reparameterizations Some authors argue that since making inferences about q c under the full rank reparameterized model Model GM R is equivalent to making inferences about q S b in the possibly less than full rank original model Model GM the inclusion of the possibility that the design matrix has less than full column rank causes a needless complication in linear model theory An opposing argument is that since the computa tions required to deal with the reparameterized model are essentially the same as those required to handle the original model we might as well allow for less than full rank mod els in the rst place I tend to favor the latter point of view to me there is no reason not to include less than full rank models as long as you know what you can and can not estimate But at the same time I dont really think that individual parameters in linear models are all that meaningful rather I am convinced that the models themselves are of key importance lndividual parameters are just part of the model and on their own do not provide enough information about that which is under investigation Example 33 Consider the one way xed effects ANOVA model 917 M 041 6177 for 239 12 a and j 12n where 67 are uncorrelated random variables with zero mean and variance 02 gt 0 ln matrix form X and b are H 11 11 01 01 041 XM 2 2 2 and bw a2 7 1m 0m 0m 1m 04a where p a 1 and N 71 This is not a full rank model since the rst column is the sum of the last 1 columns ie rX a PAGE 81 CHAPTER 3 STAT 7147 J TEBBS Reparameterization 1 Deleting the rst column7 we have 1m 0m 39 39 39 0m M 0 1 M1 on 1n 0n a WNXt 2 392 t 392 and ctxl M t 2 E M2 7 07m 07m 17m M an Ma where t a and M M 041 This is called the cellmeans model This is a full rank reparameterization with CW The least squares estimate of c is H A 71 y2 c W W W y 39 111 EXERCISE What are the matrices T and S associated with this reparameterization Reparameterization 2 Deleting the last column7 we have 1m 1m 0m 39 39 39 0m M an 1712 0712 1712 39 39 39 0712 041 7 042 Wm 39 39 and cm a2 7 04a 7 lnail Onail Onail 39 39 39 lnail 1 0m 0m 0 ail 04a where t a This is called the cellreference model what SAS uses by default This is a full rank reparameterization with CW The least squares estimate of c is ya H EH 6 W erw y g2 7 y a71 7 111 EXERCISE What are the matrices T and S associated with this reparameterization PAGE 82 CHAPTER 3 STAT 7147 J TEBBS Reparameterization 3 Another reparameterization of Model GM is 1n1 1n1 0n1 0n1 M E 1n2 0n2 1n2 0n2 a1 7 3 Wm E E E E and cm a2 73 l air OM71 OM71 Irma 1 71 71 71 awl 7 E where t a and E a l 041 This is called the deviations from the mean model This is a full rank reparameterization with CW EXERCISE What are the matrices T and S associated with this reparameterization Example 34 Two part multiple liriear regression model Consider the linear model y Xb e7 where Ee 0 and cove 021 Suppose that X is full rank Write X X1 X2 and b bf1L7 b z so that the model can be written as Model GM y X1b1 ng2 e Set W1 X1 and W2 I7PX1X27 where le X1X LX1 le1L is the perpendicular projection matrix onto CXl A reparameterized version of Model GM is Model GM R y W1c1 W2c2 e7 where W W1 W2 and c1 7 b1 XiX lXngbg C2 b2 With this reparameterization7 note that verify XX 0 X W W 1 1 and W y 1y 0 KW 7 Px1gtX2 X20 7 any so that Xlxllilxiy E1 6 W erw y A X 2I 7 Px1X2 1X 21 7 Px1y c2 PAGE 83 CHAPTER 3 STAT 7147 J TEBBS say The importance of this reparameterization is that W2 can be thought of as the residual77 from regressing each column of X2 on X1 A further calculation shows that 31 El 7 X3X171X1XZBZ E X X 1X y A b2 c2 where note that XiX lXng is the estimate obtained from regressing77 X2 on X1 Furthermore7 the estimate 52 can be thought of as the estimate obtained from regressing y on W2 I 7 PX1X2 EXERCISE What are the matrices T and S associated with this reparameterization APPLICATION Consider the two part full rank regression model y X1b1 X2b2 e7 where Ee 0 and cove 03921 Suppose that X2 x2 is N gtlt 1 and that b2 b2 is a scalar Conceptualize two different models Reduced model y X1b1 e Full model y X1b1 b2X2 e We use the term reduced model77 since CXl C CXth Consider the full model y X1b1 b2X2 e and premultiply by I 7 le to obtain I 7 ley I 7 PX1X1b1 b2I 7 PX1X2 I 7 lee 521 PX1X2 67 where e I 7 lee Now7 note that I 7 ley y 7 ley E Ey xl say7 are the residuals from regressing y on X1 Similarly7 I 7 lex2 E Emmi are the residuals from regressing x2 on X1 We thus have the following induced linear model Ey xl bgax xl 6 where Ee 0 The plot of Ey xl versus Emmi is called an addedvariable plot or partial regression plot It displays the relationship between y and x2 after adjusting for the effects of X1 being in the model If a linear trend exists in this plot7 this suggests that x2 enters into the full model linearly This plot can also be useful for detecting outliers and high leverage points On the down side7 added variable plots only look at PAGE 84 CHAPTER 3 STAT 7147 J TEBBS one predictor at a time so one can not assess multicolinearity that is7 if the predictor x2 is close7 to CXl7 this may not be detected in the plot Finally7 note that the slope of the least squares regression line for the added variable plot is 52 I Px1X2I Px1X2l 11 Px1X2I Px1y KEG PX1X2 1X21 PX1Y This is equal to the least squares estimate of b2 in the full model 34 Unique least squares solutions via linear constraints REVIEW Consider our general linear model y Xb e7 where Ee 07 and X is an N gtlt p matrix with rank 7 The normal equations are X Xb X y o If r p7 then a unique least squares solution exists ie7 b X X 1X y o If r lt p7 then a least squares solution is E X X X y This solution is not unique its value depends on which generalized inverse X X is used Example 35 Consider the one way xed effects ANOVA model yij u 04 6177 for z3912a and j 1277117 where Eeij 0 The normal equations are N n1 n2 nu M 2274117 n1 n1 0 0 041 2739 91739 XXb n2 0 n2 0 a2 Z yzj y7 na 0 0 na 0 7 ya or7 written another way7 a NM 2mm y i1 M Mai 1147 17277a7 where y 27y for 239 127 a7 and y Z yij This set of equations has no unique solution However7 note that PAGE 85 CHAPTER 3 STAT 7147 J TEBBS if we set M 07 then we get the solution M 0 and Ei 147 for 239 17271 if we set 21 mm 07 then we get the solution M y and Ei 17H 7 177 for 239 1 2 if we set another nonestimable function equal to 07 well get a different solution to the normal equations REMARK Equations like M 0 and 21 mm 07 which are used to force77 a particu lar solution to the normal equations7 are called side conditions Different side conditions produce different least squares solutions We know that in the one way ANOVA model7 the parameters M and 041 for 239 172717 are not estimable individually lmposing side conditions does not change this In fact7 we are doing nothing more than creating an arbitrary solution to a mathematical problem that in my opinion isn7t relevant The good news is that estimable functions Xb have least squares estimates that do not depend on which side condition was used Estimable functions are the only functions we should ever be concerned with REMARK We have seen similar results for the two way crossed ANOVA model In general7 what and how many conditions should we use to force77 a particular solution to the normal equations Mathematically7 we are interested in imposing additional linear restrictions of the form Cb 0 where the matrix C does not depend on y TERMINOLOGY We say that the system of equations Ax q is compatible if c A 0 gtc q0 ie7 c NA gt c q 0 Result E6 The system Ax q is consistent if and only if it is compatible Proof If Ax q is consistent7 then Ax q7 for some X5 Hence7 for any c such that c A 07 we have c q c Ax 07 so Ax q is compatible lf Ax q is compatible7 then for any c E NA CI 7 PA7 we have 0 c q q c q I 7 PAZ7 for all z Successively taking 2 to be the columns ofthe identity matrix the standard unit vectors7 we have q I 7 PA 0 gt I 7 PAq 0 gt q AA A A q gt q AX7 where x A A A q Thus7 Ax q is consistent D PAGE 86 CHAPTER 3 STAT 7147 J TEBBS AUGMENTED NORMAL EQUATIONS We consider adjoining the set of equations Cb 0 to the normal equations that is we consider the new set of equations C 0 X X b X y These are called the augmented normal equations When we add C we want these equations to be consistent for all y We now would like to nd a suf cient condition for consistency Suppose that W E RX X Note that W E RX X gt W X le for some V1 W E RC gt W CVZ for some V2 Thus 0 W 7 W X le C Vg gt 0 viXX VZC X X X X gt 0 V1 V2 V 7 C where V Vl We want C chosen so that XX X b y C 0 is consistent or equivalently from Result E6 is compatible Compatibility occurs when XX 0 gt V Xy 0 0 Thus we need viX y 0 for all y Successively taking y e the standard unit vector forz39 1 2 N convinces us that V1X 0 ltgt XV1 0 ltgt X le 0 gt W 0 Thus the augmented normal equations are consistent when RX X RC Since RX X RX a suf cient condition for consistency is RX m 720 0 Now consider the parametric function XCb for some A We know that XCb is es timable if and only if XC E However clearly NC 6 Thus XCb is PAGE 87 CHAPTER 3 STAT 714 J TEBBS estimable if and only if XCb 0 In other words writing the set of functions cib c zb cgb is jointly nonestimable So we can set a collection ofjointly nonestimable functions equal to zero and augment the normal equations so that they remain consistent We get a unique solution if X X C Because RX X RC 0 p r rXX rC C rrltcgt showing that we need MC s p 7 r SUMMARY To augment the normal equations we can nd a set of s jointly nonestimable functions cine2b c sb with Then X X X b y C 0 is consistent and has a unique solution B Example 35 continued Consider the one way xed effects ANOVA model 917 MO i5ij7 PAGE 88 CHAPTER 3 STAT 7147 J TEBBS for 239 12 a and j 1 2 ni where Eej 0 The normal equations are N n1 n2 nu M 2139 27 W n1 n1 0 0 041 2739 91739 XXb n2 0 n2 0 a2 Z yzj no 0 0 39 39 39 no at 2739 yaj We know that rX r a lt p this system can not be solved uniquely and that s p 7 r a 1 7 a 1 Thus to augment the normal equations we need to nd 5 1 jointly nonestimable function Take c3 100 0 which produces For this choice of c1 the augmented normal equations are N 711 712 quot39 71a 27 yij M 711 711 0 quot39 0 27 91739 041 X X n2 0 712 0 27 92739 Xy b a2 c1 39 t 0 na 0 0 71 t 7 ya all 1 0 0 0 0 Solving this now full rank system produces the unique solution 0 ai 17H Z3912a You7ll note that this choice of c1 used to augment the normal equations corresponds to specifying the side condition M 0 D EXERCISE Redo this example using a the side condition 2 ma 0 and b using another side condition PAGE 89 CHAPTER 3 STAT 7147 J TEBBS Example 36 Consider the two way xed effects crossed ANOVA model 111739 M i ai i j i eij7 for 239 12711 and j 127 b7 where Eeij 0 For purposes of illustration7 let7s takeab3sothatNab9andpab17 lnmatrixformXandbare 1100100 1 1 0 0 0 1 0 M 1 1 0 0 0 0 1 041 1 0 1 0 1 0 0 042 X9gtlt7 1 0 1 0 0 1 0 and b7gtlt1 a3 1 0 1 0 0 0 1 31 1 0 0 1 1 0 0 g 1 0 0 1 0 1 0 g 1001001 We see that rX r 5 so that s p 7 r 7 7 5 2 The normal equations are 3 3 3 3 3 3 M 22ij 3 0 01 1 1 a1 27111 0 3 01 1 1 a2 27112 X Xb 3 0 0 3 1 1 1 a3 ijgj X y 3111300 01 2101 3111030 02 1in 3111003 03 203 This system does not have a unique solution To augment the normal equations7 we will need a set of s 2 linearly independent jointly nonestimable functions From Section 3227 one example of such a set is 041 Z 67 For this choice7 our matrix C is ca 0 1 1 1 0 0 0 c c 2 0000111 PAGE 90 CHAPTER 4 STAT 714 J TEBBS Thus the augmented normal equations become 9333333 227 3300111 1 211 3030111 a1 27112 XX 3003111 a2 2113 Xy C 73111300 043 2011 0 3111030 61 2012 3111003 62 2013 0111000 03 0 0000111 0 Verify that the solution to this system is 7 17H 1 73170 1123 5739 73j 17 j17273 QUESTION In general can we give a mathematical form for the particular solution Note that we are now solVing X X X y C 0 which is equivalent to X X X b 7 y CC 0 since Cb 0 iff C Cb 0 Thus any solution to this system must also satisfy X X C Cb X y X X rXXCCr X C r p C C that is X X C C is nonsingular Hence the unique solution to the augmented normal equations must be S X X C C 1X y PAGE 91 CHAPTER 4 STAT 7147 J TEBBS 4 The GaussMarkov Model Complementary reading from Monahan Chapter 4 omit Section 47 for now 41 Introduction REVIEW Consider the general linear model y Xb e7 where Ee 0 o A linear estimator ty c a y is unbiased for Xb if and only if Ety Xb7 for all b We have seen this implies that c 0 and X E ie7 Xb is estimable When Xb is estimable7 it is possible to nd several estimators that are unbiased for Xb For example7 in the one way xed effects ANOVA model yij M al 61739 with Eeij 07 each of yn ylg and H is an unbiased estimator of M 041 and there are many others7 too If Xb is estimable7 then the ordinary least squares estimator Xb where b is any solution to the normal equations X Xb X y7 is unbiased for Xb Recall that Xb XX X X y 2ny7 where X a X and PX is the perpendicular projection matrix onto Thus7 EXb Eany aPXEy aPXXb aXb Xb GOAL Among all linear unbiased estimators for Xb7 we want to nd the best linear unbiased estimator in the sense that it has the smallest variance We will show that the least squares estimator Nb is the best linear unbiased estimator BLUE of Xb7 provided that Xb is estimable and cove 021 42 The GaussMarkov Theorem Result GM1 Consider the Gauss Markov model y Xb e7 where Ee 0 and cove 021 Suppose that Xb is estimable and let b denote any solution to the normal PAGE 92 CHAPTER 4 STAT 714 J TEBBS equations X Xb X y The ordinary least squares estimator Xb is the best linear unbiased estimator BLUE of Nb that is the variance of Xb is uniformly less than that of any other linear unbiased estimator of Xb Proof Suppose that g 0a y is another linear unbiased estimator of Xb From Result E1 we know that c 0 and that X a X Thus 1 a y where X a X Now write 1 XS 137 XS Note that var varXb 37 Xb varXb var 7 Xb 2covXb 7 Xb T We now show that covXb g7 Nb 0 Recalling that Xb a PXy we have covXb g7 Xb covany ay 7 any cova PXya I 7 Pxy a Px00Vy7 Yla 1 7 Pxl UZIaPx1 7 Pxa 0 since PXI7 PX 0 Thus var 2 varXb showing that Xb has variance no larger than that of Equality results when var 7 Xb 0 However if var 7 Xb 0 then because E 7 Xb 0 as well a7 Xb is a degenerate random variable at 0 ie pr Xb 1 This establishes uniqueness D MULTIVARIATE CASE Suppose that we wish to estimate simultaneously k estimable linear functions le Ab Xzb Azb where A agX for some a 239 12k We say that Ab is estimable if and only if Agb 239 12k are estimable Put another way Ab is estimable if and only if A A X for some matrix A PAGE 93 CHAPTER 4 STAT 7147 J TEBBS Result GM2 Consider the Gauss Markov model y Xb e where Ee 0 and cove 021 Suppose that Ab is any k dimensional estimable vector and that c A y is any vector of linear unbiased estimators of the elements of Ab Let b denote any solutions to the normal equations Then the matrix covc A y 7 covAb is nonnegative de nite Proof lt suf ces to show that x covc A y 7 covAbx 2 0 for all x Note that x covc A y 7 covAbx XCOVC A yx 7 XCOVABX varxc xAy 7 varxAb For any x XAb a scalar parametric function is estimable and X cx A y x cA y is a linear unbiased estimator of XAb The least squares estimator of XAb is XAB Thus by Result GM1 varx c X A y 7 V81 XAB 2 0 D OBSERVATION Consider the Gauss Markov model y Xb e where Ee 0 and cove 021 Note that if X is of full rank then X X is nonsingular and every linear combination of Xb is estimable The ordinary least squares estimator of b in this case is b X X 1X y It is unbiased and covb covXX 1Xy X X 1XcovyXXlel X X 1X UZIXX X 1 02X X 1 Example 41 Recall the simple linear regression model 9139 50 51 51 for 239 1 2 N where 61 62 6N are uncorrelated random variables with mean 0 and common variance 02 gt 0 these are the Gauss Markov assumptions Recall that in matrix notation 11 1 1 51 1 z e y 12 7 X 2 7 b 50 7 e 2 51 1W 1 N 6N PAGE 94 CHAPTER 4 STAT 7147 J TEBBS ln Example 21 notes we saw that the least squares estimator of b is A Bo 17731 7 71 7 i biXX Xy B 1 ZMFW Also 1 52 i E COWS 02X X 1 U2 N i 7 WWW 24175 43 Estimation of 02 in the GM model REVIEW Consider the Gauss Markov model y Xbe where Ee 0 and cove 021 We have seen the best linear unbiased estimator BLUE for any estimable function Xb is Xb where b is any solution to the normal equations Clearly Ey Xb is estimable and the BLUE of Ey Xb is XE XX X 1X y ny a the perpendicular projection of y onto CX that is the tted values from the least squares t The residuals are given by Ey yiny1Pxy7 the perpendicular projection of y onto Recall that the residual sum of squares is Q00 3 3 Y 1 Pxy We now turn our attention to estimating 02 Result GM3 Suppose that z is a random vector with mean u Ez and covariance matrix covz 2 Let A be nonrandom Then EzAz uAM trA2 Proof Note that Z Az is a scalar random variable hence Z Az trz Az Also recall that expectation and tr are linear operators Finally recall that trAB trBA PAGE 95 CHAPTER 4 STAT 714 J TEBBS for any conformable A and B Now EzAz EtrzAz EtrAzz trlAEzz MIME MM trA2 trAuu trA2 t7 uAu u AM trA2 D REMARK In general the formula for varz Az is far more dif cult see Section 49 in Monahan Considerable simpli cation results for varz Az when 2 follows a multivariate normal distribution APPLICATION We now nd an unbiased estimator of 02 under the GM model y Xb e where Ee 0 and cove 021 Suppose that y is N gtlt1 and X is N gtlt p with rank 7 S p Note that Ey Xb Applying Result GM3 directly with A I 7 PX we have EliI 7 Pxgtyi 7 ltXbgt ltI 7 PXXb ma 7 Px021l 02tr177 trPX 7le 7 TPX 02N 7 7 Thus 32 N 7 r 1yI 7 Pxy is an unbiased estimator of 02 in the GM model In non matrix notation N 32 N 7 r1 2m 7 iii2 i1 where is the least squares tted value of yi ANOVA Consider the Gauss Markov model y Xb e where Ee 0 and cove 021 Suppose that y is N gtlt 1 and X is N gtlt p with rank 7 S p Recall from Chapter 2 the basic form of an ANOVA table PAGE 96 CHAPTER 4 STAT 714 J TEBBS Source df SS MS F Model 7quot SSR yny MSR SSRr MSRMSE Residual N 7 r SSE y I 7 Pxy MSE SSEN 7 7quot Total N SST y y NOTES 0 Notice that the degrees of freedom associated with each SS is the rank of an appro priate perpendicular projection matrix that is rPX r and rI 7 PX N 7 r 0 Note that covyE covPXy I 7 Pxy PXUZII 7 PX 0 That is the least squares tted values are uncorrelated with the residuals 0 We have just shown that EMSE 02 Note that ESSR Ey PXy Xb PxXb trPXU21 Xb Xb 03927 Px Xb Xb r02 Thus EMSR r 1ESSR 02 r 1Xb Xb 0 Consider the F statistic de ned by F MSRMSE Note that if Ey Xb 0 ie the predictor variables in X do not add anything to the model then MSR and MSE are both unbiased estimators of 02 and F should be close to 1 Large values of Ey Xb produce on average large values of F 44 The geometry of linear model misspeci cation MISSPECIFICATION Consider the Gauss Markov model y Xb e where Ee 0 and cove 021 Suppose that y is N gtlt1 and X is Ngtltp with rank 7 S p No statistical model is ever truly correct77 and the GM model is no exception We now investigate two instances of model misspeci cation under tting and over tting PAGE 97 CHAPTER 4 STAT 7147 J TEBBS 441 Under tting UNDERFITTING Suppose that7 in truth7 the correct model for y is y Xb W6 e7 where the vector 1 W6 includes the variables and coef cients missing from Xb Rarely will anyone ever tell us the true W However7 if the analyst uses y Xb e instead to describe the data7 she may be missing important variables that are in W7 that is7 the analyst is under tting the true model We now examine the effect of under tting on the least squares estimates of estirnable functions and the error variance 02 CONSEQUENCES Suppose that Xb is estirnable under y Xb e ie7 X a X7 for some vector 21 The least squares estimator of Nb is given by Xb XX X X y lf W6 07 then EXb Xb lf W6 31 07 then7 under the correct rnodel7 EXB ElXX X X y XX X X Ey XX X X Xb W6 a XX X X Xb aXX X X W6 aPXXb aPXW6 Xb a PxW6 showing that Xb is no longer unbiased7 in general The amount of the bias depends on where W6 is located If W6 is orthogonal to CX7 then PXW6 0 and the estimation of Xb with Nb is unaffected Otherwise7 PXW6 31 0 and the estimate of Xb is biased Now7 let7s turn to the estimation of 02 Under the correct rnodel7 Ey I 7 may 7 Xb W6 I 7 PxXb W6 mlt1 7 PXUZI W6 I 7 PXW6 02N 7 r where r Thus7 EMSE U2 N 7 r 1W6 I 7 PXW6 that is7 32 MSE is unbiased if and only if W6 6 PAGE 98 CHAPTER 4 STAT 7147 J TEBBS 442 Over tting OVERFITTING Suppose that7 in truth7 the correct model for y is y lel 67 but7 instead7 we t 3 lel Xzbz 67 that is7 the extra covariables in X2 are not needed ie7 b2 0 Set X X1 X2 and suppose that X and X1 have full column rank ie7 a regression setting The least squares estimator of b1 under the true model is E1 X3X1 1X y We know that E661 b1 covE1 02X3X1 1 On the other hand7 the normal equations associated with the larger unnecessarily large model are XX XX b X XXb Xy ltgt 1 1 1 2 1 1y X2X1 X2X2 b2 Xzy and the least squares estimator of b is 57 El 7 X3X1X3X2 7 Xay 32 ngl we ng The least squares estimator is still unbiased for the larger model ie7 b1 and b2 0 Thus7 we assess the impact of over tting by looking at covBl Under the larger model7 COVBl 02X1X171 7 PX1X271X2X1X1X171 see the results of Exercise A72 in Monahan Thus7 covBl i covB1 02X3X1 1X3X2X 2I i PX1X2 1X 2X1X 1X1 1 PAGE 99 CHAPTER 4 STAT 714 J TEBBS Note that if the columns of X2 each orthogonal to CXl then Xng 0 and covb1 7 covb1 0 Furthermore xx1 xgxz xgx1 0 XX 1 xle xng 0 xgxz ie X X is block diagonal and bl XiX lX y b1 Now 0 if the columns of X2 are not all orthogonal to CXl then covb1 31 02X3X1 1 Furthermore as X2 gets closer to CXl then X 2I 7 PXX2 gets smaller This makes 7 PxX2 1 larger This makes covb1 larger Multicollinearity occurs when X2 is close77 to CXl Severe multicollinearity can greatly in ate the variances of the least squares estimates In turn this can have a deleterious effect on inference eg con dence intervals too wide hypothesis tests with no power predicted values with little precision etc Various diagnostic measures exist to assess multicollinearity eg VlFs condition numbers etc see the discussion in Monahan pp 80 82 45 The Aitken model and generalized least squares AITKEN MODEL The model y Xbe where Ee 0 and cove 02V V known is called the Aitken model It is more exible than the Guass Markov GM model because the analyst can incorporate correlation among the observed responses The GM model is a special case with V I that is responses are assumed to be uncorrelated NOTES In practice V is rarely known V must be estimated More on this later Also we will assume that V is positive de nite pd and hence nonsingular for reasons that will soon be obvious Generalizations are possible see Christensen Chapter 10 RECALL Because V is symmetric we can write V in its Spectral Decomposition ie V QDQ where Q is orthogonal and D is the diagonal matrix consisting PAGE 100 CHAPTER 4 STAT 7147 J TEBBS of A17A27AN the eigenvalues of V Because V is pd7 we know that AZ gt 07 for each 239 127N The symmetric square root of V is VlZ QD lZQ 7 where DlZ imam m W Note that VlZvl2 V and that V 1 V lZV lZ where V lZ QDilZQ and D lZ diag1TL7 Um TRANSFORMATION Consider the Aitken model y Xb e7 where Ee 0 and cove 02V7 V known Premultiplying by V lZ7 we get the transformed77 model V lZy V lZXb V lZe Now set y V lZy7 U V lZX7 and e V lZe7 so that the transformed model can be written as y Ub e It is easy to show that y Ub e is now a GM model To see this7 note that Ee V 12Ee V lZO 0 and cove V 12c0veV 12 V IZUZVV 12 021 Note also that 7ZU7 because V lZ is nonsingular This means that Xb is estimable in the Aitken model if and only if Xb is estimable in the transformed GM model The covariance structure on e does not affect estimability AITKEN EQUATIONS In the transformed model7 the normal equations are U Ub U y Note that U Ub U y ltgt V lZX V 12Xb V lZX V 12y ltgt X V le X V ly in the Aitken model The equations X V le X V ly are called the Aitken equa tions These should be compared with the normal equations X Xb X y in the GM model In general7 we will denote by bGLS and bow the solutions to the Aitken and nor mal equations7 respectively GLS77 stands for generalized least squares OLS77 stands for ordinary least squares PAGE 101 CHAPTER 4 STAT 7147 J TEBBS GENERALIZED LEAST SQUARES Any solution bGLS to the Aitken equations is called a generalized least squares GLS estimator of b It is not necessarily unique unless X is full rank The solution bGLS minimizes Qb 7 f 7 Ub y 7 Ub 7 y 7 Xb V 1y 7 Kb When X is full rank the unique GLS estimator is Em X v71X71X V71y When X is not full rank a GLS estimator is Em X V lX X V 1y NOTE When V is diagonal ie V diago1o2 oN then N orb y e Xbgt V 1lty 7 Kb Ziay e x210 i1 where w 11 and x is the ith row of X In this situation bGLS is called a weighted least squares estimator Result GM4 Consider the Aitken model y Xb e where Ee 0 and cove 02V V known If Xb is estimable then XbGLS is the BLUE for Xb Proof Applying the GM Theorem to the transformed model y Ub e the GLS estimator XbGLS is the BLUE of Xb among all linear unbiased estimators involving y V lZy However any linear combination of y can be obtained from y because V712 is invertible Thus XEGLS is the BLUE D REMARK If X is full rank then estimability concerns vanish as in the GM model and bGLS is unique In this case straightforward calculations show that EbGLS b and covbGLs 02XV 1X 1 Example 42 Heteroseedastie regression through the origiri Consider the regression model y x ei for i 1 2 N where Ee 0 vare 0292xi for some real PAGE 102 CHAPTER 4 STAT 7147 J TEBBS function g and coveiej 0 for 239 31 j For this model 1 92m 0 0 z 0 2 z 0 X 392 and V t 9 2 39 zN 0 0 92xN The OLS estimator is given by N BOLS X X 1X y x131 211 i The GLS estimator is given by N 211 N 21 9612 92 Which one is better Both of these estimators are unbiased so we turn to the variances BGLS XV71X71XV71y Straightforward calculations show that a 11 We a varBOLS and VarBcLs 21195 21 zlzgm We are thus left to compare 11 We A 1 2 With N 2 211 i 9239 Write uwi where u and 1 Applying a version of the Cauchy Schwartz inequality we get N 2 N 2 N N N N lt2 S U122 ZmznZ Wn i1 i1 i1 i1 i1 i1 Thus 1 zggzwi A A S l gt V3rlt eLsgt S VaFBOLs 2 Wm 21ng This result should not be surprising after all we know that BGLS is BLUE D Result GM5 An estimate b is a generalized least squares estimate if and only if XE Ay where A XX V 1X X V 1 PAGE 103 CHAPTER 5 STAT 7147 J TEBBS Proof The GLS estimate ie the OLS estimate in the transformed model y Ube where y V lZy U V lZX and e V lZe satis es V712XV712XV712X7V712XV712y VilZXB7 by Result LS5 Multiplying through by V12 and simplifying gives the result D Result GM6 A XX V 1X X V 1 is a projection matrix onto Proof We need to show that a A is idempotent b AW 6 CX for any W and c AZ z for all z E The perpendicular projection matrix onto CV lZX is V 12XV 12X V 12X V lZX which implies that V712XV712XV712X7 VilZXyVilZX VilZX39 This can also be written as V lZAX vilZx Premultiplying by VlZ gives AX X Thus AA AXX V 1X X V 1 xx vilxxV1 A showing that A is idempotent To show b note AW XX V 1X X V 1W E To show c it suf ces to show CA But A XX V 1X X V 1 implies that CA C CX and AX X implies that CX C CA D Result GM7 In the Aitken model ifCVX C CX then the GLS and OLS estimates will be equal ie OLS estimates will be BLUE in the Aitken model Proof The proof proceeds by showing that A XX V 1X X V 1 is the perpendicular projection matrix onto CX when CVX C We already know that A is a projection matrix onto Thus all we have to show is that if WlCX then AW 0 If V is nonsingular then rVX The only way this and CVX C CX holds is if CVX CX in which case VXB1 X and VX XBZ for some matrices B1 and B2 Multiplying through by V 1 gives XB1 V lX and X V lXBg Thus CV lX CX and CV lXL CXi lf WlCX then WlCV 1X ie W E NX V 1 Since AW XX V 1X 1X V 1W 0 we are done D PAGE 104 CHAPTER 5 STAT 7147 J TEBBS 5 Distributional Theory Complementary reading from Monahan Chapter 5 51 Introduction REMARK Consider the Gauss Markov linear model y Xb e where y is N gtlt 1 Xis anNgtltpwithrankr 10bispgtlt1andeisNgtlt1withEe Oand cove 021 In addition to the rst two moment assumptions it is common to assume that e follows a multivariate normal distribution This additional assumption allows us formally pursue various questions dealing with inference In addition to the multivariate normal distribution we will also examine noncentral distributions and quadratic forms 52 Univariate normal distribution RECALL lf 2 N N0 1 then the probability density function pdf of z is given by 1 2 7 1 HI R 2 e 2 E fz gt T lt gt The NW 02 family is a location scale family generated by the standard density fzz TERMINOLOGY The collection of pdfs Sf fX lM7039 fXxluU fz 95 T u E R a gt 0 a is a locationscale family generated by fzz see Casella and Berger Chapter 3 That is if z N fzz then x 0392 M N fXxluU ifz APPLICATION With the standard normal density fzz it is easy to see that 1 7 1 i m fXltlM7l7 EfZlt UM X gg 2012 M2Ilt E That is any normal random variable x N Np02 may be obtained by transforming 2N01 viazUzp PAGE 105 CHAPTER 5 STAT 7147 J TEBBS 53 Multivariate normal distribution STARTING POINT Suppose that 21 22 quot7217 are iid standard normal random variables The joint pdf of z 21227217 is given by p fZZ H fZ2i i1 1 P p 2 p 7 7 Zi1zi2 I 1672 if e lt2 gt 27f p2 exp7zz2Iz 6 72 lf 2 has pdf fzz7 we say that 2 has a standard multivariate normal distribution ie7 a multivariate normal distribution with mean Owl and covariance matrix 117 We write 2 N NPOI MULTIVARIATE NORMAL DISTRIBUTION Suppose that Z N NPOI Suppose that V is symmetric and positive de nite and7 hence7 nonsingular and let VlZ be the symmetric square root of V see Chapter 1 notes De ne the transformation y VlZZ u where y and u are both p gtlt 1 Note that Ey EV12Z M 7 since Ez 07 and covy covV1Zz u V12c0vzV12 V7 since covz I The transformation y 92 VlZz u is linear in z hence7 one to one and the pdf of y can be found using a transformation The inverse transformation is z g 1y V 12y 7 u The Jacobian of the inverse transformation is 69 1y 6y V lZt PAGE 106 CHAPTER 5 STAT 7147 J TEBBS where lAl denotes the determinant of A The matrix V lZ is pd thus7 its determinant is always positive see Result M27 Thus7 for y 6 RP fYY fz9 1YlV 12l 7 lVl IZsz lZW 7 w 7 27T 2lVl 12 expl7V 12y 7 u V 12y 7 u2l 7 27T 2lVl 12 exp7y 7 M V 1y 7 MW lf y N fyy7 we say that y has a multivariate normal distribution with mean u and covariance matrix V We write y N NpuV IMPORTANT In the preceding derivation7 we needed V to be pd hence7 nonsingular If V is singular7 then the distribution of y is concentrated in a subspace of 72 with dimension rV In this situation7 the density function of y does not exist 54 Moment generating functions REVIEW Suppose that z is a random variable with cumulative distribution function FXt Px 3 It If Eem lt 00 for all ltl lt 6 36 gt 07 then MXtEem Aetmdea is de ned for t in some open neighborhood about zero The function MXt is called the moment generating function mgf of x Result MVN1 1 If MXt exists7 then lt 007 for allj 2 17 that is7 the moment generating function characterizes an in nite set of moments 2 MX0 1 3 The jth moment of z is given by djmx t EM aw PAGE 107 CHAPTER 5 STAT 7147 J TEBBS Uniqueness lf 1 MX1t7 2 MX2t7 and MX1t MX2t for all t in an 7 open neighborhood about zero7 then FX1 FX2z for all x If 1727 man are independent random variables with mgfs MXZt7 239 127717 U and y a0 21 am then Mm 5W H MXiait i1 Result MVN2 1 If x N JMUZ7 then the mgf of z is given by MXt expmt 1t2z7227 for all t6 R 2 lfx IAM702 then y abx Nabpb202 3 lfx IAM702 then 2 1x 7M N0 1 c7 TERMINOLOGY De ne the random vector x 1727 zp and lett t17t27 tp The moment generating function for x is given by Mm Eexpt x expt xdex 72p provided that Eexpt x lt 007 for all lt 6 36 gt 0 Result MVN3 H lf Mxt exists7 then Mxt7 where t 0707ti7070 This implies that lt 007 for all j 2 1 2 The expected value of x is 6Mxt E X at t0 3 The p gtlt p second moment matrix 62MXt E 7 XX 6t6t PAGE 108 CHAPTER 5 STAT 7147 J TEBBS Thus7 62M t Elms will tts0 4 Uniqueness lf x1 and x2 are random vectors with MX1 t MX2 t for all t in an open neighborhood about zero7 then FX1X FX2X for all x 5 If thl xn are independent random vectors7 and 71 y 30 E Ainy i1 for conformable a0 and Ai 239 127717 then MY t expa6t MxiAt i1 6 Let x x3982 Xn and suppose that Mxt exists Let Mxiti denote the mgf of xi Then7 X17X2 xm are independent if and only if Mxt H Mxiti i1 for all t tgmt 27 tn in an open neighborhood about zero Result MVN4 If y N Jpu7 V7 then Myt expt u t Vt2 Proof Start with 21227217 iid N01 Use Result MVN36 to nd the mgf of z 21227217 Then use Result MVN35 to nd Myt D 55 Properties of the multivariate normal distribution 551 Linear transformations Result MVN5 Let y N NpuV Let a be p gtlt 17 b be k gtlt 17 and A be k gtltp Then 1 z a y NNa ua Va 2 x Ay b N NkAu bAVA PAGE 109 CHAPTER 5 STAT 7147 J TEBBS EXERCISE If y N Npu V where V is positive de nite show that z V 12y 7 u N Np0 I Hint Use Result MVN52 D Result MVN6 If y N Npu V then any 7 gtlt 1 subvector of y has an r variate normal distribution with the same means variances and covariances as the original NpuV distribution Proof Partition y yiy 2 where y1 is r gtlt 1 Partition u Lu2y and V11 V12 V21 V22 V accordingly De ne A I 0 where 0 is r gtlt p 7 r y1 Ay is a linear function of y and is hence normally distributed Since Au M1 and AVA V11 we are done D COROLLARY If y N NpuV then y N ARM0 for 239 1 2 p WARNING Joint normality implies marginal normality That is if y1 and y2 are jointly normal then they are marginally normal However if y1 and y2 are marginally normal this does not necessarily mean that they are jointly normal APPLICATION Consider our general linear model y Xb e where e N NN0 021 Note that Ey Xb and that V covy 021 Furthermore because y is a linear combination of e it is also normally distributed ie y N NNXbUZI With PX XX X X we know that y ny and E I 7 Pxy Now EPXY PXEY PXXb Xb and covy covPXy chovyP X UZPXIPX UZPX since PX is symmetric and idempotent Also y ny is a linear combination of y so it also has normal distribution Putting everything together we have found that N NNXb702PX39 EXERCISE Show that E N NN002I 7 PX PAGE 110 CHAPTER 5 STAT 7147 J TEBBS 552 Lessthan full rank normal distributions TERMINOLOGY The random vector ypxl NpuV is said to have a p Variate normal distribution with rank k if y has the same distribution as upgtlt1 ngkzkxl where FT V rV k lt10 and Z NNk0I Example 51 Suppose that k 17 21 N J017 and y y1y2 where 0 Yi V121 y Z1 7 0 Yz 7221 where 139 7172 and rlquot 1 Since rlquot 17 this means that at least one of 71 and 72 is not equal to zero Without loss7 take 71 31 07 in which case Yz 12 711 Yi Note that Ey 0 00 and 222 22 2 COVy Eltyy E 71 1 Yl YZ 1 Y1 Yl Yz FT V 7172212 v32 Yl Yz 7 Note that V 0 Thus7 ygxl is a random vector with all of its probability mass located in the linear subspace yhyg yg nglVl Since rV 1 lt 27 y does not have a density function E 5 5 3 Independence results Result MVN7 Suppose that y N Ju7 V7 where Y1 M1 V11 V12 39 39 39 Vlm y y392 7 M Mtz 7 and V Y21 Yzz 39 39 39 V39Zm y m Mm le VmZ 39 39 39 me Then7 y17y27ym are jointly independent if and only if Vij 07 for all 239 31 j Proof Suf ciency Suppose y17y27ym are jointly independent For all 239 31 j V 7 EYi WOW 7 my 7 EltYi 7 MEyj 7 my 7 0 PAGE 111 CHAPTER 5 STAT 7147 J TEBBS Necessity Suppose that Vij 0 for all 239 31 j and let t tit 2 t Note that t Vt ZtQVMt and t u Ztgm i1 i1 Thus Myt expt u t Vt2 exp tgm itQVnt i1 i1 i1 i1 Result MVN8 Suppose that X N Nu 2 and let y1 a1 B1x and y2 212 ng for nonrandom conformable a and Bi 239 1 2 Then y1 and y2 are independent if and only if B1232 0 Proof Write y y1y2 as a B yl 1 1 x a Bx 3 2 32 132 Thus y is a linear combination of x hence y follows a multivariate normal distribution ie y1 and y2 are jointly normal Also covy1y2 COVB1XB2X B12B 2 Now simply apply Result MVN7 D REMARK lf x1 N Nu121 x2 N Nu222 and covx1x2 0 this does not necessarily mean that x1 and x2 are independent We need x xix 2 to be jointly normal APPLICATION Consider the general linear model y Xb e where e N NN0 021 We have already seen that y N NNXbUZI Also note that with PX XX X X PX E I 7 PX a linear combination of y Thus y and E are jointly normal By the last result we know that y and E are independent since covyE PXU21I 7 PX 0 That is the PAGE 112 CHAPTER 5 STAT 7147 J TEBBS tted values and residuals from the least squares t are independent This explains why residual plots ie7 plots of residuals versus tted values that exhibit random patterns support the validity of the model If a residual plot exhibits a nonrandorn pattern7 this would suggest that our model assumptions are not appropriate 554 Conditional distributions RECALL Suppose that y x N N201 2 where u My and 2 U pUXUY MX pUXay 7 and p corrxy The conditional distribution ofy7 given x is also norrnally distributed7 rnore precisely7 yl N N MY PUYUX95 MX7U121 P2 It is important to see that the conditional rnean is a linear function of x Note also that the conditional variance varylz is free of x EXTENSION We wish to extend the previous result to random vectors ln particular7 suppose that y and x are jointly rnultivariate normal with EXY 31 0 That is7 suppose N N My 7 23 2xx 7 10 2Xi 2X and assume that Ex is nonsingular The conditional distribution of y given x is YlX N Mum Eyixl where Myix MY EYXEENX MX and Eyix 2y EYngglEXY Again7 the conditional rnean uY X is a linear function of x and the conditional covariance matrix EY X is free of x PAGE 113 CHAPTER 5 STAT 7147 J TEBBS 56 Noncentral X2 distribution RECALL Suppose that u N xi that is7 u has a central X2 distribution with n gt 0 degrees of freedom The pdf of u is given by 1 fugileiuZ u gt 0 new fUuln The xi family of distributions is a gammaoz subfamily with shape parameter Oz 712 and scale parameter 6 2 Note that n7 varu 2717 and MUt 1 7 20 for t lt 12 RECALL lf 2122 72 are iid J017 then u 212 x and 22 Z xi i1 Proof Exercise TERMINOLOGY A univariate random variable 1 is said to have a noncentral X2 distribution with degrees of freedom 71 gt 0 and noncentrality parameter A gt 0 if it has the pdf 00 ETAAJI 1 n2j fV ln7 lt f1 2 7167quot wa gt 0 72 jl PTZJ2W272 fUltWln2j We write 1 xiA When A 07 the x A distribution reduces to the central xi distribution In the x A pdf7 notice that e AAjjl is the jth term of a Poisson pmf with parameter A gt 0 Result MVN9 lf olw N xin and w w li oissonA7 then 1 xiA Proof Note that fV1 waWWj j0 wawwwwm j0 00 e AAj 1 n 2739 16 Iltvgt0gtD PAGE 114 CHAPTER 5 STAT 7147 J TEBBS Result MVN10 lf 1 x A then 2tA M t1i2t 2 7 v lt gt exp172 for t lt 12 Proof The rngf of V by de nition is MVt Eem Using an iterated expectation we can write for t lt 12 Midi We EE m1w7 where w N PoissonA Note that Eet w is the conditional rngf of 1 given w We know that 111w xi w thus Eemw 1 7 2t 2w2 and m 00 7n2392 57A mm M Zn 7 2t 7 j0 co Aj I i 7A 7 inZ 7 7 7 1 e 1 2t iju 2t 7 j0 co 67A 1 7 2t inZ 17 72737 A 7A 7 inZ 7 e 1 2t explt17 2t D 2m 17 2t n2 7 exp172t MEAN AND VARIANCE If 11 WA then E1 n 2A and var1 2n 1 8A Result MVN11 If y N NW 1 then u y2 N x A where A 22 Outline of the proof The proof proceeds by nding the rngf of u and showing that it equals 7 2tA MW 1 i 2t exp 7 with n 1 and A 1122 Note that MUtEet Eety2 72 at V12 yimzdy PAGE 115 CHAPTER 5 STAT 7147 J TEBBS Now combine the exponents in the integrand square out the y 7 2 term combine like terms complete the square and collapse the expression to 1 7 2042 expu2t1 7 2t times some normal density that is integrated over R D Result MVN12 lf 111112 um are independent random variables where u N xiiQi 239 1 2 m then u x A where n n and A Ala Result MVN13 Suppose that 1 x A For xed 71 and c gt 0 the quantity FAQ gt c is a strictly increasing function of A Proof See Monahan pp 106 108 D IMPLICATION If 01 x Al and 112 Xi2 where A2 gt A1 then pr12 gt c gt pr11 gt c That is 112 is strictly stochastically greater than 111 written 112 gtSt 111 Note that 12 gtst v1 ltgt FV22 lt FV1 v ltgt SV22 gt SV1 for all 1 where denotes the cdf of 1 and SVi 1 7 FV denotes the survivor function of 11 57 Noncentral F distribution RECALL A univariate random variable w is said to have a central F distribution with degrees of freedom 711 gt 0 and 712 gt 0 if it has the pdf n 2 Pn142rn2 1 wm22 n1n22 meme 1 We write w w me The moment generating function for the F distribution does not fWltwln17n2 w gt 0 exist in closed form RECALL lf U1 and uz are independent central X2 random variables with degrees of freedom 711 and 712 respectively then w 111711 112712 NFTHJLT PAGE 116 CHAPTER 5 STAT 7147 J TEBBS TERMINOLOGY A univariate random variable w is said to have a noncentral F distribution with degrees of freedom 711 gt 0 and 712 gt 0 and noncentrality parameter A gt 0 if it has the pdf Pltm22jn2 n12 gt 12quot2 wltn12j722 n2 00 7A 7 fWwln1n2 Z 6 A 0 j Iwgt0 n n n w n12jn22 r27r72 1 We write w w meOx When A 07 the noncentral F distribution reduces to the central F distribution MEAN AND VARIANCE If w N Elmo then i 712 Ewi71272 17 7 271 711 2A2 711 4A 774712 7 712 7 712 7 712 7 4 39 Ew exists only when 712 gt 2 and varw exists only when 712 gt 4 The moment and varw generating function for the noncentral F distribution does not exist in closed form Result MVN14 lf U1 and uz are independent random variables with U1 foQ and uz xiv then U1 711 w 7 an A um Hltgt Proof See Searle7 p 51 52 Result MVN15 Suppose that w w meOx For xed n1 712 and c gt 07 the quantity PAw gt c is a strictly increasing function of A That is7 if wl meOxl and LU2 meOxg where A2 gt A1 then prw2 gt c gt prw1 gt c ie7 LU2 gtst wl REMARK The fact that the noncentral F distribution tends to be larger than the central F distribution is the basis for many of the tests used in linear models Typically7 test statistics are used that have a central F distribution if the null hypothesis is true and a noncentral F distribution if the null hypothesis is not true Since the noncentral F distribution tends to be larger7 large values of the test statistic are consistent with PAGE 117 CHAPTER 5 STAT 7147 J TEBBS the alternative hypothesis Thus7 the form of an appropriate rejection region is to reject Hg for large values of the test statistic The power of these F tests is simply a function of the noncentrality parameter A Given a value of A there is little dif culty in nding the power of an F test The power is simply the probability of rejection region de ned under H0 when the probability distribution is noncentral F Noncentral F distributions are available in most software packages 58 Distributions of quadratic forms GOAL We would like to nd the distribution of y Ay7 where y N NpuV We will obtain this distribution by taking steps Result MVN16 is a very small step Result MVN17 is a large step7 and Result MVN18 is the nish line There is no harm in assuming that A is symmetric Result MVN16 Suppose that y N Jpu7 I and de ne Z7 w y y y Iy i1 Result MVN11 says x plzQ 239 12710 Thus7 from Result MVN127 p y y 11 gum2 11 LEMMA The p gtlt p symmetric matrix A is idempotent of rank 5 if and only if there exists a p gtlt 5 matrix P1 such that a A Pij1 and b PiPl Is Proof Suppose that A Pij1 and P P1 Is Clearly7 A is symmetric Also7 A2 PIPImp PIP1 A Note also that rA trA trP1P trPiP1 was s Now7 to go the other way gt7 suppose that A is a symmetric7 idempotent matrix of rank 5 The spectral decomposition of A is given by A QDQ 7 where D diag12p and Q is orthogonal Since A is idempotent7 we know that s of the eigenvalues A1 A2 p are PAGE 118 CHAPTER 5 STAT 7147 J TEBBS equal to 1 and other p 7 s eigenvalues are equal to 0 Thus we can write Pl A QDQ p1 p2 P PIP1 2 Thus we have shown that a holds To show that b holds note that because Q is orthogonal P3 P1 P2 ngl P lPZ 7 7 7 IPTQQT PP PF 2 1 2 2 PE It is easy to convince yourself that P P1 is an identity matrix lts dimension is s gtlt 5 because trP P1 trP1P trA rA which equals 5 by assumption D Result MVN17 Suppose that y N NpuI If A is idempotent of rank 5 then y Ay N x A where A illAM Conversely if y Ay N x A then A is idempotent of rank 5 and A u Au Proof We prove only the suf ciency Suppose that y N NpuI and that A is idempotent of rank 5 By the last lemma we know that A P1P3 where P1 is p gtlt s and PiPl 15 Thus y Ay y PlPly X X7 where x Pay Since y N Npu I and since x Pay is a linear combination of y we know that x N MP3u PQIP1 N N5P3uIs Result MVN16 says that y Ay x x N xiP M P u2 But A E Piu P u M Plpau MAM D Result MVN18 Suppose that y N NpuV where rV p lf AV is idempotent of rank 5 then y Ay N x A where A u AM Conversely if y Ay N x A then AV is idempotent of rank 5 and A u Au Proof We again prove only the suf ciency Since y N NpuV and since X V lZy is a linear combination of y we know that x NPV 12uV 12VV 12 NPV 12u1p PAGE 119 CHAPTER 5 STAT 7147 J TEBBS Now yAy xVlZAVlZx XBX where B VlZAVlZ Recall that V12 is the symmetric square root of V From Result MVN17 we know that y Ay x BX N X300 if B is idempotent of rank 5 However note that r03 rV12AV12 rA rAV 5 since AV has rank 5 by assumption and V and V12 are both nonsingular Also AV is idempotent by assumption so that AV AVAV i A AVA gt vlZAvlZ vlZAvlZvlZAvlZ i B BB Thus B is idempotent of rank 5 This implies that y Ay X Bx x A Noting that A V712MBV712M MV712V12AV12V712M giltAM completes the argument D Example 52 Suppose that y y1y2 yN NNM1UZI so that u p1 and V 03921 where 1 is N gtlt 1 and I is N gtlt N The statistic N N 7 082 7 29117 732 7 Y 1 7 N lJy 7 y Ay i1 where A I 7 N lJ Thus N 715202 y By where B U ZA U 2I 7 N lJ Note that BV U 2I 7 N lJ02I I 7 n lJ A which is idempotent with rank rBV trBV trA ml 7 NH N 7 NW N i 1 Result MVN18 says that N715202 y By vile where A MBM However I I 7 EllBM 7 M1 U 21 7 N IJMl 7 0 since 1 6 C1 and I7N 1J is the ppm onto C1i Thus N715202 y By vil a central X2 distribution with N 7 1 degrees of freedom D PAGE 120 CHAPTER 5 STAT 7147 J TEBBS Example 53 Consider the general linear model y Xb e7 where X is N gtlt p with rank 7 S p and e N NN0UZI Let PX XX X X denote the perpendicular pro jection matrix onto We know that y N JNXb7 021 Consider the uncorrected partitioning of the sums of squares given by y y y PXy y I 7 Pxy 0 We rst consider the residual sum of squares y I 7 Pxy Dividing this quantity by 02 we get y 1 7 PXYUZ 7 y 0 21 7 Pxy 7 y Ay where A U 2I 7 PX With V 02L note that AV 741 7 PX021 I i PX an idempotent matrix with rank rI 7 PX trI 7 PX trI 7 trPX trI 7 rPX N 7 7 since rPX rX r by assumption Result MVN18 says that y 1 7 Pxvz72 7 y Ay XivirO L where A MAM However7 1 1 72 A 5M Au EX10 U I i PXXb 0 because Xb E CX and I 7 PX projects onto the orthogonal complement Thus7 we have shown that y I 7 PXyU2 vir a central X2 distribution with N 7 7 degrees of freedom 0 Now7 we turn our attention to the uncorrected model sum of squares y PXy Dividing this quantity by 02 we get y nyt72 7 Y U 2Pxy y By where B U ZPX With V 02L note that BV UTZPXU39ZI PX7 PAGE 121 CHAPTER 5 STAT 7147 J TEBBS an idempotent matrix with rank rPX rX 7 Result MVN18 says that y nyt72 y By X3007 where A M BM Note that A u BM Xb a 2PXXb Xb Xb202 That is ynyUZ has a noncentral X2 distribution with 7 degrees of freedom and noncentrality parameter A Xb Xb202 In the last calculation note that A Xb Xb202 0 iff Xb 0 In this case both quadratic forms y I 7 PXyU2 and ynyUZ have central X2 distributions D 59 Independence of quadratic forms GOALS In this subsection we consider two problems With y N NuV we would like to establish suf cient conditions for a y Ay and By to be independent and b y Ay and y By to be independent Result MVN19 Suppose that y N NpuV lf BVA 0 then y Ay and By are independent Proof We may assume that A is symmetric Write A QDQ where D diagA1A2Ap and Q is orthogonal From Result M23 we know that s S p of the eigenvalues A1 A2 Ap are nonzero where s rA We can thus write D1 0 P 0 0 Pg AQDQ p1 p2 P1D1P 1 where D1 diagA1A2 A5 Thus y Ay yP1D1P1y X D1X7 PAGE 122 CHAPTER 5 STAT 7147 J TEBBS where x Pay Notice that US Bu BVB BVP1 Pg 7 P IVB PIVP1 B y yNN 1 quotU X Suppose that BVA 0 Then 0 BVA BVP1D1P3 BVP1D1P3P1 But because Q is orthogonal P P P1 P P2 Hm P1 P2gte 3 3 P2 P2P1 PZPZ This implies that PiPl I5 and thus 0 BVPIDIPQLPL BVP1D1 BVPlDlDfl BVPl We have shown that covByx 0 that is x and By are independent But y Ay X Dlx a function of x Thus y Ay and By are independent as well D Example 54 Suppose that y y1y2 yN NNM102I where 1 is N gtlt1 and I is N gtlt N so that u p1 and V 03921 Recall that N 7 082 Y 1 7 N lJy yAy where A I 7 N lJ Also 7 N ll y By where B N ll These two statistics are independent because BVA N 11 021I i N lJ UZN 11 I i N lJ 0 because I 7 N lJ is the ppm onto C1i Since functions of independent statistics are also independent and 52 are also independent D Result MVN20 Suppose that y N NpuV lf BVA 0 then y Ay and y By are independent Proof Write A and B in their spectral decompositions that is write D1 0 P3 0 0 Pg A PDP p1 p2 P1D1P 1 PAGE 123 CHAPTER 5 STAT 7147 J TEBBS where D1 diag12 A5 and s rA Similarly write R1 0 Qi Q1R1Q17 0 0 Q2 BQRQ Q1Q2 where R1 diag yl yg yt and t MB Since P and Q are orthogonal this implies that PiPl IS and Qin It Suppose that BVA 0 Then 0BVA P1D1P 1VQ1R1Q 1 P3P1D1P 1VQ1R1Q 1Q1 Dlp lvoln1 DlelP lVQlRlRfl ngol covP3y Qay 1 y N N Pi Pivpi 0 1 an 0 Qavczl That is Pay and Q y are jointly normal and uncorrelated thus they are independent So are y PlDlpiy and y QlRlQiy But A PAIDIPT1L and B QlRlQi D Example 55 Suppose that y y1y2 yN NNM102I where 1 is N gtlt1 and I is N gtlt N so that u p1 and V 03921 Consider the uncorrected total sum of squares y y Y N 1Jy Y 1 7 N IJW 01 N N 2y N 04 i 2 i1 i1 Note that we have expressed y y as the sum of two quadratic forms y Ay and y By where A N lJ and B I 7 N lJ To see that they are independent note that BVA I i N lJ021N 1J N 102I i N lJJ 0 This is true because I 7 N lJ is the ppm onto C1L and each column of J is in C1 PAGE 124 CHAPTER 6 STAT 7147 J TEBBS Example 56 Consider the general linear model y Xbe7 where X is Ngtltp with rank 7 S p and e N NN0UZI Let PX XX X X denote the perpendicular projection matrix onto We know that y N NNXbUZI ln Example 537 we showed that Y 1 7 PxWt72 xiv and that y nyUZ X3007 where A Xb Xb202 Note that y 1 7 PxWz72 7 Y U 21 7 Pxy 7 y Ay where A U 2I 7 PX Also7 y nyU2 7 y U 2Pxy 7 y By where B U ZPX Applying Result MVN207 we have BVA U ZPXUZIU ZH 7 PX 07 that is7 y I 7 PXyU2 and y PXyUZ are independent quadratic forms Thus7 the statistic y nyT Y 1 7 PXYN 7 7 U Zy nyT U ZY U 7 PXYN 7 7 a noncentral F distribution with degrees of freedom 7 numerator and N 7 r denomi Fwd ohm202 7 nator and noncentrality parameter A Xb Xb202 REMARK Note that if Xb 07 then F N FAIL since the noncentrality parameter A vanishes On the other hand7 as the length of Xb gets larger7 so does A This shifts the noncentral FAIL Xb Xb202 distribution to the right7 because the noncentral F distribution is stochastically increasing in its noncentrality parameter This should conVince you that large values of F are consistent with large values of PAGE 125 CHAPTER 6 STAT 7147 J TEBBS 6 Statistical Inference Complementary reading from Monahan Chapter 6 omit Section 67 61 Introduction INTRODUCTION Consider the general linear model y Xb e7 where X is N gtlt p with rank 7 S p and e N NN0UZI Note that this is the usual Gauss Markov model with the additional assumption of normality With this additional assumption7 we can rigorously pursue questions that deal with statistical inference We start by examining topics in minimum variance unbiased and maximum likelihood estimation 62 Estimation SUFFICIENC39Y Under the assumptions stated above7 we know that y N JNXb7 021 Set 0 b 702 The pdf of y7 for all y 6 RN is given by 2W N202 NZexpy 7 Xb y 7 XbWUZ 2W N202 N2 exp7yy2UZ yXbUZ 7 XbXb202 lt2vrgtN2lta2gtN2expeltXbgt Xb2az expey ywz b XyUZ hYC0 expfw10t1Y f w20t2Y7 where hy 2W N21y 6 RN 00 02 N2 exp7Xb Xb2UZ7 and 1010 1202 MY y y w20 1002 1t2Y X y7 that is7 y has pdf in the exponential family see Casella and Berger7 Chapter 3 Thus7 we know that Ty y y7X y is a complete suf cient statistic for 0 We also know that minimum variance unbiased estimators MVUEs of functions of 0 are unbiased functions of Ty PAGE 126 CHAPTER 6 STAT 7147 J TEBBS Result SI1 Consider the general linear model y Xb e7 where X is N gtlt p with rank 7 S p and e N NN0UZI The MVUE for an estimable function Ab is given by A6 where b is any solution to the normal equations The MVUE for 02 is MSE N WilyI 7 Pxy7 where PX is the perpendicular projection matrix onto Proof Both A6 and MSE are unbiased estimators of Ab and 02 respectively see Result E4 and Section 43 These estimators are also functions of Ty y y7X y7 the complete suf cient statistic Thus7 each estimator is the MVUE for its expected value D MAXIMUM LIKELIHOOD Consider the general linear model y Xb e7 where X is N gtlt p with rank 7 S p and e N NN0UZI The likelihood function for 0 b 7 02 is LWY Lb702lY 2W N202 N28Xpy 7 Xb y 7 Xb202 Maximum likelihood estimates of b and 02 can be obtained by maximizing 2 N N 2 2 log LbU y 7 log27r 7 3 logU 7 y 7 Xb y 7 Xb20 with respect to b and 02 For every value of 02 the loglikelihood is maximized by taking b to minimize Qb y7Xb y 7Xb7 that is7 the least squares estimate of b7 b X X X y7 is also an MLE To estimate 02 we can substitute y7Xb y7Xb y I7 Pxy for Qb and differentiate with respect to 02 to get Elam N ly I 7 Pxy NOTE Observe that A N 7 r Ewing N 1Ey ltI7Pxgty N az That is7 the MLE for 02 is biased The MLE is rarely used in practice MSE is the standard estimator for 02 INVARIANCE Under the normal GM model7 the MLE for an estimable function Ab is Ab where b is any solution to the normal equations Recall that the estimate VB is unique even if b is not PAGE 127 CHAPTER 6 STAT 7147 J TEBBS 63 Testing models REMARK We now provide a general discussion on testing reduced versus full models within a GM model framework Assuming normality will enable us to derive the sampling distribution of the resulting test statistic PROBLEM Consider the linear model y Xb e where TX r S p and Ee 0 and cove 021 Note that these are our usual GM model assumptions For the purposes of this discussion we assume that this model the full model is a correct77 model for the data Now consider the linear model y Wc e where Ee 0 cove 021 and CW C We call this a reduced model because the estimation space is smaller than in the full model Our goal is to test whether or not the reduced model is also correct REMARKS o If the reduced model is also correct there is no reason not to use it Smaller models are easier to interpret and fewer degrees of freedom are spent in estimating 02 Thus there are practical and statistical advantages to using the reduced model if it is also correct 0 Hypothesis testing in linear models essentially reduces to putting a constraint on the estimation space CX in the full model If CW CX then the We model is a reparameterization of the Xb model and there is nothing to test RECALL Let PW and PX denote the perpendicular projection matrices onto CW and CX respectively Because CW C CX we know Result LS10 that PX 7 PW is the ppm onto CPX 7 PW CW x PAGE 128 CHAPTER 6 STAT 7147 J TEBBS DISCUSSION In a general reduced versus full model test we start with the premise the full model y Xbe is correct so that Ey Xb E If the reduced model is also correct then Ey WC 6 CW C Geometrically performing a reduced versus full model test requires the analyst to decide whether Ey is in CW or Under the full model our estimate for Ey Xb is ny Under the reduced model our estimate for Ey We is Pwy o If the reduced model is correct ny and Pwy are estimates of the same quantity This means that ny 7 Pwy PX 7 Pwy should be reasonably small o If the reduced model is not correct ny and Pwy are estimating different things This means that ny 7 Pwy PX 7 Pwy is likely not small 0 The decision about the reduced model hinges on deciding whether PX 7 Pwy is large or small Note that PX 7 Pwy is the perpendicular projection of y onto CW x39 An obvious measure of the size of PX 7 Pwy is its squared length that is Px PWyPX Pwy yPX PWY However the length of PX 7 Pwy is also related to the sizes of CX and We thus adjust for these sizes by using yPX 7 PwyTPX 7 We now compute the expectation of this quantity when the reduced model isis not true For notational simplicity set W rPX 7 PW When the reduced model is true then Ey PX i Pwyr Wc PX i PWWc trPX i Pw021 ggzmpx 7 Pw 7W02 02 This argument holds because PX7PWWC 0 and trPX7Pw rPX7Pw W Thus if the reduced model is true y PX 7 Pwyr is an unbiased estimator of 02 PAGE 129 CHAPTER 6 STAT 7147 J TEBBS When the reduced model is not true then 1 Ey PX i Pwyr 7 Xb PX i PWXb trPX i Pw021 1 FXb PX i PWXb m2 U2 Xb PX i PWXbr Thus if the reduced model is not true y PX 7 Pwyr is estimating something larger than 02 Of course 02 is unknown so it must be estimated Because the full model is assumed to be correct MSE N i r1y 1 i Pxy the MSE from the full model is an unbiased estimator of 02 TEST STATISTIC To test the reduced model versus the full model we use F y ltPx 7 MW MSE 39 Using only our GM model assumptions ie not necessarily assuming normality we can surmise the following 0 When the reduced model is true the numerator and denominator of F are both unbiased estimators of 02 so F should be close to 1 0 When the reduced model is not true the numerator in F is estimating something larger than 02 so F should be larger than 1 Thus values of F much larger than 1 are not consistent with the reduced model being correct 0 Values of F much smaller than 1 may mean something drastically different see Christensen 2003 OBSERVATIONS In the numerator of F note that y Px 7 Pwy y ny 7 y Pwy Y Px 7 P1y 7 y Pw 7 Pay which is the difference in the regression model sum of squares corrected or uncorrected from tting the two models Also the term 7 rPX 7 PW trPX 7 PW trPX 7 trPw rPX 7 TPw r 7 r0 PAGE 130 CHAPTER 6 STAT 7147 J TEBBS say where r0 TPw Thus W r 7 To is the difference in the ranks of the X and W matrices This also equals the difference in the model degrees of freedom from the two ANOVA tables REMARK You will note that we have formulated a perfectly sensible strategy for testing reduced versus full models while avoiding the question What is the distribution of F777 Our entire argument is based on rst and second moment assumptions that is Ee 0 and cove 021 the GM assumptions We now address the distributional question DISTRIBUTION OF F To derive the sampling distribution of F yPX PwY7 MSE 7 we require that e N NN003921 from which it follows that y N NNXbUZI First we handle the denominator MSE y I 7 PXyN 7 7 ln Example 53 we showed that Y 1 7 Pxvt72 va4 This distributional result holds regardless of whether or not the reduced model is true Remember it is assumed that the full model is correct Now we turn our attention to the numerator Take A U 2PX 7 PW and consider the quadratic form y Ay y Px 7 PwY02 With V 021 the matrix AV a px i Pwa21 PX 7 PW is idempotent with rank rPX 7Pw 7 From Result MVN18 we know that y Ay xg where 1 1 A7 A 7 2 39 2a2 Now we make the following observations Xb PX i PWXb o If the reduced model is true and Xb E CW then PX 7 PWXb 0 because PX7PW projects onto CW x This means that A 0 and y PX7Pwy02 Xi a central X2 distribution PAGE 131 CHAPTER 6 STAT 7147 J TEBBS o If the reduced model is not true and Xb CW then PX 7 PWXb 31 0 and A gt 0 That is y PX 7 Pwy02 N xi A a noncentral X2 distribution with noncentrality parameter A 0 Because NX and CW x are orthogonal subspaces PX i Pwa211 i PX 02PX i PWI i PX 0 Thus regardless of whether or not the reduced model is true Result MVN20 says that the quadratic forms y PX 7 Pwy and y I 7 Pxy are independent CONCLUSION Putting this all together we have i yPX PWY7 i UizyPX PWY7 7 N 7N7T 7 F MSE new 7 PXYN 7 r where 1 A7Xb P 7P Xb 20 X w If the reduced model is correct that is if Xb E CW then A 0 and F N FNT Otherwise then F N FNA Note also that if the reduced model is true N77 Fm 1 7 if N 7 r is large This reaf rms our assertion that values of F close to 1 are consistent with the reduced model being true Because the noncentral F family is stochastically increasing in A larger values of F are consistent with the reduced model not being true Thus to perform a level 04 test reject the reduced model if F gt FNa the upper 04 quantile of an FN distribution Example 61 Consider the simple linear regression model y 60 612 7 6139 for 239 1 2 N where 6 iid N002 ln matrix notation we have 11 1 hi 51 1 z 7E e y 112 7 X 2 7 bf 50 7 e 2 7 51 1W 1 N f 6N PAGE 132 CHAPTER 6 STAT 714 J TEBBS where e N Nix0021 Suppose that we would like to test whether or not the reduced model y 60 6139 for 239 1 2 N also holds ln matrix notation the reduced model can be expressed as 91 1 51 1 e y y W 1 6607 e 2 yN 1 EN where e N NN003921 and 1 is an N gtlt 1 vector of ones In this problem To 1 r 2 and W r 7 r0 1 When the reduced model is correct F y ltPx 7 MW MSE N FLNin where MSE is the mean squared error from the full model When the reduced model is not true F N F1N2 where N Xb Px 7 PwXb 612295 7 Wm i1 1 A 7 202 EXERCISE a Verify that the above expression for the noncentrality parameter A is correct b Suppose that N is even and the values of s can be selected anywhere in the interval from d1 to d2 How should we choose the x values to maximize the power of a level 04 test 64 Testing linear parametric functions PROBLEM Consider our usual Gauss Markov linear model with normal errors ie y Xb e where X is N gtlt p with rank 7 S p and e N NN003921 We consider the problem of testing H0 K b m versus H1K b 31 m where K is a p gtlt 5 matrix with TK s and m is s gtlt 1 PAGE 133 CHAPTER 6 STAT 7147 J TEBBS TERMINOLOGY The general linear hypothesis H0 K b m is said to be testable iff K has full column rank and each component of K b is estimable Otherwise7 H0 K b m is said to be nontestable Example 62 Consider the multiple linear regression model 11 50 51 52 533 544 51 for 239 17 27 N7 where 61 iid N0702 Express each of these hypotheses in the form H0 K b m 1 H0 61 0 2 H0353 40 3 H0 361BS1762764 1 4 H035253 4 Example 63 Consider the one way analysis of variance model yij M 04 6m for 239 17 234 and j 1277117 where 6 iid N0702 Express each of these hypotheses in the form H0 K b m 1 H03MO 150 3O 41 2 H0a17a2a37a43 3 H0 041 Oz2 043 044 TEST STATISTIC Our goal is to derive the form of the test statistic for H0 K b m7 for K b estimable7 in the GM model with normal errors We start by noting that the BLUE of K b is K b where b is a least squares estimator of b Also7 K b K X X X y PAGE 134 CHAPTER 6 STAT 7147 J TEBBS a linear function of y so K b follows an s variate normal distribution with mean K b and covariance matrix covKb K covBK 02K X X X XX X K 02K X X K 02H where H K X X K LEMMA lf K b is estimable then H is nonsingular Proof See Monahan pp 130 D This lemma is important because it convinces us that the distribution of KT is not less than full rank We have shown that K b MK b02H so K6 7 m N N5K b 7 m0 2H Now consider the quadratic form KB 7 m02H 1Kb 7 m We have that 02H 102H Is an idempotent matrix with rank 5 so by Result MVN18 KB 7 mgt ltazHgt1ltK B e m xiw where the noncentrality parameter A K b 7 m 02H 1K b 7 m We have already shown that y 17PXy02 vir Also K bim 02H 1K bim and y I 7 PXy02 are independent verifyl Thus the ratio F 7 KB 7 m H 1K B 7 ms 7 KB 7 m 02H 1K b 7 ms F A M e mayN e r new 7 mayN e r N 7 Kb 7 m 02H 1K b 7 m where A OBSERVATION Note that if H0 K b m is true then noncentrality parameter A 0 and F N F5N Thus an 04 level test rejects H0 if F gt F5Nm where FEWW is the upper 04 quantile of the F5NT distribution PAGE 135 CHAPTER 6 STAT 7147 J TEBBS INVARIANC39E PROPERTY Monahan discusses pp 133 135 how the F test for H0 K b In is invariant to linear transformations For example the F statistic for testing H0 K b In is the same as the F statistic for testing H0 2K b 2m SCALAR CASE We now consider the special case of testing H0 K b rn when rK 1 that is K b is a scalar estimable parametric function This function and hypothesis is perhaps more appropriately written as H0 k b m to emphasize that k is a p gtlt 1 vector and m is a scalar Often k is chosen in a way so that m 0 eg testing contrasts in an ANOVA model etc The hypotheses in Example 62 1 and Example 63 3 are of this form Testing a scalar hypothesis is a mere special case of the general test we have just derived However additional exibility results in the scalar case in particular we can test for one sided alternatives like H1 k b gt m or H1 k b lt m We rst discuss one more noncentral distribution NONC39ENTRAL t DISTRIBUTION Suppose that z N NW 1 and 1 xi lf 2 and 1 are independent then It vk follows a noncentral t distribution with k degrees of freedom and noncentrality pa rameter M We write t N tkm lf 1 0 the tkm distribution reduces to a central t distribution with k degrees of freedom Note that if t N tkm then t2 N F1kM22 EXERCISE Derive the pdf of a noncentral t distribution and nd the mean and variance TEST STATISTIC Consider our usual Gauss Markov linear model with normal errors ie y Xb e where X is N gtlt p with rank 7 S p and e N NN003921 Suppose that our goal is to test H0 k b m versus a one or two sided alternative The BLUE for k b is k b where b is a least squares estimator Straightforward calculations show that k B Nk b 02k X X k Standardizing we get k E i k b z N U1 k X Xk PAGE 136 CHAPTER 6 STAT 7147 J TEBBS We know that y1 PXy 2 U T N XNi39r and that z and 1 are independent verifyl Thus7 kg 7 k b ak X X k 7 kn i k b a 2y 1 7 PXyN 7 7quot Ek X Xk where 32 MSE Thus7 under H0 k b m7 the statistic N tNi39rv t 153 77 N m 7 so an appropriate level 04 decision rule is to compare t to the tNT distribution When H0 is not true7 then t N tNT7 where k b 7 m A axk X X k 65 Testing models or testing linear parametric functions SUMMARY Under our GM linear model y Xb e7 where X is N gtlt p with rank 7 S p and e N JN0UZI7 we have presented F statistics to test a a reduced model versus a full model in Section 63 and b a hypothesis de ned by H0 K b m7 for K b estimable7 in Section 64 In fact7 testing models and testing linear parametric functions essentially is the same thing7 as we now demonstrate For simplicity7 we take m 07 although the following argument can be generalized DISCUSSION Consider the general linear model y Xb e7 where X is N gtlt p with rank 7 S p and b 6 72 Consider a hypothesis about the linear parametric function K b 0 The null hypothesis can be written as H0yXbe and K b0 We now nd a reduced model that corresponds to this hypothesis Observe that K b 0 holds iff bLCK To identify the reduced model7 pick a matrix U such that CU CKi Then7 Kb 0 ltgtbLCK ltgtb e CU ltgtb Uc PAGE 137 CHAPTER 6 STAT 7147 J TEBBS for some vector c Substituting b Uc into the linear model y Xb e gives the reduced model y XUc e or letting W XU our hypothesis above can be written H0 y Wc e where CW C OBSERVATION When K b is estimable that is so that K D X for some N gtlt 5 matrix D we can nd the perpendicular projection matrix for testing H0 K b 0 in terms of D and PX From Section 63 recall that the numerator sum of squares to test the reduced model y Wc e versus the full model y Xb e is y PX 7 Pwy where PX 7 PW is the ppm onto CPX 7 PW CW x For testing an estimable parametric function K b 0 we now show that the ppm onto CPXD is also the ppm onto CW x ie that CPXD CW x PROPOSITION CPX 7 PW CW m CXU ltXgt CPXD Proof We showed CPX 7 PW CW x in Chapter 2 and W XU so the second equality is obvious Suppose that V E CXU X Then V XU 0 so that X VLCU Because CU CKi we know that X V E CK CX D since K D X Thus v va XX X X v e CXX X X D CPXD Suppose that V E CPXD Clearly V E Also V PXDd for some 1 and VXU dDPXXU dDXU dKU 0 because CU CKi Thus V E CXU X IMPLICATION It follows immediately that the numerator sum of squares for testing the reduced model y Wc e versus the full model y Xb e is y MpXDy where MPXD PXDPXD PXD PXD PXDDPXD DPX is the ppm onto CPXD If e N NN0 021 the resulting test statistic is yMPXDyTMPXD N A yI 7 PXyr1 7 PX 7MPXDgtTI PX 7 PAGE 138 CHAPTER 6 STAT 714 J TEBBS verify where the noncentrality parameter A XbMPXDXb GOAL Our goal now is to show that the F statistic above is the same F statistic we derived in Section 64 with m 0 that is K EYH lK Es W39 Recall that this statistic was derived for the hypothesis involving the linear pararnetric F function H0 K b 0 First we show that rMpXD s where recall 5 To do this it suf ces to show that TK TPxD Because K b is estirnable we know that K D X for some D Writing K X D we see that for any vector 21 X Da 0 ltgt DalCX which occurs iff PXDa 0 Note that the X Da 0 and PXDa 0 equivalence implies that NX D NPXD ltgt CD XL CD PXL ltgt CD X CD PX so that rD X rD PX But K D X so TK rD X rD PX TPxD Now consider the quadratic form y MpXDy and let 3 denote a least squares estimator of b Result LS5 says that Xb ny so that K b D Xb D ny Substitution gives yMPny y PxDD PxD D PXy K B D XX X X D K E K B K X X K K E Recalling that H K X X K and that H is nonsingular when K b is estimable should convince you that the numerator sum of squares in yMPXDY7quotMPxD Y 1 PXY I PX and A A K b H 1K bs y1 PxYN 7 are equal We already showed that rMpXD s and because rI 7 PX N 7 r we are done D PAGE 139 CHAPTER 6 STAT 7147 J TEBBS 66 Likelihood ratio tests 661 Constrained estimation DISCUSSION In the linear model y Xb e7 where Ee 0 note the minimal assumptions we are making7 we have allowed the p gtlt l parameter vector b to take on any value in 72 that is7 we have made no restrictions on the parameters in b We now consider the case where b is restricted to the subspace of RP consisting of values of b that satisfy P b 67 where P is a p gtlt q matrix and 6 is q gtlt 1 To avoid technical dif culties7 we will assume that the system P b 6 is consistent ie7 6 E CP Otherwise7 the set b 6 RP P b 6 could be empty PROBLEM In the linear model y Xbe7 where Ee 07 we would like to minimize Qb y 7 Xb y 7 Xb subject to the constraint that P b 6 Essentially7 this requires us to nd the minimum value of Qb over the linear subspace b 6 RP P b 6 This is a restricted mini mization problem and standard Lagrangian methods apply see Appendix B in Monahan for more information on Langrange multipliers The Lagrangian ab0 is a function of b and the Lagrange multipliers in 0 and can be written as ab7 0 y 7 Xb y 7 Xb 20P b 7 6 Taking partial derivatives7 we have a b 0 72X y 2X Xb 2P0 3ab 0 7 2 P b 76 60 Setting these equal to zero leads to the restricted normal equations RNEs7 that is7 X X P b X y P 0 0 6 Denote by bH and 3H the solutions to the RNEs7 respectively The solution bH is called a restricted least squares estimator PAGE 140 CHAPTER 6 STAT 7147 J TEBBS FACTS o The restricted normal equations are consistent see Result 387 Monahan pp 62 63 0 A solution EH minimizes Qb over the set T E b 6 RP P b 6 see Result 397 Monahan pp 63 o The function Xb is estimable in the restricted model yXbe7 Ee 07 P b67 iff X a X d P 7 for some a and 17 that is7 X P X e R See Result 377 Monahan pp 60 o If Xb is estimable in the unrestricted model ie7 the model without the linear restriction7 then Xb is estimable in the restricted model The converse is not true 0 Under the GM model assumptions7 if Xb is estimable in the restricted model7 then NEH is the BLUE of Xb in the restricted model See Result 457 Monahan pp 89 90 662 Testing procedure SETTING Consider the GM linear model y Xb e7 where X is N gtlt p with rank 7 S p and e N JN07 021 We are interested in deriving the likelihood ratio test LRT for H0 K b m versus H1 K b 31 m7 where K is ap gtlt 5 matrix with TK s and m is s gtlt 1 We assume that H0 K b m is testable7 that is7 each component of K b is estimable PAGE 141 CHAPTER 6 STAT 7147 J TEBBS REMARK The logic behind a LRT is simple One compares the maximized likelihood under H0 to the maximized likelihood over the entire parameter space If the former is small when compared to the latter then there is little evidence for H0 DERIVATION Under our model assumptions we know that y N NNXbUZI The likelihood function for 0 b 02 is L0lY Lb702lY QWZYNZexpQb2027 where Qb y 7 Xb y 7 Xb The unrestricted parameter space is e0beRP 02 R The restricted parameter space that is the parameter space under H0 K b m is 00b RP K bm 02 672 The likelihood ratio statistic is given by supeo L0ly supe L0ly 39 The null hypothesis H0 is rejected when A My is small Thus to perform a level 04 E MY test reject H0 when A lt c where c 6 01 is chosen to satisfy PHOy S c a We have seen Section 62 that the unrestricted MLEs of b and 02 are b x xrx y and 32 QEN Similarly maximizing L0ly over 90 produces the solutions EH and 52 QbHN where EH is any solution to xx K b X y K 0 0 m the RNEs Algebra shows that 32 QltEgt Lalaziy QltBHgt PAGE 142 CHAPTER 6 STAT 714 J TEBBS More algebra shows that A N2 A A cage lt C t 62005 AQltbgts gt0 QltbH QbN A r where s TK and 0 5 1N 7 7quot 0 2 7 1 Furthermore Monahan7s Theorem 61 pp 139 140 shows that when K b m is estimable QbH A QE A K E A m H 1K B A m where H K X X K Applying this result and noting that 7 r MSE we see that b 7 b s K b7m H 1K b7m s Q gUfig gt0 ltgt F Ms gt The LRT speci es that we reject H0 when F is large Choosing 0 F5Nm provides a level 04 test The novelty here is that under the GM model with normal errors the LRT for H0 K b m is the same test as that in Section 64 67 Con dence intervals and multiple comparisons 671 Single intervals PROBLEM Consider the GM linear model y Xb e where X is N gtlt p with rank 7 S p and e N NN003921 Suppose that Xb estimable that is X a X for some vector a We would like to write a 1001 7 04 percent con dence interval for Xb CONFIDENCE INTERVAL We start with the obvious point estimator Xb the least squares estimator and MLE of Nb Under our model assumptions we know that XE NXb 02XX X A and hence A Nb 7 Xb z N0 1 UzAX X A PAGE 143 CHAPTER 6 STAT 714 J TEBBS lf 02 was known our work would be done as 2 is a pivot More likely this is not the case so we must estimate it An obvious point estimator for 02 is 32 MSE where MSE N 7 r1y 1 7 Pxy We consider the quantity Xb 7 Xb EzAX X A and subsequently show that t N tN Note that V XS 7 Xb 7 XS 7 Xb 02XX X A W01 xEZXX X gt N 7 WW 7 Pay02 myN 7 T To verify that t N tN it remains only to show that z and y I 7 PXyU2 are inde pendent or equivalently that XE and y I 7 Pxy are since XE is a function of z and since 02 is not random Note that XE a XX X X y a PXy a linear function of y Using Result MVN19 XE and y I 7 Pxy are independent since a PXUZHI 7 PX 0 Thus t N tN ie t is a pivot so that t XE 7 Xb t 7 1 pr N7roz2 lt W lt N7roz2 7 CY Algebra shows that this probability statement is the same as pr XS 7 mug2 32XXX1 lt Xb lt X13 mug2 EZXX X A 17 a showing that XE i mug2 32XXX1 is a 1001 7 04 percent con dence interval for Xb Example 64 Recall the simple linear regression model y 60 61 6139 for 239 1 2 N where 61 62 6N are iid N0 02 Recall also that the least squares estimator Of b 60H81y is A BO 17 i 31 b X X 1X l y agifxgra Bl 7m2 PAGE 144 CHAPTER 6 STAT 714 J TEBBS and that the covariance matrix of b is A 572 OMB 02XX 1 U2 N new new 721 ZMWW We now consider the problem of writing a 1001 7 04 percent con dence interval for Eyl0 Bo 5107 the mean response of y when x x0 Note that Eylx0 60 61 Xb where X 1 0 Also Xb is estimable because this is a regression model so our previous work applies The least squares estimator and MLE of Eylx0 is 170 3 XS 30 31 Straightforward algebra verify shows that A 1 572 XX X 1A 1 95 gt N WWW ZMHV 0 ZiW EV ZAM EV 7 0 7 if N 21 i 962 Thus a 1001 7 04 percent con dence interval for Eylx0 is 0 1 950 7 2y Ezl tml 170 i mega2 where 32 MSE D EXERCISE In the one way xed effects ANOVA model yij uaieij for 239 1 2 a and j 12 n derive a formula for a 1001 7 04 percent con dence interval for a contrast in the os Assume a GM normal model 672 Multiple intervals PROBLEM Consider the GM linear model y Xb e where X is N gtlt p with rank 7 S p and e N NN003921 We now consider the problem of writing simultaneous PAGE 145 CHAPTER 6 STAT 714 J TEBBS con dence intervals for the k estimable functions Aib XZb Azb Let the p gtlt k matrix A A1 A2 Ak so that A b T Ab Xzb Agb Because 739 Ab is estimable it follows that A B NkA b02H where H AX X A Furthermore because MBAZS Azb are jointly normal we have that N Nij02h where hjj is the jth diagonal element of H Using our previous results we know that A23 i tNmazxEW is a 1001 7 04 percent con dence interval for ij that is pr W3 7 muggm lt ij lt X73 mug2W 1 i a This statement is true for a single interval SIMULTANEOUS COVERAGE To investigate the simultaneous coverage probability of the set of intervals A9BitNTa2132hjjvj 1 2 k let Ej denote the event that intervalj contains ij that is prE 1 7 oz for j 1 2 k The probability that each of the k intervals includes their targeted value ij is m 11 M by DeMorgan7s Law In turn Boole7s lnequality says that k k pr E7 3 ZprFJ ka Thus the probability that each interval contains its intended target is k pr E7 217 ka j1 PAGE 146 CHAPTER 6 STAT 7147 J TEBBS This lower bound 1 7 koz can be quite a bit lower than 1 7 a That is the simultaneous coverage probability of the set of i tNa2 32hjjj 12k can be much lower than the single interval coverage probability GOAL We would like the set of intervals i d EZthJ 12k to have a simultaneous coverage probability of at least 1 7 a Here d represents a probability point that guarantees the desired simultaneous coverage Because taking d tND2 does not guarantee this minimum the obvious solution is to take d to be larger BONFERRONI From the argument on the last page it is clear that if one takes d tN739roz2k7 then k pr E7 217 MaxIr 17a j1 Thus 1001 7 04 percent simultaneous con dence intervals for AabeZb Azb are X5 1 tNa2M32h for j 12k SCHEFFE The approach of Scheffe is quite different The idea behind Scheffe 7s approach is to consider an arbitrary linear combination of T Ab say u T u Ab and construct a con dence interval Cu d u 7 dam u dam where d is chosen so that pruT E Cud for all u 1 7 a Since d is chosen in this way one guarantees the necessary simultaneous coverage proba bility for all possible linear combinations of T Ab an in nite number of combinations Clearly the desired simultaneous coverage is then conferred for the k functions of interest 7397 ij j 12 k these functions result from taking u to be the standard unit vectors The argument in Monahan pp 144 shows that d kaNTa12 PAGE 147
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'