Applied Longitudianal Data Analysis
Applied Longitudianal Data Analysis ST 732
Popular in Course
Popular in Statistics
This 257 page Class Notes was uploaded by Jordane Kemmer on Thursday October 15, 2015. The Class Notes belongs to ST 732 at North Carolina State University taught by Marie Davidian in Fall. Since its upload, it has received 13 views. For similar materials see /class/223938/st-732-north-carolina-state-university in Statistics at North Carolina State University.
Reviews for Applied Longitudianal Data Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/15/15
CHAPTER 11 ST 732 M DAVIDIAN 11 Generalized linear models for nonnormal response 111 Introduction So far in our study of regression type models for longitudinal data we have focused on situations where o The response is continuous and reasonably assumed to be normally distributed 0 The model relating mean response to time and possibly other covariates is linear in parameters that characterize the relationship For example regardless of how we modeled covariance by direct modeling or by introducing random effects we had models for the mean response of a data vector of the form E09 X ie for the observation at time tij on unit 2 o ltijr Under these conditions we were led to methods that were based on the assumption that Yi N JWXz 22 the form of the matrix 2239 is dictated by what one assumes about the nature of variation To t the model we used the methods of maximum likelihood and restricted maximum likelihood under the assumption that the data vectors are distributed as multivariate normal Thus the tting method was based on the normality assumption As we noted at the beginning of the course the assumption of normality is not always relevant for some data This issue is not con ned to longitudinal data analysis 7 it is an issue even in ordinary regression modeling If the response is in the form of small counts or is in fact binary yesno it is clear that the assumption of normality would be quite unreasonable Thus the modeling and methods we have discussed so far including the classical techniques would be inappropriate for these situations PAGE 423 CHAPTER 11 ST 732 M DAVIDIAN One possibility is to analyze the data on a transformed scale on which they appear to be more nearly normal eg count data may be transformed via a squareroot or other transformation and then represented by linear models on this scale This is somewhat unsatisfactory however as the model no longer pertains directly to the original scale of measurement which is usually of greatest interest Moreover it tries to force a model framework and distributional assumption that may not be best for the data In the late 1970 early 1980 s in the context of ordinary regression modeling a new perspective emerged in the statistical literature that generated much interest and evolved into a new standard for analysis in these situations For data like counts and binary outcomes as well as for continuous data for which the normal distribution is not a good probability model there are alternative probability models that might be better representations of the way in which the response takes on values The idea was to use these more appropriate probability models as the basis for developing new regression models and methods rather than to try and make things t into the usual and inappropriate normal based methods Then in the mid 1980 s these techniques were extended to allow application to longitudinal data this topic still is a focus of current statistical research In this chapter we will gain the necessary background for understanding longitudinal data methods for nonnormal response To do this we will step away from the longitudinal data problem in this chapter and consider just the ordinary regression situation where responses are scalar and independent Armed with an appreciation of regression methods for nonnormal response we will then be able to see how these might be extended to the harder problem of longitudinal data As we will see this extension turns out to not be quite as straightforward as it was in the normal case Thus in this chapter we will consider the following problem as a prelude to our treatment of nonnormal longitudinal data 0 As in multiple regression suppose we have responses Y1 Yn each taken at a setting of k covariates wj1 wjk j 1 n The values are mutually independent 0 The goal is to develop a statistical model that represents the response as a function of the covariates as in usual linear regression However the nature of the response is such that the normal probability model is not appropriate PAGE 424 CHAPTER 11 ST 732 M DAVIDIAN We might think of the data as arising either as o n observations on a single unit in a longitudinal data situation where we focus on this individual unit only so that the only relevant variation is Within the unit If observations are taken far enough apart in time they might be viewed as independent 0 n scalar observations each taken on a different unit thus the independence assumption is natural Here 9 indexes observations and units recall the oxygen intake example in section 34 Either way of thinking is valid 7 the important point is that we wish to t a regression model to data that do not seem to be normally distributed As we will see the data type might impose additional considerations about the form of the regression model We use the subscript j in this chapter to index the observations we could have equally well used the subscript 2 The class of regression models we will consider for this situation is known in the literature as generalized linear models not to be confused with the name of the SAS procedure GLM standing for General Linear Model Our treatment here is not comprehensive for everything you ever wanted to know and more about generalized linear models see the book by McCullagh and Nelder 1989 112 Probability models for nonnormal data Before we discuss regression modeling of nonnormal data we review a few probability models that are ideally suited to representation of these data We will focus on three models in particular a more extensive catalogue of models may be found in McCullagh and Nelder 1989 o The Poisson probability distribution as a model for count data discrete o The Bernoulli probability distribution as a model for binary data discrete this may be ex tended to model data in the form of proportions o The gamma probability distribution as a model for continuous but nonnormal data with con stant coe icient of variation PAGE 425 CHAPTER 11 ST 732 M DAVIDIAN We will see that all of these probability models are members of a special class of probability models This class also includes the normal distribution with constant variance the basis for classical linear regression methods for normal data thus generalized linear models will be seen to be an extension of ordinary linear regression models CO UNT DATA 7 THE POISSON DISTRIBUTION Suppose we have a response Y that is in the form of a count 7 Y records the number of times an event of interest is observed Recall the epileptic seizure data discussed at the beginning of the course here Y was the number of seizures suffered by a particular patient in a two week period When the response is a count it should be clear that the possible values of the response must be non negative integers more precisely Y may take on the values 0123 In principle any nonnegative integer value is possible there is no upper bound on how large a count may be Realistically if the thing being counted happens infrequently large counts may be so unlikely as to almost never be seen The Poisson probability distribution describes probabilities that a random variable Y that describes counts takes on values in the range 0123 More precisely the probability density function de scribes the probability that Y takes on the value 3 Hyatt f3PY3T 31701727m7 Mgt0gt 111 It may be shown that the mean expectation on is u ie EY u Note that u is positive which makes sense 7 the average across all possible values of counts should be positive Furthermore it may be shown that the variance of Y is also equal to u ie varY u Thus the variance of Y is nonconstant Thus if Y1 and Y2 are both Poisson random variables the only way that they can have the same variance is if they have the same mean This has implications for regression 7 if Y1 and Y2 correspond to counts taken at different settings of the covariates so thus at possibly different mean values it is inappropriate to assume that they have the same variance Recall that a standard assumption of ordinary regression under normality is that of constant variance regardless of mean value this assumption is clearly not sensible for count data Figure 1 shows the probability histogram for the case of a Poisson distribution with u 4 Because the random variable in question is discrete the histogram is not smooth rather the blocks represent the probabilities of each value on the horizontal axis by area PAGE 426 CHAPTER 11 ST 732 M DAVIDIAN Figure 1 Poisson probabilities with mean 4 0 1 2 3 4 5 6 7 8 91011121314151617181920 count Some features Probabilities of seeing counts larger than 12 are Virtually negligible although in principle counts may take on any nonnegative value Clearly if ILL were larger the values for which probabilities would become negligible would get larger and larger For smallish counts where the mean is small e g ILL 4 the shape of the probability histogram is asymmetric Thus discreteness aside the normal distribution would be a lousy approximation to this shape For larger and larger u it may be seen that the shape gets more and more symmetric Thus when counts are very large it is common to approximate the Poisson probability distribution by a normal distribution PAGE 427 CHAPTER 11 ST 732 M DAVIDIAN EXAMPLE 7 HORSEKICK DATA As an example of a situation where the response is a small count we consider a world famous data set These data may be found on page 227 of Hand et al 1994 Data were collected and maintained over the 20 years 1875 7 1894 inclusive on the numbers of Prussian militiamen killed by being kicked by a horse in each of 10 separate corps of militiamen For example the data for the rst 6 years are as follows Year Corps 1 2 3 4 5 6 7 8 9 10 1875 0 0 0 0 1 1 0 0 1 0 1876 0 0 1 0 0 0 0 0 1 1 1877 0 0 0 0 1 0 0 1 2 0 1878 2 1 1 0 0 0 0 1 1 0 1879 0 1 1 2 0 1 0 0 1 0 1880 2 1 1 1 0 0 2 1 3 0 Thus for example in 1877 2 militiamen were killed by kicks from a horse in the 9th corps Note that technically counts may not be any number 7 there is an upper bound77 the total number of men in the corps But this number is so huge relative to the size of the counts that for all practical purposes it is in nite Clearly the numbers of men killed counts in each yearcorps combination are small thus the normal distribution is a bad approximation to the true Poisson distribution It was of interest to determine from these data whether differences in the numbers of men kicked could be attributed to systematic effects of year or corps That is were members of certain corps more susceptible to horse kick deaths than others Were certain years particularly bad for horse kick deaths 0 If the data were normal a natural approach to this question would be to postulate a regression model that allows mean response to depend on the particular corps and year 0 Speci cally if we were to de ne 19 dummy variables for year and 9 for corps we might write a linear model for the mean of the jth observation in the data set 71 200 total as o 47310671 19wj19 zozji 282j97 112 wjk 1 if observation 9 is from year k 1875 1893 0 otherwise zjk 1 if observation 9 is from corps k 1 9 0 otherwise PAGE 428 CHAPTER 11 ST 732 M DAVIDIAN With these de nitions note that 80 corresponds to what happens for year 1894 with corps 10 The remaining parameters describe the change from this due to changing year or corps 0 Note that aside from the normality issue letting 112 represent the mean of observation Yj has a problem Recall that counts must be nonnegative by de nition However with this model it is possible to end up with an estimated value for that is negative 7 this restriction is not enforced This seems quite possible 7 many of the observations are 0 so that it would not be surprising to end up estimating some means as negative More on this later BINARY DATA 7 THE BERNOULLI DISTRIBUTION Suppose we have a response 3 that takes on either the value 0 or 1 depending on whether an event of interest occurs or not Recall the child respiratory data at the beginning of the course here 3 was 0 or 1 according to whether a child did not or did wheeze Here the response can take on only two possible values Clearly the normal distribution should not even be considered as a model The Bernoulli probability distribution describes probabilities that a random variable Y that charac terizes whether an event occurs or not takes on its two possible values 0 1 The probability density function is given by for O S u S 1 The extremes u 01 are not particularly interesting so we will consider 0 lt u lt 1 This may be summarized succinctly as y 7 PW 7 y 7 W17 M 0 lt M lt1 2 7 01 113 o It may be shown that the mean of Y is u Also note that u is also the probability of seeing the event of interest 3 1 As a probability it must be between 0 and 1 so that the mean of Y must be between 0 and 1 as well 0 Furthermore it may be shown that the variance of Y is equal to u17p ie varY u17p As with the Poisson distribution the variance of Y is nonconstant Thus if Y1 and Y2 are both Bernoulli random variables the only way that they can have the same variance is if they have the same mean PAGE 429 CHAPTER 11 ST 732 M DAVIDIAN o This has implications for regression 7 if Y1 and Y2 correspond to binary responses taken at dif ferent settings of the covariates so thus at possibly different mean values it is inappropriate to assume that they have the same variance Thus again the usual assumption of constant variance is clearly not sensible when modeling binary data EXAMPLE 7 MYOCARDIAL INFARCTION DATA The response is often binary in medical studies Here we consider an example in which 200 women participated in a study to investigate risk factors associated with myocardial infarction heart attack On each woman the following information was observed 0 Whether the woman used oral contraceptives in the past year 1 if yes 0 if no 0 Age in years 0 Whether the woman currently smokes more than 1 pack of cigarettes per day 1 if yes 0 if no 0 Whether the woman has suffered a myocardial infarction 7 the response 3 0 if no 3 1 if yes The data for the rst 10 women are given below Woman Contracep Age Smoke Ml 1 1 33 1 O 2 O 32 O O 3 1 37 O 1 4 O 36 O O 5 1 50 1 1 6 1 40 O O 7 O 35 O O 8 1 33 O O 9 1 33 O O 10 O 31 O O The objective of this study was to determine whether any of the covariates or potential risk factors oral contraceptive use age smoking were associated with the chance of having a heart attack For example was there evidence to suggest that smoking more than one pack of cigarettes a day raises the probability of having a heart attack PAGE 430 CHAPTER 11 ST 732 M DAVIDIAN o If the data were normal a natural approach to this question would be to postulate a regression model that allows mean response which is equal to probability of having a heart attack as this is a binary response to depend on age smoking status and contraceptive use 0 De ne for the jth woman cry1 1 if oral contraceptive use 0 otherwise cry2 age in years cry3 1 if smoke more then one packday 0 otherwise Then we would be tempted to model the mean probability of heart attack as a linear model writing the mean for the j observation o iwji zwjz Bij y 0 Using a linear function of the covariates like this to represent the mean probability of heart attack has an immediate problem Because the mean is a probability it must be between 0 and 1 There is nothing to guarantee that the estimates of means we would end up with after tting this model in the usual way would honor this restriction Thus we could end up with negative estimates of probabilities or estimated probabilities that were greater than one More on this later CONTINUOUS DATA WITH CONSTANT COEFFICIENT OF VARIATION 7 THE GAMMA DIS TRIB U TI ON As we have already remarked just because the response is continuous does not mean that the normal distribution is a sensible probability model 0 For example most biological responses take on only positive values The normal distribution in principle assigns positive probability to all values on the real line negative and positive PAGE 431 CHAPTER 11 ST 732 M DAVIDIAN 0 Furthermore the normal distribution says that values to the left and right of its mean are equally likely to be seen by virtue of the symmetry inherent in the form of the probability density This may not be realistic for biological and other kinds of data A common phenomenon is to see unusually large values of the response with more frequency than unusually small values For example if the response is annual income the distribution of incomes is mostly in a limited range however every so often a chairman of the board athlete or entertainer may command an enormous income For this situation a distribution that says small and large values of the response are equally likely is not suitable Other probability models are available for continuous response that better represent these features Several such models are possible we consider one of these The gamma probability distribution describes the probabilities with which a random variable Y takes on values where Y can only be positive More precisely the probability density function for value 3 is given by 02 7 1 3 102 3 2 f3gt 7 W exp mgty 70 gt 07 3 gt 0gt 114 In 114 is the so called Gamma function This function of a positive argument may only be evaluated on a computer If the argument is a positive integer k however then it turns out that rag 7 k 71 7 k 71 7 2 7 7 7 20 o It may be shown that the mean of Y is u ie EY u Note that u must be positive which makes sense o It may also be shown that the variance of Y is varY Uzi2 That is the variance of Y is nonconstant it depends on the value of u Thus if Y1 and Y2 are both gamma random variables then the only way that they can have the same variance is if they have the same mean u and the same value of the parameter 72 Thus for regression if Y1 and Y2 correspond to responses taken at different covariate settings it is inappropriate to take them to have the same variance Thus as above the assumption of constant variance is not appropriate for a response that is well represented by the gamma probability model PAGE 432 CHAPTER 11 ST 732 M DAVIDIAN o In fact note here that the symbol 72 is being used here in a different way from how we have used it in the past to represent a variance Here it turns out that a not squared has the interpretation as the coe icient of variation CV de ned for any random variable Y as MOH1 CV W that is CV is the ratio of standard deviation of the response to mean or noise to signal This ratio may be expressed as a proportion or a percentage in either case CV characterizes the quality of the data by quantifying how large the noise is relative to the size of the thing being measured 0 Small CV high quality is usually considered to be CV S 030 Large CV low quality is larger 0 Note that for the gamma distribution 2 2 1 2 CV M a I 7 so that regardless of the value of u the ratio of noise to signal is the same Thus rather than having constant variance the gamma distribution imposes constant coe icient of variation This is often a realistic model for biological income and other data taking on positive values Figure 2 shows gamma probability density functions for u 1 and progressively smaller choices of 72 corresponding to progressively smaller CV 0 As 72 becomes smaller the shape of the curve begins to look more symmetric Thus if CV is small high quality data gamma probability distribution looks very much like a normal distribution 0 On the other hand when 72 is relatively large so that CV is large low quality data the shape is skewed For example with 72 05 corresponding to CV 0707 so noise that is 70 the magnitude of the signal upper left panel of Figure 2 the shape of the gamma density does not resemble that of the normal at all PAGE 433 CHAPTER 11 ST 732 M DAVIDIAN Figure 2 Gamma probability density functions a1lt7205 a1lt7202 00 02 O4 O5 0810 00 02 O4 O5 0810 l5 0 l5 0 10 10 EXAMPLE 7 CLOTTING TIME DATA In the development of clotting agents it is common to perform in vitro studies of time to clotting The following data are reported in McCullagh and Nelder 1989 section 842 and are taken from such a study Here samples of normal human plasma were diluted to one of 9 different percentage concentrations with prothrombin free plasma the higher the dilution the more the interference with the blood s ability to clot because the blood s natural clotting capability has been weakened For each sample clotting was induced by introducing thromboplastin a clotting agent and the time until clotting occurred was recorded in seconds 5 samples were measured at each of the 9 percentage concentrations and the mean clotting times were averaged thus the response is mean clotting time over the 5 samples The response is plotted against percentage concentration on the log scale in the upper left panel of Figure 3 We will discuss the other panels of the gure shortly It is well recognized that this type of response which is by its nature always positive does not exhibit the same variability at all levels Rather large responses tend to be more variable than small ones and a constant coe icient of variation model is often a suitable model for this nonconstant variation PAGE 434 CHAPTER 11 ST 732 M DAVIDIAN Figure 3 Clotting times seconds for normal plasma diluted to 9 di erent concentrations with prothrombinfree plasma In the lower right panel the solid line is the loylinear t the clashed line is the reciprocal inverse t Original scale Reciprocal scale 8 O 8 c a c E E 8 E 8 E g 5 5 s e g S s a 8 E 9 E g C 6 O N S c 5 10 50 100 5 10 50 100 log percentage conc or plasma log percentage conc or plasma Log scale Original scale with ts 8 A m f g s g a u c c z E 8 7 to z z m g e m E a E C 8 o V ltr c m E 9 in C N N 5 10 50 100 5 10 50 100 log percentage conc or plasma log percentage conc or plasma From the plot it is clear that a straight line model for mean response as a function of logpercentage concentration would be inappropriate A quadratic model seems better but because such models eventually curve back up77 this might not be a good model either In the upper right and lower left panels the reciprocals and logarithms logy of the response respectively are plotted against logpercentage concentration These appear to be roughly like straight lines the former more so than the latter We will return to the implications of these two plots for choosing a model for mean response shortly Note of course that a sensible model for mean response would be one that honors the positivity restriction for the response Also noticeable from the plot is that the data are of high quality77 7 the pattern of change in the response with logpercentage concentration is very clear and smooth with very little noise77 This would suggest that if the data really are well represented by the gamma probability distribution then the coe icient of variation is small77 From the plot it is very di cult to see any evidence of that the variance really is nonconstant as the response changes 7 this is due to the fact that variation is just so small so it is hard to pick up by eye We will return to these data shortly PAGE 435 CHAPTER 11 ST 732 M DAVIDIAN SUMMARY The Poisson Bernoulli and gamma distributions are three different probability distribu tions that are well suited to modeling data in the form of counts binary response and positive contin uous response where constant coe icient of variation is more likely than constant variance respectively As mentioned above still other probability distributions for other situations are available discussion of these is beyond our scope here but the implications are similar to the cases we have covered We now turn to regression modeling in the context of problems where these probability distributions are appropriate 113 Generalized linear models THE CLASSICAL LINEAR REGRESSION MODEL The classical linear regression model for scalar response and k covariates 5331 wjk is usually written as Yj oJrBiwji 3k00jk j or de ning 11 15371 wjk where 11 is p X 1 p k 1 magma lt 0m kgt4 115 The are assumed to be independent across 9 When the response is continuous it is often assumed that the 53 are independent N002 so that Y Nag302 That is the classical normal based regression model may be summarized as i Mean w ii Probability distribution follow a normal distribution for all j and are independent iii Variance varY lt72 constant regardless of the setting of 117 As we have discussed through our examples this approach has several de ciencies as a model for count binary or some positive continuous data 0 The normal distribution may not be a good probability model 0 Variance may not be constant across the range of the response PAGE 436 CHAPTER 11 ST 732 M DAVIDIAN 0 Because the response and its mean are restricted to be positive a model that does not build this in may be inappropriate 7 in 115 there is nothing that says that estimates of the mean response must be positive everywhere 7 it could very well be that the estimated value of B could produce negative mean estimates for some covariate settings even if ideally this is not possible for the problem at hand Models appropriate for the situations we have been discussing would have to address these issues GENERALIZATION For responses that are not well represented by a normal distribution it is not customary to write models in the form of 115 above with an additive deviation This is because for distributions like the Poisson Bernoulli or gamma there is no analogue to the fact that if e is normally distributed with mean 0 variance 72 then Y u e is also normal with mean u variance 72 It is thus standard to express regression models as we did in i ii and iii above 7 in terms of an assumed model for the mean ii an assumption about probability distribution and iii an assumption about variance As we have noted for the Poisson Bernoulli and gamma distributions the form of the distribution dictates the assumption about variance We now show how this modeling is done for the three situations on which we have focused We will then highlight the common features Because these models are more complex that usual linear regression models special tting techniques are required and will be discussed in section 114 COUNT DATA For data in the form of counts we have noted that a sensible probability model is the Poisson distribution This model dictates that variance is equal to the mean moreover any sensible representation of the mean ought to be such that the mean is forced to be positive i Mean For regression modeling we wish to represent the mean for as a function of the covariates 117 However this representation should ensure the mean can only be positive A model that would accomplish this is GXPWO 1wji kwjk eXPWQ 116 In 116 the positivity requirement is enforced by writing the mean as the exponential of the linear function of B w Note that the model implies 10gEYj o 1wji kwjk 33 ie the logarithm of the mean response is being modeled as a linear function of covariates and regression parameters As a result a model like 116 is often called a loglinear model PAGE 437 CHAPTER 11 ST 732 M DAVIDIAN A v iii Loglinear modeling is a standard technique for data in the form of counts especially when the counts are small When the counts are small it is quite possible that using a linear model instead w would lead to an estimated value for B that would allow estimates of the mean to be negative for some covariate settings This is less of a worry when the counts are very large Consequently loglinear modeling is most often employed for small count data It is important to note that a loglinear model for the mean response is not the only possibility for count data However it is the most common Probability distribution The are assumed to arise at each setting 11 from a Poisson distribution with mean as in 116 and are assumed to be independent Variance Under the Poisson assumption and the mean model 116 we have that the variance of is given by mm 117 EC eXPw3 BINARY DA TA For binary data the relevant probability model is the Bernoulli distribution Here the mean is also equal to the probability of seeing the event of interest thus the mean should be restricted to lie between 0 and 1 In addition the model dictates that the variance of a response is a particular function of the mean 1 Mean For regression modeling we wish to represent the mean for as a function of the covariates 11 with the important restriction that this function always be between 0 and 1 A model that accomplishes this is expltw9xagt 1 expltw9 gt39 Note that regardless of the value of the linear combination w this function must always Eco 118 be less than 1 Similarly the function must always be greater than 0 Convince yourself It is an algebraic exercise to show that try it EC 7 log 7 wj The function of on the left hand side of 119 is called the logit function Recall that here 119 is equal to the probability of seeing the event of interest Thus the function EGG PM is the ratio of the probability of seeing the event of interest to the probability of not seeing it PAGE 438 CHAPTER 11 ST 732 M DAVIDIAN This ratio is often called the odds for this reason Thus the model 118 may be thought of as modeling the log odds as a linear combination of the covariates and regression parameters Model 118 is not the only model appropriate for representing the mean of a Bernoulli random variable any function taking values only between 0 and 1 would do Other such models are the probit and complementary loglog functions see McCullagh and Nelder 1989 page 31 However 118 is by far the most popular and the model is usually referred to as the logistic regression model for binary data A v Probability distribution The are assumed to arise at each setting 11 from a Bernoulli distribution with mean as in 118 and are assumed to be independent iii Variance For binary data if the mean is represented by 118 then we must have that the variance of is given by W09 E3j1 EYj expltw9agt lt1 expltw9agt gt 1110 1expltw9 gt 1expltw9 gt CONTINUOUS POSITIVE DATA WITH CONSTANT COEFFICIENT OF VARIATION For these data there are a number of relevant probability models we have discussed the gamma distribution Here the mean must be positive and the variance must have the constant CV form i Mean For regression modeling we wish to represent the mean for as a function of the covariates 11 If the size of the responses is not too large then using a linear model w could be dangerous thus it is preferred to use a model that enforces positivity One common model is the loglinear model 116 which is also commonly used for count data Both types of data share the requirement of positivity so this is not surprising When the size of the response is larger it is often the case that the positivity requirement is not a big concern 7 even if a linear model is used to represent the data because the responses are all so big estimated means will still all be positive for covariate settings like those of the original data This opens up the possibility for other models for the mean With a single covariate k 1 linear models are seldom used 7 here the linear model would be a straight line This is because it is fairly typical that for phenomena where constant coef cient of variation occurs the relationship between response and covariate seldom looks like a straight line rather it tends to look more like that in the upper left panel of Figure 3 PAGE 439 CHAPTER 11 ST 732 M DAVIDIAN A v Note that in the lower left panel of Figure 3 once the response is placed on the log scale the relationship looks much more like a straight line This suggests that a model like 10gEYj o lm where M log percent concentration might be reasonable that is log of response is a straight line in 7 This is exactly the loglinear model 116 in the special case k 1 of course However note that in the upper right panel once the response is inverted by taking the recip rocal so plotting on the vertical axis the relationship looks even more like a straight line This observation indicates that a model like o 1wj 1 E 0639 might be appropriate More generally for k covariates this suggests the model 1 Em m 1111 This model does not preserve the positivity requirement however for situations where this is not really a concern the inverse or reciprocal model 1111 often gives a better representation than does a plain linear model for as was the case for the clotting time data Probability distribution The are assumed to arise at each setting 11 from a gamma distri bution with mean as in 116 1111 or some other model deemed appropriate The are also assumed to be independent Variance Under the gamma assumption the variance of is proportional to the square of the mean response ie constant coe icient of variation Thus if the mean is represented by 116 then we must have that the variance of is given by varoj aonj a2expa 2 1112 If the mean is represented by 1111 then we must have that 1 2 Y 2E Y 2 2 1113 varm a ltgt a W lt gt PAGE 440 CHAPTER 11 ST 732 M DAVIDIAN IN GENERAL All of the regression models we have discussed share the features that 0 Appropriate models for mean response are of the form EGG fw 7 1114 where fw is a suitable function of a linear combination of the covariates 11 and regression parameter B o The variance of may be represented as a function of the form my V Ecogt1 Vfw gt 1115 where V is a function of the mean response and 175 is a constant usually assumed to be the same for all 9 For the Poisson and Bernoulli cases7 q 1 for the gamma case7 q lt72 SCALED EXPONENTIAL FAMILY It turns out that these regression models share even more It was long ago recognized that certain probability distributions all fall into a general class For distributions in this class7 if the mean is equal to u then the variance must be a speci c function Vu of u Distributions in this class include o The normal distribution with mean u variance 72 not related to u in any way7 so a function of u that is the same for all u o The Poisson distribution with mean u variance 1 o The gamma distribution with mean u variance 72112 0 The Bernoulli distribution with mean u variance 111 7 In The class includes other distributions we have not discussed as well This class of distributions is known as the scaled exponential family As we will discuss in section 1147 because these distributions share so much7 tting regression models under them may be accomplished by the same method PAGE 441 CHAPTER 11 ST 732 M DAVIDIAN GENERALIZED LINEAR MODELS We are now in a position to state all of this more formally A generalized linear model is a regression model for response with the following features 0 The mean of is assumed to be of the form 1114 EGG mm It is customary to express this a bit di ferently however The function f is almost always chosen to be monotone that is it is a strictly increasing or decreasing function of w This means that there is a unique function 9 say called the inverse function of f such that we may re express 1114 model in the form 9EY w i For example for binary data we considered the logistic function 118 ie expw gt E0 N53 Mr This may be rewritten in the form 119 l E Y og1EYj g m w The function g is called the link function because it links the mean and the covariates The linear combination of covariates and regression parameters is called the linear predictor Certain choices of f and hence of link function g are popular for different kinds of data as we have noted The probability distribution governing is assumed to be one of those from the scaled expo nential faInily class The variance of is thus assumed to be of the form dictated by the distribution W09 V ECG l where the function V depends on the distribution and 45 might be equal to a known constant The function V is referred to as the variance function for obvious reasons The parameter 45 is often called the dispersion parameter because it has to do with variance It may be known as for the Poisson or Bernoulli distributions or unknown and estimated which is the case for the gamma The models we have discussed for count binary and positive continuous data are thus all generalized linear models In fact the classical linear regression model assuming normality with constant variance is also a generalized linear model PAGE 442 CHAPTER 11 ST 732 M DAVIDIAN 114 Maximum likelihood and iteratively reweighted least squares The class of generalized linear models may be thought of as extending the usual classical linear model to handle special features of different kinds of data The extension introduces some complications however In particular 0 The model for mean response need no longer be a linear model 0 The variance is allowed to depend on the mean thus the variance depends on the regression parameter The result of these more complex features is that it is no longer quite so straightforward to estimate and 45 if required To appreciate this we rst review the method of least squares for the normal linear constant variance model LINEAR MODEL AND MAXIMUM LIKELIHOOD For the linear model with constant variance 72 and normality the usual method of least squares involves minimizing in B the distance criterion MS H 3 w 1116 7 where 31 yn are observed data This approach has another motivation 7 the estimator of B obtained in this way is the maximum likelihood estimator In particular write the observed data as y 31 3 Because the are assumed independent the joint density of all the data that is the joint density of Y is just the product of the 71 individual normal densities TL fy Hlt2wgt12a1expiltyj e saggy2 71 It is easy to see that the only place that B appears is in the exponent thus if we wish to maximize the likelihood fy we must maximize the exponent Note that the smaller 7 w y gets the larger the exponent gets because of the negative sign Thus to maximize the likelihood we wish to minimize 1116 which corresponds exactly to the method of least squares 0 Thus obtaining the least squares estimator in a linear regression model under the normality and constant variance assumptions is the same as nding the maximum likelihood estimator o In this case minimizing 1116 may be done analytically that is we can write down an explicit expression for the estimator as a function of the random vector Y B X X gt 1X Y PAGE 443 CHAPTER 11 ST 732 M DAVIDIAN where X is the usual design matrix 0 This follows from calculus 7 the minimizing value of 1116 is found by setting the rst derivative of the equation to O and solving for D That is the least squares ML estimator solves the set of 1 equations M Y 7w9 gtw7 039 mm 1 w H 0 Note that the the estimator and the equation it solves are linear functions of the data GENERALIZED LINEAR MODELS AND MAXIMUM LIKELIHOOD A natural approach to esti mating B in all generalized linear models is thus to appeal to the principle of maximum likelihood It is beyond the scope of our discussion to give a detailed treatment of this We simply remark that it turns out that fortuitously the form of the joint density of random variables 11 Yn that arise from any of the distributions in the scaled exponential family class has the same general form Thus it turns out that the ML estimator for B in any generalized linear model solves a set of 1 equations of the same general form n 1 7 Z Vfwa Y7 Hwy3W ivy3 7 07 1118 31 d where f u d fu the derivative of f with respect to its argument u The equation 1118 and the equation for the linear normal constant variance model 1117 share the feature that they are both linear functions of the data and are equations we would like to solve in order to obtain the maximum likelihood estimator for B Thus they are very similar in spirit However they differ in several ways 0 Each deviation 7 in 1118 is weighted in accordance with its variance the scale parameter 45 is a constant Of course so is each deviation in 1117 however in that case the variance is constant for all 9 Recall that weighting in accordance with variance is a sensible principle so it is satisfying to see that despite the difference in probability distributions this principle is still followed Here the variance function depends on B so now the weighting depends on B Thus appears in this equation in a very complicated way 0 Moreover also appears in the function f which can be quite complicated 7 the function f is certainly not a linear function of B The result of these differences is that while it is possible to solve 1117 explicitly it is not possible to do the same for 1118 Rather the solution to 1118 must be found using a numerical algorithm PAGE 444 CHAPTER 11 ST 732 M DAVIDIAN The numerical algorithm is straightforward and works well in practice so this is not an enormous drawback ITERATIVELY REWEIGHTED LEAST SQUARES It turns out that there is a standard algorithm that is applicable for solving equations of the form 1118 discussion of the details is beyond our scope The basic idea is operating on the observed data 0 Given a starting value or guess for B m say evaluate the weights at m 1Vfaj 0gt o Pretending the weights are xed constants not depending on B solve equation 1118 This still requires a numerical technique but may be accomplished by something that is approximately like solving 1117 This gives a new guess for B g say 0 Evaluate the weights at u and repeat Continue updating until two successive values are the same The repeatedly updating of the weights along with the approximation to solve an equation like 1117 gives this procedure its name iteratively reweighted least squares often abbreviated as IRWLS or lWLS Luckily there are standard ways to nd the starting value based on the data and knowledge of the assumed probability distribution Thus the user need not be concerned with this usually software typically generates this value automatically SAMPLING DISTRIBUTION It should come as no surprise that the sampling distribution of the estimator B solving 1118 cannot be derived in closed form Rather it is necessary to resort to large sample theory approximation Here large sample77 refers to the sample size 71 number of independent observations This is sensible 7 each is typically from a different unit We now state the large sample result For 71 large the lRWLSML estimator satis es B a N A V 1A 1r 1119 Here 0 A is a n X 13 matrix whose js element 9 1 n s 1p is the derivative of fa9 with respect to the 5th element of B 0 V is the n X 71 diagonal matrix with diagonal elements Vfa9 PAGE 445 CHAPTER 11 ST 732 M DAVIDIAN A little thought about the form of A and V reveals that both depend on B However is unknown and has been estimated In addition if 45 is not dictated to be equal to a speci c constant eg q 1 if are Poisson or Bernoulli but is unknown if is gamma then it too must be estimated In this situation the standard estimator for 45 is v A Y 7 fWB 2 45 n e p 1 Z 71 Viflt j gtl In the context of tting generalized linear models this is often referred to as the Pearson chisquare divided by its degrees of freedom Other methods are also available we use this method for illustration in the examples of section 116 Thus it is customary to approximate 1119 by replacing and 45 by estimates wherever they appear Standard errors for the elements of B are then found as the square roots of the diagonal elements of the matrix AA A A 1A V5 M V Ari where the hats mean that B and 45 are replaced by estimates We use the same notation V5 as in previous chapters to denote the estimated covariance matrix the de nition of V5 should be clear from the context HYPOTHESIS TESTS It is common to use Wald testing procedures to test hypotheses about Speci cally for null hypotheses of the form H0 L h we may approximate the sampling distribution of the estimate LB by LB a NL LV3L Construction of test statistics and con dence intervals is then carried out in a fashion identical to that discussed in previous chapters For example if L is a row vector then one may form the z statistic 7 LB 7 h SEltLBgt39 More generally the Wald X2 test statistic would be LB 7 hgt ltLVaL gt1ltLB 7 h of course 22 in the case L has a single row PAGE 446 CHAPTER 11 ST 732 M DAVIDIAN REMARK Note that all of this looks very similar to what is done in classical linear regression under the assumption of constant variance and normality The obvious difference is that the results are now just large sample approximations rather than exact but the form and spirit are the same 115 Discussion Generalized linear models may be regarded as an extension of classical linear regression when the usual assumptions of normality and constant variance do not apply Because of the additional considerations imposed by the nature of the data sensible models for mean response may no longer be linear functions of covariates and regression parameters directly Rather the mean response is modeled as a function nonlinear of a linear combination of covariates and regression parameters the linear predictor Although the models and tting methods become more complicated as a result the spirit is the same 116 Implementation with SAS We illustrate how to carry out tting of generalized linear models for the three examples discussed in this section 1 The horsekick data 2 The myocardial infarction data 3 The clotting times data As our main objective is to gain some familiarity with these models in order to appreciate their extension to the case of longitudinal data from m units we do not perform detailed comprehensive analyses involving many questions of scienti c interest Rather we focus mainly on how to specify models using SAS PRDC GENMDD and how to interpret the output In the next chapter we will use PRDC GENMDD with the REPEATED statement to t longitudinal data PAGE 447 CHAPTER 11 ST 732 M DAVIDIAN EXAMPLE 1 7 HORSEKIOK DATA Recall that it was reasonable to model these data using the Poisson distribution assumption De ne to be the jth observations of number of horsekick deaths suffered corresponding to a particular corps and year denoted by dummy variables wjk 1 if observationj is from year k 1875 1893 0 otherwise zjk 1 if observationj is from corps k 1 9 0 otherwise We thus consider the loglinear model eXPWO 1wji 19wj19 202j1 282j9gt 1120 for the mean response This model represents the mean number of horse kicks as an exponential function for example for 9 corresponding to 1894 and corps 10 EC expt o forj corresponding to 1875 and corps 1 E09 eXPCHO 1 20r An obvious question of interest would be to determine whether some of the regression parameters are different from 0 indicating that the particular year or corps to which they correspond does not differ from the nal year and corps 1894 corps 10 This may be addressed by inspecting the Wald test statistics corresponding to each element of B To address the issue of how speci c years compared averaged across corps one would be interested in whether the appropriate differences in elements of B were equal to zero For example if we were interested in whether 1875 and 1880 were different we would be interested in the difference 81 7 85 PAGE 448 CHAPTER 11 ST 732 M DAVIDIAN PROGRAM CHAPTER 11 EXAMPLE 1 Fit a loglinear regression model to the horsekick data Poisson assumption options ls80 ps59 nodate run The data look like first 6 records 1875 0 0 0 0 1 1 0 0 1 0 1876 0 0 1 0 0 0 0 0 1 1 1877 0 0 0 0 1 0 0 1 2 0 1878 2 1 1 0 0 0 0 1 1 0 1879 0 1 1 2 0 1 0 0 1 0 1880 2 1 1 1 0 0 2 1 3 0 col umn 1 year columns 211 number of fatal horsekicks suffered by corps 110 data kicks infile kicksdat inpu year c1c10 run Reconfigure the data so that the a single number of kicks for a particular yearcorps combination appears on a separate line data kicks2 set kicks array c10 c1c10 do c s orp 1 to 10 kicks ccorps output en drop c1c10 run proc print datakicks2 run Fit the loglinear regression model using PRGCGENMGD Here the dispersion parameter hi1 so is not estimated e let SAS form the dummy variables through use of the CLASS statement is results in the f an response being parameterized as in equation 1120 The DISTPGISSGN option in the model statement specifies that the Poisson probability distribution assumption with its requirement that mean variance be used The LINKLOG option asks for the loglinear model Other LINK choices are available We also use a CONTRAST statement to investigate whether there is 39 t 875 differed from 1880 in terms of numbers of horsekick deaths The WALD option asks that the usual large sample chisquare test statistic be used as the basis for the test proc genmod datakicks2 class year corps model kicks year corps dist poisson link log contrast 18751880 year 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 wald PU E 449 CHAPTERII ST 732 M DAVIDIAN OUTPUT Following the output7 we comment on a few aspects of the output 1 U39 m mpmMHo om mmpmMH The year year SAS System corps kicks PU E 450 CHAPTER 11 ST 732 M DAVIDIAN 885 1 The SAS System 3 corps kicks D cr ll lt D r H Immpwkm k 00 00 03 PAGE 451 CHAPTER 11 ST 732 M DAVIDIAN The SAS System 4 Obs yea corps kicks 6 891 6 7 891 7 8 891 9 891 70 891 1 71 892 72 892 73 892 74 892 L 75 892 76 892 77 892 78 892 79 892 92 1 893 893 893 L 893 L 893 893 893 893 893 893 1 894 894 894 L 894 L 894 894 894 94 199 894 9 200 894 1 The SAS System 5 The GENMOD Procedure Model Information WORKKICKS2 Distribution Poisso ink Function Log Dependent Variable kicks Number of Observations Head 200 Number of Observations Used 200 Class Level Information Class Levels Values year 20 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 corps 10 1 2 3 4 5 6 7 8 9 10 Parameter Information Parameter Effect year corps rm1 Intercept rm2 year 1875 rm3 year 1876 rm4 year 1877 rm5 year 1878 rm6 year 1879 rm7 year 1880 rm8 year 1881 9 year 1882 rm10 year 1883 rm11 year 1884 rm12 year 1885 rm13 year 1886 rm14 year 1887 rm15 year 1888 rm16 year 1889 rm17 year 1890 rm18 year 1891 rm19 year 1892 rm20 year 1893 rm21 year 1894 rm22 corps 1 rm23 corps 2 rm24 corps 3 rm25 corps 4 rm26 corps 5 PU E 452 CHAPTERll ST 732 M DAVIDIAN Prm27 corps 6 Prm28 corps 7 Prm29 corps 8 Prm30 corps 9 The SAS System 6 The GENMGD Procedure Parameter Information Parameter Effect year corps Prm31 corps 10 Criteria For Assessing Goodness 0f Fit Criterion DF Value ValueDP Deviance 171 1716395 10037 Scaled Deviance 171 1716395 10037 Pearson ChiSquare 171 1606793 09396 Sca ed Pearson X2 171 1606793 09396 Log Likelihood 618886 Algorithm converged Analysis Of Parameter Estimates Standard Wald 95 Chi Parameter DF Estimate Error Confidence Limits Square Pr gt ChiSq Intercept 0314 7854 5707 4921 0097 year 875 4055 9129 3837 1947 6569 year 876 4055 9129 3837 1947 6569 year 877 6931 8660 0042 3905 L 4235 year 878 0986 8165 5017 6989 1785 year 879 0986 8165 5017 6989 1785 year 880 7047 7687 1981 2114 L 0266 year 881 9163 8367 7235 5561 2734 year 882 5041 7817 0281 0363 0544 year 883 0986 8165 5017 6989 1785 year 1884 0986 8165 5017 6989 1 1785 year 1885 4055 9129 3837 1947 0 6569 year 886 0986 8165 5017 6989 1 1785 year 887 5041 7817 0281 0363 70 0544 year 888 4055 9129 3837 1947 0 6569 year 889 3863 7906 1632 9358 7 0795 year 890 7918 7638 2948 2887 0 0190 year 891 5041 7817 0281 0363 70 0544 year 892 2528 8018 3187 8242 14 1182 year 893 6931 8660 0042 3905 4 4235 year 894 0000 0000 0000 0000 corps 4055 4564 4891 3001 73 3744 corps 4055 4564 4891 3001 73 3744 corps 0000 5000 9800 9800 0 0000 corps L 3185 4647 5923 2292 47 4931 corps 4055 4564 4891 3001 73 3744 corps 1335 5175 1479 8808 07 7964 corps 4855 4494 3952 3662 17 2799 corps 6286 4378 2295 4867 0 1510 The SAS System 7 The GENMGD Procedure Analysis Of Parameter Estimates Standard Wald 95 Chi Parameter DF Estimate Error Confidence Limits Square Pr gt ChiSq corps 9 1 10986 04082 02985 18988 724 00071 cor s 10 0 00000 00000 00000 00000 Sca e 0 10000 00000 10000 10000 NOTE The scale parameter was held fixed Contrast Results Chi Contrast DF Square Pr gt ChiSq Type 18751880 1 398 00461 Wald INTERPRETATION PU E 453 CHAPTER 11 ST 732 M DAVIDIAN 0 Pages 174 of the output show the recon gured data set 0 The results of running PRDC GENMDD appear on pages 577 of the output On page 6 the results of the t by lRWLSML are displayed The table Analysis of Parameter Estimates contains the estimates of the parameters 80 7 828 along with their estimated standard errors square roots of the elements of The column ChiSquare gives the value of the Wald test statistic for testing whether the parameter in that row is equal to zero The row SCALE corresponds to 45 here for the Poisson distribution 45 1 so nothing is estimated This is noted at the bottom of page 6 The scale parameter was held fixed Page 7 shows the result of the contrast statement to address the null hypothesis that there was no difference in mean horsekick deaths in 1875 and 1880 see the program The Wald test statistic is 398 with an asociated p value of 0046 suggesting that there is some evidence to support a difference Note that if 81 and 85 are different then the mean responses for 1875 and 1880 must be different for any corps However note that the difference 81 7 85 does not correspond to the actual difference in mean response Inspection of the estimates of 81 and 85 on page 6 shows 31 04055 and 35 17047 This suggests that the mean response for 1880 which depends on exp85 is larger than that for 1875 which depends on exp81 EXAMPLE 2 e MYOCARDIAL INFAROTION DATA Here the response whether or not a woman has suffered a myocardial infarction is binary so we wish to t a generalized linear model assuming the Bernoulli distribution The mean function must honor the restriction of being between 0 and 1 here we t the logistic regression model using the logit link Recall that we de ned w 1 if oral contraceptive use 0 otherwise cry2 age in years cry3 1 if smoke more then one packday 0 otherwise Thus we model the mean response equivalently the probability of suffering a heart attack as W E Y39 j 1 GXPWO 1wj1 Jr zwjz Jr gwjg 1121 Interest focuses on whether or not 81 82 and 83 corresponding to the association of oral contraceptive use age and smoking respectively with probability of myocardial infarction are different from zero PAGE 454 CHAPTER 11 ST 732 M DAVIDIAN If l is different from zero7 for example7 the interpretation is that oral contraceptive use does change the probability of suffering a heart attack We say more about this shortly PROGRAM CHAPTER 11 EXAMPLE 2 Fit a logistic regression model to the myocardial infarction data options ls80 ps59 nodate run E data look like first 10 records Homummpmmw HHOHHOHO 8 OOOOHOOOH OOOOHOHOO o o m H o 0 column 1 subject id column 2 oral contraceptive indicator 0no1yes column 3 age column 4 smoking indicator Ono1yes column 5 binary response whether MI has been suffered Ono1yes data mi infile infarcdat input id oral age smoke mi i Fit the logistic regression model using PRGC GENMGD We do not use a CLA S statement here as t e covariates are either continuous AGE or already in quotdummyquot form ORAL SMOKE The model statement with the LINKLOGIT option results in the logistic re ression model in equation 1021 The DISTBINGMIAL specifies t e Bernoulli distribution which is the simplest case o a binomial distribution In versions 7 and higher of SAS PRGC GENMGD will model by default the probability that the response y rat er an e conventiona To make PRGC ENMGD model probabilit y1 as is s and rd one must include the DESCEND NG option in the PRGC GENMGD statement In earlier versions of the probability y1 is modeled by default as would be expected f the user is unsure which probability is bein modeled o e can check the log file In later versions of AS an ex licit statement about w at is being modeled will appear PRGC G NMGD ougpptdshould also contain a statement about what is being mo e e proc genmod datami descending odel mi oral age smoke dist binomial link logit PAGE4 CHAPTER 11 ST 732 M DAVIDIAN OUTPUT Following the output7 we comment on a few aspects of the output The SAS System 1 The GENMOD Procedure Model Information Distribution Binomial 39 c io ogit Dependent Variable Number of Observations Read 200 Number of Observations Used 200 Number of Events 43 Number of Trials 200 Response Profile Ordered Total Value mi Frequency 1 1 43 2 0 157 PROC GENMOD is modeling the probability that mi 1 Parameter Information Parameter Effect Prm1 Intercept Prm2 oral Prm3 age Prm4 smoke Criteria For Assessing Goodness Of Fit Criterion DF Value ValueDP Dev1 ce 196 1503748 07672 Sca De an 196 1503748 07672 Pearso hiSq re 6 1775430 0 9058 Scaled Pearson X2 196 177 5430 0 9058 Log Likelihood 7 4 Algorithm converged 2 The SAS System The GENMOD Procedure Analysis Of Parameter Estimates Standard Wald 95 Chi Parameter DF Estimate Error Confidence Limits Square Pr gt ChiSq Intercept 1 91140 17571 125579 56702 2690 0001 oral 1 19799 04697 10593 29005 1777 0001 age 1 01626 00445 00753 02498 1332 00003 smoke 1 18122 04294 09706 26538 1781 0001 Scale 0 10000 00000 10000 10000 NOTE The scale parameter was held fixed Contrast Estimate Results Standard Ch Label Estimate Error Alpha Confidence Limits Square smk log odds ratio 181 2 04294 005 09706 26538 1781 Expsmk log odds ratio 61241 26297 005 26396 142084 Contrast Estimate Results Label Pr gt ChiSq smk log odds ratio 0001 Expsmk log odds ratio INTERPRETATION 0 From the output7 the Wald test statistics in the Chi Square column of the table Analysis Of PUAE 456 CHAPTER 11 ST 732 M DAVIDIAN Parameter Estimates of whether l 07 g 07 and g O are all large7 with very small p values This suggests that there is strong evidence that oral contraceptive use7 age7 and smoking affects the probability of having a heart attack PAGE 457 CHAPTER 11 ST 732 M DAVIDIAN o In each case note that the estimate is positive The logistic function expltugt 1 expu is an increasing function of u Note that because the estimated values of 81 82 and 83 are positive if cry1 changes from 0 no contraceptives to 1 contraceptives the linear predictor o iwji zwjz 3sz evaluated at the estimates increases and the same is true if age cry2 increases or if cry3 changes from 0 no smoking to 1 smoking Thus the t indicates that the probability of having a heart attack increases if one uses oral contraceptives or smokes and increases as women age 0 In fact we can say more According to this model the odds of having a heart attack given a woman has particular settings of contraceptive use age and smoking cry1wj2wj3 is from 119 which is the ratio of the probability of having a heart attack to not having one is eXpWo iwji zwjz Bwj3gtgt A common quantity of interest is the so called odds ratio For example we may be interested in comparing the odds of having a heart attack if a randomly chosen woman smokes cry3 1 to those if she does not cry3 O The ratio of the odds under smoking to those under not smoking for any settings of age or contraceptive use is thus GXPWO 1wji Jr zwjz ax 7 7 eXPCBB eXPCHo lwyl zwyz Thus exp 83 is a multiplicative factor that measures by how much the odds of having a heart attack change if we move from not smoking to smoking If 83 gt 0 this multiplicative factor is gt 1 meaning that the odds go up if 83 is negative the factor is lt 1 and the odds go down 83 itself is referred to as the log odds ratio for obvious reasons Here we estimate the log odds ratio for smoking as 181 and the odds ratios as exp 3 eXp181 612 the odds increase by 6 fold if a woman smokes Note that ideally we would like a standard error to attach to this estimated odd ratios PAGE 458 CHAPTER 11 ST 732 M DAVIDIAN One can actually get PRDC GENMDD to print out a log odds ratio and odds ratio and associated standard errors in an estimate statement with the exp option by choosing L appropriately Here to get the log odds ratio which is just 83 we take L 00 0 1 The estimate tatement would be estimate quotsmk log odds ratioquot int 0 oral 0 age 0 smoke 1 exp try adding this to the program and see what happens see the program on the class web site for the results 0 An interesting aside Logistic regression is a standard technique in public health studies Chances are when you read in the newspaper that a certain behavior increases the risk of developing a disease the analysis that was performed to arrive at that conclusion was like this one EXAMPLE 3 e CLOTTING TIME DATA These data are positive and continuous with possible con stant coe icient of variation Thus we consider the gamma probability model Letting be the clotting time at percentage concentration wj we consider two models for the mean response 0 Loglinear GXPCBO j o Reciprocal inverse 1 0 1wj Note that although in both models 81 has to do with how the changing percentage concentration affects the mean response this happens in different ways in each model so the parameters have different interpretations so it is not interesting to compare their values for the different models Here because of the gamma assumption the dispersion parameter 45 is not equal to a xed known constant It is thus estimated from the data Note that PRDC GENMDD does not print out the estimate of 45 rather it prints out 145 We also show how to obtain results of the t in a table that may be output to a SAS data set using the ods statement which is relevant in versions 7 and higher of SAS Earlier versions use the make statement PAGE 459 CHAPTER 11 ST 732 M DAVIDIAN PROGRAM CHAPTER 11 EXAMPLE 3 Fitting loglinear and reciprocal models to the clotting data amma assumption options ls80 ps59 nodate run The data look like 5 118 10 58 15 42 20 35 30 27 40 25 60 21 80 19 100 18 column 1 percentage concentration plasma column 2 clotting time seconds data clots infile clotdat input u y Fit the loglinear regression model using PRGC GENMGD The DISTGAMMA option specifies the gamma istribution assumption We then fit two models the loglinear mode in e first call to PRGC GENMGD obtained with the LINKLGG option and the reciprocal inverse model obtained with the LINKPGWERl option this option asks that the linear redictor be raised to the power in parentheses as the model r the mean response Here the dispersion parameter phi is unknown so must be estimated This may be done a number of wa s here we use the PSCALE option in MODEL statement to as that phi be estima m by Actually for the gamma istribution wha SCALE arameter is the reciprocal of this quantity so w remember to invert the resu Also use the UBSTATS option in the MODEL statement to output table of statistics such as predicted values estimates of the mean se and residuals responseestimated mean We sh 39 t fo with them The ODS statemen wo s wi version 7 an ig r o S Note that the obstats option causes the output of GENMGD to contain 39stics printing the output data set simply repeats these va ues proc genmod dataclots model y x dist gamma link log obstats pscale ods output obstatsout og L111 proc print dataoutlog run Fit the inverse reciprocal regression model using PRGC GENMGD Phi is again calculated by the Pearson chisquaredof proc genmod dataclots model x dist gamma link power1 obstats pscale run PUAE 460 CHAPTER 11 ST 732 M DAVIDIAN OUTPUT Following the output7 we comment on a few aspects of the output The SAS System 1 The GENMOD Procedure Model Information WORKCLOTS Gamma Distribution 39 Function Log Dependent Variable Number of Observations Head 9 Number of Observations Used 9 Criteria For Assessing Goodness Of Fit Criterion DF Value ValueDP Deviance 7 0 1626 00232 Scaled Deviance 7 6 6768 09538 Pearso Chi Sq are 7 0 1705 0 0244 Scaled Pearson X2 7 70000 10000 Log Likelihood 264276 Algorithm converged Analysis Of Parameter Estimates Standard Wald 95 Chi Parameter DF Estimate Error Confidence Limits Square Pr gt ChiSq Intercept 1 55032 01799 51506 58559 93563 0001 x 1 06019 00520 07039 04999 13380 0001 Scale 0 410604 00000 410604 410604 NOTE The Gamma scale parameter was estimated by DOFPearson s ChiSquare Lagrange Multiplier Statistics Parameter ChiSquare Pr gt ChiSq Scale 03069 05796 Observation Statistics Observation y x Pred Xbeta Std Hesngt Lower Upper Resraw Reschi StResdev StReschi Reslik 1 118 16094379 93175154 45344811 01026374 52000165 76196496 11393712 24824846 0266432 02458801 21728608 23544798 22608074 The SAS System 2 The GENMOD Procedure Observation Statistics Observation y x Pred Xbeta Std Hesngt Lower Upper Resraw Reschi StResdev StReschi Reslik 2 58 23025851 6139102 41172636 00738424 38792341 53119026 70951174 339102 0055236 0056288 03 as M M 1 o 00 o 01 o M as 000 o J 03 m 00 M m 00 1 m M o 1 00607149 35855825 09638 0126753 0132544 0 8844 0918591 4 35 29957323 40449166 3700046 00545252 35528863 36349431 45011297 5449166 0134716 0141291 0 7 8 092 0 096160 5 27 34011974 31689627 34559894 0052237 34984 28605721 353823g 4689627 0147986 0155989 6 25 36888795 26651048 3 2828285 00556359 38516653 23897747 29721562 1651048 0061951 0063278 04 509 0425393 0433342 7 21 40943446 20879585 30387719 00661298 41297168 18341382 23769042 01204152 00057671 00057561 0409427 00410213 0 040 8 19 4 3820266 175597 8 8 00762872 44428066 65611 155121094 20 391766 1 44032 00820182 00798774 PUAE 461 CHAPTER 11 ST 732 M DAVIDIAN 9 18 46051702 15352785 27312969 00851497 48140231 12992945 18141231 26472147 01724257 01634065 12715487 13417313 12945556 The SAS System 3 Obs Observation y x Pred Xbeta 1 118 16094379 93175154 45344811 2 58 23025851 6139102 41172636 3 42 2 7080502 48096382 3873207 L 4 35 2 9957323 404491 6 3 70 046 5 27 3 4011974 31689627 3 4559894 6 25 3 6888795 26651048 3 2828285 7 21 40943446 20879585 3 0387719 8 19 43820266 17559778 2865611 9 9 18 46051702 15352785 27312969 Obs Std Hesswgt Lower Upper Resraw 01026374 52000165 76196496 11393712 24824846 00738424 38792341 53119026 70951174 3 9102 00607149 35855825 42700382 54174268 6096382 L 00545252 35528863 36349431 45011297 5 449166 0052237 34984 28605721 35106001 4689627 00556359 38516653 23897747 29721562 1651048 00661298 41297168 18341382 23769042 01204152 00762872 44428066 15121094 20391766 14402218 9 00851497 48140231 12992945 18141231 26472147 Obs Reschi Resdev Stresdev Streschi Reslik 0266432 02458801 21728608 23544798 22608074 0055236 0056288 0413325 0405606 0411497 0 126753 0 13254 48 8844 0918591 L 0 134716 0 141291 0 967048 92205 0 961605 0 147986 0 155989 1 060815 1 006389 1 054851 0 061951 0 063 0 434509 0 425393 0 4 33 2 00057671 00057561 09427 00410213 0409576 00820182 00798774 05932165 06091154 05973195 9 01724257 01634065 12715487 13417313 12945556 4 The SAS System The GENMOD Procedure Model Information WORKCLOTS Distribution amma ink F ction Power1 Dependent Variable y Number of Observations Read 8 Number of Observations Used Criteria For Assessing Goodness Of Fit Criterion DF Value ValueDP Deviance 7 00167 00024 Scaled Deviance 7 68395 09771 Pearso ChiSq are 7 00171 00024 Scal d Pearson X2 7 70000 10000 161504 e Log Likelihood Algorithm converged Analysis Of Parameter Estimates Standard Wald 95 Chi Parameter DF Estimate Error Confidence Limits Square Pr gt ChiSq Intercept 1 00166 00009 00184 00147 31853 0001 x 00153 00004 00145 00162 136715 0001 Scale 0 4088247 00000 4088247 4088247 NOTE The Gamma scale parameter was estimated by DOFPearson s ChiSquare Lagrange Multiplier Statistics Parameter ChiSquare Pr gt ChiSq Scale 02600 06101 Observation Statistics Observation y x Pred Xbeta Std Hesngt Lower Upper Resraw Reschi esdev StResdev StReschi Reslik PU E 462 CHAPTER 11 ST 732 M DAVIDIAN 1 118 16094379 12285904 00081394 00003814 61709405 11252367 13528505 4859041 003955 0040083 2535827 2502059 250553 The SAS System 5 The GENMGD Procedure Observation Statistics Observation y x Pred Xbeta Std Hesngt Lower Upper Resraw Reschi StResdev StReschi Reslik 2 58 0003353 11598527 23025851 53263889 00187744 5 0889179 00864112 0 0 00004121 65435276 38754832 41343065 19928686 00498128 0049009 039 0 0 m wgt M M l o oo o 01 o M wgt o o o l H m H o o M wgt lt0 lt0 01 01 0004948 47267468 0293319 00290499 pp A 01 M O 0 U1 A M A A agt O O M 03 A I O O M O agt O 0 U1 01 N x u n c H pi lt0 n to m lt3 n U1 4 4 lt0 lt3 lt3 no 01 n no I 06 0006317 32202628 2712331 29076102 10 5779 0037974 0038466 39 00007367 2549476 24103101 25906332 00277938 0001113 00011126 00008909 19099429 20828244 22462064 0614323 0028422 0028696 0001003 15917377 1899499 20528126 0731822 0037088 0037557 390 17 2 00010911 13966578 17780391 19 243791 0 48317 0026141 0026372 0 5839 39 m u m H m m o H m H H m m o m m o m m p m o m m m p u m p o m m m H m m 0 gt gtUgt lt0 m 0 mm F um HQ p m m m m m m o m m m o o o o o o m H H o m o m m o u m H 0 01 DP m m m 0 H I H 03 01 O H I H I A H q 0 O 01 p H O A A INTERPRETATION 0 Pages 172 of the output show the results of tting the loglinear model The estimates of g and 61 and their estimated standard errors are given in the table Analysis of Parameter Estimates The SCALE parameter estimate corresponds to an estimate of lq thus7 the estimate of 45 itself is 1410604 002435 Recall that the coe icient of variation 7 is de ned as 72 q thus7 the estimated coe icient of variation under the loglinear t is 015606 The table Observation Statistics on pages 1 and 2 lists a number of results based on the t Of particular interest is the column FRED7 which gives the estimates of the mean response at each wj value the column Y contains the actual data values for comparison These numbers are repeated on page 37 which shows the result of the call to proc print to print the data set created by the ods statement This illustrates how it is possible to output such results so that further manipulation may be undertaken 0 Pages 475 contain the same information for the reciprocal link t Here7 the estimate of 45 is 14088247 00024467 so that the estimated coe icient of variation 7 is 004946 0 Note that the estimates of CV do not agree well at all between the two ts The reason can be appreciated when one inspects the lower right panel of Figure 3 Here7 the estimated mean PU E 463 CHAPTER 11 ST 732 M DAVIDIAN response for each t is superimposed on the actual data 7 the solid line represents the t of the loglinear model the dashed line is the t of the reciprocal model Note that this second model appears to provide a much better t to the data The calculation of 45 and hence of a is based on squared deviations 7 Because the loglinear model ts poorly these deviations are large leading to an estimate of CV that is misleading large The reciprocal model which ts the data very well leads to a much smaller estimate because the deviations of the t from the observed responses are much smaller Based on the visual evidence the t of the reciprocal model is preferred for describing the percentage concentration of plasma clotting time relationship PAGE 464 CHAPTER 13 ST 732 M DAVIDIAN 13 Advanced topics 131 Introduction In this chapter we conclude with brief overviews of several advanced topics Each of these topics could realistically be the subject of an entire course 132 Generalized linear mixed models The models considered in Chapter 12 were of the populationaveraged type that is the focus was on explicit modeling of the mean of a data vector Of course the elements of EY E027 represent the mean response at a particular time tij and possibly setting of covariates ie the average over all possible values of K we might see under those conditions the average being over all members of the population The models used to represent as a function of tij and other covariates were of the generalized linear type so were no longer linear functions of the parameter characterizing mean response In Section 125 we discussed brie y the alternative strategy of subjectspeci c models for nonnormal data Here the idea is to model individual trajectories where the mean at time tij over all observations we might see for a speci c individual is represented again by a generalized linear model but where the parameters are in turn allowed to depend on random effects A general representation of such a model is as follows recall that the conditional expectation of YZ39 given a vector of random effects bi unique to individual 2 may be thought of as the mean response for a particular individual We have for an element of YZ39 that for a suitable function f 190 bi fltwij igt7 131 where the subject speci c parameter B may be represented as before eg in the most general case Ai Bibi 132 Here then B is the parameter that describes the typical value of Bis across all individuals with covariate matrix A2 eg all individual in a particular treatment group bi is a random effect assumed to come from a distribution with mean 0 almost always taken to be the multivariate normal distribution so that b NOD PAGE 519 CHAPTER 13 ST 732 M DAVIDIAN It is further assumed that at the level of the individual the data in YZ39 follow one of the distributions such as the binomial Poisson or gamma in the scaled exponential family It is common to assume that observations on a given individual are taken far apart enough in time so that there is no correlation introduced by the way the data are collected within an individual in fact the observations on a particular individual 2 123 j 1 m are assumed to be independent at the level of the individual The variance of an observation at the level of the individual will thus depend on the mean of an observation at the individual level Thus we think of the variance associated with observations Within a particular individual as being conditional on that individual s random effects because the mean is conditional on them Thus we think of the variance within an individual as VaFOij I bi Vfj 7 where 45 may or may not be known depending on the nature of the data For example if the Y2 are counts then appropriate distribution for example if the Y2 are counts then it follows that varY j I bi fltw j igt39 The model de ned in 131 and 132 with the stated properties is referred to in the statistical liter ature as a generalized linear mixed model for obvious reasons It is an alternative model to the population averaged models in Chapter 12 Just as in the linear case it may be more advantageous or natural to think of individual trajectories rather than the average response over the population this model allows thinking this way However as discussed in Section 125 it is not the case that this model and a population averaged model constructed using the same function f lead to the same model for EY j as was fortuitously true in the case of a linear model Thus whether one adopts a populationaveraged or subject speci c approach will lead to different implied models for the mean response for the population Technically this is because under the population averaged model we would take 190 atly3 while under the subject speci c approach we would take Em bi f j 7 which implies upon averaging over the population that 190 Ef j z 39 PAGE 520 CHAPTER 13 ST 732 M DAVIDIAN Plugging in 132 for Bi we see that under the subject speci c approach the implied model for mean over the population is 19027 ElfwjAi6 B b lgt It is a mathematical fact that because f is not a linear function of bi taking this expectation is an operation that is likely to be impossible to do in closed form It follows that it is simply not possible that f j ElfjA2 Bibi that is the two types of model for mean response implied by each strategy are almost certainly not the same This has caused some debate about which strategy is more appropriate For linear models the debate is not as strong because the mean response model turns out to be the same the only difference being how one models the covariance Here instead what is implied about the most prominent aspect the mean over the population is not the same The debate has not been resolved and still rages in the statistical literature In real applications the following is typically true 0 For studies in public health education and so on where the main goal of data analysis is to make proclamations about the population the usual strategy has been to use population averaged models The rationale is that interest focuses on what happens on the average in a population so why not just model that directly For example if a government health agency wishes to understand whether maternal smoking affects child respiratory health for the purposes of making public policy statements it wants to make statements about what happens on the average in the whole population For the purposes of making general policy there is no real interest in individual children and their respiratory trajectories Thus the thinking is 7 why complicate matters by assuming a subject speci c model when there is no interest in individual subjects 0 On the other hand in the context of a clinical trial there may be interest in individual patients and understanding how they evolve over time For example in the epileptic seizure study in Chapter 12 researchers may think that the process of how epileptic seizures occur over time is something that happens within a subject and they may wish to characterize that for individual subjects As a result it is more common to see generalized linear mixed models used in this kind of setting PAGE 521 CHAPTER 13 ST 732 M DAVIDIAN INFERENCE One major complication in implementing the tting of generalized linear mixed models is that it is no longer straightforward to write down the implied likelihood of a data vector The actual form of this likelihood is quite complicated and will involve an integral with respect to the elements of bi Rather than write down this mess we note what the problem is by considering again something that is related to the full likelihood of a data vector 7 the mean vector Here the mean vector is EY2 j ElfjA2 B b l7 which is a calculation that we have already noted is generally not possible to do in closed form This suggests that trying to derive the whole likelihood function in closed form would be equally di icult which it is The result is that the function we would like to use as the basis of estimation and testing is not even something we can write down A variety of approaches to dealing with this problem by way of approximations that might allow something close to77 the true likelihood function to be written down have been proposed Discussion of these methods is beyond our scope see the references in Diggle Heagerty Liang and Zeger 2002 for an introduction to the statistical literature One of these approximate approaches is implemented in a macro provided by SAS glimmix The procedure proc nlmixed ts these models directly A new procedure proc glimmix is being developed It is important that the user fully understand the basis of these approximate approaches before attempting to t such models 7 the interpretation and tting can be very di icultl 133 Nonlinear mixed effects models A more complicated version of generalized linear mixed models is possible In many applications a suitable model for individual trajectories is dictated by theoretical concerns Recall for example the soybean growth data introduced in Chapter 1 the plot is reproduced here as Figure 1 A common model for the process of growth is the so called logistic growth function this function is of a similar form as the logistic regression model discussed previously but the interpretation is different If one assumes that the rate of change of the growth value size77 or weight for example of the organism here plants in a soybean plot relative to the size of the organism at any time declines in a linear fashion with increasing growth it may be shown that the growth value at any particular time t may be represented by a function of the form i ing 1 2 expk gt 133 Where iy zy g gt 0 PAGE 522 CHAPTER 13 ST 732 M DAVIDIAN Figure 1 Average leaf weightplant pro les for 8 plots planted with Forrest and 8plots planted with P1 416937 in 1989 GenotypeF GenotypeP LanWemMFlam an LanWemMFlam an DavsA evPlammg DavsA evPlammu Here the value 81 corresponds to the asymptote of growth that is the value that growth seems to level out at as time grows large The parameter 83 is sometimes called a growth rate parameter because it characterizes how the growth increases as a function of time by decreasing the denominator of 133 A scientist may have speci c interest in these features It is natural in a setting like this to think that each soybean plot evolves over time according to a growth process unique to that plot If the model 133 is a reasonable way to represent the process a particular plot might undergo then it is natural to think of representing the situation of several such plots by allowing each plot to have its own logistic growth model with its own parameters that characterize how large it ultimately gets and its growth rate More formally if Y2 is the measurement on the growth value at time tij for the 2th plot we might think of the mean at the individual plot level as being represented by 133 with plot speci c values for 1 2 3 that is u u I 7 7 E03 3952 1 2i exp7 3itij7 52 gm 7 A25 B2527 134 at where bi are random effects and Ai and Bi are suitable matrices allowing covariate information eg genotype and other considerations to be represented PAGE 523 CHAPTER 13 ST 732 M DAVIDIAN This seems like a natural way to think and it is indeed the way scientists feel comfortable thinking when trying to formally represent the data Of course the model 134 and more general versions of it eg other functions f is a subjectspeci c model Thus for many applications in the biological sciences there is a theoretical basis for preferring the subjectspeci c modeling approach This model looks very similar to the general form of a generalized linear mixed model with one important exception The function f in 133 is not a function of a single argument so that tij and the parameter enter the model only in terms of a linear predictor Rather the way time and parameters enter this model is more complicated The result is that we have a model one might think of as being even more nonlinear Indeed it is the case in biological and physical sciences that theoretical models that may be derived from scienti c principles are typically nonlinear in this more complicated way INFERENCE The same issues that make model tting di icult in the generalized linear mixed model case apply here as well 7 it is not generally possible to write down the likelihood of a data vector in closed form Again approximations are often used A full account of these models in biological and physical applications may be found in Davidian and Giltinan 1995 There is a SAS macro nlinmix that implements approximate methods to accomplish this tting however as above it should only be used by those who have a full understanding of the model framework and the approximations used 134 Issues associated with missing data As we have mentioned a common issue with longitudinal data particularly when the units are humans is that data may be missing That is although we may intend to collect data according to some experimental plan in which all units are seen at the same 71 times it is quite often the case that things do not end up this way The obvious consequence is that the resulting data may not be balanced as was originally intended However the fact that the data are not balanced is the least of the problems 7 all of the modern methods we have discussed can handle this issue with ease The real problems are more insidious and were not in fact truly appreciated until quite recently As we have discussed data may be missing for different reasons 1 Mistakes screw ups etc 7 for example a sample is dropped or contaminated so that a measure ment may not be taken 2 Issues related to the thing being studied more in a moment PAGE 524 CHAPTER 13 ST 732 M DAVIDIAN Missingness of the rst type is mainly an annoyance unless it happens a lot Missingness of the second type can be a problem previously in the course we have noted that if missingness happens in this way then intuition suggests that the very fact that data are missing may have information about the issues under study The fear is that if we treat the missingness as if it has no information by simply attributing the fact that data vectors are of different length by chance and this is not really true the inference we draw may be misleading We are now more formal about this TERMINOLOGY In the literature on missing data a certain terminology has been developed to characterize different ways missingness happens This terminology seems somewhat arcane but it is in widespread use A statistical reference book that introduces this terminology is Little and Rubin 2002 the recent and current statistical literature always has papers about missing data too In reading further about the consequences of missing data it is useful to be familiar with this terminology MISSING COMPLETELY AT RANDOM In the rst type of example where say a sample is dropped and ruined the fact that the associated observation is thus missing has nothing to do with what is being studied If the sample is from a patient in a study to compare two treatments the fact that it was dropped has nothing to do with the treatments and their effect but rather most likely with the clumsiness of the person handling the sample In the event that missingness is in no way related to the issues under study it is referred to as occurring completely at random or MCAR The consequence of MCAR is simply that we get less data than we d hoped Thus concerns about sample size may be an issue 7 we may not be able to have the power to detect differences that we d hoped If a lot of observations are missing obviously power will be much less than we had bargained for and the ability of a study to detect a desired difference or estimate a particular quantity with a desired degree of precision will be compromised If the problem isn t too bad then power may not be too seriously affected However we don t have to worry about the inferences being misleading Luckily because the reason for the missingness has nothing to do with the issues under study we can assume that the observation and the individual it came from are similar to all the others in the study so that what s left is legitimately viewed as a fair representation of the response of interest in the population of interest What s left might just be smaller than we hoped PAGE 525 CHAPTER 13 ST 732 M DAVIDIAN MISSING AT RANDOM In the second type of example we may have a situation where a patient is a participant in a longitudinal study to evaluate a blood pressure medication The patient s blood pressure at the outset may have been very high which is why he was recruited into the study The study plan dictates that the patient be randomized to receive one of two study treatments and return monthly to the hospital to have his blood pressure recorded For ethical reasons however a patient may be Withdrawn from the study eg o In many such studies the study plan dictates that if a patient s measured blood pressure on any visit goes above a certain danger level the patient must be removed from the study and have his treatment options be decided based solely on his condition rather than continue on his randomized treatment which in some cases may be a placebo This protects patients in the event they are assigned to a medication that does not work for them The patient s personal physician may review the measurements taken over his previous monthly visits and make a judgment that the patient would be better off being removed from the study treatment This of course would mean that the patient would be removed from the study In each of these cases the patient will have data that are missing after a certain point because he is no longer a participant The reason the data will be missing in this way is a direct result of observation of his previous response values Formally in the event that missingness results because of the values of responses and other variables already seen for a unit the missingness is said to be at random abbreviated MAR o The reason for this name is that missingness still happens as the result of observation of random quantities the response observed so far but is no longer necessarily just an annoyance Because observations on any given patient are subject to within patient variation it could be that the patient registered above the danger level just by chance due to measurement error and in reality his true blood pressure is really not high enough to remove him from the study 0 On the other hand his blood pressure may have registered above the danger level because his true pressure really is high We have to be concerned that the latter situation is true if this is the case then we fear that the data end up seeing are not truly representative of the population data values from patients who may have registered high at some point whether by chance or not are not seen PAGE 526 CHAPTER 13 ST 732 M DAVIDIAN It turns out that as long as one uses maximum likelihood methods and the assumptions underlying them are correct estimation of quantities of interest will not be compromised However implementa tion of such methods becomes more complicated and specialized techniques may be necessary Thus some acknowledgment of the problem is required In the case of GEE methods things are worse 7 because these methods are not based on a likelihood it is possible that the estimates themselves will be unreliable in particular they can end up being biased Thus if MAR is suspected the user must be aware that the usual analyses may be awed Fancy methods to correct the problem are becoming more popular these are beyond our scope here NONIGNORABLE NONRESPONSE A more profound case of the second type of missingness is as follows We discussed earlier in the course the case of patients in a study to evaluate AIDS treatments Suppose patients are to come to the clinic at scheduled intervals and measurements of viral load a measure of roughly how much77 HIV virus is in the system are to be made Patients with high viral load tend to be sicker than those with low viral load Viral load is thus likely to be seen increasing over time for patients who are sicker Moreover the faster the rate of increase the more rapidly patients seem to deteriorate Suppose that a particular patient fails to come in for his scheduled clinic visits because his disease has progressed to the point where he is too sick to come to the clinic ever again If we think in terms of a the patient s individual trajectory of viral load a patient who is too sick to come in probably also has a viral load trajectory that is increasing and may be increasing more quickly than those for other patients who have not become so sick Thus if we think formally of a random coe icient model to describe viral load as a function of time eg Y j m iitz j 62739 say then it may be that the fact that a patient is too sick to come in is re ected in the fact that his individual slope u is large and positive PAGE 527 CHAPTER 13 ST 732 M DAVIDIAN Now if the treatment is supposed to be targeting the disease obviously the fact that this patient is too sick to return yielding missing data is caught up with the treatment If we think of the random coe icient model the fact that data for this patient end up being missing is a consequence of the fact that his slope u which is supposedly in uenced by the treatment is too large and positive The patient has missing data not just because of data already seen but in a sense because of his underlying characteristics represented through his slope that will carry him through the rest of time even beyond the current time Thus missingness in this example is even more profound than missingness that results from values of data already seen here missingness is related to all data observed or not that we might see for this patient because those data would all be the consequence of the patient s very steep slope This kind of missingness which is caused by an underlying phenomenon that cannot be observed and operates throughout time is known as nonignorable nonresponse or NINR Unlike the MAR situation as the name indicates if missingness happens this way then a patient has missing data not just by chance but because of an underlying characteristic of that patient that may be in uenced by the treatment Thus we will have a completely unrealistic picture of the population of individuals from the available data because we will only have incomplete information from part of it The result can be that estimates of quantities of interest like the difference in typical slope between two treatments can be awed biased because information from people who are the sickest is underrepresented Correcting the problem can be dif cult if not impossible because the missingness is a consequence of something we cannot see If NINR is suspected it may not be possible to obtain reliable inferences without making assumptions about things like random effects that cannot be observed This is a serious drawback and one that is not always appreciated A full treatment of the consequences of missing data and how to handle the issues in the longitudinal context would ll an entire course The foregoing discussion is meant simply to highlight some of the basic issues The book by Verbeke and Molenberghs 2000 devotes considerable attention to issues associated with missing data in the particular context of the linear mixed effects model The book by Fitzmaurice Laird and Ware 2004 also offers more extensive introductory discussion PAGE 528 CHAPTER 5 ST 732 M DAVIDIAN 5 Univariate repeated measures analysis of variance 51 Introduction As we will see as we progress there are a number of approaches for representing longitudinal data in terms of a statistical model Associated with these approaches are appropriate methods of analysis that focus on questions that are of interest in the context of longitudinal data As noted previously one way to make distinctions among these models and methods has to do with what they assume about the covariance structure of a data vector from an unit Another has to do with what is assumed about the form of the mean of an observation and thus the mean vector for a data vector We begin our investigation of the different models and methods by considering a particular statistical model for representing longitudinal data This model is really only applicable in the case where the data are balanced that is where the measurements on each unit occur at the same 71 times for all units with no departures from these times or missing values for any units Thus each individual has associated an n dimensional random vector whose jth element corresponds to the response at the jth common time point Although as we will observe the model may be put into the general form discussed in Chapters 3 and 4 where we think of the data in terms of vectors for each individual and the means and covariances of these vectors it is motivated by considering a model for each individual observation separately Because of this motivation the model and the associated method of analysis is referred to as univariate repeated measures analysis of variance 0 This model imposes a very speci c assumption about the covariances of the data vectors one that may often not be ful lled for longitudinal data 0 Thus because the method exploits this possibly incorrect assumption there is the potential for erroneous inferences in the case that the assumption made is not relevant for the data at hand The model also provides a simplistic representation for the mean of a data vector that does not exploit the fact that each vector represents what might appear to be a systematic trajectory that appears to be a function of time recall the examples in Chapter 1 and the sample mean vectors for the dental data in the last chapter PAGE 105 CHAPTER 5 ST 732 M DAVIDIAN 0 However because of its simplicity and connection to familiar analysis of variance techniques the model and method are quite popular and are often adopted by default sometimes without proper attention to the validity of the assumptions We will rst describe the model in the way it is usually represented which will involve slightly different notation than that we have discussed This notation is conventional in this setting so we begin by using it We will then make the connection between this representation and the way we have discussed thinking about longitudinal data as vectors 52 Basic situation and statistical model Recall Examples 1 and 2 in Chapter 1 o In Example 1 the dental study 27 children 16 boys and 11 girls were observed at each of ages 8 10 12 and 14 years At each time the response a measurement of the distance from the center of the pituitary to the pterygomaxillary ssure was made Objectives were to learn whether there is a difference between boys and girls with respect to this measure and its change over time In Example 2 the diet study 15 guinea pigs were randomized to receive zero low or high dose of a vitamin E diet supplement Body weight was measured at each of several time points weeks 1 3 4 5 6 and 7 for each pig Objectives were to determine whether there is a difference among pigs treated with different doses of the supplement with respect to body weight and its change over time Recall from Figures 1 and 2 of Chapter 1 that each child or guinea pig exhibited a pro le over time age or weeks that appeared to increase with time Figure 1 of Chapter 1 is reproduced in Figure 1 here for convenience In these examples the response of interest is continuous distance body weight PAGE 106 CHAPTER 5 ST 732 M DAVIDIAN Figure 1 Orthodontic distance measurements for 27 children over ages 8 10 12 14 The plotting symbols are 0 s for girls Is for boys Dental Study Data O V e 37 3 8 c E 2 W 0 U 0 N 8 9 10 12 13 14 age years STANDARD SETUP These situations typify the usual setup of a standard oneway longitudinal or repeated measurement study Units are randomized to one of q 2 1 treatment groups In the literature these are often referred to as the betweenunits factors or groups This is an abuse of grammar if the number of groups is greater than 2 amongunits would be better In the dental study 1 2 boys and girls where randomly selecting boys from the population of all boys and similarly for girls is akin to randomization of units In the diet study we think of q 3 dose groups The response of interest is measured on each of n occasions or under each of n conditions Although in a longitudinal study this is usually time it may also be something else For example suppose men were randomized into two groups regular and modi ed diet The repeated responses might be maximum heart rate measurements after separate occasions of 10 20 30 45 and 60 minutes walking on a treadmill As is customary we will refer to the repeated measurement factor as time with the understanding that it might apply equally well to thing other than strictly chronological time It is often also referred to in the literature as the withinunits factor In the dental study this is age n 4 in the diet study weeks n 6 PAGE 107 CHAPTER 5 ST 732 M DAVIDIAN o For simplicity we will consider in detail the case where there is a single factor making up the groups eg gender dose however it is straightforward to extend the development to the case where the groups are determined by a factorial design eg if in the diet study there had been 1 6 groups determined by the factorial arrangement of 3 doses and 2 genders SOURCES OF VARIATION As discussed in Chapter 4 the model recognizes two possible sources of variation that may make observations on units in the same group taken at the same time differ 0 There is random variation in the population of units due to for example biological variation For example if we think of the population of all possible guinea pigs if they were all given the low dose they would produce different responses at week 1 simply because guinea pigs vary biologically and are not all identical We may thus identify random variation among individuals units There is also random variation due to Withinunit uctuations and measurement error as discussed in Chapter4 We may thus identify random variation Within individuals units It is important that any statistical model take these two sources of variation into appropriate account Clearly these sources will play a role in determining the nature of the covariance matrix of a data vector we will see this for the particular model we now discuss in a moment MODEL To state the model in the usual way we will use notation different from that we have discussed so far We will then show how the model in the standard notation may also be represented as we have discussed De ne the random variable YW observation on unit h in the 6th group at time 9 o h 1 m where n denotes the number of units in group 6 Thus in this notation h indexes units Within a particular group 0 6 1 q indexes groups 0 j 1 n indexes the levels of time PAGE 108 CHAPTER 5 ST 732 M DAVIDIAN q 0 Thus the total number of units involved is m Zn Each is observed at n time points 1 The model for YW is given by Ym MTlbhlquotj T Wj hej 51 o u is an overall mean77 0 n is the deviation from the overall mean associated with being in group 6 o w is the deviation associated with time 9 o TA5 is an additional deviation associated with group 6 and time 9 TA5 is the interaction effect for group 6 time 9 o bu is a random effect with Edam 0 representing the deviation caused by the fact that Ya is measured on the hth particular unit in the 6th group That is responses vary because of random variation among units If we think of the population of all possible units were they to receive the treatment of group 6 we may think of each unit as having its own deviation simply because it differs biologically from other units Formally we may think of this population as being represented by a probability distribution of all possible bu values one per unit in the population bhg thus characterizes the source of random variation due to amongunit causes The term random effect is customary to describe a model component that addresses amongunit variation 0 em is a random deviation with E hgjgt 0 representing the deviation caused by the aggregate effect of within unit uctuations and measurement error Withinunit sources of variation That is responses also vary because of variation Within units Recalling the model in Chapter 4 if we think of the population of all possible combinations of uctuations and measurement errors that might happen we may represent this population by a probability distribution of all possible em values The term random error77 is usually used to describe this model component but as we have remarked previously we prefer random deviation as this effect may be due to more than just measurement error PAGE 109 CHAPTER 5 ST 732 M DAVIDIAN REMARKS 0 Model 51 has exactly the same form as the statistical model for observations arising from an experiment conducted according to a split plot design Thus as we will see the analysis is identical however the interpretation and further analyses are different Note that the actual values of the times of measurement eg ages 8 10 12 14 in the dental study do not appear explicitly in the model Rather a separate deviation parameter y and and interaction parameter TA5 is associated with each time Thus the model takes no explicit account of where the times of observation are chronologically eg are they equally spaced MEAN MODEL The model 51 represents how we believe systematic factors like time and treatment group and random variation due to various sources may affect the way a response turns out To exhibit this more clearly it is instructive to re express the model as Ym MTl w Wm w hlj 52 W My th 0 Because bu and 6th have mean 0 we have of course Ethjgt My M W w WM Thus Wj p447 yj7ygj represents the mean for a unit in the 6th group at the jth observation time This mean is the sum of deviations from an overall mean caused by a xed systematic effect on the mean due to group 6 that happens at all time points 77 a xed systematic effect on the mean that happens regardless of group at time 9 47 and an additional xed systematic effect on the mean that occurs for group 6 at time 9 7405 EM bu em the sum of random deviations that cause Yuj to differ from the mean at time 9 for the hth unit in group 6 EM summarizes all sources random variation Note that bu does not have a subscript 9 Thus the deviation that places the hth unit in group 6 in the population of all such units relative to the mean response is the same for all time points This represents an assumption if a unit is high at time 9 relative to the group mean at 9 it is high by the same amount at all other times This may or not be reasonable For example recall Figure 1 in Chapter 4 reproduced here as Figure 2 PAGE 110 CHAPTER 5 ST 732 M DAVIDIAN This assumption might be reasonable for the upper two units in panel b as the inherent trends for these units are roughly parallel to the trajectory of means over time But the lower unit s trend is far below the mean at early times but rises to be above it at later times for this unit the deviation from the mean is not the same at all times As we will see shortly violation of this assumption may not be critical as long as the overall pattern of variance and correlation implied by this model is similar to that in the data Figure 2 a Hypothetical longitudinal data from m 3 units at n 9 time points 6 Conceptual representation of sources of variation 3 b response response time time NORMALITY AND VARIANCE ASSUMPTIONS For continuous responses like those in the example it is often realistic to consider the normal distribution as a model for the way in which the various sources of variation affect the response If YW is continuous we would expect that the deviations due to biological variation among units and within unit sources that affect how YW turns out to also be continuous Thus rather than assuming that Yuj is normally distributed directly it is customary to assume that each random component arises from a normal distribution Speci cally the standard assumptions which also incorporate assumptions about variance are 0 bu N NOo and are all independent This says that the distribution of deviations in the popu lation of units is centered about 0 some are negative some positive with variation characterized by the variance component 71 PAGE 111 CHAPTER 5 ST 732 M DAVIDIAN The fact that this normal distribution is identical for all 6 1 q re ects an assumption that units vary similarly among themselves in all 1 populations The independence assumption represents the reasonable view that the response one unit in the population gives at any time is completely unrelated to that given by another unit em N NOag and are all independent This says that the distribution of deviations due to Withinunit causes is centered about 0 some negative some positive with variation character ized by the common variance component 7 That this distribution is the same for all 6 1 q and j 1 n again is an assumption The variance 7 represents the aggregate variance of the combined uctuation and measurement error processes and is assumed to be constant over time and group Thus the model assumes that the combined effect of within unit sources of variation is the same at any time in all groups Eg the magnitude of within unit uctuations is similar across groups and does not change with time and the variability associated with errors in measurement is the same regardless of the size of the thing being measured The independence assumption is something we must think about carefully It is customary to assume that the error in measurement introduced by say an imperfect scale at one time point is not related to the error in measurement that occurs at a later time point ie measurement errors occur haphazardly Thus if ehgj represents mostly measurement error the independence assumption seems reasonable However uctuations within a unit may well be correlated as discussed in the last chapter Thus if the time points are close enough together so that correlations are not negligible this may not be reasonable recall our discussion of observations close in time tending to be large or small together The bu and em are assumed to all be mutually independent This represents the view that deviations due to within unit sources are of similar magnitude regardless of the the magnitudes of the deviations bu associated with the units on which the observations are made This is often reasonable however as we will see later in the course there are certain situations where it may not be reasonable With these assumptions it will follow that the Yujs are normally distributed as we will now demonstrate VECTOR REPRESENTATION AND CO VARIANCE MATRIX Now consider the data on a particular unit With this notation the subscripts h and 6 identify a particular unit as the hth unit in the 6th group PAGE 112 CHAPTER 5 ST 732 M DAVIDIAN For this unit we may summarize the observations at the n times in a vector and write Yhei M n V1 WM bhl 6M1 YMZ M n V2 TV 22 bhl 6M2 lt l 53 Yhen M U m T4027 bhl ehen th W 11w 6m where 1 is a n X 1 vector of 1s or more succinctly Yw Mi 6w Ym M22 Em 54 thn Mn Ehln th W 6m so for the data vector from the hth unit in group 6 E Yhe War We see that the model implies a very speci c representation of a data vector Note that for all units from the same group 6 pl is the same We will now see that the model implies something very speci c about how observations within and across units covary and about the structure of the mean of a data vector 0 Because bu and 6th are independent we have varYM varbhg vareuj 2covbu 6M7 713 a O 713 7 0 Furthermore because each random component bhg and em is normally distributed each YW is normally distributed 0 In fact the Yuj values making up the vector th are jointly normally distributed Thus a data vector th under the assumptions of this model has a multivariate n dimensional normal distribution with mean vector pl We now turn to the form of the covariance matrix of Y PAGE 113 CHAPTER 5 ST 732 M DAVIDIAN FACT First we note the following result If I and e are two random variables with means M7 and He then covbe 0 implies that Ebe EbEe Mpg This is shown as follows COVbye E03 MW Me Eb gt EGOLe MbE MbIa E036 7 Hwy Thus covbe O Ebe 7 Mpg and the result follows 0 We know that if b and e are jointly normally distributed and independent then covb e O 0 Thus I and 6 independent and normal implies Ebe ublue lf furthermore b and e have means 0 ie Eb 0 Ee 0 then in fact Ebe 0 We now use this result to examine the covariances 0 First let Yuj and Yhgj be two observations taken from different units h and h from different groups 6 and 6 at different times 9 and j COVthjYhgj EGG 7 pgjYhgj 7 lugiv Ebm eujbhg ewe3v Ebhgbhg E Mjbhg Ebhg hgj E Mj hgj 55 Note that since all the random components are assumed to be mutually independent with 0 means by the above result we have that each term in 55 is equal to 01 Thus 55 implies that two responses from different units in different groups at different times are not correlated o In fact the same argument goes through if 6 6 ie the observations are from two different units in the same group andor 9 j ie the observations are from two different units at the same time That is try it COVthjYhgj 0 COVthjYhgj 0 COVYMjYhgj 0 0 Thus we may conclude that the model 51 automatically implies that any two observations from different units have 0 covariance Furthermore because these observations are all normally distributed this implies that any two observations from different units are independent Thus two vectors th and Yhg from different units where 6 7 6 or 6 6 are independent under this model Recall that at the end of Chapter 3 we noted that it seems reasonable to assume that data vectors from different units are indeed independent this model automatically induces this assumption PAGE 114 CHAPTER 5 ST 732 M DAVIDIAN 0 Now consider 2 observations on the same unit say the hth unit in group 6 YW and thj We have COWM77 YW 190th Wjgtthj New Ebhl ewxbm 6W Ebmbhg E Mjbm Ebhg Mj E hgjemj a 000a 56 This follows because all of the random variables in the last three terms are mutually independent according to the assumptions and Ebhgbhg Ebm 7 02 varbh1 a by the assumptions OOVARIANOE MATRIX Summarizing this information in the form of a covariance matrix we see that a a a u a 2 2 2 2 a a a u a b b b varYM 8 57 a a w a 0 Actually we could have obtained this matrix more directly by using matrix operations applied to the matrix form of 53 Speci cally because bu and the elements of em are independent and normal 1th and em are independent multivariate normal random vectors varth var1bhg varem 1varbh11 varehg 58 Now varbh1 713 Furthermore try it 1 1 1 1 2 11 Jn and varem 03981 1 1 applying these to 58 gives varth 7an 01 2 59 It is straightforward to observe by writing out 59 in detail that it is just a compact way in matrix notation to state 57 PAGE 115 CHAPTER 5 ST 732 M DAVIDIAN o It is customary to use J to denote a square matrix of all Is where we add the subscript when we wish to emphasize the dimension 0 We thus see that we may summarize the assumptions of model 51 in matrix form The m data vectors YM h 1 m 6 1 q are all independent and multivariate normal with th N NMM 2 where 2 is given in 59 COMPOUND SYMMETRY We thus see from given in 57 and 59 is that this model assumes that the covariance of a random data vector has the compound symmetry or exchangeable correlation structure see Chapter 4 Note that the off diagonal elements of this matrix the covariances among elements of YM are equal to 713 Thus if we compute the correlations they are all the same and equal to verify agag 7 This is called the intraclass correlation in some contexts 0 As we noted earlier this model says that no matter how far apart or near in time two elements of YM were taken the degree of association between them is the same Hence with respect to association they are essentially interchangeable or exchangeable 0 Moreover the association is positive ie because both a and a are variances both are positive Thus the correlation which depends on these two positive quantities must also be positive The diagonal elements of are also all the same implying that the variance of each element of th is the same 0 This covariance structure is a special case of something called a Type H covariance structure More on this later As we have noted previously the compound symmetric structure may be a rather restrictive assumption for longitudinal data as it tends to emphasize amongunit sources of variation If the within unit source of correlation due to uctuations is non negligible this may be a poor representation Thus assuming the model 51 implies this fairly restrictive assumption on the nature of variation within a data vector PAGE 116 CHAPTER 5 ST 732 M DAVIDIAN o The implied covariance matrix 57 is the same for all units regardless of group As we mentioned earlier using model 51 as the basis for analyzing longitudinal data is quite common but may be inappropriate We now see why 7 the model implies a restrictive and possibly unrealistic assumption about correlation among observations on the same unit over time ALTERNATIVE NOTATION We may in fact write the model in our previous notation Note that h indexes units Within groups and 6 indexes groups for a total of m 2231 m units We could thus reindex units by a single index 2 1 m where the value ofi for any given unit is determined by its unique values ofh and 6 We could reindex bu and em in the same way Thus let Yi 2 1 m ie Y i Yi 7 Y denote the vectors Y b 1 n 6 1 q reindexed and similarly write bi and 2 To express the model with this indexing the information on group membership must somehow be incorporated separately as it is no longer explicit from the indexing To do this it is common to write the model as follows Let M denote the matrix of all means M implied by the model 51 ie M11 M12 Min M s s s s 510 111 112 Him The 6th row of the matrix M in 510 is thus the transpose of the mean vector pl n X 1 ie PAGE 117 CHAPTER 5 ST 732 M DAVIDIAN Also using the new indexing system let for 6 1 q aw 1 if unit 2 is from group 6 0 otherwise Thus the am record the information on group membership Now let a be the vector 1 X 1 of am values corresponding to the 2th unit ie ai 02170227 gt r r 1in because any unit may only belong to one group a will be a vector of all Os except for a 1 in the position corresponding to 2 s group For example if there are q 3 groups and n 4 times then M11 M12 M13 M14 M M21 M22 M23 M24 M31 M32 M33 M34 and if the 2th unit is from group 2 then a 010 so that verify a M M217M227M237M24 2 say the mean vector for the 2th unit The particular elements of p are determined by the group membership of unit 2 and are the same for all units in the same group Using these de nitions it is straightforward try it to verify that we may rewrite the model in 53 and 54 as YaM1b e 21m YaMe 21m 511 This one standard way of writing the model when indexing units is done with a single subscript in this case In particular this way of writing the model is used in the documentation for SAS PRDC GLM The 77 convention is to put the model on its side which can be confusing PAGE 118 CHAPTER 5 ST 732 M DAVIDIAN Another way of writing the model that is more familiar and more germane to our later development is as follows Let B be the vector of all parameters in the model 51 for all groups and times ie all of u the n V and T YMj 6 1qj1n For example with q 2 groups and n 3 time points Now m If for example 2 is in group 2 then M21 u 7392 Yl 7quotth M 22 MT2V2T 22 M23 u 7392 V3 T y23 Note that if we de ne 101100000100 X 1010100000107 101001000001 then verify we can write M Xi gt Thus in any general model we see that if we de ne and X 239 appropriately we can write the model as OI i1m Xi would be the appropriate matrix of 0s and 1s and would be the same for eaCh 2 in the same group PAGE 119 CHAPTER 5 ST 732 M DAVIDIAN PARAMETERIZATION Just as with any model of this type we note that representing the means M in terms of parameters u n 0 and TA5 leads to a model that is overparameterized That is while we do have enough information to gure out how the means M differ we do not have enough information to gure out how they break down into all of these components For example if we had 2 treatment groups we can t tell where all of u 7391 and 7392 ought to be just from the information at hand To see what we mean suppose we knew that u 7391 20 and u 7392 10 Then one way this could happen is if u 15 7391 5 7392 75 another way is 127 1 8 T2 2 in fact we could write zillions of more ways Equivalently this issue may also be seen by realizing that the matrix Xi is not of full rank Thus the point is that although this type of representation of a mean Mj used in the context of analysis of variance is convenient for helping us think about effects of different factors as deviations from an overall mean we can t identify all of these components In order to identify them it is customary to impose constraints that make the representation unique by forcing only one of the possible zillions of ways to hold My n q n U 07 2 07 ZUVM 0 ZUW for a 93639 21 71 l 391 H w lmposing these constraints is equivalent to rede ning the vector of parameters and the matrices X 239 so that X will always be a full rank matrix for all 2 REGRESSION INTERPRETATION The interesting feature of this representation is that it looks like we have a set of m regression models indexed by 2 each with its own design matrix X and deviations 62 We will see later that more exible models for repeated measurements are also of this form thus writing 51 this way will allow us to compare different models and methods directly Regardless of how we write the model it is important to remember that an important assumption of the model is that all data vectors are multivariate normal with the same covariance matrix having a very speci c form ie with this indexing we have Yi Mons 2 731 01 PAGE 120 CHAPTER 5 ST 732 M DAVIDIAN 53 Questions of interest and statistical hypotheses We now focus on how questions of scienti c interest may be addressed in the context of such a model for longitudinal data Recall that we may write the model as in 511 ie YaMe i1m 512 where 11 M12 Min M 111 112 Him and My MTl 40 02 513 The constraints are assumed to hold The model 512 is sometimes written succinctly as yAMe 514 where 37 is the m X 71 matrix with 2th row Y and similarly for e and A is the m X 1 matrix with 2th row a We will not make direct use of this way of writing the model we point it out as it is the way the model is often written in texts on general multivariate models It is also the way the model is referred to in the documentation for PRDC GLM in the SAS software package GROUP BY TIME INTERACTION As we have noted a common objective in the analysis of longi tudinal data is to assess whether the way in which the response changes over time is different across treatment groups This is usually phrased in terms of means For example in the dental study is the pro le of distance over time different on average for boys and girls That is is the pattern of change in mean response different for different groups This is best illustrated by picture For the case of q 2 groups and n 3 time points Figure 3 shows two possible scenarios In each panel the lines represent the mean responses Mj for each group In both panels the mean response at each time is higher for group 2 than for group 1 at all time points and the pattern of change in mean response seems to follow a straight line However in the left panel the rate of change of the mean response over time is the same for both groups PAGE 121 CHAPTER 5 ST 732 M DAVIDIAN le the time pro les are parallel In the right panel the rate of change is faster for group 2 thus the pro les are not parallel Figure 3 Group by time interaction Plotting symbol indicates group number c c ltr ltr 8 2 8 mean response 20 N 0 mean response 2 time time In the model each point in the gure is represented by the form 513 My MTe 40 T4027 Here the terms Tn5 represent the special amounts by which the mean for group 6 at time 9 may differ from the overall mean The difference in mean between groups 1 and 2 at any speci c time 9 is under the model 17 i 27 T1 T2TV17 TV27r Thus the terms Tn5 allow for the possibility that the difference between groups may be different at different times as in the right panel of Figure 3 7 the amount r y1j 7 Tngt27 is speci c to the particular time 9 Now if the Tn5 were all the same the difference would reduce to 17 My T1 7 T2 as the second piece would be equal to zero Here the difference in mean response between groups is the same at all time points and equal to 7391 7 7392 which does not depend on 9 This is the situation of the left panel of Figure 3 PAGE 122 CHAPTER 5 ST 732 M DAVIDIAN Under the constraints My TL Tm 0 2mm for all e j l j1 H if TA5 are all the same for all j then it must be that TA5 O for all 6 Thus if we wished to discern between a situation like that in the left panel of parallel pro les and that in the right panel lack of parallelism addressing the issue of a common rate of change over time we could state the null hypothesis as H02 all T405 There are qn total parameters T4027 however if the constraints above hold then having 17 1n7 1 of the TA5 equal to O automatically requires the remaining ones to be zero as well Thus the hypothesis is really one about the behavior of q 7 1n 7 1 parameters hence there are q 7 1n 7 1 degrees of freedom associated with this hypothesis GENERAL FORM OF HYPOTHESES It turns out that with the model expressed in the form 512 it is possible to express H0 and other hypotheses of scienti c interest in a uni ed way This uni ed expression is not necessary to appreciate the hypotheses of interest however it is used in many texts on the subject and in the documentation for PRDC GLM in SAS so we digress for a moment to describe it Speci cally noting that M is the matrix whose rows are the mean vectors for the different treatment groups it is possible to write formal statistical hypotheses as linear functions of the elements of M Let 0 C be a c X 1 matrix with c S q of full rank 0 U be a n X u matrix with u S n of full rank Then it turns out that the null hypothesis corresponding to questions of scienti c interest may be written in the form H0 CM U 0 Depending on the choice of the matrices C and U the linear function CMU of the elements of M the individual means for different groups at different time points may be made to address these different questions PAGE 123 CHAPTER 5 ST 732 M DAVIDIAN We now exhibit this for Hg for the group by time interaction For de niteness consider the situation where there are q 2 groups and n 3 time points Consider Clt171gt7 so that c 1 q 7 1 Then note that CM 171 M11 M12 M13 lt 11 i 217 12 i 227 13 i 23 M21 M22 M23 lt 7391 i 7392 T Yhi 7quoty217 7391 i 7392 7012 T y22 7391 i 7392 7013 T y23 gt Thus this C matrix has the effect of taking differences among groups Now let 1 0 U 71 1 0 71 so that u 2 n 7 1 It is straightforward try it to show that CMU M11 U 21 U 12 erzz7 M12 M22 U 13 M23gt lt 011 i 021 i 012 0227 74012 i 022 7quotth 023 gt 39 It is an exercise in algebra to verify that under the constraints if each of these elements equals zero then H0 follows In the jargon associated with repeated measurements the test for group by time interaction is sometimes called the test for parallelism Later we will discuss some further hypotheses involving different choices of U that allow one to investigate different aspects of the change in mean response over time and how it differs across groups Generally in the analysis of longitudinal data from different groups testing the group by time interaction is of primary interest as it addresses whether the change in mean response differs across groups It is important to recognize that parallelism does not necessarily mean that the mean response over time is restricted to look like a straight line in each group In Figure 4 the left panel exhibits parallelism the right panel does not PAGE 124 CHAPTER 5 ST 732 M DAVIDIAN Figure 4 Group by time interaction Plotting symbol indicates group number m 22 m 2 mean response 2 N mean response 2 A 11 time time MAIN EFFECT OF GROUPS Clearly if pro les are parallel then the obvious question is whether they are in fact coincident that is whether at each time point the mean response is in fact the same A little thought shows that if the pro les are parallel then if the pro les are furthermore coincident then the average of the mean responses over time will be the same for each group Asking the question of whether the average of the mean responses over time is the same for each group if the pro les are not parallel may or may not be interesting or relevant o For example if the true state of affairs were that depicted in the right panels of Figures 3 and 4 whether the average of mean responses over time is different for the two groups might be interesting as it would be re ecting the fact that the mean response for group 2 is larger at all times On the other hand consider the left panel of Figure 5 If this were the true state of affairs a test of this issue would be meaningless the change of mean response over time is in the opposite direction for the two groups thus how it averages out over time is of little importance 7 because the phenomenon of interest does indeed happen over time the average of what it does over time may be something that cannot be achieved 7 we can t make time stand still PAGE 125 CHAPTER 5 ST 732 M DAVIDIAN 0 Similarly if the issue under study is something like growth the average over time of the response may have little meaning instead one may be interested in for example how different the mean response is at the end of the time period of study For example in the right panel of Figure 5 the mean response over time increases for each group at different rates but has the same average over time Clearly the group with the faster rate will have a larger mean response at the end of the time period Figure 5 Group by time interaction Plotting symbol indicates group number f time time mean response 2 mean response 2 N w M w Generally then whether the average of the mean response is the same across groups in a longitudinal study is of most interest in the case where the mean pro les over time are approximately parallel For de niteness consider the case of q 2 groups and n 3 time points We are interested in whether the average of mean responses over time is the same in each group For group 6 this average is with n 3 71710121 22 Ms M n 77141 V2 V3 n 1T 21 Tegt22 Tegt23 Taking the difference of the averages between 6 1 and 6 2 some algebra yields verify 1 Why nil 02 7173924 717 MS H w n H 3 Note however that the constraints we impose so that the model is of full rank dictate that Sigh5 O for each 6 thus the two sums in this expression are 0 by assumption so that we are left with 7391 7 7392 PAGE 126 CHAPTER 5 ST 732 M DAVIDIAN Thus the hypothesis may be expressed as H0 2 T1 7 T2 Furthermore under the constraint 2231 n 0 if the n are equal as in H0 then they must satisfy 77 O for each 6 Thus the hypothesis may be rewritten as H0 2 T1 T2 0 For general 1 and n the reasoning is the same we have H02T1Tq0 The appropriate null hypothesis that addresses this issue may also be stated in the general form H0 CM U O for suitable choices of C and U The form of U in particular shows the interpretation as that of averaging over time Continuing to take 1 2 and n 3 let Clt171gt7 so that c 1 q 7 1 Then note that 11 M12 M13 1 71 M21 M22 23 CM ltU 11 U 217 Hui 22 Mai 23 lt71 7392 T Y11 T Y217 7391 i 7392 T Y12 T Y227 7391 i 7392 T Y13 T Ygt23gt Now let n 3 here ln U 171 171 It is straightforward to see that with n 3 n n CMU 7391 7 7392 n71 27394013 7 n71 Evy 71 71 That is this choice of U dictates an averaging operation across time lmposing the constraints as above we thus see that we may express H0 in the form H0 CMU O with these choices of C and U For general 1 and 71 one may specify appropriate choices of C and U where the latter is a column vector of 1 s implying the averaging operation across time and arrive at the general hypothesis H02T1Tq0 PAGE 127 CHAPTER 5 ST 732 M DAVIDIAN MAIN EFFECT OF TIME Another question of interest may be whether the mean response is in fact constant over time If the pro les are parallel then this is like asking whether the mean response averaged across groups is the same at each time If the pro les are not parallel then this may or may not be interesting For example note that in the left panel of Figure 5 the average of mean responses for groups 1 and 2 are the same at each time point However the mean response is certainly not constant across time for either group If the groups represent things like genders then what happens on average is something that can never be achieved Consider again the special case of q 2 and n 3 The average of mean responses across groups for time 9 is 71 q 71 q 71 q q 2W7 47 q Zn q EMM 47 21 21 1 using the constraints Zg 7 O and ELAN0 0 Thus having all these averages be the same at each time is equivalent to Horn 42 43 Under the constraint 21 4 0 then we have H0 41 42 43 O For general 1 and n the hypothesis is of the form Hoi y1 n0 We may also state this hypothesis in the form H0 CMU 0 In the special case 1 2 n 3 taking 1 0 U 71 1 7C39lt1212gt 0 71 gives 1 0 MU M11 M12 M13 71 1 M11 7 M12 M12 7 M13 M21 M22 M23 0 71 21 i 22 22 i 23 41 i 42 T4011 T4012 42 7 43 T4012 T4013 V1 V2 W21 T4022 42 Y3 022 i 023 from whence it is straightforward to derive imposing the constraints that verify CMUlt Y1 Y27 043 gt39 Setting this equal to zero gives H0 41 42 43 For general 1 and n we may choose the matrices C and U in a similar fashion Note that this type of C matrix averages across groups PAGE 128 CHAPTER 5 ST 732 M DAVIDIAN OBSERVATION These are of course exactly the hypotheses that one tests for a split plot experiment where here time plays the role of the split plot factor and group is the whole plot factor What is different lies in the interpretation because time has a natural ordering longitudinal what is interesting may be different as noted above of primary interest is whether the change in mean response is different over the levels of time We will see more on this shortly 54 Analysis of variance Given the fact that the statistical model and hypotheses in this setup are identical to that of a split plot experiment it should come as no surprise that the analysis performed is identical That is under the assumption that the model 51 is correct and that the observations are normally distributed it is possible to show that the usual F ratios one would construct under the usual principles of analysis of variance provide the basis for valid tests of the hypotheses above We write out the analysis of variance table here using the original notation with three subscripts ie Yuj represents the measurement at the 9 time on the hth unit in the 6th group De ne 0 7M n 1 1 thj the sample average over time for the hth unit in the 6th group over all observations on this unit 0 733 r1 2251 ij the sample average at time 9 in group 6 over all units 0 73 rmfl 221 21 ij the sample average of all observations in group 6 o 77 Wf1 2231 2251 thj the sample average of all observations at the jth time o 7 the average of all mn observations 1 q 33G an VZ 77 27 SSTUt U n 7 1 a H r H n n q SST mZYj 7 Y2 SSGT 22mm 7 Y2 7 SST 7 33G 71 j1 1 11 W n SSTotall Z Z 0 th 7 7 e71 h1j1 Then the following analysis of variance table is usually constructed PAGE 129 CHAPTER 5 ST 732 M DAVIDIAN Source SS DF MS F Among Groups SSG q 7 1 MSG FG MSGMSEU Among unit Error SSTUt U 7 SSG m 7 q MSEU Time SST 7771 MST FTMSTMSE Group gtlt Time SSGT q 71n 71 MSGT FGT MSGTMSE Within unit Error SSE m 7 qn 7 1 MSE Total SSTUt all nm 7 1 where SSE SSTUt all 7 SSGT 7 SST 7 SSTUt U ERROR Keep in mind that although it is traditional to use the term error in analysis of variance the amongunit error term includes variation due to amongunit biological variation and the withinunit error term includes variation due to both uctuations and measurement error F RATIOS It may be shown that as long as the model is correct and the observations are normally distributed the F ratios in the above table do indeed have sampling distributions that are F distribu tions under the null hypotheses discussed above It is instructive to state this another way If we think of the data in terms of vectors then this is equivalent to saying that we require that Y NNnu2 2 agjna In 515 That is as long as the data vectors are multivariate normal and exhibit the compound symmetry covariance structure then the F ratios above which may be seen to be based on calculations on individual observations do indeed have sampling distributions that are F with the obvious degrees of freedom EXPECTED MEAN SQUARES In fact under 515 it is possible to derive the expectations of the mean squares in the table That is we nd the average over all data sets we might have ended up with of the M Ss that are used to construct the F ratios by applying the expectation operator to each expression which is a function of the data The calculations are messy one place where they are done is in section 33 of Crowder and Hand 1990 so we do not show them here The following summarizes the expected mean squares under 515 PAGE 130 CHAPTER 5 ST 732 M DAVIDIAN Source MS Expected mean square Among Groups MSG a nag n 2231 TWZq 7 1 Among unit error MSEU a nag Time MST 0 m E921 v31 71 7 1 Group gtlt Time MSGT a 2231 n 21T y jq 7 1n 7 1 Within unit Error M S E a It is critical to recognize that these calculations are only valid if the model is correct ie if 515 holds Inspection of the expected mean squares shows informally that we expect the F ratios in the analysis of variance table to test the appropriate issues For example we would expect FGT to be large if the TA5 were not all zero Note that FG uses the appropriate denominator intuitively because we base our assessment on averages of across all units and time points we would wish to compare the mean square for groups against an error term77 that takes into account all sources of variation among observations we have on the units 7 both that attributable to the fact that units vary in the population 713 and that attributable to the fact that individual observations vary within units 0 The other two tests are on features that occur within units thus the denominator takes account of the relevant source of variation that within units 0 We thus have the following test procedures 0 Test of the Group by Time interaction parallelism H0 7402 O for all 96 vs H1 at least one 7402 7 O A valid test rejects H0 at level of signi cance 04 if FGT gt fc171m71gtltn71gtltm7qgtoz or equivalently if the probability is less than 04 that one would see a value of the test statistic as large or larger than FGT if H0 were true that is the p value is less than 04 PAGE 131 CHAPTER 5 ST 732 M DAVIDIAN 0 Test of Main effect of Time constancy H0 yj O for all 9 vs H1 at least one yj 7 O A valid test rejects H0 at level 04 if FT gt 514024 mm or equivalently if the probability is less than 04 that one would see a value of the test statistic as large or larger than FT if H0 were true Test of Main effect of Group coincidence H0 n O for all 6 vs H1 at least one 77 7 O A valid test rejects H0 at level of signi cance 04 if FG gt fqil niqy or equivalently if the probability is less than 04 that one would see a value of the test statistic as large or larger than FG if H0 were true In the above fa b a critical value corresponding to 04 for an F distribution with a numerator and b denominator degrees of freedom In section 58 we show how one may use SAS PRDC GLM to perform these calculations 55 Violation of covariance matrix assumption In the previous section we emphasized that the procedures based on the analysis of variance are only valid if the assumption of compound symmetry holds for the covariance matrix of a data vector In reality these procedures are still valid under slightly more general conditions However the important issue remains that the covariance matrix must be of a special form if it is not the tests above will be invalid and may lead to erroneous conclusions That is the F ratios FT and FGT will no longer have exactly an F distribution PAGE 132 CHAPTER 5 ST 732 M DAVIDIAN A n X 71 matrix 2 is said to be of Type H if it may be written in the form A2041 a1a2 04104n 04 04 A204 04 04 2 2 1 2 2 n 516 ana1 ana2 A2an It is straightforward convince yourself that a matrix that exhibits compound symmetry is of Type H It is possible to show although we will not pursue this here that as long as the data vectors Y are multivariate normal with common covariance matrix 2 that is of the form 516 the F tests discussed above will be valid Thus because 516 includes the compound symmetry assumption as a special case these F tests will be valid if model 51 holds along with normality o If the covariance matrix 2 is not of Type H but these F tests are conducted nonetheless they will be too liberal that is they will tend to reject the null hypothesis more often then they should 0 Thus one possible consequence of using the analysis of variance procedures when they are not appropriate is to conclude that group by time interactions exist when they really don t TEST OF SPHERI 01 T Y It is thus of interest to be able to test whether the true covariance structure of data vectors in a repeated measurement context is indeed of Type H One such test is known as Mauchly s test for sphericity The form and derivation of this test are beyond the scope of our discussion here a description of the test is given by Vonesh and Chinchilli 1997 p 85 for example This test provides a test statistic for testing the null hypothesis H0 2 is of Type H where 2 is the true covariance matrix of a data vector The test statistic which we do not give here has approximately a X2 chi square distribution when the number of units m on test is large with degrees of freedom equal to n 7 2n 12 Thus the test is performed at level of signi cance 04 by comparing the value of the test statistic to the xi critical value with n 7 2n 12 degrees of freedom SAS PRDC GLM may be instructed to compute this test when repeated measurement data are being analyzed this is shown in section 58 PAGE 133 CHAPTER 5 ST 732 M DAVIDIAN The test has some limitations o It is not very powerful when the numbers of units in each group is not large o It can be misleading if the data vectors really do not have a multivariate normal distribution These limitations are one of the reasons we do not discuss the test in more detail it may be of limited practical value In section 57 we will discuss one approach to handling the problem of what to do if the null hypothesis is rejected or if one is otherwise dubious about the assumption of Type H covariance 56 Specialized Withinunit hypotheses and tests The hypotheses of group by time interaction parallelism and main effect of time have to do with questions about what happens over time as time is a Withinunit factor these tests are often referred to as focusing on within unit issues These hypotheses address these issues in an overall sense for example the group by time interaction hypothesis asks whether the pattern of mean response over time is different for different groups Often it is of interest to carry out a more detailed study of speci c aspects of how the mean response behaves over time as we now describe We rst review the following de nition CONTRASTS Formally if c is a n X 1 vector and p is a n X 1 vector of means then the linear combination c p Me is called a contrast if c is such that its elements sum to zero Contrasts are of interest in the sense that hypotheses about differences of means can be expressed in terms of them In particular if 0 0 there is no difference PAGE 134 CHAPTER 5 ST 732 M DAVIDIAN For example consider 1 2 and n 3 The contrasts M11 M12 and M21 M22 517 compare the mean response at the rst and second time points for each of the 2 groups similarly the contrasts M12 M13 and M22 M23 518 compare the mean response at the second and third time points for each group Thus these contrasts address the issue of how the mean differs from one time to the next in each group Recalling 1ltM11 M12 M13 2ltM21 M22 M23 we see that the contrasts in 517 result from postmultiplying these mean vectors for each group by Specialized questions of interest pertaining to how the mean differs from one time to the next may then be stated 0 We may be interested in whether the way in which the mean differs from say time 1 to time 2 is different for different groups This is clearly part of the overall group by time interaction focusing particularly on what happens between times 1 and 2 For our two groups we would thus be interested in the difference of the contrasts in 517 We may equally well wish to know whether the way in which the mean differs from time 2 to time 3 is different across groups this is of course also a part of the group by time interaction and is represented formally by the difference of the contrasts in 518 We may be interested in whether there is a difference in mean from say time 1 to time 2 averaged across groups This is clearly part of the main effect of time and would be formally represented by averaging the contrasts in 517 For times 2 and 3 we would be interested in the average of the contrasts in 518 PAGE 135 CHAPTER 5 ST 732 M DAVIDIAN Specifying these speci c contrasts and then considering their differences among groups or averages across groups is a way of picking apart77 how the overall group by time effect and main effect of time occur and can thus provide additional insight on how and whether things change over time It turns out that we may express such contrasts succinctly through the representation CMU indeed this is the way in which such specialized hypotheses are presented documentation for PRDC GLM in SAS To obtain the contrasts in 517 and 518 in the case 1 2 and n 3 consider the n X 7171 matrix 1 0 U 71 1 0 71 Then note that 1 0 MU M11 M12 M13 71 1 7 11 i 12 12 i 13 519 M21 M22 M23 0 1 M21 7 M22 M22 7 23 Each element of the resulting matrix is one of the above contrasts This choice of the contrast matrix U thus summarizes contrasts that have to do with differences in means from one time to the next Each column represents a different possible contrast of this type Note that the same matrix U would be applicable for larger 1 7 the important point is that it has n 7 1 columns each of which applies one of the n 7 1 possible comparisons of a mean at a particular time to that subsequent For general 71 the matrix would have the form 71 1 U7 971 9 520 0 1 0 0 71 with n and n 7 1 columns Postmultiplication of M by the general form of contrast matrix U in 520 is often called the pro le transformation of within unit means Other contrasts may be of interest Instead of asking what happens from one time to the next we may focus on how the mean at each time differs from what happens over all subsequent times This may help us to understand at what point in time things seem to change if they do PAGE 136 CHAPTER 5 ST 732 M DAVIDIAN For example taking 1 2 and n 4 consider the contrast M11 M12 M13 M143r This contrast compares for group 1 the mean at time 1 to the average of the means at all other times Similarly M12 M13 M142 compares for group 1 the mean at time 2 to the average of those at subsequent times The nal contrast of this type for group 1 is M13 M147 which compares What happens at time 3 to the average of What comes next which is the single mean at time 4 We may similarly specify such contrasts for the other group We may express all such contrasts by a different contrast matrix U In particular let 1 0 0 713 1 0 U 521 713 712 1 713 712 71 Then if q 2 verify MUi M11 M123iM133iM1437 M12M132M1427 M13M14 M21M223iM233iM2437 M22M232M2427 M23M24 which expresses all such contrasts the rst row gives the ones for group 1 listed above For general 71 the n X 7171 matrix Whose columns de ne contrasts of this type is the so called Helmert transformation matrix of the form 1 0 0 0 710171 1 0 0 522 U 710171 710172 1 0 7171 7 3 717171 717172 3 1 717171 717172 710173 71 Postmultiplication of M by a matrix of the form 522 in contrasts representing comparisons of each mean against the average of means at all subsequent times PAGE 137 CHAPTER 5 ST 732 M DAVIDIAN It is straightforward to verify try it that with n 3 and q 2 this transformation would lead to M11 Viz2 His2 M12 M13 MU 523 21 i 11222 i Mai2 22 i 23 How do we use all of this OVERALL TESTS We have already seen the use of the CM U representation for the overall tests of group by time interaction and main effect of time Both contrast matrices U in 519 pro le and 523 Helmert contain sets of n 7 1 contrasts that pick apart77 all possible differences in means over time in different ways Thus intuitively we would expect that either one of them would lead us to the overall tests for group by time interaction and main effect of time given the right C matrix one that takes differences over groups or one that averages over groups respectively This is indeed the case It may be shown that premultiplication of either 519 or 523 by the same matrix C will lead to the same overall hypotheses in terms of the model components 47 and T4023 For example we already saw that premultiplying 519 by C 1 1 gives with the constraints on T4027 CMUlt Y1 Y27 Y2i Y3 gt 039 It may be shown that premultiplying 523 by the same matrix C yields try it CMU lt V1 0542 05437 V2 43 gt 0 It is straightforward to verify that these both imply the same thing namely that we are testing Y1 Y2 Y3 OVERALL TESTS This shows the general phenomenon that the choice of the matrix of contrasts U is not important for dictating the general tests of Time main effects and Group by Time interaction As long as the matrix is such that it yields differences of mean responses at different times it will give the same form of the overall hypotheses The choice of U matrix is important when we are interested in picking apart77 these overall effects as above We now return to how we might represent hypotheses for and conduct tests of issues like those laid out on page 135 for a given contrast matrix U of interest Premultiplication of U by M will yield the 1 X n 7 1 matrix MU whose 6th row contains whatever contrasts are of interest dictated by the columns of U for group 6 PAGE 138 CHAPTER 5 ST 732 M DAVIDIAN o If we premultiply MU by the q 7 1 X 1 matrix 1 71 O O 1 O 71 O C 1 O O 71 we considered earlier the special case where q 2 then for each contrast de ned in U the result is to consider how that contrast differs across groups The contrast considers a speci c part of the way that mean response differs among the times so is a component of the Group by Time interaction how the difference in mean across groups is different at different times o If we premultiply by C 1q1q 1q each of the 7171 elements of the resulting 1 X 7171 matrix correspond to the average of each of these contrasts over groups which all together constitute the Time main effect If we consider one of these elements on its own we see that it represents the contrast of mean response at time 9 to average mean response at all times after 9 averaged across groups If that contrast were equal to zero it would say that averaged across groups the mean response at time 9 is equal to the average of subsequent mean responses As we noted earlier we may wish to look at each of these separately to explore particular aspects of how the mean response over time behaves That is we may wish to consider separate hypothesis tests addressing these issues SEPARATE TESTS Carrying out separate hypothesis tests for each contrast in U may be accomplished operationally as follows Consider the kth column of U ck k 1 n 7 1 0 Apply the function dictated by that column of U to each unit s data vector That is for each vector YM the operation implied is yhlck Cither This distills down the repeated measurements on each unit to a single number representing the value of the contrast for that unit If each unit s data vector has the same covariance matrix 2 then each of these distilled data values has the same variance across all units see below 0 Perform analyses on the resulting data eg to test whether the contrast differs across groups one may conduct a usual oneway analysis of variance on these data To test whether the contrast is zero averaged across groups test whether the overall mean of the data is equal to zero using using a standard t test or equivalently the F test based on the square of the t statistic PAGE 139 CHAPTER 5 ST 732 M DAVIDIAN 0 These tests will be valid regardless of whether compound symmetry holds all that matters is that 2 whatever it is is the same for all units The variance of a distilled data value cthg for the hth unit in group 6 is var cthg 462019 This is a constant for all h and 6 as long as 2 is the same Thus the usual assumption of constant variance that is necessary for a one way analysis of variance is ful lled for the data corresponding to each contrast ORTHOGONAL CONTRASTS In some instances note that the contrasts making up one of these transformation matrices have an additional property Speci cally if cl and 02 are any two columns for the matrix then if 0102 O ie the sum of the product of corresponding elements of the two columns is zero the vectors cl and 02 are said to be orthogonal The contrasts corresponding to these vectors are said to be orthogonal contrasts o The contrasts making up the pro le transformation are not orthogonal verify o The contrasts making up the Helmert transformation are orthogonal verify The advantage of having a transformation whose contrasts are orthogonal is as follows NORMALIZED ORTHOGONAL CONTRASTS For a set of orthogonal contrasts the separate tests for each have a nice property not possessed by sets of nonorthogonal contrasts As intuition might suggest if contrasts are indeed orthogonal they ought to partition the total Group by Time interaction and Within Unit Error sums of squares into n 71 distinct or nonoverlapping components This means that the outcome of one of the tests may be viewed without regard to the outcome of the others It turns out that if one works with a properly normalized version of a U matrix whose columns are orthogonal then this property can be seen very clearly In particular the sums of squares for group in each separate ANOVA for each contrasts add up to the sum of squares SSGTI Similarly the error sums of squares add up to SSE PAGE 140 CHAPTER 5 ST 732 M DAVIDIAN To appreciate this consider the Helmert matrix in 521 1 0 713 1 0 7 713 712 1 713 712 71 Each column corresponds to a different function to be applied to the data vectors for each unit ie the kth column describes the kth contrast function czYM of a data vector Now the constants that make up each ck are different for each k thus the values of czYM for each k are on different scales of measurement They are not comparable across all n 7 1 contrasts and thus the sums of squares from each individual ANOVA are not comparable because they each work with data on different scales It is possible to modify each contrast without affecting the orthogonality condition or the issue addressed by each contrast so that the resulting data are scaled similarly Note that the sums of the squared elements of each column are different ie the sums of squares of the rst second and third columns are 12 7132 7132 7132 43 32 and 2 respectively This illustrates that the contrasts are indeed not scaled similarly and suggests the modi cation 0 Multiply each contrast by an appropriate constant so that the sums of the squared elements is equal to 1 o In our example note that if we multiply the rst column by 34 the second by 23 and the third by 12 then it may be veri ed that the sum of squares of the modi ed elements is equal to 1 in each eg mm wmlt7u3gt12 1 mlt7u3gt121mlt7u3gt12 7 1 Note that multiplying each contrast by a constant does not change the spirit of the hypothesis tests to which it corresponds eg for the rst column testing H0 1111 M123 Mai3 M143 0 is the same as testing H0 34U11 7 34LL123 7 34p133 7 x34M143 0 When all contrasts in an orthogonal transformation are scaled similarly in this way then they are said to be orthonormal PAGE 141 CHAPTER 5 ST 732 M DAVIDIAN o The resulting data corresponding to the modi ed versions of the contrasts will be on the same scale It then is the case that the sums of squares for each individual ANOVA do indeed add up Although this is a pleasing property it is not necessary to use the normalized version of contrasts to obtain the correct test statistics for each contrast Even if a set of n 7 1 orthogonal contrasts is not normalized in this way the same test statistics will result Although each separate ANOVA is on a different scale so that the sums of squares for group and error in each will not add up to SSGT and SSE the F ratios formed will be the same because the scaling factor will cancel out77 from the numerator and denominator of the F ratio and give the same statistic The orthonormal version of the transformation is often thought of simply because it leads to the nice additive property If contrasts are not orthogonal the interpretation of the separate tests is more dif cult because the separate tests no longer are nonoverlapping The overall sum of squares for Group by Time is no longer partitioned as above Thus how one test comes out is related to how another one comes out ORTHOGONAL POLYNOMIAL CONTRASTS As we saw in the examples in Chapter 1 a common feature of longitudinal data is that each unit appears to exhibit a smooth time trajectory In some cases like the dental study this appears to be a straight line In other cases like the soybean growth study Example 3 the trajectories seem to curve Thus if we were to consider the trajectory of a single unit it might be reasonable to think of it as a linear quadratic cubic in general a polynomial function of time Later in the course we will be much more explicit about this view Figure 6 shows such trajectories Figure 6 Polynomial trajectories linear solid quadratic clots cubic dashes mean response an Zn me PAGE 142 CHAPTER 5 ST 732 M DAVIDIAN In this situation it would be advantageous to be able to consider behavior of the mean response over time averaged across and among groups in a way that acknowledges this kind of pattern For example in the dental study we might like to ask 0 Averaged across genders is there a linear straight line trend over time Is there a quadratic trend 0 Does this linear or quadratic trend differ across genders There is a particular type of contrast that focuses on this issue whose coe icients are referred to as orthogonal polynomial coef cients If we have data at n time points on each unit then in principle it would be possible to t up to a n 7 1 degree polynomial in time Thus for such a situation it is possible to de ne n 7 1 orthogonal polynomial contrasts each measuring the strength of the linear quadratic cubic and so on contri bution to the n 7 1 degree polynomial This is possible both for time points that are equally spaced over time and unequally spaced The details of how these contrasts are de ned are beyond our scope here For equally spaced times the coe icients of the n 7 1 orthogonal polynomials are available in tables in many statistics texts eg Steel Torrie and Dickey 1997 p 390 for unequally spaced times points the computations depend on the time points themselves Statistical software such as SAS PRDC GLM offers computation of orthogonal polynomial contrasts so that the user may focus on interpretation rather than nasty computation As an example the following U matrix has columns corresponding to the n 7 1 orthogonal polynomial contrasts in the order linear quadratic cubic in the case n 4 73 1 71 71 71 3 U 1 71 73 3 1 1 With the appropriate set of orthogonal polynomial contrasts one may proceed as above to conduct hypothesis tests addressing the strength of the linear quadratic and so on components of the pro le over time The orthogonal polynomial transformation may also be normalized as discussed above PAGE 143 CHAPTER 5 ST 732 M DAVIDIAN 57 Adjusted tests We now return to the issue discussed in section 55 Suppose that we have reason to doubt that 2 is of Type H This may be because we do not believe that the limitations of the test for sphericity discussed in section 55 are too serious and we have rejected the null hypothesis when performing this test Alternatively this may be because we question the assumption of Type H covariance to begin with as being unrealistic more in a moment In any event we do not feel comfortable assuming that 2 is of Type H thus certainly does not exhibit compound symmetry as stated by the model Thus the usual F tests for Time and Group by Time are invalid Several suggestions are available for adjusting the usual F tests De ne 7 tr2U 2U 7 n 71trU2UU2U where U is any n X 7171 so u n71 matrix whose columns are normalized orthogonal contrasts It may be shown that the constant 5 de ned in this way must satisfy 1n 7 1 S E S 1 and that if and only if 2 is of Type H Because the usual F tests are too liberal see above if 2 is not of Type H one suggestion is as follows Rather than compare the F ratios to the usual critical values with a and b numerator and denominator degrees of freedom say compare them to F critical values with ea and Eb numerator and denominator degrees of freedom instead This will make the degrees of freedom smaller than usual A quick look at a table of F critical values shows that as the numerator and denominator degrees of freedom get smaller the value of the critical value gets larger Thus the effect of this adjustment would be to compare F ratios to larger critical values making it harder to reject the null hypothesis and thus making the test less liberal 0 Of course 5 is not known because it depends on the unknown 2 matrix 0 Several approaches are based on estimating 2 to be discussed in the next chapter of the course and then using the result to form an estimate for e PAGE 144 CHAPTER 5 ST 732 M DAVIDIAN o This may be done in different ways two such approaches are known as the GreenhouseGeisser and HuynhFeldt adjustments Each estimates 5 in a different way the Huynh Feldt estimate is such that the adjustment to the degrees of freedom is not as severe as that of the Greenhouse Geisser adjustment These adjustments are available in most software for analyzing repeated measurements eg SAS PRDC GLM computes the adjustments automatically as we will see in the examples in section 58 They are however approximate o The general utility of these adjustments is unclear however That is it is not necessarily the case that making the adjustments in a real situation where the numbers of units are small will indeed lead to valid tests SUMMARY The spirit of the methods discussed above may be summarized as follows One adopts a statistical model that makes a very speci c assumption about associations among observations on the same unit compound symmetry If this assumption is correct then familiar analysis of variance methods are available It is possible to test whether it is correct however the testing procedures available are not too reliable In the event that one doubts the compound symmetry assumption approximate methods are available to still allow adjusted versions of the methods to be used However these adjustments are not necessarily reliable either This suggests that rather then try to force the issue of compound symmetry a better approach might be to start back at the beginning with a more realistic statistical model In later chapters we will discuss other methods for analyzing longitudinal data that do not rely on the assumption of compound symmetry or more generally Type H covariance We will also see that it is possible to adopt much more general representations for the form of the mean of a data vector 58 Implementation with SAS We consider two examples 1 The dental study data Here 1 2 and n 4 with the time factor being the age of the children and equally spaced time points at 8 10 12 and 14 years of age 2 the guinea pig diet data Here 1 3 and n 6 with the time factor being weeks and unequally spaced time points at 1 3 4 5 6 and 7 weeks PAGE 145 CHAPTER 5 ST 732 M DAVIDIAN In each case7 we use SAS PRDC GLM to carry out the computations These examples thus serve to illustrate how this SAS procedure may be used to conduct univariate repeated measures analysis of variance Each program carries out construction of the analysis of variance table in two ways 0 Using the same speci cation that would be used for the analysis of a split plot experiment 0 Using the special REPEATED statement in PRDC GLM This statement and its associated options allow the user to request various specialized analyses like those involving contrasts discussed in the last section A full description of the features available may be found in the SAS documentation for PRDC GLM PAGE 146 CHAPTER 5 ST 732 M DAVIDIAN EXAMPLE 1 7 DENTAL STUDY DATA The data are read in from the le dentaldat PROGRAM CHAPTER 5 EXAMPLE 1 Analysis of the dental study data by re eated measures analysis of variance using PRU GLM the repeated measurement factor is age time there is one quottreatmentquot factor gender options ls80 ps59 nodate run The data set looks like 1 1 21 0 2 1 10 20 0 3 1 12 215 0 4 1 14 23 0 5 2 8 21 column 1 observation number column 2 child id number column 3 age column 4 response distance column 5 gender indicator Ogirl 1boy The second data step changes the ages from 8 10 12 14 o 4 so that SAS can count them when it creates a t different data set later data dent1 infile dentaldat input obsno child age distance gender data dent1 set dent1 if age8 then age1 if age12 then age3 if age14 then age4 drop obsno Di Create an altern ative data set with the data record for each child on a single line proc sort datadent1 by gender child data dent2keepage1age4 gender array aa4 age1age4 do a e1 to 4 set ent by gender child aaagedistance if lastchild then return D i run proc print Find the means of each genderage combination and plot mean vs r age for each gende proc sort datadent1 by gender age run proc means datadent1 y gender age s ance output outmdent meanmdist run PAGE 147 CHAPTER 5 ST 732 M DAVIDIAN proc plot datamdent plot mdistagegender run Construct the analysis of variance using PRGC GLM via a quotsplit plotquot specification This requires that the data be represented in the form they are given in data set dentl Note that the F ratio that PRGC GLM prints out automatically for the ender effect averaged across age will use t e MSE in tEe denominator This is not the correct F ratio for testing this effect The RANDOM statement asks SAS to compute the ex ected mean squares for eac source of variation The TE option asks SAS to compute the test for the gender effect averaged across age treating the childgender effect as random giving the c rrect F ratio Other Fratios are correct In older versions of SAS that do not recognize this option this test cou be obtained by removin t e TEST option from the RANDOM statement and adding t e statement test hgender e childgender to the call to PRGC GLM proc glm datadent1 c s age gender child model distance gender childgender age agegender random childgender tes run Now carry out the same analysis using the REPEATED statement in PRGC GLM This requires that the data be represented in the form of data set dent2 The option NGUNI su presses individual analyses of variance for t e data at eac age value from being printed The PRINTE option asks for the test of sphericity to be performed The MGM option means quotno multivariatequot which means just do the univariate repeated measures ana sis under the assumption that the exchangable compound symmetry model is correct proc glm datadent2 e d r model agel age2 age3 age4 gender nouni repeated age printe nom This call to PRGC GLM redoes the basic analysis of the last However in the REPEATED statement a different contrast of the arameters is specified the PGLYNGMIAL transformation The evels of quotagequot are equally spaced an t e va ues are specified The transformation produced is orthogonal polynomials for polynomial trends linear quadratic cubic tests corresponding to the contrasts in each column of t matrix The SUMMARY option asks that PRGC GLM print out the resu tonf e The NGU option asks that printing of the univariate analysis of variance be suppressed we already did it in the previous PRGC GLM call THE PRINTM option prints out the U matrix corresponding to the orthogonal polynomial contrasts SAS cal s this matrix M and actuallly prints out its transponse our U For the orthogonal polynomial transformation SAS uses the normalized version of the matrix us the SSs from the individual ANGVAs for each column will add up to the Gender by Age interaction SS and similarly for the withinunit error SS proc glm datadent2 class gend r PAGE 148 CHAPTER 5 ST 732 M DAVIDIAN model age1 age2 age3 age4 g ender ouni repeated age 4 8 10 12 14 polynomial summary nou nom printm run For comparison we do the same analysis as above but use the Helmert matrix instead SAS does NOT use the normalized version of the Helmert transformation matrix Thus the SSs from the individual ANGVAs 39 add up to the Gender by Age interaction SS similarly for withinunit error However the F ratios are correct proc glm datadent2 e d r model age1 age2 age3 age4 gende r nouni repeated age 4 8 10 12 14 helmert summary nou nom printm run Here we manually perform the same NGRMALIZED version of the Helmert transformation matrix e et each individual test separately using the PRGC GLM MAN VA statement analysis but using the proc glm datadent2 model a e1 a e2 age3 age4 gender nouni manova gen er m08660254 4age1 0288675135age2 0288675135age3 0288675135age4 manova h en er m 0816496581age2040824829a e3040824829age4 manova hgender m 0707106781age3 070710678 age4 run To compare we apply the contrasts normalized version to each child s data We t us get a single value for each child corresponding o each contrast These are in the variables AGE1P AGE3P e then use PRGC GLM to perform each separate ANGVA It may be 39 39 squares add up to data dent3 set dent2 age1p sqrt075age1age23age33age43 age2p sqrt23age2age32age42 sqrt12age3age4 proc glm class gender model age1p age2p age3p gender run OUTPUT One important note 7 it is important to always inspect the result of the Test for Sphericity using MauChly s Criterion applied to Orthogonal Components The test must be performed using an orthogonal normalized transformation matrix If the selected transformation eg helmert is not orthogonal and normalized SAS will both do the test anyway which is not appropriate and do it using an orthogonal normalized transformation which is appropriate Obs age1 age2 age3 age4 gender 1 210 200 215 230 0 2 210 215 240 25 5 0 3 205 240 245 26 0 0 4 235 245 250 265 0 5 215 230 225 235 0 6 200 210 21 225 0 PAGE 149 ST 732 M DAVIDIAN II1ITFIEI 5 7 215 225 230 25 0 0 8 230 230 23 5 24 0 0 9 200 210 22 0 21 5 0 10 165 190 19 0 19 5 0 11 245 250 28 0 28 0 0 12 260 250 29 0 31 0 1 13 215 225 23 0 26 5 1 14 230 225 24 0 27 5 1 15 255 275 26 5 27 0 1 16 200 235 22 5 26 0 1 17 245 255 27 0 28 5 1 18 220 220 24 5 26 5 1 19 240 215 24 5 25 5 1 20 230 205 31 0 26 0 1 21 275 280 31 0 31 5 1 22 230 230 23 5 25 0 1 23 215 235 24 0 28 0 1 24 170 245 26 0 29 5 1 25 225 255 25 5 26 0 1 26 230 245 26 0 30 0 1 27 220 21 235 25 0 1 2 gender0 age1 The MEANS Procedure Analysis Variable distance N Mean Std Dev Minimum Maximum 11 211818182 21245320 165000000 245000000 gender0 age2 Analysis Variable distance N Mean Std Dev Minimum Maximum 11 222272727 19021519 190000000 250000000 gender0 age3 Analysis Variable distance N Mean Std Dev Minimum Maximum 11 230909091 23645103 190000000 280000000 gender0 age4 Analysis Variable distance N Mean Std Dev Minimum Maximum 11 240909091 24373980 195000000 280000000 gender1 age1 Analysis Variable distance N Mean Std Dev Minimum Maximum 16 228750000 24528895 170000000 275000000 3 gender1 age2 The MEANS Procedure Analysis Variable distance N Mean Std Dev Minimum Maximum 16 238125000 21360009 205000000 280000000 gender1 age3 Analysis Variable distance N Mean Std Dev Minimum Maximum 16 257187500 26518468 225000000 310000000 PAGE 150 CHAPTER 5 ST 732 M DAVIDIAN gender1 age4 Analysis Variable distance N Mean Std Dev Minimum Maximum 16 274687500 20854156 25 0000000 31 5000000 4 Plot of mdistage Symbol is value of gender mdist 28 27 26 1 25 24 1 0 23 1 0 22 0 21 1 2 3 a e g 5 The GLM Procedure Class Level Information Class Levels Values age 4 1 2 3 4 gender 2 1 child 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Number of observations 108 The GLM Procedure Dependent Variable distance of Source DF Squares Mean Square F Value Model 32 769 5642887 24 0488840 1218 Pr gt F 0001 PAGE 151 II117FIEI 5 ST 732 M DAVIDIAN Error 75 1481278409 19750379 Corrected Total 107 9176921296 RSquare Coeff Var Root MSE distance Mean 0838587 5850026 1405360 2402315 Source DF Type I SS Mean Square F Value Pr gt F gender 1404648569 1404648569 7112 0001 childgender 25 3779147727 15 1165909 765 lt 0001 age 2371921296 79 0640432 4003 lt 0001 agegender 139925295 46641765 236 781 Source DF Type III SS Mean Square F Value Pr gt F gender 1 1404648569 1404648569 7112 0001 childgender 25 377 9147727 151165909 765 001 age 3 2094369739 698123246 3535 lt 0001 agegender 3 139925295 46641765 236 781 7 The GLM Procedure Source Type III Expected Mean Square gender VarError 4 Varchildgender Qgenderagegender childgender VarError 4 Varchildgender age VarError Qageagegender agegender VarError Qagegender 8 The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable distance Source DF Type III SS Mean Square F Value Pr gt F gender 1 140464857 140464857 929 00054 Error 25 377914773 15116591 Error MSchildgender This test assumes one or more other fixed effects are zero Source DF Type III SS Mean Square F Value Pr gt F childgender 25 377914773 15116591 765 0001 a e 3 209436974 69812325 3535 lt 0001 agegender 3 13992529 4664176 236 781 Error MSError 75 148127841 1975038 This test assumes one or more other fixed effects are zero 9 The GLM Procedure Class Level Information Class Levels Values gender 2 0 1 Number of observations 27 10 The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable age1 age2 age3 age4 Level of age 1 2 3 4 Partial Correlation Coefficients from the Error SSCP Matrix Prob gt r DF 25 age1 age2 age3 age4 age1 1000000 0570699 0661320 0521583 00023 00002 00063 age2 0570699 1000000 0563167 0726216 PAGE 152 ST 732 M DAVIDIAN IIIJXITFIEI 5 00023 00027 0001 age3 0661320 0563167 1000000 0728098 00002 00027 0001 age4 0521583 0726216 0728098 1000000 00063 0001 0001 E Error SSCP Matrix ageN represents the contrast between the nth level of age and the last age1 age2 age3 age1 124518 41879 51375 age 41879 63405 11625 age3 51375 11625 79500 Partial Correlation Coefficients from the Error SSCP Matrix of the ariables Defined by the Specified Transformation Prob gt r DF 25 age1 age2 age3 age1 1000000 0471326 0516359 00151 00069 age2 0471326 1000000 0163738 00151 04241 age3 0516359 0163738 1000000 00069 04241 11 The GLM Procedure Repeated Measures Analysis of Variance Sphericity Tests Mauchly s Variables DF Criterion ChiSquare Pr gt ChiSq Transformed Variates 5 04998695 16449181 00057 Orthogonal Components 5 07353334 72929515 01997 12 Th LM Procedure Repeated Measures Analysis of Variance Tests o Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square P Value Pr gt F ender 1 1404648569 1404648569 929 00054 rror 25 3779147727 151165909 13 The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source DF Type III SS Mean Square P Value Pr gt P age 3 2094369739 698123246 3535 0001 agegender 3 139925295 46641765 236 00781 Errorage 75 1481278409 19750379 Ada Pr gt F Source G H P age 0001 0001 agegender 00878 00781 Errorage GreenhouseGeisser Epsilon 08672 HuynhFeldt Epsilon 10156 14 The GLM Procedure Class Level Information Class Levels Values gender 2 0 1 PAGE 153 CHAPTER 5 ST 732 M DAVIDIAN Number of observations 27 15 The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable age1 age2 age3 age4 Level of age 8 10 12 14 ageN represents the nth degree polynomial contrast for age M Matrix Describing Transformed Variables age1 age2 age3 age4 age1 6708203932 2236067977 02236067977 06708203932 age2 05000000000 5000000000 5000000000 05000000000 age3 2236067977 06708203932 6708203932 02236067977 16 The Procedure Repeated Measures Analysis of Variance Tests o Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square P Value Pr gt F ender 1 1404648569 1404648569 929 00054 rror 25 3779147727 151165909 17 The GLM Procedure Repeated Measures Anal sis of Variance Analysis of Variance of ontrast Variables ageN represents the nth degree polynomial contrast for age Contrast Variable age1 Source DF Type III SS Mean Square P Value Pr gt F Mean 1 2082660038 2082660038 8800 0001 gender 1 121141519 121141519 512 00326 rror 25 591673295 23666932 Contrast Variable age2 Source DF Type III SS Mean Square P Value Pr gt F Mean 1 095880682 095880682 092 03465 gender 1 119954756 119954756 115 02935 rror 25 2604119318 104164773 Contrast Variable age3 Source DF Type III SS Mean Square P Value Pr gt F Mean 1 021216330 021216330 008 07739 gender 1 067882997 067882997 027 06081 rror 25 6291931818 251677273 18 The GLM Procedure Class Level Information Class Levels Values gender 2 0 1 Number of observations 27 19 The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable age1 age2 age3 age4 PAGE 154 II117FIEI 5 Level of age 8 10 12 ageN represents the contrast between the nth level of age and the mean of subsequent levels M Matrix Describing Transformed Variables age1 age2 age3 age4 age1 1000000000 0333333333 0333333333 0333333333 age2 0000000000 1000000000 0500000000 0500000000 age3 0000000000 0000000000 1000000000 1000000000 20 The Procedure Repeated Measures Analysis of Variance Tests o Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr gt F ender 1 1404648569 1404648569 929 00054 rror 25 3779147727 151165909 21 The GLM Procedure Repeated Measures Anal sis of Variance Analysis of Variance of ontrast Variables ageN represents the contrast between the nth level of age and the mean of subsequent levels Contrast Variable age1 Source DF Type III SS Mean Square F Value Pr gt F Mean 1 1468395997 1468395997 4543 0001 ender 1 679948 45679948 141 02457 rror 25 808106061 32324242 Contrast Variable age2 Source DF Type III SS Mean Square F Value Pr gt F Mean 1 1119886890 1119886890 3907 0001 gender 1 130998001 130998001 457 00425 rror 25 716548295 28661932 Contrast Variable age3 Source DF Type III SS Mean Square F Value Pr gt F Mean 1 4929629630 4929629630 1550 00006 gender 1 366666667 366666667 115 02932 rror 25 7950000000 318000000 22 The GLM Procedure Number of observations 27 23 The GLM Procedure Multivariate Analysis of Variance M Matrix Describing Transformed Variables age1 age2 age3 age4 MVAR1 0866025404 0288675135 0288675135 0288675135 24 The GLM Procedure Multivariate Analysis of Variance Characteristic Roots and Vectors of E Inverse H where H Type III SSCP Matrix for gender E Error SSCP Matrix Variables have been transformed by the M Matrix Characteristic V EV1 Root Characteristic Vector Percent MVAR1 ST 732 M DAVIDIAN PAGE 155 II117FIEI 5 ST 732 M DAVIDIAN 005652717 100 00 012845032 MANGVA Test Criteria and Exact F Statistics for t e ypothesis of No Overall gender Effect on the Varia es Defined b the M Matrix Transformation H Type III SSC Matrix for gen er E Error SSCP Matrix S1 M05 N115 Statistic Value F Value Num DF Den DF Pr gt F Wilks Lambda 094649719 141 1 25 02457 Pillai s Trace 005350281 141 1 25 02457 HotellingLawle Trace 005652717 141 1 25 02457 Roy s Greatest oot 005652717 141 1 25 02457 25 The GLM Procedure Multivariate Analysis of Variance M Matrix Describing Transformed Variables age1 age2 age3 age4 MVAR1 0 0816496581 040824829 040824829 26 The GLM Procedure Multivariate Analysis of Variance Characteristic Roots and Vectors of E Inverse H where H Type III SSCP Matrix for gender E Error SSCP Matrix Variables have been transformed by the M Matrix Characteristic Characteristic Vector V EV1 Root Percent MVAR1 018281810 10000 014468480 MANGVA Test Criteria and Exact F Statistics for e y othesis of No Overall gender Effect on the Vari es Defined by the M atrix Transformation H Type III SSC Matrix for gen er E Error SSCP Matrix S1 M05 N115 Statistic Value F Value Num DF Den DF Pr gt F Wilks Lambda 084543853 457 1 25 0 0425 Pillai s Trace 015456147 457 1 25 0 0425 HotellingLawle Trace 018281810 457 1 25 00425 Roy s Greatest oot 018281810 457 1 25 00425 27 The GLM Procedure Multivariate Analysis of Variance M Matrix Describing Transformed Variables age1 age2 age3 age4 MVAR1 0 0 0707106781 0707106781 28 The GLM Procedure Multivariate Analysis of Variance Characteristic Roots and Vectors of E Inverse H where H Type III SSCP Matrix for gender E Error SSCP Matrix Variables have been transformed by the M Matrix Characteristic Characteristic Vector V EV1 Root Percent MVAR1 004612159 10000 015861032 PAGE 156 II117FIEI 5 MANGVA Test Criteria and Exact F Statistics for on the Var ab Statistic Wilks Lambda Pillai s Trac e HotellingLawle Trace oot oy s Greatest Dependent Variable Source Model Error Corrected Total age1p RSquare 0053503 Source gender Source gender Dependent Variable Source Model Error Corrected Total age2p RSquare 0154561 Source gender Source gender Dependent Variable age3p of No 0 es Defined b the M Mat ype III SSC E rror SSCP Matrix Matrix for gen er verall gender Effect rix Transformation S1 M05 N115 Value F Value Num DF Den DF 095591182 115 1 25 004408818 115 1 25 004612159 115 1 25 004612159 115 1 25 The GLM Procedure Class Level Information Class Levels Values gender 2 0 1 Number of observations 27 The GLM Procedure of DF Squares Mean Square F Value 1 342599607 342599607 141 25 6060795455 242431818 26 6403395062 Coeff Var Root MSE age1p Mean 7336496 1557022 2122297 DF Type I SS Mean Square F Value 1 342599607 342599607 141 DF Type III SS Mean Square F Value 1 342599607 342599607 141 The GLM Procedure of DF Squares Mean Square F Value 1 873320006 873320006 457 25 4776988636 191079545 26 5650308642 Coeff Var Root MSE age2p Mean 7682446 1382315 1799317 DF Type I SS Mean Square F Value 1 873320006 873320006 457 DF Type III SS Mean Square F Value 1 873320006 873320006 457 The GLM Procedure Pr gt F 02457 Pr gt F 02457 Pr gt F 02457 Pr gt F 00425 Pr gt F 00425 Pr gt F 00425 ST 732 M DAVIDIAN PAGE 157 CHAPTER 5 ST 732 M DAVIDIAN Source DF Squarg Mean Square F Value Pr gt F Model 1 183333333 183333333 115 02932 Error 25 3975000000 159000000 Corrected Total 26 4158333333 RSquare Coeff Var Root MSE age3p Mean 0044088 1234561 1260952 1021376 Source DF Type I SS Mean Square F Value Pr gt F gender 1 183333333 183333333 115 02932 Source DF Type III SS Mean Square F Value Pr gt F gender 1 183333333 183333333 115 02932 EXAMPLE 2 7 GUINEA PIG DIET DATA The data are read in from the le dietdat PROGRAM CHAPTER 5 EXAMPLE 2 Analysis of the vitamin E data by univariate repeated measures analysis of variance using PRGC GLM the repeated measurement factor is week time there is one quottreatmentquot factor dose options ls80 ps59 nodate run The data set looks like 1 455 460 510 504 436 466 01 gt oo o 01 o o 01 01 o 01 M 00 01 l on M 01 l on MMMMHHHHH H M gt O I 01 gt O 01 I 0 U1 U1 01 03 I 03 O O mmwmmm 15 472 498 540 524 532 583 olumn 1 gig number columns 27 o y weights at weeks 1 3 4 5 6 7 column 8 dose group 1zero 2 low 3 high dose data pigs1 infile dietdat input pig week1 week3 week4 week5 week6 week7 dose Create a data set with one data record per pigweek this repeated measures data are often recorded in this form Create a new variable quotweightquot containing the body weight at time quotweekquot The second data step fixes up the quotweekquot values as the weeks of observagiozs werg ngt equally spaced but rather have the ues PAGE 158 CHAPTER 5 ST 732 M DAVIDIAN data pigs2 set pigsl array wt6 weekl week3 week4 week5 week6 week7 o e o weight tweek output en drop weekl week3week7 run data i s2 set i s2 if ge kgt1 thenpwgekweek1 Di proc print run Find the means of each doseweek combination and plot mean vs week for each dose proc sort datapigs2 by dose week run proc means datapigs2 y dose week var weig t output outmpigs meanmweight run proc plot datampigs plot mweightweekdose run First construct the analysis of variance using PRGC GLM via a quotsplit lotquot specification is requires that the data be represented in the form they are given in data set pigs2 Note that the F ratio that PRGC GLM prints out automatically for the dose effect avera ed across week will use e MSE in the denominator is is not the correct F ratio for testing this effe t The RANDOM statement asks SAS to compute the ex ected mean squares for eac s urce of variation The TES option asks SAS to compute the test for the dose effect averaged across week treating the Eigdose effect as random giving the correct F ratio at er Fratios are correct In older versions of SAS that do not recognize this option this test could be obtained by removin t e TE T option from the RANDOM statement and adding t e statement test hdose epiggender to the call to PRGC GLM proc glm datapigs2 class week dose pig model weight dose pigdose week weekdose t random pigdose tes run Now carry out the same analysis using the REPEATED statement in PRGC GLM This requires that the data be represented in the form of data set pigs The option NGUNI suppresses individual analyses of variance at each week value 39 t d rom being prin e The PRINTE option asks for the test of sphericity to be performed The MGM option means quotno multivariatequot which means univariate tests under the assumption that the compound symmetry model is correc proc glm datapigs1 class se model weekl week3 week4 week5 week6 week7 dose nouni repeated week printe no run These calls to PRGC GLM redo the basic analysis of the last PAGE 159 CHAPTER 5 However in the REPEATED stgtement different contrasts of e the parameters are speci i The SUMMARY option asks that PRGC GLM print out the resu tonf e tests corresponding to the contrasts in each column of t matrix The NGU option asks that printing of the univariate analysis of variance be suppressed we already did it in the previous PRGC GLM call THE PRINTM option prints out the U matrix corresponding to the g used SAS calls this matrix M and actually prints out its transpose our contrasts bein 39 proc glm datapigs1 class dose model weekl week3 week4 week5 week6 week7 dose nouni repeated week 6 1 3 4 5 6 7 polynomial summary printm nom run proc glm datapigs1 class dose model weekl week3 week4 week5 week6 week7 dose nouni repeated week 6 1 3 4 5 6 7 profile summary printm nom run proc glm datapigs1 class ose model weekl week3 week4 week5 week6 week7 dose nouni repeated week 6 he mert summary printm nom run OUTPUT The same warning about the test for sphericity applies here 1 U m 0 W pig dose we weight mpmMHo om mmpmMH o1gtUgtmgt I03o1gtUgtmgt I03o1gtUgtwgt I03o1gtUgtwgt Imm wH mmwmm k mm mkk lmm w 03 gt H 0301 ST 732 M DAVIDIAN PAGE 160 CHAPTER 5 ST 732 M DAVIDIAN pig dose dose1 week1 H IOSU39UPCAH Nm 03 gt 0 week weight 3 555 5 l03UHFwgt l03U1gtUgtwH mm wH mm wb mm wb mm 01 gt o The MEANS Procedure Analysis Variable weight N Mean Std Dev Minimum Maximum 5 4664000000 167272233 4450000000 4850000000 dose1 week3 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5194000000 406423425 4600000000 5650000000 dose1 week4 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5688000000 395878769 5100000000 6100000000 dose1 week5 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5616000000 428404015 5040000000 597 0000000 PAGE 161 CHAPTER 5 ST 732 M DAVIDIAN dose1 week6 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5466000000 668789952 4360000000 6110000000 4 dose1 week7 The MEANS Procedure Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5720000000 618182821 4660000000 6190000000 dose2 week1 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 4944000000 319108132 4400000000 5200000000 dose2 week3 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5510000000 418927201 4800000000 5900000000 dose2 week4 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5742000000 279946423 5360000000 6100000000 dose2 week5 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5670000000 620604544 4840000000 6370000000 5 dose2 week6 The MEANS Procedure Analysis Variable weight N Mean Std Dev Minimum Maximum 5 6030000000 533057220 5520000000 6710000000 dose2 week7 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 6440000000 575499783 5690000000 7020000000 PAGE 162 CHAPTER 5 ST 732 M DAVIDIAN dose3 week1 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 4978000000 286740301 4720000000 5450000000 dose3 week3 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5346000000 297623924 4980000000 5650000000 dose3 week4 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5798000000 299532970 5400000000 6220000000 dose3 week5 The MEANS Procedure Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5718000000 392390112 5240000000 6220000000 dose3 week6 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 5882000000 437058349 5320000000 6330000000 dose3 week7 Analysis Variable weight N Mean Std Dev Minimum Maximum 5 6232000000 353723056 5830000000 6700000000 PAGE 163 CHAPTER 5 ST 732 M DAVIDIAN 7 Plot of mweightweek Symbol is value of dose mweight 660 r 2 640 r 3 620 r 2 600 r 3 580 r 3 2 1 3 1 2 560 r 1 2 1 540 r 3 520 r 1 500 r 3 2 480 r 1 460 r 1 2 3 4 5 6 7 week 8 The GLM Procedure Class Level Information Class Levels Values week 6 1 3 4 5 6 7 dose 3 1 2 3 pig 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of observations 90 9 The GLM Procedure Dependent Variable weight of Source DF Squares Mean Square F Value Pr gt F Model 29 2762995000 95275690 1756 0001 Error 60 325526000 5425433 Corrected Total 89 3088521000 RSquare Coeff Var Root MSE weight Mean 0894601 4166081 2329256 5591000 Source DF Type I SS Mean Square F Value Pr gt F PAGE 164 II117FIEI 5 dose 2 185480667 92740333 1709 lt 0001 pigdose 12 105434200 7861833 1619 lt 0001 wee 5 5545000 285109000 5255 0001 weekdose 10 97627333 9762733 180 00801 Source DF Type III SS Mean Square F Value Pr gt F dose 2 185480667 92740333 1709 lt 0001 pigdose 12 1054342000 87861833 1619 lt 0001 week 5 1425545000 285109000 5255 0001 weekdose 97627333 9762733 180 00801 10 The GLM Procedure Source Type III Expected Mean Square dose VarError 6 Varpigdose Qdoseweekdose pigdose VarError 6 Varpigdose week VarError Qweekweekdose weekdose VarError Qweekdose 11 The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable weight Source DF Type III SS Mean Square F Value Pr gt F dose 2 18548 9274033333 106 03782 Error MSpigdose 12 105434 8786183333 This test assumes one or more other fixed effects are zero Source DF Type III SS Mean Square F Value Pr gt F pigdose 12 105434 8786183333 1619 0001 week 5 5 5255 0001 weekdose 10 9762733333 976273333 180 00801 Error MSError 60 32553 542543333 This test assumes one or more other fixed effects are zero 12 The GLM Procedure Class Level Information Class Levels Values dose 3 1 2 3 Number of observations 15 13 The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable week1 week3 week4 week5 week6 week7 Level of week 1 2 3 4 5 6 Partial Correlation Coefficients from the Error SSCP Matrix Prob gt r DF 12 week1 week3 week4 week5 week6 week7 week1 1000000 0707584 0459151 0543739 0492366 0502098 00068 01145 00548 00874 00804 week3 0707584 1000000 0889996 0874228 0676753 0834899 00068 0001 0001 00111 00004 week4 0459151 0889996 1000000 0881217 0789575 0847786 01145 0001 0001 00013 00003 ST 732 M DAVIDIAN PAGE 165 ST 732 M DAVIDIAN IIIJXITFIEI 5 week5 0543739 0874228 0881217 1000000 0803051 0919350 00548 0001 0001 00009 0001 week6 0492366 0676753 0789575 0803051 1000000 0895603 00874 00111 00013 00009 0001 week7 0502098 0834899 0847786 0919350 0895603 1000000 00804 00004 00003 0001 0001 E Error SSCP Matrix weekN represents the contrast between the nth level of week and the last week1 week2 week3 week4 week5 week1 250836 135740 121932 49590 22748 week2 135740 106384 90992 43546 9682 week3 121932 90992 111368 4293 8 1623 6 week 0 435 6 42938 5194 65 8 week5 2274 8 8 2 16236 65 8 7425 2 14 The GLM Procedure Repeated Measures Analysis of Variance Partial Correlation Coefficients from the Error SSCP Matrix of the Variables Defined by the Specified Transformation Prob gt r DF 12 week1 week2 week3 week4 week5 week1 1000000 0830950 0729529 0434442 0166684 00004 00047 01380 05863 week2 0830950 1000000 0835959 0585791 0108936 00004 00004 00354 07231 week3 0729529 0835959 1000000 0564539 0178544 00047 00004 00444 05595 week4 0434442 0585791 0564539 1000000 0058901 01380 00354 00444 08484 week5 0166684 0108936 0178544 0058901 1000000 05863 07231 05595 08484 Sphericity Tests Mauchly s Variables DF Criterion ChiSquare Pr gt ChiSq Transformed Variates 14 00160527 41731963 00001 Orthogonal Components 14 00544835 29389556 00093 15 The Procedure Repeated Measures Analysis of Variance Tests o Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square P Value Pr gt F dose 2 185480667 92740333 106 03782 Error 12 1054342000 87861833 16 The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subj ct Effects Source DF Type III SS Mean Square P Value Pr gt F week 5 1425545000 285109000 5255 0001 weekdose 10 9762 3 9762733 180 00801 Errorweek 60 325526000 5425433 Ada Pr gt F Source G H F week 0001 0001 weekdose 01457 01103 Errorweek GreenhouseGeisser Epsilon 04856 HuynhFeldt Epsilon 07191 PAGE 166 CHAPTER 5 ST 732 M DAVIDIAN 17 The GLM Procedure Class Level Information Class Levels Values dose 3 1 2 3 Number of observations 15 18 The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable week1 week3 week4 week5 week6 week7 Level of week 1 3 4 5 6 7 weekN represents the nth degree polynomial contrast for week M Matrix Describing Transformed Variables week1 week3 week4 week1 6900655593 2760262237 0690065559 week2 0 5455447256 32732683 4 4364357805 week3 2 3 262021 0 6061281254 0 0932504808 week4 00703659384 481736039 05196253913 week5 0149872662 02248089935 5994906493 weekN represents the nth degree polynomial contrast for week M Matrix Describing Transformed Variables week5 week6 week7 week1 01380131119 03450327797 05520524475 week2 3273268354 00000000000 05455447256 week3 4196271637 4662524041 04196271637 week4 02760509891 6062296232 02219233442 week5 06744269805 3596943896 00749363312 19 The GLM Procedure Repeated Measures Analysis of Variance Tests o Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square P Value Pr gt F dose 2 185480667 92740333 106 03782 Error 12 1054342000 87861833 20 The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source DF Type III SS Mean Square P Value Pr gt F week 5 1425545000 285109000 5255 0001 weekdose 10 97627333 9762733 180 00801 Errorweek 60 325526000 5425433 Ada Pr gt F Source G H F week 0001 0001 weekdose 01457 01103 Errorweek GreenhouseGeisser Epsilon 04856 HuynhFeldt Epsilon 07191 21 The GLM Procedure Repeated Measures Anal sis of Variance Analysis of Variance of ontrast Variables weekN represents the nth degree polynomial contrast for week PAGE 167 CHAPTER 5 ST 732 M DAVIDIAN Contrast Variable week1 Source DF Type III SS Mean Square F Value Pr gt F Mean 1 1317648029 1317648029 8735 0001 dose 2 24952133 12476067 083 04608 Error 12 181008743 15084062 Contrast Variable week2 Source DF Type III SS Mean Square F Value Pr gt F Mean 1 2011479365 2011479365 667 00240 dose 2 4489677778 2244838889 745 00079 Error 12 3617509524 301459127 Contrast Variable week3 Source DF Type III SS Mean Square F Value Pr gt F Mean 1 2862193623 2862193623 919 00104 dose 2 694109855 347054928 111 03597 Error 12 3736192174 311349348 Contrast Variable week4 Source DF Type III SS Mean Square F Value Pr gt F Mean 1 3954881058 3954881058 1728 00013 dose 2 1878363604 939181802 410 00439 Error 12 2746984214 228915351 Contrast Variable week5 Source DF Type III SS Mean Square F Value Pr gt F Mean 1 1961143097 1961143097 541 00384 dose 2 205368763 102684382 028 07583 Error 12 4351039802 362586650 22 The GLM Procedure Class Level Information Class Levels Values dose 3 1 2 3 Number of observations 15 23 The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable week1 week3 week4 week5 week6 week7 Level of week 1 3 4 5 6 7 weekN l the nth 39 quot in week M Matrix Describing Transformed Variables week1 week3 week4 week1 1 000000000 1000000000 0 000000000 week2 0 000000000 1 000000 00 1000000000 week3 0 000000000 0 000000 00 1 000000000 week4 0 000000000 0 000000000 0 000000000 week5 0 000000000 0 000000000 0 000000000 weekN the nth 39 quot in week A M Matrix Describing Transformed Variables week5 week6 week7 week1 0 000000000 0 000000000 0 000000000 week2 0 000000000 0 000000000 0 000000000 PAGE 168 II117FIEI 5 week3 1000000000 0000000000 0 week4 1000000000 1000000000 0 week5 0000000000 1000000000 1 The Procedure ated Measures Analysis of Varia c 000000000 000000000 000000000 Repe n e Tests o Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square dose 2 185480667 92740333 Error 12 1054342000 87861833 The GLM Procedure Repeated Measures Anal sis of Variance Univariate Tests of Hypotheses for Within Subject Source DF Type III SS Mean Square week 5 1425545000 285109000 weekdose 1 97627333 9762733 Errorweek 60 325526000 5425433 Ada Pr gt F Source G H F week 0001 0001 weekdose 01457 01103 Errorweek GreenhouseGeisser Epsilon 04856 HuynhFeldt Epsilon 07191 The GLM Procedure Repeated Measures Anal sis of Vari ce Analysis of Variance of ontrast Variables weekN l the nth 39 quot in week Contrast Variable week1 Source DF Type III SS Mean Square Mean 1 3572160000 3572160000 dose 2 111240000 55620000 Error 12 857400000 71450000 Contrast Variable week2 Source DF Type III SS Mean Square Mean 1 2312806667 2312806667 dose 2 198013333 99006667 Error 12 357680000 29806667 Contrast Variable week3 Source DF Type III SS Mean Square Mean 1 836266667 836266667 dose 2 21 1066667 Error 12 7743600000 645300000 Contrast Variable week4 Source DF Type III SS Mean Square Mean 1 233126667 233126667 dose 2 661853333 330926667 Error 12 1335120000 111260000 Contrast Variable week5 Source DF Type III SS Mean Square Mean 1 1713660000 1713660000 dose 2 0 0 30960000 Error 12 742520000 61876667 F Value 106 Effects F Value 5255 180 F Value 5000 078 F Value 7759 332 F Value 130 000 F Value 210 297 F Value 2769 050 Pr gt F 03782 Pr gt F 0001 00801 Pr gt F 0001 04810 Pr gt F 0001 00711 Pr gt F 02772 09983 Pr gt F 01734 00893 Pr gt F 00002 06184 ST 732 M DAVIDIAN PAGE 169 CHAPTER 5 ST 732 M DAVIDIAN 27 The GLM Procedure Class Level Information Class Levels Values dose 3 1 2 3 Number of observations 15 28 The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable week1 week3 week4 week5 week6 week7 Level of week 1 2 3 4 5 6 weekN represents the contrast between the nth level of week and the mean of subsequent levels M Matrix Describing Transformed Variables week1 week3 week4 week1 1000000000 0200000000 0200000000 week2 0 000000000 1000000000 0250000000 week3 0 000000000 0 000000000 1 000000000 week4 0 000000000 0 000000000 0 000000000 week5 0 000000000 0 000000000 0 000000000 weekN represents the contrast between the nth level of week and the mean of subsequent levels M Matrix Describing Transformed Variables week5 week6 week7 week1 0200000000 0200000000 0200000000 week2 0 250000000 0 250000000 0 250000000 week3 0 333333333 0 333333333 0 333333333 week4 1 000000000 0500000000 0500000000 week5 0 000000000 1000000000 1000000000 29 The Procedure Repeated Measures Analysis of Variance Tests o Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr gt F dose 2 185480667 92740333 106 03782 Error 12 1054342000 87861833 30 The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source DF Type III SS Mean Square F Value Pr gt F week 5 1425545000 285109000 5255 0001 weekdose 10 97627333 9762733 180 00801 Errorweek 60 325526000 5425433 Ada Pr gt F Source G H F week 0001 0001 weekdose 01457 01103 Errorweek GreenhouseGeisser Epsilon 04856 HuynhFeldt Epsilon 07191 31 The GLM Procedure Repeated Measures Anal sis of Vari ce Analysis of Variance of ontrast Variables PAGE 170 II117FIEI 5 ST 732 M DAVIDIAN weekN represent subsequent level s s Contrast Source Mean dose Error Contrast Source Mean dose Error Contrast Source Mean dose Error Contrast Source Mean dose Error Contrast Source Mean dose Error Variable Variable Variable Variable Variable the contrast between the nth level of week and week1 week2 week3 week4 week5 1 2 12 Type III SS 1147912560 3436960 147019680 Type III SS 3506583750 657432500 Type III SS 2200185185 3888059259 8512755556 Type III SS 1293601667 879773333 741650000 Type III SS 1713660000 7425120000 Mean Square 1147912560 1718480 12251640 Mean Square 3506583750 24095000 54786042 Mean Square 2200185185 1944029630 709396296 Mean Square 1293601667 439886667 Mean Square 1713660000 309 6 0000 61876667 F F F F F the mean Value 9369 014 Value 6401 044 Value 310 274 Value 2093 712 Value 2769 050 Pr gt F 0001 08705 Pr gt F 0001 06541 Pr gt F 01037 01046 Pr gt F 00006 00092 Pr gt F 00002 06184 PAGE 171 CHAPTER 3 ST 732 M DAVIDIAN 3 Random vectors and multivariate normal distribution As we saw in Chapter 1 a natural way to think about repeated measurement data is as a series of random vectors one vector corresponding to each unit Because the way in which these vectors of measurements turn out is governed by probability we need to discuss extensions of usual univari ate probability distributions for scalar random variables to multivariate probability distributions governing random vectors 31 Preliminaries First it is wise to review the important concepts of random variable and probability distribution and how we use these to model individual observations RANDOM VARIABLE We may think of a random variable Y as a characteristic whose values may vary The way it takes on values is described by a probability distribution CONVENTION REPEATED It is customary to use upper case letters eg Y to denote a generic random variable and lower case letters eg y to denote a particular value that the random variable may take on or that may be observed data EXAMPLE Suppose we are interested in the characteristic body weight of rats77 in the population of all possible rats of a certain age gender and type We might let Y body weight of a randomly chosen rat from this population Y is a random variable We may conceptualize that body weights of rats are distributed in this population in the sense that some values are more common ie more rats have them than others If we randomly select a rat from the population then the chance it has a certain body weight will be governed by this distribution of weights in the population Formally values that Y may take on are distributed in the population according to an associated probability distribution that describes how likely the values are in the population In a moment we will consider more carefully Why rat weights we might see vary First we recall the following PAGE 32 CHAPTER 3 ST 732 M DAVIDIAN POPULATION MEAN AND VARIANCE Recall that the mean and variance of a probability distribution summarize notions of center and spread or variability of all possible values Consider a random variable Y with an associated probability distribution The population mean may be thought of as the average of all possible values that Y could take on so the average of all possible values across the entire distribution Note that some values occur more frequently are more likely than others so this average re ects this We write 31 to denote this average the population mean The expectation operator E denotes that the averaging operation over all possible values of its argument is to be carried out Formally the average may be thought of as a weighted average where each possible value is represented in accordance to the probability with which it occurs in the population The symbol 1 is often used The population mean may be thought of as a way of describing the center of the distribution of all possible values The population mean is also referred to as the expected value or expectation of Y Recall that if we have a random sample of observations on a random variable Y say Y1 Yn then the sample mean is just the average of these For example if Y rat weight and we were to obtain a random sample of n 50 rats and weigh each then Y represents the average we would obtain 0 The sample mean is a natural estimator for the population mean of the probability distribution from which the random sample was drawn The population variance may be thought of as measuring the spread of all possible values that may be observed based on the squared deviations of each value from the center of the distribution of all possible values More formally variance is based on averaging squared deviations across the population which is represented using the expectation operator and is given by varm my 7 m M Em 32 32 shows the interpretation of variance as an average of squared deviations from the mean across the population taking into account that some values are more likely occur with higher probability than others PAGE 33 CHAPTER 3 ST 732 M DAVIDIAN o The use of squared deviations takes into account magnitude of the distance from the center but not direction so is attempting to measure only spread in either direction The symbol 72 is often used generically to represent population variance Figure 1 shows two normal distributions with the same mean but different variances a lt 0 illustrating how variance describes the spread of possible values Figure 1 Normal distributions with mean a but di erent variances Variance is on the scale of the response squared A measure of spread that is on the same scale as the response is the population standard deviation de ned as xvarY The symbol a is often used Recall that for a random sample as above the sample variance is almost the average of the squared deviations of each observation from the sample mean 7 32ltn71gt1iltyj 77gt o The sample variance is used as an estimator for population variance Division by n 7 1 rather than n is used so that the estimator is unbiased ie estimates the true population variance well even if the sample size n is small 0 The sample standard deviation is just the square root of the sample variance often represented by the symbol S PAGE 34 CHAPTER 3 ST 732 M DAVIDIAN GENERAL FACTS If I is a xed scalar and Y is a random variable then 0 EbY bEY by ie all values in the average are just multiplied by b Also EY b EY b adding a constant to each value in the population will just shift the average by this same amount 0 varbY EbY7 17102 bzvarY ie all values in the average are just multiplied by b2 Also varY b varY adding a constant to each value in the population does not affect how they vary about the mean which is also shifted by this amount SOURCES OF VARIATION We now consider why the values of a characteristic that we might observe vary Consider again the rat weight example 0 Biological variation It is well known that biological entities are different although living things of the same type tend to be similar in their characteristics they are not exactly the same except perhaps in the case of genetically identical clones Thus even if we focus on rats of the same strain age and gender we expect variation in the possible weights of such rats that we might observe due to inherent natural biological variation Let Y represent the weight of a randomly chosen rat with probability distribution having mean a If all rats were biologically identical then the population variance of Y would be equal to 0 and we would expect all rats to have exactly weight a Of course because rat weights vary as a consequence of biological factors the variance is gt 0 and thus the weight of a randomly chosen rat is not equal to a but rather deviates from a by some positive or negative amount From this view we might think of Y as being represented by Y u b 33 where b is a random variable with population mean Eb O and variance varb 0 say Here Y is decomposed into its mean value a systematic component and a random devia tion I that represents by how much a rat weight might deviate from the mean rat weight due to inherent biological factors 33 is a simple statistical model that emphasizes that we believe rat weights we might see vary because of biological phenomena Note that 33 implies that EY a and varY og PAGE 35 CHAPTER 3 ST 732 M DAVIDIAN 0 Measurement error We have discussed rat weight as though once we have a rat in hand we may know its weight exactly However a scale usually must be used ldeally a scale should register the true weight of an item each time it is weighed but because such devices are imperfect measurements on the same item may vary time after time The amount by which the measurement differs from the truth may be thought of as an error ie a deviation up or down from the true value that could be observed with a perfect device A fair or unbiased device does not systematically register high or low most of the time rather the errors may go in either direction with no pattern Thus if we only have an unbiased scale on which to weigh rats a rat weight we might observe re ects not only the true weight of the rat which varies across rats but also the error in taking the measurement We might think of a random variable e say that represents the error that might contaminate a measurement of rat weight taking on possible values in a hypothetical population of all such errors the scale might commit We still believe rat weights vary due to biological variation but what we see is also subject to measurement error It thus makes sense to revise our thinking of what Y represents and think of Y measured weight of a randomly chosen rat The population of all possible values Y could take on is all possible values of rat weight we might measure ie all values consisting of a true weight of a rat from the population of all rats contaminated by a measurement error from the population of all possible such errors With this thinking it is natural to represent Y as Yubeu 34 where b is as in 33 e is the deviation due to measurement error with Ee O and vare 7 representing an unbiased but imprecise scale In 34 e b e represents the aggregate deviation due to the effects of both biological variation and measurement error Here Ee O and vare lt72 a 7 so that EY u and varY 72 according to the model 34 Here 72 re ects the spread of measured rat weights and depends on both the spread in true rat weights and the spread in errors that could be committed in measuring them There are still further sources of variation that we could consider we defer discussion to later in the course For now the important message is that in considering statistical models it is critical to be aware of different sources of variation that cause observations to vary This is especially important with longitudinal data as we will see PAGE 36 CHAPTER 3 ST 732 M DAVIDIAN We now consider these concepts in the context of a familiar statistical model SIMPLE LINEAR REGRESSION Consider the simple linear regression model At each xed value 31 wn we observe a corresponding random variable Yj j 1 n For example suppose that the wj are doses of a drug For each at a rat is randomly chosen and given this dose The associated response for the jth rat given dose 37 may be represented by The simple linear regression model as usually stated is Yj o 1wj 677 where Ej is a random variable with mean 0 and variance 72 that is E j 0 var j 72 Thus o 1wj and varY 0392 This model says that ideally at each 7 the response of interest Y should be exactly equal to the xed value 80 lm the mean of However because of factors like biological variation and ii measurement error the values we might see at wj vary In the model Ej represents the deviation from 80 8le that might occur because of the aggregate effect of these sources of variation If is a continuous random variable it is often the case that the normal distribution is a reasonable probability model for the population of Ej values that is Ej N N0 72 This says that the total effect of all sources of variation is to create deviations from the mean of that may be equally likely in either direction as dictated by the symmetric normal probability distribution Under this assumption we have that the population of observations we might see at a particular 37 is also normal and centered at 80 lmyg ie Y quotVA t 1wy397 02 o This model says that the chance of seeing values above or below the mean 80 8le is the same symmetry 0 This is an especially good model when the predominant source of variation represented by the Ej is due to a measuring device o It may or may not be such a good model when the predominant source of variation is due to biological phenomena more later in the course PAGE 37 CHAPTER 3 ST 732 M DAVIDIAN The model thus says that at each wj there is a population of possible values we might see with mean 80 8le and variance 72 We can represent this pictorially by considering Figure 2 Figure 2 Simple linear regression ERROR An unfortunate convention in the literature is that the Ej are referred to as errors which causes some people to believe that they represent solely deviation due to measurement error We prefer the term deviation to emphasize that values may deviate from 80 1wj due to the combined effects of several sources but not limited to measurement error INDEPENDENCE An important assumption for simple linear regression and indeed more general problems is that the random variables Yj or equivalently the Ej are independent Statistical independence is a formal statistical concept with an important practical interpretation In particular in our simple linear regression model this says that the way in which at wj takes on its values is completely unrelated to the way in which Yj observed at another position 537 takes on its values This is certainly a reasonable assumption in many situations 0 In our example where wj are doses of a drug each given to a different rat there is no reason to believe that responses from different rats should be related in any way Thus the way in which Y values turn out at different wj would be totally unrelated PAGE 38 CHAPTER 3 ST 732 M DAVIDIAN The consequence of independence is that we may think of data on an observationbyobservation basis because the behavior of each observation is unrelated to that of others we may talk about each one in its own right without reference to the others Although this way of thinking may be relevant for regression problems where the data were collected according to a scheme like that in the example above as we will see it may not be relevant for longitudinal data 3 2 Random vectors As we have already mentioned when several observations are taken on the same unit it will be convenient and in fact necessary to talk about them together We thus must extend our way of thinking about random variables and probability distributions RANDOM VECTOR A random vector is a vector whose elements are random variables Let Y1 Y2 Y Y n be a n X 1 random vector 0 Each element of Y Yj j 1 n is a random variable with its own mean variance and probability distribution eg EGG My WW EYj 7 WV 0 We might furthermore have that is normally distributed ie Y IA770 0 Thus if we talk about a particular element of Y in its own right we may speak in terms of its particular probability distribution mean and variance Probability distributions for single random variables are often referred to as univariate because they refer only to how one scalar random variable takes on its values PAGE 39 CHAPTER 3 ST 732 M DAVIDIAN JOINT VARIATION However if we think of the elements of Y together we must consider the fact that they come together in a group so that there might be relationships among them Speci cally if we think of Y as containing possible observations on the same unit at times indexed by 9 there is reason to expect that the value observed at one time and that observed at another time may turn out the way they do in a common fashion For example o If Y consists of the heights of a pine seedling measured on each of 71 consecutive days we might expect a large value one day to be followed by a large value the next day o If Y consists of the lengths of baby rats in a litter of size n from a particular mother we might expect all the babies in a litter to be large or small relative to babies from other litters This suggests that if observations can be naturally thought to arise together then they may not be legitimately viewed as independent but rather related somehow o In particular they may be thought to vary together or covary o This suggests that we need to think of how they take on values jointly JOINT PROBABILITY DISTRIBUTION Just as we think of a probability distribution for a random variable as describing the frequency with which the variable may take on values we may think of a joint probability distribution that describes the frequency with which an entire set of random variables takes on values together Such a distribution is referred to as multivariate for obvious reasons We will consider the speci c case of the multivariate normal distribution shortly We may thus think of any two random variables in Y and Yk say as having a joint probability distribution that describes how they take on values together OOVARIANOE A measure of how two random variable vary together is the covariance Formally suppose and Yk are two random variables that vary together Each of them has its own probability distribution with means p and Ink respectively which is relevant when we think of them separately They also have a joint probability distribution which is relevant when we think of them together Then we de ne the covariance between and Yk as coveM Eme 7 mm 7 Ml 35 Here the expectation operator denotes average over all possible pairs of values and Yk may take on together according to their joint probability distribution PAGE 40 CHAPTER 3 ST 732 M DAVIDIAN Inspection of 35 shows 0 Covariance is de ned as the average across all possible values that and Yk may take on jointly of the product of the deviations of and Yk from their respective means 0 Thus note that if large values larger than their means of and Yk tend to happen together and thus small values of and Yk tend to happen together then the two deviations 7 p7 and Y16 7 pk will tend to be positive together and negative together so that the product Y7 7 WgtYk 7 We 35 will tend to be positive for most of the pairs of values in the population Thus the average in 35 will likely be positive 0 Conversely if large values of tend to happen coincidently with small values of Yk and vice versa then the deviation 7 M will tend to be positive when Yc 7 pk tends to be negative and vice versa Thus the product 36 will tend to be negative for most of the pairs of values in the population Thus the average in 35 will likely be negative 0 Moreover if in truth and Yk are unrelated so that large are likely to happen with small Yk and large Yk and vice versa then we would expect the deviations 7 p7 and Y16 7 pk to be positive and negative in no real systematic way Thus 36 may be negative or positive with no special tendency and the average in 35 would likely be zero Thus the quantity of covariance de ned in 35 makes intuitive sense as a measure of how associated values of are with values of Yk o In the last bullet above and Yk are unrelated and we argued that covYYk 0 In fact formally if and Yk are statistically independent then it follows that covY Yk 0 0 Note that covYYk covYkY 0 Fact the covariance of a random variable and itself coves7197 Em 7 mac 7 m 7 my 7 a 0 Fact If we have two random variables and Yk then varY Yk varY varYk 2covY PAGE 41 CHAPTER 3 ST 732 M DAVIDIAN That is the variance of the population consisting of all possible values of the sum Yk is the sum of the variances for each population adjusted by how associated the two values are Note that if and Yk are independent varYj Yk varY varYk We now see how all of this information is summarized EXPECTATION OF A RANDOM VECTOR For an entire n dimensional vector random Y we sum marize the means for each element in a vector EY1 M1 E03 2 u I EYn Mn We de ne the expected value or mean of Y as E Y u the expectation operation is applied to each element in the vector Y yielding the vector p of means RANDOM MATRIX A random matrix is simply a matrix whose elements are random variables we will see a speci c example of importance to us in a moment Formally if 37 is a r X 0 matrix with element ij each a random variable then each element has an expectation Mk say Then the expected value or mean of 37 is de ned as the corresponding matrix of means ie EY11 EY12 u Ech Em E041 E042 EYm OO VARIANCE MATRIX We now see how this concept is used to summarize information on covariance among the elements of a random vector Note that Ki11V YI MIXYZ M lawlgtanngt Yi xyim Yzimla 7m lei12V Y2 in ll717ml 7 Yn WOO1 7 P1 Yn 7 WOO2 7 P2 Yn 7 any which is a random matrix PAGE 42 CHAPTER 3 ST 732 M DAVIDIAN Note then that EY1 M1gt2 EY1M1Y2M2 EY1M1Ynngt EYiltYlt Ewwlmim W27 REGIME 7W Eltangtmmgt Eltangtltnwgt Eltanngt2 0 012 am 021 7 727 27 an m 7 say where for jk 1 n varY 732 and we de ne covYYk 0719 The matrix 2 is called the covariance matrix or variancecovariance matrix of Y 0 Note that 0719 akj so that 2 is a symmetric square matrix 0 We will write succinctly varY 2 to state that the random vector Y has covariance matrix 2 JOINT PROBABILITY DISTRIBUTION It follows that if we consider the joint probability distribu tion describing how the entire set of elements of Y take on values together p and 2 are the features of this distribution characterizing center and spread and association77 0 p and 2 are referred to as the population mean and population covariance matrix for the population of data vectors represented by the joint probability distribution 0 The symbols p and 2 are often used generically to represent population mean and covariance as above PAGE 43 CHAPTER 3 ST 732 M DAVIDIAN CORRELATION It is informative to separate the information on spread contained in variances 732 from that describing association Thus we de ne a particular measure of association that takes into account the fact that different elements of Y may vary differently on their own The population correlation coe icient between and Yk is de ned as 039 39k 73 7k Of course 73 4 la is the population standard deviation of Y on the same scale of measurement as Yj and similarly for Yk o pjk scales the information on association in the covariance in accordance with the magnitude of variation in each random variable creating a unitless measure Thus it allows one to think of the associations among variables measured on different scales P716 PM Note that if 0719 ajak then pjk 1 lntuitively if this is true it says that the ways and Yk vary separately is identical to how they vary together so that if we know one we know the other Thus a correlation of 1 indicates that the two random variables are perfectly positively associated Similarly if ajk fay7k then pjk 71 and by the same reasoning they are perfectly negatively associated 0 Clearly pjj 1 so a random variable is perfectly positively correlated with itself o It may be shown that correlations must satisfy 71 S ij S 1 o If O39jk 0 then pjk 0 so if and Yk are independent then they have 0 correlation CORRELATION MATRIX It is customary to summarize the information on correlations in a matrix The correlation matrix P is de ned as 1 012 pin P P21 1 Pin pnl p712 1 For now we use the symbol P to denote the correlation matrix of a random vector PAGE 44 CHAPTER 3 ST 732 M DAVIDIAN ALTERNATIVE REPRESENTATION OF 00 VARIANCE MATRIX Note that knowledge of the vari ances 0f 072 and the correlation matrix P is equivalent to knowledge of 2 and vice versa It is often easier to think of associations among random variables on the unitless correlation scale than in terms of covariance thus it is often convenient to write the covariance matrix another way that presents the correlations explicitly De ne the standard deviation matrix 71 0 TlZ 0 0392 0 0 an The 12 reminds us that this is a diagonal matrix with the square roots of the variances on the diagonal Then it may be veri ed that try it TlZPTlZ 2 37 The representation 37 will prove convenient when we wish to discuss associations implied by models for longitudinal data in terms of correlations Moreover it is useful to appreciate 37 as it allows calculations involving 2 that we will see later to be implemented easily on a computer GENERAL FACTS As we will see later we will often be interested in linear combinations of the elements of a random vector Y that is functions of the form Clyl 07an which may be written succinctly as c Y where c is the column vector C1 0 Note that c Y is a scalar quantity It is possible using facts on the multiplication random variables by scalars see above and the de nitions of p and 2 to show that EcY c p varcY cEc Try to verify these PAGE 45 CHAPTER 3 ST 732 M DAVIDIAN More generally if we have a set of 1 such linear combinations de ned by vectors 01 cq we may summarize them all in a matrix whose rows are the 02 ie Then CY is a 1 X 1 random vector For example if we consider the simple linear model in matrix notation we noted earlier that if Y is the random vector consisting of the observations then the least squares estimator of B is given by fa X X 1X Y which is such a linear combination It may be shown using the above that ECY Cu varC39Y 020 Finally the results above may be generalized If A is a 1 X 1 vector then 0 ECYa Cpa o varC39Y a 02C 0 We will make extensive use of this result o It is important to recognize that there is nothing mysterious about these results 7 they merely represent a streamlined way of summarizing information on operations performed on all elements of a random vector succinctly For example the rst result on ECY a just summarizes what the expected value of several different combinations of the elements of Y is where each is shifted by a constant the corresponding element in a Operationally the results follow from applying the above de nitions and matrix operations 33 The multivariate normal distribution A fundamental theme in much of statistical methodology is that the normal probability distribution is a reasonable model for the population of possible values taken on by many random variables of interest In particular the normal distribution is often but not always a good approximation to the true probability distribution for a random variable y when the random variable is continuous Later in the course we will discuss other probability distributions that are better approximations when the random variable of interest is continuous or discrete PAGE 46 CHAPTER 3 ST 732 M DAVIDIAN If we have a random vector Y with elements that are continuous random variables then it is natural to consider the normal distribution as a probability model for each element However as we have discussed we are likely to be concerned about associations among the elements of Y Thus it does not su ice to describe each of the elements separately rather we seek a probability model that describes their joint behavior As we have noted such probability distributions are called multivariate for obvious reasons The multivariate normal distribution is the extension of the normal distribution of a single random variable to a random vector composed of elements that are each normally distributed Through its form it naturally takes into account correlation among the elements of Y moreover it gives a basis for a way of thinking about an extension of least squares that is relevant when observations are not independent but rather are correlated NORMAL PROBABILITY DENSITY Recall that for a random variable y the normal distribution has probability density function 1 y exp in 7 lay202 38 This function has the shape shown in Figure 3 The shape will vary in terms of center and spread according to the values of the population mean u and variance 72 eg recall Figure 1 Figure 3 Normal density function with mean In PAGE 47 CHAPTER 3 ST 732 M DAVIDIAN Several features are evident from the form of 38 The form of the function is determined by u and 72 Thus if we know the population mean and variance of a random variable Y and we know it is normally distributed we know everything about the probabilities associated with values of Y because we then know the function 38 completely The form of 38 depends critically on the term 7 2 WG 2 lty7tgtlta2gt1ltywgt 39 Note that this term depends on the squared deviation 3 7 u2 o The deviation is standardized by the standard deviation 7 which has the same units as 3 so that it is put on a unitless basis This standardized deviation has the interpretation of a distance measure 7 it measures how far 3 is from u and then puts the result on a unitless basis relative to the spread about 1 expected 0 Thus the normal distribution and methods such as least squares which depends on minimizing a sum of squared deviations have an intimate connection We will use this connection to motivate the interpretation of the form of multivariate normal distribution informally now Later in the course we will be more formal about this connection SIMPLE LINEAR REGRESSION For now to appreciate this form and its extension consider the method of least squares for tting a simple linear regression The same considerations apply to multiple linear regression which will be discussed later in this chapter As before at each xed value 31 wn there is a corresponding random variable Yj j 1 n which is assumed to arise from Yj o 1wj 6quot 80781 The further assumption is that are each normally distributed with means uj 80 1wj and variance 72 0 Thus each N NMj02 so that they have different means but the same variance 0 Furthermore the are assumed to be independent PAGE 48 CHAPTER 3 ST 732 M DAVIDIAN The method of least squares is to minimize in B the sum of squared deviations 2109 7 aj2 which is the same as minimizing n Y WVlt72 310 1 as 72 is just a constant Pictorially realizations of such deviations are shown in Figure 4 Figure 4 Deviations from the mean in simple linear regression IMPORTANT POINTS Each deviation gets equal weight in 310 7 all are weighted by the same constant 72 This makes sense 7 if each has the same variance then each is subject to the same magnitude of variation so the information on the population at 7 provided by is of equal quality Thus information from all is treated as equally valuable in determining B o The deviations corresponding to each observation are summed so that each contributes to 310 in its own right unrelated to the contributions of any others 310 is like an overall distance measure of values from their means aj put on a unitless basis relative to the spread expected for any PAGE 49 CHAPTER 3 ST 732 M DAVIDIAN MULTIVARIA TE NORMAL PROBABILITY DENSITY The joint probability distribution that is the extension of 38 to a n X 1 random vector Y each of whose components are normally distributed but possibly associated is given by 1mm expiltyi u 2 1y 7W2 311 fyW o 311 describes the probabilities with which the random variable Y takes on values jointly in its 71 elements 0 The form of 311 is determined by p and 2 Thus as in the univariate case if we know the mean vector and covariance matrix of a random vector Y and we know each of its elements are normally distributed then we know everything about the joint probabilities associated with values yon By analogy to 39 the form of fy depends critically on the term 1 7 u 2 1y 7 u 312 Note that this is a quadratic form so it is a scalar function of the elements of y 7 p and 2 1 Speci cally if we refer to the elements of 2 1 as 771 ie U11 H am 2 1 0711 H 07m then we may write n yiuYE lQiu ZZWWM imam 7mg 313 j1k1 Of course the elements 07 will be complicated functions of the elements 73 0719 of 2 ie the variances of the and the covariances among them This term thus depends on not only the squared deviations 313 7 M2 for each element in y which arise in the double sum when 9 k but also on the crossproducts 313 7 pjyk 7 uk Each contribution of these squares and crossproducts is being standardized somehow by values 07 that somehow involve the variances and covariances Thus although it is quite complicated one gets the suspicion that 313 has an interpretation albeit more complex as a distance measure just as in the univariate case PAGE 50 CHAPTER 3 ST 732 M DAVIDIAN BIVARIATE NORMAL DISTRIBUTION To gain insight into this suspicion and to get a better understanding of the multivariate distribution it is instructive to consider the special case n 2 the simplest example of a multivariate normal distribution hence the name bivariate Here 2 Y1 7 M1 2 i lt71 012 7 H i 7 i 2 Y2 M2 712 lt72 Y Using the inversion formula for a 2 X 2 matrix given in Chapter 2 2 1 1 02 012 2 7202 7 72 2 1 2 12 703912 0391 We also have that the correlation between Y1 and Y2 is given by 03912 12 gt p 7102 Using these results it is an algebraic exercise to show that try it y 7 M271 7 u 171p2 31M12 32 2M22 7 q 311M132 312 314 12 1 2 Compare this expression to the general one 313 Inspection of 314 shows that the quadratic form involves two components 0 The sum of standardized squared deviations 31 02 32 M2 71 7 2 This sum alone is in the spirit of the sum of squared deviations in least squares with the difference that each deviation is now weighted in accordance with its variance This makes sense 7 because the variances of Y1 and Y2 differ information on the population of Y1 values is of different quality than that on the population of Y2 values If variance is large the quality of information is poorer thus the larger the variance the smaller the weight so that information of higher quality receives more weight in the overall measure lndeed then this is like a distance measure where each contribution receives an appropriate weight PAGE 51 CHAPTER 3 ST 732 M DAVIDIAN o In addition there is an extra77 term that makes 314 have a different form than just a sum of weighted squared deviations 2p12 31 1 32 M2 lt71 02 This term depends on the crossproduct where each deviation is again weighted in accordance with its variance This term modi es the distance measure in a way that is connected with the association between Y1 and Y2 through their crossproduct and their correlation p12 Note that the larger this correlation in magnitude either positive or negative the more we modify the usual sum of squared deviations 0 Note that the entire quadratic form also involves the multiplicative factor 1 1 7 pgz which is greater than 1 if p12 gt O This factor scales the overall distance measure in accordance with the magnitude of the association INTERPRETATION Based on the above observations we have the following practical interpretation of 314 o 314 is an overall measure of distance of the value y of Y from its mean p It contains the usual distance measure a sum of appropriately weighted squared deviations 0 However if Y1 and Y2 are positively correlated p12 gt 0 it is likely that the crossproduct Y1 7 p1Y2 7 M2 is positive The measure of distance is thus reduced we subtract off a positive quantity This makes sense 7 if Y1 and Y2 are positively correlated knowing one tells us a lot about the other Thus we won t have to travel as far77 to get from Y1 to M1 and Y2 to 2 Similarly if Y1 and Y2 are negatively correlated p12 lt 0 it is likely that the crossproduct Y1 7 p1Y2 7 M2 is negative The measure of distance is again reduced we subtract off a positive quantity Again if Y1 and Y2 are negatively correlated knowing one still tells us a lot about the other in the opposite direction Note that if p12 0 which says that there is no association between values taken on by Y1 and Y2 then the usual distance measure is not modi ed 7 there is nothing to be gained77 in traveling from Y1 to M1 by knowing Y2 and vice versa PAGE 52 CHAPTER 3 ST 732 M DAVIDIAN This interpretation may be more greatly appreciated by examining pictures of the bivariate normal density for different values of the correlation P12 Note that the density is now an entire surface in 3 dimensions rather than just a curve in the plane because account is taken of all possible pairs of values of Y1 and Y2 Figure 5 shows a the bivariate density function with M1 40 p2 40 a 5 722 5 for P12 08 and p12 00 Figure 5 Bivariate normal distributions with di erent correlations p12 08 o The two panels in each row are the surface and a bird s eye view for the 2 p12 values 0 For p12 08 a case of strong positive correlation note that the picture is tilted at a 45 degree angle and is quite narrow This re ects the implication of positive correlation 7 values of Y1 and Y2 are highly associated Thus the overall distance of a pair Y1Y2 from the center p is constrained by this association 0 For p12 0 Y1 and Y2 are not at all associated Note now that the picture is not tilted 7 for a given value of Y1 Y2 can be anything within the relevant range of values for each The overall distance of a pair Y1 Y2 from the center p is not constrained by anything PAGE 53 CHAPTER 3 ST 732 M DAVIDIAN INDEPENDENCE Note that if Y1 and Y2 are independent then p12 0 In this case the second term in the exponent of 314 disappears and the entire quadratic form reduces to 31 7 02 32 7 WV 71 72 r This is just the usual sum of weighted squared deviations EXTENSION As you can imagine these same concepts carry over to higher dimensions 71 gt 2 in an analogous fashion although the mechanics are more di icult the ideas and implications are the same In general the quadratic form y 7 p 2 1y 7 p is a distance measure taking into account associations among the elements of Y Y1 Yn in the sense described above When the are all mutually independent the quadratic form will reduce to a weighted sum of squared deviations as observed in particular for the bivariate case It is actually possible to see this directly If are independent then all the correlations pjk 0 as are the covariances 07k and it follows that 2 is a diagonal matrix Thus if a 0 0 2 0 0 73 then 10 0 O 2 1 7 O 0 103 so that verify n y 7 u 2 1y 7 u 7 m 7 laylt7 Note also that as 2 is diagonal we have IEI Ufa 73 Thus fy becomes fygt expem 7 lam209 a a exPi Qn 7 tony2 315 f y reduces to the product of individual normal densities This is a de ning characteristic of statistical independence thus we see that if Y1 Yn are each normally distributed and uncorrelated they are independent Of course this independence assumption forms the basis for the usual method of least squares PAGE 54 CHAPTER 3 ST 732 M DAVIDIAN SIMPLE LINEAR REGRESSION CONTINUED We now apply the above concepts to extension of usual least squares We have seen that estimation of B is based on minimizing an appropriate distance measure For classical least squares under the assumptions of i constant variance ii independence the distance measure to be minimized is a sum of squared deviations where each receives the same wei ght 0 Consider relaxation of i ie suppose we believe that Y1 Yn were each normally distributed and uncorrelated which implies independent or totally unrelated but that varY is not the same at each 37 This situation is represented pictorially in Figure 6 Figure 6 Simple linear regression with nonconstant variance m Under these conditions we believe that the joint probability density of Y is given by 315 so we would want to obtain the estimator for B that minimizes the overall distance measure associated with this the one that takes the fact that there are different variances and hence different quality of information at each wj ie the weighted sum of squared deviations M Y flaylt7 1 3 Estimation of B in linear regression based on minimization of this distance measure is often called weighted least squares for obvious reasons PAGE 55 CHAPTER 3 ST 732 M DAVIDIAN Note that to actually carry this out in practice we would need to know the values of each 73 which is unnecessary when all the 732 are the same We will take up this issue later Consider relaxation both of and ii we believe that Y1 Yn are each normally distributed but correlated with possibly different variances at each 7 In this case we believe that y follows a general multivariate normal distribution Thus we would want to base estimation of B on the overall distance measure associated with this probability density which takes both these features into account ie we would minimize the quadratic form 71 Y 7 WE Y 7 It Estimation of B in linear regression based on such a general distance measure is also sometimes called weighted least squares where it is understood that the weighting also involves infor mation on correlations through terms involving crossproducts Again to carry this out in practice we would need to know the entire matrix 2 more later NOTATION In general we will use the following notation If Y is a n X 1 random vector with a multivariate normal distribution with mean vector p and covariance matrix 2 we will write this as Y NNnu2 o The subscript n reminds us that the distribution is n variate 0 We may at times omit the subscript in places where the dimension is obvious PROPERTIES o If Y N Nnp2 then if we have a linear combination of Y CY where C is 1 X n then CY N anupzo o If also Z N Nn rP and is independent of Y then Z Y N Nnu 72 P as long as 2 and P are nonsingular 0 We will use these two facts alone and together PAGE 56 CHAPTER 3 ST 732 M DAVIDIAN 34 Multiple linear regression So far we have illustrated the usefulness of matrix notation and some key points in the context of the problem of simple linear regression which we have referred to informally throughout our discussion Now that we have discussed the multivariate normal distribution it is worthwhile to review formally the usual multiple linear regression model of which the simple linear regression model is a special case and summarize what we have discussed from the broader perspective we have developed in terms of this model in one place This will prove useful later when we consider more complex models for longitudinal data SITUATION The situation of the general multiple linear regression model is as follows 0 We have responses Y1 Yn the jth of which is to be taken at a setting of k covariates also called predictors or independent variables wjlwjg wjk For example an experiment may be conducted involving 71 men Each man spends 30 minutes walking on a treadmill and at the end of this period Y his oxygen intake rate mlkgmin is measured Also recorded are 31 age years avg weight kg avg heart rate while resting beatsmin and 34 oxygen rate while resting mlkgmin Thus for the jth man we have response Y oxygen intake rate after 30 min and his covariate values cry1 wj4 The objective is to develop a statistical model that represents oxygen intake rate after 30 minutes on the treadmill as a function of the covariates One possible use for the model may be to get a sense of how oxygen rates after 30 minutes might be for men with certain baseline characteristics age weight resting physiology in order to develop guidelines for an exercise program A standard model under such conditions is to assume that each covariate affects the response in a linear fashion Speci cally if there are k covariates k 4 above then we assume Y7 o 1wjl kwjk 677 My o 1w71 kjkgt 316 Here Ej is a random deviation with mean 0 and variance 72 that characterizes how the observations on deviate from the mean value 1 due to the aggregate effects of relevant sources of variation PAGE 57 CHAPTER 3 ST 732 M DAVIDIAN o More formally under this model we believe that there is a population of all possible values that could be seen for in the case of our example men with the particular covariate values 5331 wjk This population is thought to have mean uj given above Ej re ects how such an observation might deviate from this mean The model itself has a particular interpretation It says that if the value of one of the covariates 5 say is increased by one unit then the value of the mean increases by the amount k o The usual assumption is that at any setting of the covariates the population of possible values is well represented by a normal distribution with mean W and variance 72 Note that the variance 72 is the same regardless of the covariate setting More formally we may state this as Ej AAA002 or equivalently N NW02 0 Furthermore it is usually assumed that the are independent This would certainly make sense in our example 7 we would expect that if the men were completely unrelated chosen at random from the population of all men of interest then there should be no reason to expect that the response observed for any one man would have anything to do with that observed for another 0 The model is usually represented in matrix terms letting the row vector 15371 wjk the model is written Yja ej YX e With Y Y1Yn E 61 Eny t 1 7011 001k 1 X 7 5 P X 17 1 711 w wnk k where p k 1 is the dimension of B so that the n X 13 design matrix X has rows PAGE 58 CHAPTER 3 ST 732 M DAVIDIAN 0 Thus thinking of the data as the random vector Y we may summarize the assumptions of normality independence and constant variance succinctly We may think of Y n X 1 as having a multivariate normal distribution with mean X B Because the elements of Y are assumed independent all covariances among the are 0 and the covariance matrix of Y is diagonal Moreover with constant variance 72 the variance is the same for each Thus the covariance matrix is given by 72 0 0 0 lt72 0 721 0 0 72 where I is a n X 71 identity matrix We thus may write Y N NnX a21 Note that the simple linear regression model is a special case of this with k 1 The only real difference is in the complexity of the assumed model for the mean of the population of values for the general multiple linear regression model this depends on k covariates The simple linear regression case is instructive because we are able to depict things graphically with ease for example we may plot the relationship in a simple w y plane For the general model this is not possible but in principle the issues are the same LEAST SQUARES ESTIMATION The goal of an analysis of data of this form under assumption of the multiple linear regression model 316 is to estimate the regression parameter B using the data in order to characterize the relationship Under the usual assumptions discussed above ie o and equivalently Ej are normally distributed with variance 72 for all j o and equivalently 53 are independent the usual estimator for B is found by minimizing the sum of squared deviations Y7 7 o 7 007131 7 7 wjp k M w H PAGE 59 CHAPTER 3 ST 732 M DAVIDIAN ln matrix terms the sum of squared deviations may be written Y 7 Xaw 7 m 317 In these terms the sum of squared deviations may be seen to be just a quadratic form 0 Note that we may write these equivalently as M Y7 7 t 7 007131 7 7 wjk k2lt727 w H Y 7 X IY 7 X 02 because 72 does not involve B we may equally well talk about minimizing these quantities Of course as we have previously discussed this shows that all observations are getting equal weight77 in determining B which is sensible if we believe that the populations of all values of Y at any covariate setting are equally variable same 72 We now see that we are minimizing the distance measure associated with a multivariate normal distribution where all of the are mutually independent with the same variance all covariancescorrelations O Minimizing 317 means that we are trying to nd the value of B that minimizes the distance between responses and the means by doing so we are attributing as much of the overall differences among the that we have seen to the fact that they arise from different settings of 117 and as little as possible to random variation Because the quadratic form 317 is just a scalar function of the 13 elements of B it is possible to use calculus to determine that values of these 13 elements that minimize the quadratic form Formally one would take the derivatives of 317 with respect to each of lm81 k and set these 13 expressions equal to zero These 13 expressions represent a system of equations that may be solved to obtain the solution the estimator PAGE 60 CHAPTER 3 ST 732 M DAVIDIAN o The set of 1 simultaneous equations that arise from taking derivatives of 317 expressed in matrix notation is 72X Y 2X X 0 or X Y X X We wish to solve for B Note that X X is a square matrix p X p and X y is a p X 1 vector Recall in Chapter 2 we saw how to solve a set of simultaneous equations like this thus we may invoke that procedure to solve X Y X X as long as the inverse of X X exists Assuming this is the case from Chapter 2 we know that X X will be of full rank rank number of rows and columns 13 if X has rank 13 We also know from Chapter 2 that if a square matrix is of full rank it is nonsingular so its inverse exists Thus assuming X is of full rank we have that XX 1 exists and we may premultiply both sides by XX 1 to obtain X X 1X Y X X 1X X Thus the least squares estimator for B is given by fa X X 1X Y 318 o Computation for general 13 is not feasible by hand of course particularly nasty is the inversion of the matrix X X Software for multiple regression analysis includes routines for inverting a matrix of any dimension thus estimation of B by least squares for a general multiple linear regression model is best carried out in this fashion PAGE 61 CHAPTER 3 ST 732 M DAVIDIAN ESTIMATION OF 72 It is often of interest to estimate 72 the assumed common variance The usual estimator is TL 32 n P 1 09 7 11932 n 7p 1Y XBWY X5 1 V o This makes intuitive sense Each squared deviation 7 2 contains information about the spread of values of at 117 As we assume that this spread is the same for all 11 a natural approach to estimating its magnitude represented by the variance 72 would be to pool this information across all n deviations Because we don t know B we replace it by the estimator 0 We will see a more formal rationale later SAMPLING DISTRIBUTION When we estimate a parameter like B or 72 that describes a popu lation by an estimator like I or 32 the estimator is some function of the responses Y here Thus the quality of the estimator ie how reliable it is depends on the variation inherent in the responses and how much data on the responses we have o If we consider every possible set of data we might have ended up with of size 71 each one of these would give rise to a value of the estimator We may think then of the population of all possible values of the estimator we might have ended up with We would hope that the mean of this population would be equal to the true value of the parameter we are trying to estimate This property is called unbiasedness We would also hope that the variability in this population isn t too large o If the values vary a lot across all possible data sets then the estimator is not very reliable Indeed we ended up with a particular data set which yielded a particular estimate however had we ended up with another data set we might have ended up with quite a different estimate o If on the other hand these values vary little across all possible data sets then the estimator is reliable Had we ended up with another set of data we would have ended up with an estimate that is quite similar to the one we have Thus it is of interest to characterize the population of all possible values of an estimator Because the estimator depends on the response the properties of this population will depend on those of Y More formally we may think of the probability distribution of the estimator describing how it takes on all its possible values This probability distribution will be connected with that of the Y PAGE 62 CHAPTER 3 ST 732 M DAVIDIAN A probability distribution that characterizes the population of all possible values of an estimator is called a sampling distribution To understand the nature of the sampling distribution of B we thus consider the probability distribution of fa X X71X Y 319 which is a linear combination of the elements of Y We may thus apply earlier facts to derive mathematically the sampling distribution 0 We may determine the mean of this distribution by applying the expectation operator to the expression 319 this represents averaging across all possible values of the expression which follow from all possible values of Y Now Y N NMXB 721 under the usual assumptions thus EY X Thus using the results in section 32 Ea EX X 1X Y X X 1X EY X X 1X X 3 showing that B under our assumptions is an unbiased estimator of B 0 We may also determine the variance of this distribution Formally this would mean applying the expectation operator to X XYIX Y 7 X X 1X Y 7 l ie nding the covariance matrix of 319 Rather than doing this directly it is simpler to exploit the results in section 32 which yield varX X 1X Y X X 1X varYX X 1X X X 1X 021XX X 1 xX Note that the variability of the population of all possible values of B depends directly on 72 the variation in the response It also depends on n the sample size because X is of dimension n X p o In fact we may say more 7 because under our assumptions Y has a multivariate normal distribu tion it follows that the probability distribution of all possible values of B is multivariate normal with this mean and covariance matrix ie B NNpWJWX X l PAGE 63 CHAPTER 3 ST 732 M DAVIDIAN This result is used to obtain estimated standard errors for the components of B ie estimates of the standard deviation of the sampling distributions of each component of o In practice 72 is unknown thus it is replaced with the estimate 32 o The estimated standard error of the kth element of B is then the square root of the kth diagonal element of 32XX 1 It is also possible to derive a sampling distribution for 32 For now we will note that it is possible to show that 32 is an unbiased estimator of 72 That is it may be shown that En 7p 1Y 7 XBWY 7 X5 0239 This may be shown by the following steps 0 First it may be demonstrated that try it Y 7 XB Y 7 X3 Y Y 7 Y XB 7 B X Y B X XB Y I 7 XX X 1X Y We have just expressed the original quadratic form in a different way which is still a quadratic form 0 Fact It may be shown that if Y is any random vector with mean u and covariance matrix 2 that for any square matrix A EYAY trA2 MA Applying this to our problem we have u X 2 721 and A I 7 XX X 1X Thus using results in Chapter 2 EY 7 XB Y 7 X3 trI 7 XX X 1X 021 X I 7 XX X 1X X 02trI 7 XX X 1X X I 7 XX X 1X X Thus to nd EY 7 XBWY 7 X3 we must evaluate each term PAGE 64 CHAPTER 3 ST 732 M DAVIDIAN 0 We also have If X is any n X 13 matrix of full rank writing Iq to emphasize the dimension of the identity matrix of dimension 1 then trXX X 1X trX X 1X X trap 13 so that m1 7 XX X 1X trIn 7 trap n 7 p Furthermore I 7 XX X 1X X X 7 XX X 1X X X 7 X 0 Applying these to the above expression we obtain m 7 XBW 7 Xfagt 7 02m 7p o 7 0 7pgt Thus we have 7 p 1Y 7 XB Y 7 72 as desired EXTENSION The discussion above focused on the usual multiple linear regression model where it is assumed that Y N NnX a21 In some situations although it may be reasonable to think that the population of possible values of at 11 might be normally distributed the assumptions of constant variance and independence may not be realistic o For example recall the treadmill example where was oxygen intake rate after 20 minutes on the treadmill for man 9 with covariates age weight baseline characteristics 117 Now each was measured on a different man so the assumption of independence among the seems realistic 0 However the assumption of constant variance may be suspect Young men in their 20s will all tend to be relatively t simply by virtue of their age so we might expect their rates of oxygen intake to not vary too much Older men in their 50s and beyond on the other hand might be quite variable in their tness 7 some may have exercised regularly while others may be quite sedentary Thus we might expect oxygen intake rates for older men to be more variable than for younger men More formally we might expect the distributions of possible values of at different settings of 11 to exhibit different variances as the ages of men differ PAGE 65 CHAPTER 3 ST 732 M DAVIDIAN 0 Recall the pine seedling example Suppose the seedling is planted and its height is measured on each of 71 consecutive days Here would be the height measured at time at say where wj is the time measured in days from planting We might model the mean of as a function of 7 eg Yj o 100j 677 a quadratic function of time After 71 days we have the vector Y As discussed earlier however it may not be realistic to think that the elements of Y are all mutually independent In fact we do not expect the height to follow the smooth quadratic trend rather it uctuates about it eg the seedling may undergo growth spurts or dormant periods along the way Thus we would expect to see a large value of Y on one day followed by a large value the next day Thus the elements of covary are correlated In these situations we still wish to consider a multiple linear regression model however the standard assumptions do not apply More formally we may still believe that each follows a normal distribution so that Y is multivariate normal but the assumption that varY 721 for some constant 72 is no longer relevant Rather we think that varY 2 for some covariance matrix 2 that summarizes the variances of each and the covariances thought to exist among them Under these conditions we would rather assume Y N MAX3 2 Clearly the usual method of least squares discussed above is inappropriate for estimating B it mini mizes an inappropriate distance criterion WEIGHTED LEAST SQUARES The appropriate distance condition is Y 7 X 2 1Y 7 X 320 Ideally we would rather estimate B by minimizing 320 because it takes appropriate account of the possibly different variances and the covariances among elements of Y o In the constant varianceindependence situation recall that 72 the assumed common variance is not involved in estimation of B PAGE 66 CHAPTER 3 ST 732 M DAVIDIAN o In addition if 72 is unknown as is usually the case in practice we saw that an intuitively appealing unbiased estimator 32 may be derived which is based on pooling information on the common 72 Here however with possibly different variances for different Y7 and different covariances among different pairs YjYk things seem much more dif cult As we will see momentarily estimation of B by minimizing 320 will now involve 2 which further complicates matters We will delay discussion of the issue of how to estimate 2 in the event that it is unknown until we talk about longitudinal data from several individuals later For now assume that 2 is known which is clearly unrealistic in practice to gain insight into the principle of minimizing 320 o Analogous to the simpler case of constant varianceindependence to determine the value B that minimizes 320 one may use calculus to derive a set ofp simultaneous equations to solve which turn out to be 72X 2 1Y 2X 2 1X 0 which leads to the solution fa X E lX 1X 2 1Y 321 B in 321 is often called the weighted least squares estimator 0 Note that B is still a linear function of the elements of Y 0 Thus it is straightforward to derive its sampling distribution B is unbiased as EBX 2 1X 1X 2 1X 3 varB X E lX 1X 2 122 1XX 2 1X 1 X E lX 1 0 Furthermore because Y is multivariate normal we have B NM X24204 0 Thus if we knew 2 we would be able to construct estimated standard errors for elements of B etc The notion of weighted least squares will play a major role in our subsequent development of methods for longitudinal data We will revisit it and tackle the issue of how to estimate 2 later PAGE 67 CHAPTER 1 ST 732 M DAVIDIAN 1 Introduction and Motivation 11 Purpose of this course OBJECTIVE The goal of this course is to provide an overview of statistical models and methods that are useful in the analysis of longitudinal data that is data in the form of repeated measurements on the same unit human plant plot sample etc over time Data are routinely collected in this fashion in a broad range of applications including agriculture and the life sciences medical and public health research and physical science and engineering For example 0 ln agriculture a measure of growth may be taken on the same plot weekly over the growing season Plots are assigned to different treatments at the start of the season 0 In a medical study a measure of viral load roughly amount of HIV virus present in the body may be taken at monthly intervals on patients with HIV infection Patients are assigned to take different treatments at the start of the study Note that a de ning characteristic of these examples is that the same response is measured repeatedly on each unit ie viral load is measured again and again on the same subject This particular type of data structure will be the focus of this course The scienti c questions of interest often involve not only the usual kinds of questions such as how the mean response differs across treatments but also how the change in mean response over time differs and other issues concerning the relationship between response and time Thus it is necessary to represent the situation in terms of a statistical model that acknowledges the way in which the data were collected in order to address these questions Complementing the models specialized methods of analysis are required In this course we will study ways to model these data and we will explore both classical and more recent approaches to analyzing them Interest in the best ways to represent and interpret longitudinal data has grown tremendously in recent years and a number of new powerful statistical techniques have been developed We will discuss these techniques in some detail PAGE 1 CHAPTER 1 ST 732 M DAVIDIAN TERMINOLOGY Although the term longitudinal naturally suggests that data are collected over time the models and methods we will discuss are more broadly applicable to any kind of repeated measurement data That is although repeated measurement most often takes place over time this is not the only way that measurements may be taken repeatedly on the same unit For example 0 The units may be human subjects For each subject reduction in diastolic blood pressure is measured on several occasions each occasion involving administration of a different dose of an anti hypertensive medication Thus the subject is measured repeatedly over dose The units may be trees in a forest For each tree measurements of the diameter of the tree are made at several different points along the trunk of the tree Thus the tree is measured repeatedly over positions along the trunk o The units may be pregnant female rats Each rat gives birth to a litter of pups and the birthweight of each pup is recorded Thus the rat is measured repeatedly over each of her pups The third example is a bit different from the other two in that there is no natural order to the repeated measurements Thus the methods will apply more broadly than the strict de nition of the term longitudinal data indicates 7 the term will mean to us data in the form of repeated measurements that may well be over time but may also be over some other set of conditions Because time is most often the condition of measurement however many of our examples will indeed involve repeated measurement over time We will use the term response to denote the measurement of interest Because units are often human or animal subjects we use the terms unit individual and subject interchangeably 12 Examples To put things into rmer perspective we consider several real datasets from a variety of applications These will not only provide us with concrete examples of longitudinal data situations but will also serve to illustrate the range of ways that data may be collected and the types of measurements that may be of interest PAGE 2 CHAPTER 1 ST 732 M DAVIDIAN EXAMPLE 1 The orthodontic study data of Potthoff and Roy 1964 A study was conducted involving 27 children 16 boys and 11 girls On each child the distance mm from the center of the pituitary to the pterygomaXillary ssure was made at ages 8 10 12 and 14 years of age In Figure 1 the distance measurements are plotted against age for each child The plotting symbols denote girls 0 and boys 1 and the trajectory for each child is connected by a solid line so that individual child patterns may be seen Figure 1 Orthodontic distance measurements for 27 children over ages 8 10 12 14 The plotting symbols are 0 s for girls 1 s for boys Dental Study Data O 039 E gamp 3 8 c 2 0 m a W O N 8 9 10 11 12 13 14 age years Plots like Figure 1 are often called spaghetti plots for obvious reasons The objectives of the study were to 0 Determine whether distances over time are larger for boys than for girls 0 Determine whether the rate of change of distance over time is similar for boys and girls PAGE 3 CHAPTER 1 ST 732 M DAVIDIAN Several features are notable from the plot of the data o It appears that each child has hisher own trajectory of distance as a function of age For any given child the trajectory looks roughly like a straight line with some uctuations But from child to child features of the trajectory eg its steepness vary Thus the trajectories are all of similar form but vary in their speci c characteristics among children Note the one unusual boy whose pattern uctuates more profoundly than those of the other children and the one girl who is much lower than the others The overall trend is for the distance measurement to increase with age The trajectories for some children exhibit strict increase with age while others show some intermittent decreases but still with an overall increasing trend across the entire 6 year period 0 The distance trajectories for boys seem for the most part to be higher than those for girls 7 most of the boy pro les involve larger distance measurements than those for girls However this is not uniformly true some girls have larger distance measurements than boys at some of the ages Although boys seems to have larger distance measurements the rate of change of the measure ments with increasing age seems similar More precisely the slope of the increasing approximate straight line relationship with age seems roughly similar for boys and girls However for any individual boy or girl the rate of change slope may be steeper or shallower than the evident typical rate of change To address the questions of interest it is clear that some formal way of representing the fact that each child has an individual speci c trajectory is needed Within such a representation a formal way of stating the questions is required PAGE 4 CHAPTER 1 ST 732 M DAVIDIAN EXAMPLE 2 Vitamin E diet supplement and growth of guinea pigs The following data are reported by Crowder and Hand 1990 p 27 The study concerned the effect of a vitamin E diet supplement on the growth of guinea pigs 15 guinea pigs were all given a growth inhibiting substance at the beginning of week 1 of the study time 0 prior to the rst measurement and body weight was measured at the ends of weeks 1 3 and 4 At the beginning of week 5 the pigs were randomized into 3 groups of 5 and vitamin E therapy was started One group received zero dose of vitamin E another received a low dose and the third received a high dose The body weight g of each guinea pig was measured at the end of weeks 5 6 and 7 In Figure 2 the data for the three dose groups are plotted on three separate graphs the plotting symbol is the ID number 1715 for each guinea pig The plotting is similar to that for the dental data Figure 2 Growth of guinea pigs receiving di erent doses of vitamin E diet supplement Pigs 175 received zero dose pigs 6710 received low dose pigs 11715 received high dose Zero dose Low dose EndyWeigMg ACID 450 am 550 am BSD mm 750 EmlyWeithg ACID 450 am 550 am BSD mm 750 l l l l l l l n 2 A E E n 2 Weeks Weeks High dose Sample averages i Zemdase n LaWdase Highdase 4 EmlyWeithg ACID 450 am 550 am BSD mm 750 EmlyWeithg ACID 450 am 550 am BSD mm 750 0 Weeks Weeks The primary objective of the study was to 0 Determine whether the growth patterns differed among the three groups PAGE 5 CHAPTER 1 ST 732 M DAVIDIAN As with the dental data several features are evident o For the most part the trajectories for individual guinea pigs seem to increase overall over the study period although note pig 1 in the zero dose group Different guinea pigs in the same dose group have different trajectories some of which look like a straight line and others of which seem to have a dip at the beginning of week 5 the time at which vitamin E was added in the low and high dose groups The trajectories for the zero dose group seem somewhat lower than those in the other dose groups It is unclear whether the rate of change in body weight on average is similar or different across dose groups In fact it is not clear that the pattern for either individual pigs or on average is a straight line so the rate of change may not be constant Because vitamin E therapy was not administered until the beginning of week 5 we might expect two phases before and after vitamin E making things more complicated Again some formal framework for representing this situation and addressing the primary research question is required EXAMPLE 3 Growth of two different soybean genotypes This study was conducted by Colleen Hudak a former student in the Department of Crop Science at North Carolina State University and is reported in Davidian and Giltinan 1995 p 7 The goal was to compare the growth patterns of two soybean genotypes a commercial variety Forrest and an experimental strain Plant lntroduction 416937 Data were collected in each of three consecutive years 198871990 In each year 8 plots were planted with F 8 with P Over the course of the growing season each plot was sampled at approximate weekly intervals At each sampling time 6 plants were randomly selected from each plot leaves from these plants were mixed together and weighted and an average leaf weight per plant g was calculated In Figure 3 the data from the 8 F plots and 8 P plots for 1989 are depicted The primary objective of the study was 0 To compare the growth characteristics of the two genotypes PAGE 6 CHAPTER 1 ST 732 M DAVIDIAN Figure 3 Average leaf weightplant pro les for 8 plots planted with Forrest and 8 plots planted with P1 416937 in 1989 GenotypeF GenotypeP LanWemMFlam an DavsA evPlammg DavsA evPlammg From the gure several features are notable If we focus on the trajectory of a particular plot we see that typically the growth begins slowly with not much change over the rst 374 observation times Then growth begins increasing at a faster rate in the middle of the season Toward the end of the season growth appears to begin leveling off77 This makes sense 7 soybean plants may only grow so large so their leaf weight cannot increase without bound forever Overall then the trajectory for any one plot does not appear to have the rough form of a straight line as in the previous two examples with an apparent constant rate of change over the observation period Rather the form of the trajectory seems more complicated with almost an S type shape It is thus clear that trying to characterize differences in growth characteristics will involve more than simply comparing rate of change over the season PAGE 7 CHAPTER 1 ST 732 M DAVIDIAN In fact the investigators realized that the growth pattern would not be as simple as an apparent straight line They knew that growth would tend to level off 7 toward the end of the season thus a more precise statement of their primary objective was 0 To compare the apparent limiting77 average leaf weightplant between the 2 genotypes 0 To compare the way in which growth accelerates during the middle of the growing season 0 To compare the apparent initial average leaf weight plant From Figure 3 it seems that average leaf weightplant achieves higher limiting growth for genotype P relative to genotype F That is the leveling off77 seems to begin at lower values of the response for genotype F The two genotypes seem to start off at roughly same value It is di cult to make a simple statement about the relative rates of growth from the gure Naturally the investigators would like to be able to be more formal about these observations As it so happened weather patterns differed considerably over the three years of the experiment in 1988 conditions were unusually dry in 1989 they were unusually wet and conditions in 1990 were relatively normal Thus comparison of growth patterns across the different weather patterns as well as how the weather patterns affected the comparison of growth characteristics between genotypes was also of interest SO FAR In the three examples we have considered the measurement of interest is continuous in nature That is 0 Distance mm from the center of the pituitary to the pterygomaxillary ssure 0 Body weight g 0 Average leaf weight plant g all may in principle take on any possible value in a particular range How precisely we observe the value of the response is limited only by the precision of the measuring device we use In some situations the response of interest is not continuous rather it is discrete in nature That is the values that we may observe differ by xed amounts For de niteness we consider 2 additional examples PAGE 8 CHAPTER 1 ST 732 M DAVIDIAN EXAMPLE 4 Epileptic seizures and chemotherapy A common situation is where the measurements are in the form of counts A response in the form of a count is by nature discrete 7 counts usually take only nonnegative integer values 0 12 3 The following data were rst reported by Thall and Vail 1990 A clinical trial was conducted in which 59 people with epilepsy suffering from simple or partial seizures were assigned at random to receive either the anti epileptic drug progabide subjects 29759 or an inert substance a placebo subjects 1728 in addition to a standard chemotherapy regimen all were taking Because each individual might be prone to different rates of experiencing seizures the investigators rst tried to get a sense of this by recording the number of seizures suffered by each subject over the 8 week period prior to the start of administration of the assigned treatment It is common in such studies to record such baseline measurements so that the effect of treatment for each subject may be measured relative to how that subject behaved before treatment Following the commencement of treatment the number of seizures for each subject was counted for each of four two week consecutive periods The age of each subject at the start of the study was also recorded as it was suspected that the age of the subject might be associated with the effect of the treatment somehow The data for the rst 5 subjects in each treatment group are summarized in Table 1 Table 1 Seizure counts for 5 subjects assigned to placebo 0 and 5 subjects assigned to progabide Period Subject 1 2 3 4 Irt Baseline Age 1 5 3 3 3 0 11 31 2 3 5 3 3 0 11 30 3 2 4 0 5 0 6 25 4 4 4 1 4 0 8 36 5 7 18 9 21 0 66 22 29 11 14 9 s 1 76 18 30 8 7 9 4 1 38 32 31 0 4 3 0 1 19 20 32 3 6 1 3 1 10 30 33 2 6 7 4 1 19 18 The primary objective of the study was to 0 Determine whether progabide reduces the rate of seizures in subjects like those in the trial PAGE 9 CHAPTER 1 ST 732 M DAVIDIAN Here we have repeated measurements counts on each subject over four consecutive observation periods for each subject Obviously we would like to compare somehow the baseline seizure counts to post treatment counts where the latter are observed repeatedly over time following initiation of treatment Clearly an appropriate analysis would make the best use of this feature of the data in addressing the main objective Moreover note that some of the counts are quite small in fact for some subjects 0 seizures none were experienced in some periods For example subject 31 in the treatment group experienced only 0 3 or 4 seizures over the 4 observation periods Clearly pretending that the response is continuous would be a lousy approximation to the true nature of the data Thus it seems that methods suitable for handling continuous data problems like the rst three examples here would not be appropriate for data like these To get around this problem a common approach to handling data in the form of counts is to transform them to some other scale The motivation is to make them seem more normally distributed with constant variance and the square root transformation is used to hopefully accomplish this The desired result is that methods that are usually used to analyze continuous measurements may then be applied However the drawback of this approach is that one is no longer working with the data on the orig inal scale of measurement numbers of seizures in this case The statistical models being assumed by this approach describe square root number of seizures which is not particularly interesting nor intuitive Recently new statistical methods have been developed to allow analysis of discrete repeated measurements like counts on the original scale of measurement EXAMPLE 5 Maternal smoking and child respiratory health Another common discrete data situation is where the response is binary that is the response may take on only two possible values which usually correspond to things like 0 success or failure of a treatment to elicit a desired response 0 presence or absence of some condition Clearly it would be foolish to even try and pretend such data are approximately continuous PAGE 10 CHAPTER 1 ST 732 M DAVIDIAN The following data come from a very large public health study called the Six Cities Study which was undertaken in six small American cities to investigate a variety of public health issues The full situation is reported in Lipsitz Laird and Harrington 1992 The current study was focused on the association between maternal smoking and child respiratory health Each of 300 children was examined once a year at ages 9712 The response of interest was wheezing status a measure of the child s respiratory health which was coded as either no 0 or yes 1 where yes corresponds to respiratory problems Also recorded at each examination was a code to indicate the mother s current level of smoking 0 none 1 moderate 2 heavy The data for the rst 5 subjects are summarized in Table 12 Table 2 Data for 5 children in the Six Cities study Missing data are denoted by a Smoking at age Wheezing at age Subject City 9 10 11 12 9 10 11 12 1 Portage 2 2 1 1 1 O O O 2 Kingston 0 O O O O O O O 3 Portage 1 O O O O O 4 Portage 1 1 1 1 O O 5 Kingston 1 1 2 O O 1 The objective of an analysis of these data was to 0 Determine how the typical wheezing response pattern changes with age 0 Determine whether there is an association between maternal smoking severity and child respiratory status as measured by wheezing Note that it would be pretty pointless to plot the responses as a function of age as we did in the continuous data cases 7 here the only responses are 0 or 11 Inspection of individual subject data does suggest that there is something going on here for example note that subject 5 did not exhibit positive wheezing status until hisher mother s smoking increased in severity This highlights the fact that this situation is complex over time measured here by age of the child an important characteristic maternal smoking changes Contrast this with the previous situations where a main focus is to compare groups whose membership stays constant over time PAGE 11 CHAPTER 1 ST 732 M DAVIDIAN Thus we have repeated measurements where to further complicate matters the measurements are binary As with the count data one might rst think about trying to summarize and transform the data to allow somehow methods for continuous data to be used however this would clearly be inappropriate As we will see later in the course methods for dealing with repeated binary responses and scienti c questions like those above have been developed Another feature of these data is the fact that some measurements are missing for some subjects Speci cally although the intention was to collect data for each of the four ages this information is not available for some children and their mothers at some ages for example subject 3 has both the mother s smoking status and wheezing indicator missing at age 12 This pattern would suggest that the mother may have failed to appear with the child for this intended examination A nal note In the other examples units children guinea pigs plots patients were assigned to treatments thus these may be regarded as controlled experiments where the investigator has some control over how the factors of interest are applied to the units through randomization In contrast in this study the investigators did not decide which children would have mothers who smoke instead they could only observe smoking behavior of the mothers and wheezing status of their children That is this is an example of an observational study Because it may be impossible or unethical to randomize subjects to potentially hazardous circumstances studies of issues in public health and the social sciences are often observational As in many observational studies an additional di iculty is the fact that the thing of interest in this case maternal smoking also changes with the response over time This leads to complicated issues of interpretation in statistical modeling that are a matter of some debate We will discuss these issues in our subsequent development SUMMARY These ve examples illustrate the broad range of applications where data in the form of repeated measurements may arise The response of interest may be continuous or discrete The questions of interest may be focused on very speci c features of the trajectories eg limiting growth or may involve vague questions about the form of the typical trajectory 13 Statistical models for longitudinal data In this course we will discuss a number of approaches for modeling data like those in the examples and describe different statistical methods for addressing questions of scienti c interest within the context of these models PAGE 12 CHAPTER 1 ST 732 M DAVIDIAN STATISTICAL MODELS A statistical model is a formal representation of the way in which data are thought to arise and the features of the model dictate how questions of interest may be stated unambiguously and how the data should be manipulated and interpreted to address the questions Different models embody different assumptions about how the data arise thus the extent to which valid conclusions may be drawn from a particular model rests on how relevant its assumptions are to the situation at hand Thus to appreciate the basis for techniques for data analysis and use them appropriately one must refer to and understand the associated statistical models This connection is especially critical in the context of longitudinal data as we will see Formally a statistical model uses probability distributions to describe the mechanism believed to generate the data That is responses are represented by a random variables whose probability distributions are used to describe the chances that a response takes on different values How responses arise may involve many factors thus how one builds a statistical model and decides which probability distributions are relevant requires careful consideration of the features of the situation RANDOM VECTORS In order to o elucidate the assumptions made under different models and methods and make distinctions among them 0 describe the models and methods easily it is convenient to think of all responses collected on the same unit over time or other set of conditions together so that complex relationships among them may be summarized Consider the random variable ng the jth measurement taken on unit i To x ideas consider the dental study data in Figure 1 Each child was measured 4 times at ages 8 10 12 and 14 years Thus we let 9 1 4 j is indexing the number of times a child is measured To summarize the information on when these times occur we might further de ne tij the time at which the 9 measurement on unit i was taken PAGE 13 CHAPTER 1 ST 732 M DAVIDIAN Here for all children til 8 Q2 10 and so on for all children in the study Thus if we ignore gender of the children for the moment the responses for the 2th child where 2 ranges from 1 to 27 are 391 Yi4 taken at times my ti4 In fact we may summarize the measurements for the 2th child even more succinctly de ne the 4 X 1 random vector The components are random variables representing the responses that might be observed for child 2 at each time point Later we will expand this notation to include ways of representing additional information such as gender in this example The important message is that it is possible to represent the responses for the 2th child in a very streamlined and convenient way for the purposes of talking about them all together Each child 2 has its own vector of responses Yi It often makes sense to think of the data not just as individual responses 3 some from one child some from another according to the indices but rather as vectors corresponding to children the units 7 each unit has associated with it an entire vector of responses It is worth noting that this way of summarizing information is not always used in particular some of the classical methods for analyzing repeated measurements that we will discuss are usually not cast in these terms However as we will see using this uni ed way of representing the data will allow us to appreciate differences among approaches This discussion demonstrates that it will be convenient to use matrix notation to summarize longi tudinal data This is indeed the case in the literature particularly when discussing some of the newer methods Thus we will need to review elements of of matrix algebra that will be useful in describing the models and methods that we will use PROBABILITY DISTRIBUTIONS Statistical models rely on probability distributions to describe the way in which the random variables invoved in the model take on their values That is probability distributions are used to describe the chances of seeing particular values of the response of interest This same reasoning will of course be true for repeated measurements In fact acknowledging that it makes sense to think of the responses for each unit in terms of a random vector it will be necessary to consider probability models for entire vectors of several responses thought of together coming from the same unit PAGE 14 CHAPTER 1 ST 732 M DAVIDIAN NORMAL DISTRIBUTION For continuous data recall that the most common model for single observations is the normal or Gaussian distribution That is if Y is a normal random variable with mean u and variance 72 then the probabilities with which Y takes on different values 3 are described by the probability density function 1 i 2 y mexpir This function is depicted graphically in Figure 4 Recall that the area under the curve between two values represents the probability of the random variable Y taking on a value in that range Figure 4 Normal density function with mean In The assumption that data may be thought of as ending up the way they did according to the probabilities dictated by a normal distribution is a fundamental one in much of statistical methodology For example classical analysis of variance methods rely on the relevance of this assumption for conclusions ie inferences based on F ratios to be valid Classical methods for linear regression modeling also are usually motivated based on this assumption When the response is continuous the assumption of normality is often a reasonable one PAGE 15 CHAPTER 1 ST 732 M DAVIDIAN MULTIVARIATE NORMAL DISTRIBUTION When we have data in the form of repeated measure ments we have already noted that it is convenient to think of the data from a particular unit 2 as a vector of individual responses one vector from each unit We will be much more formal later for now consider that these vectors may be thought of as unrelated across individuals 7 how the measurements for one child turn out over time has nothing to do with how they turn out for another child However if we focus on a particular child the measurements on that child will de nitely be related to one another For example in Figure 1 the boy with the highest pro le starts out high at age 8 and continues to be high over the entire period Thus we would like some way of not only characterizing the probabilities with which a child has a certain response at a certain age but of characterizing how responses on the same child are related When the response is continuous and the assumption of normality seems reasonable we will thus need to discuss the extension of the idea of the normal distribution from a model just for probabilities associated with a single random variable representing a response at one time to a model of the joint probabilities for several responses together in a random vector This of course includes how the responses are related The multivariate normal distribution is the extended probability model for this situation Because many popular methods for the analysis of longitudinal data are based on the assumption of normally distributed responses we will discuss the multivariate normal distribution and its properties in some detail NORMAL CONTINUOUS RESPONSE Armed with our understanding of matrix notation and al gebra and the multivariate normal distribution we will study methods for the analysis of continuous longitudinal data in the rst part of the course that are appropriate when the multivariate normal distribution is a reasonable probability model PAGE 16 CHAPTER 1 ST 732 M DAVIDIAN DISCRETE RESPONSE Of course the normal distribution is appropriate when the response of interest is continuous so although the assumption of normality may be suitable in this case it may not be when the data are in the form of small counts as in the seizure example This assumption is certainly not reasonable for binary data As discussed above a common approach has been to try to transform data to make them approximately normal77 on the transformed scale however this has some disadvantages In the early 1980 s there began an explosion of research into ways to analyze discrete responses that did not require data transformation to induce approximate normality These methods were based on more realistic probability models the Poisson distribution as a model for count data and the Bernoulli binomial distribution as a model for binary data For regression type problems where a single response is measured on each unit the usual classical linear regression methods were extended to allow the assumption that these distributions rather than the normal distribution are sensible probability models for the data The term generalized linear models is used to refer to the models and techniques used Starting in the late 1980 s generalized linear model methods were extended to the situation of re peated measurement data allowing one to think in terms of random vectors of responses each element of which may be thought of as Poisson or Bernoulli distributed We will study these probability distributions generalized linear models and their extension to longitudinal data NONNORMAL CONTINUOUS RESPONSE In fact although the normal distribution is by far the most popular probability model for continuous data it is not always a sensible choice As can be seen from Figure 4 the normal probability density function is symmetric saying that probabilities of seeing responses smaller or larger than the mean are the same This may not always be reasonable As we will discuss later in the course other probability models are available in this situation It turns out that the methods in the same spirit as those used for discrete response may be used to model and analyze such data PAGE 17 CHAPTER 1 ST 732 M DAVIDIAN 14 Outline of the course Given the considerations of the previous section the course will offer coverage of two main areas First methods for the analysis of continuous repeated measurements that are reasonably thought of as normally distributed will be discussed Later methods for the analysis of repeated measurements that are not reasonably thought of as normally distributed such as discrete responses are covered The course may be thought of as coming in roughly ve parts I Preliminaries D t 1 0 Introduction 0 Review of matrix algebra 0 Random vectors multivariate distributions as models for repeated measurements multivariate normal distribution review of linear regression 0 Introduction to modeling longitudinal data Classical methods 0 Classical methods for analyzing normally distributed balanced repeated measurements 7 univariate analysis of variance approaches 0 Classical methods for analyzing normally distributed balanced repeated measurements 7 multivariate analysis of variance approaches 0 Discussion of classical methods 7 drawbacks and limitations III Methods for unbalanced normally distributed data 0 General linear models for longitudinal data models for correlation 0 Random coe icient models for continuous normally distributed repeated measurements 0 Linear mixed models for continuous normally distributed repeated measurements PAGE 18 CHAPTER 1 ST 732 M DAVIDIAN IV Methods for unbalanced nonnormally distributed data 0 Probability models for discrete and nonnormal continuous response generalized linear models 0 Models for discrete and nonnormal continuous repeated measurements 7 generalized estimating equations V Advanced topics 0 Generalized linear mixed models for discrete and nonnormal continuous repeated measurements 0 More general nonlinear mixed models for all kinds of repeated measurements 0 Issues associated with missing data Throughout we will devote considerable time to the use of standard statistical software to implement the methods In particular we will focus on the use of the SAS Statistical Analysis System software Some familiarity with SAS such as how to read data from a le how perform simple data manipulations and basic use of simple procedures such as PRDC GLM is assumed The examples in subsequent chapters are implemented using Version 82 of SAS on a SunOs operating system Features of the output and required programming statements may be somewhat different when older versions of SAS are used as some of the procedures have been modi ed In addition slight numerical differences arise when the same programs are run on other platforms The user should consult the documentation for hisher version of SAS for possible differences Plots in the gures are made with R and Splus Making similar plots with SAS is not demonstrated in these notes as it is assumed the user will wish to use hisher own favorite plotting software It is important to stress that there are numerous approaches to the modeling and analysis of longitudinal data and there is no strictly right or wrong way It is true however that some approaches are more exible than others imposing less restrictions on the nature of the data and allowing questions of scienti c interest to be addressed more directly We will note how various approaches compare as we proceed Throughout we adopt a standard convention We often use upper case letters eg Y and Y to denote random variables and vectors most often those corresponding to the response of interest We use lower case letters eg y and y when we wish to refer to actual data values ie realizations of the random variable or vector PAGE 19 CHAPTER 10 ST 732 M DAVIDIAN 10 Linear mixed effects models for multivariate normal data 101 Introduction Random coe icient models where we develop an overall statistical model by thinking rst about indi vidual trajectories in a subjectspeci c fashion are a special case of a more general model framework based on the same perspective This model framework known popularly as the linear mixed effects model is still based on thinking about individual behavior rst of course However the possibilities for how this is represented and how the variation in the population is represented are broadened The result is a very exible and rich set of models for characterizing repeated measurement data The broader possibilities that are encompassed are best illustrated by examples In the next section we consider several examples that highlight some of these possibilities We then note that all of the examples as well as the random coe icient model as described in the last chapter may be written in a uni ed way Moreover the same inferential techniques of maximum likelihood and restricted maximum likelihood are also applicable As mentioned in our discussion of random coe icient models one advantage is that the model naturally represents individual trajectories in a formal way so that questions of interest about individual behavior may be considered In this chapter we will show in the context of the general linear mixed effects model framework how estimation of individual trajectories may carried out 10 2 Examples RANDOM COEFFICIENT MODEL To set the stage recall the random coe icient model where each unit is assumed to have its own inherent straight line trajectory with its own intercept and slope lm and u ie og u If furthermore units are from say 1 2 groups then the population model would be Yij m lz tij 6277 5239 5239 Ai bi bi N07D7 PAGE 363 CHAPTER 10 ST 732 M DAVIDIAN m u bO L g i 7 bi oz bli u and AZ39 is the appropriate matrix of 0 s and 1 s that picks off77 the intercept and slope for the group to which i belongs If there is only 1 1 group7 then Ai 12 for all i and lm81y o Implicit in the statement of this model is that both intercepts and slopes exhibit nonnegligible variation among units in the populations of interest This belief is represented by the 2 X 1 random effect bi 7 the intercept and slope for different units vary about the mean intercept and slope according to bi MAGNITUDES OF AMONG UNIT VARIATION For simplicity consider rst a situation with a sin gle group7 so that all oi and u in the random coe icient model are assumed to vary about a common mean intercept and slope Consider Figure 17 which depicts longitudinal data for 10 hypothetical units Figure 1 Longitudinal data where variation in slope may be negligible 120 140 100 a llll PAGE 364 CHAPTER 10 ST 732 M DAVIDIAN Note that although the pro les clearly begin at different responses at time 0 the rate of change slope of each pro le over time seems very similar across units keeping in mind that there is variation Within units making the pro les not look perfectly like straight lines The upshot is that the intercepts of the individual true straight lines de nitely appear to vary across units however the slopes do not seem to vary much at all 0 One possibility is that though impossible to tell from just a graph that the true underlying slopes are identical for all units in the population When the units are biological entities and the response something like growth this seems practically implausible However in some applications like engineering where the units may have been manufactured to change over time in an identical fashion this may not be so farfetched o A more reasonable explanation may be that relative to how the intercepts vary across units the variation among the slopes is much less making them appear to vary hardly at all It may be that the rate of change over time for this population is quite similar but not exactly identical for all units If we had reason to believe the rst possibility we might want to consider a model that re ects the fact that slopes are virtually identical across units explicitly The following second stage model would accomplish this m o 502 u r 101 In 101 note that the individual speci c slope u has no random effect associated with it This re ects formally the belief that the u do not vary in the population of units 0 Thus under this population model while the intercepts are random with an associated random effect and thus varying in the population the slopes are all equal to the xed value 81 and do not vary at all across units 0 Thus there is only a single scalar random effect 170 Consideration of a covariance matrix for the population D reduces to consideration ofjust a single variance that of 170 PAGE 365 CHAPTER 10 ST 732 M DAVIDIAN If we believed that the second possibility were likely we might still want to consider model 101 If we considered the usual random coe icient model with m o 502 u 1bii7 then for the matrix D the D117 represents the variance of bm among intercepts and D22 that of by among slopes lf D11 is nonnegligible relative to the mean intercept then this suggests that intercepts vary perceptibly If on the other hand D22 is virtually negligible relative to the size of the mean slope then this suggests that variation in slopes is almost undetectable o It is a fact of life that when this is the case the numerical algorithms used to implement tting of the model eg by ML or REML may experience serious dif culties The algorithm simply cannot pin down D22 and this makes it also have a hard time pinning down the covariance D12 Thus in situations where this is true it may be a reasonable approximation to the truth to say that for all practical purposes the variation among u slopes is negligible Although we don t necessarily believe that the slopes don t vary at all saying their variance is negligible is an approximation that is probably reasonably close enough to the truth to accept for practical purposes This assumption will allow implementation of the model to be feasible In either case we are faced with a situation that does not quite t into the random coe icient framework The individual speci c parameters no longer have all elements varying How may we represent this This is most easily seen by brute force77 We have Y j m iitz j 62739 t o 17 u r 102 Plugging the representations for og and u into the rst stage model we obtain Yij O ltij 502 62 103 PAGE 366 CHAPTER 10 ST 732 M DAVIDIAN If we think of the implication of 103 for the entire vector Yi it is straightforward to see that we may write this succinctly as Yi Xi 1502 62 where as usual 1 is a X 1 vector of 1 s and Xi is the design matrix for individual 2 1 til 1 ting Note that if we let Z 1 and bi b0 1 X 1 we may write this in the form Yi Xi Zibz39 62 104 as before 7 this looks identical to the general representation we used in the last chapter except that the de nitions of Xi and Zi we used in the single group case are now different Other than this the model has exactly the same form once we ve de ned X 239 and Z appropriately Alternatively we can do the same calculation with more fancy footwork We will illustrate this in a way that allows immediate extension to the case of more than one group to this end it is convenient to use a different symbol to represent the design matrix for individual 2 we called it X 239 above Thus write 1 Q1 0 1 tmi Furthermore note that we may write 102 as follows verify 6 Bibi bi 502 1 X 1 105 where Ai is an identity matrix and 1 0 B 2 X1 With these representations if we think of the model that says each child has his her own straight line regression model with child speci c regression parameter Bi ie Yi Ci i Q plugging 105 into this expression gives YZ39 CiBin 62 106 PAGE 367 CHAPTER 10 ST 732 M DAVIDIAN It is straightforward to verify try it that With a single group A is an identity matrix so furthermore CiAZ39 C in this case If we rename CiAZ39 Ci Xi then writing Zi 1 we have the model 104 above with these de nitions of Xi and Zi This argument extends immediately to the case of more than one group In this situation the Ai for each individual 2 are appropriate k X 1 matrices of 0 s and 1 s rather than identity matrices and must be de ned appropriately as well For the dental data k 2 and p 4 and we de ne o g 1 g 03 1 3 However the same manipulations apply the only difference is that in this case Xi CiAi is now the appropriate X 13 matrix for the group to which individual 2 belongs eg in the dental study for boys we have 1 til 0 0 1 til 0 0 1 0 Xi CiAi 0 0 0 1 1 tmi 0 0 1 tmi and similarly for girls It is straightforward to verify that with these de nitions the model implied for an observation Y2 is Yz j op l tij 502 62739 for girls OB 1Bt2 j 502 627 for boys Thus by the above we are able to write down a model that says that all boys have slope 81 3 and girls y with intercepts that vary about the respective mean intercepts ag and op RESULT This is of course the same representation we considered in the last chapter The difference between the models here and the random coe icient model is that the matrix Zi which dictates how the random effects enter the model and the bi themselves are allowed to be de ned differently to accommodate the belief that the slopes u do not vary across individuals We thus see that it is possible to consider a more general form of the random coe icient model and write it in the same form as we did previously ie in terms of matrices Xi and Zi The de nition of these matrices depends on the features we wish to represent That is the random coe icient model of Chapter 9 is a special case of a more general model where the X and Z matrices may be de ned in other ways PAGE 368 CHAPTER 10 ST 732 M DAVIDIAN To gain a further understanding of this consider another possibility OTHER OOVARIATES In some instances the question of interest may in fact involve the possible association between the values of measured covariates and rate of change of a response over time We now see that it is possible to write models appropriate for this situation in the form 104 for suitable choices of X and Zi An example arises in understanding the progression of disease in HIV infected patients assigned to follow a certain therapeutic regimen HIV attacks the immune system so HIV infected subjects often have compromised immune system characteristics A standard measure of immune status is CD4 count where lower counts indicate poorer status Now a standard measure of how well a patient is doing is viral load roughly the amount of virus present in the body and it is routine to follow viral load over time to monitor a patient s well being HIV scientists may be interested in whether the nature of viral load progression is different depending on a subject s immune system at the time of initiation of therapy To develop a formal model to address this issue suppose initially there is only one group 0 Let K be the viral load measurement taken on subject 2 at time tij usually measured in units of log copy number following start of therapy at time 0 and suppose that for any given subject the trajectory of viral load measurements over time appears to be a straight line with subjectspeci c intercept and slope ie Yz j 0 1 t j 62739 i ow312 In addition suppose that at time 0 baseline for all subjects a CD4 count measurement is available denote this measurement as ai for the 2th subject 0 In terms of the individual model then the question of interest is whether the magnitude and direction of individual rates of change ie slopes u are associated with the value of 1 We may state such an association formally as u z aw 5122 For illustration suppose that we do not believe that the intercepts which represent viral load at time 0 are associated with CD4 count this is actually unlikely but we assume it here for purposes of developing a simple model We may state this as m 1bmr PAGE 369 CHAPTER 10 ST 732 M DAVIDIAN We may write this succinctly as i 20 1 0 0 A b 7 z 752 7142 by 0 1 1239 g 0 Note that this model allows the possibility that both intercepts and slopes vary in the population of subjects However7 it states that the fact that slopes vary across individuals may in part be associated with their baseline CD4 counts 0 The question of interest in the context of this model is about the value of g if g 07 then this says that there is no association between baseline CD4 and subsequent rate of change of viral load while on this therapy 0 The model for Bi itself has the avor of a regression model77 Here7 ai is a covariate in this model It is straightforward to see that this model may be put into the form of 104 Plugging in the form of Bi into the individual model7 we see that Y j i 2t2 j 3a t j 5 b1it2 j 62 77 j 17gt r r 77122 It may be veri ed that this may be written succinctly as Yi Xi Zibi 62 where Xi 3 3 3 7 Zi 3 3 02 8a 1 tmi him 1 tin PAGE 370 CHAPTER 10 Alternatively using a matrix argument note that we may write 5239 Ai Bibiy Bi 12 and A as above Writing the rst stage individual model as Yi Ci i 62 and plugging in for Bi we obtain Yi CiAi6 CiBigtbi 62 Xi Zibi 62 where 1 0 0 X 2 CiAi 3 3 0 1 12 1 tmi 1 ting It is straightforward to see that this model could be extended to allow ST 732 M DAVIDIAN 107 aitli Mimi o More than one group by suitable rede nition of B and A2 eg with two treatment groups we could write Oi l 50 for treatment 1 4 170239 for treatment 2 u z gai bu for treatment 1 s eaz 512 for treatment 2 and de ne 1 2 3 4 5 5 and bi b0 b1 The matrices Ai would be 2 X 6 for example for subject 2 in treatment 1 100000 000 Ai 0 1 ai Then Ai Bibi with Ai and B as above and Bi I2 PAGE 371 CHAPTER 10 ST 732 M DAVIDIAN 0 Some parameters not to vary in the population as above As a hypothetical example suppose we wanted a model that expresses the belief that variation among slopes is entirely attributable to CD4 count and that none of the variation in slopes is random while variation in intercepts is random This sounds biologically questionable but we consider it for illustration With 2 groups this could be expressed as g 81 b0 for treatment 1 4 170 for treatment 2 lz z ga for treatment 1 s 5a for treatment 2 We could again write this as B A 31 with A and B as above but with b b0 and B 10 By plugging these representations into the rst stage model as in 107 we arrive at a model of the form Yi Xi Zibi 62 103 where the matrices X and Z are determined by the particular de nitions of A B and 0 RESULT It should be clear that it is possible to represent even fancier speci cations in this way Eg we could also incorporate association of the intercepts with 1 and we may have more than one covariate in the second stage population model We consider an example at the end of this chapter Once we write down the model in the form A 31 for appropriately de ned matrices A and B re ecting the features of interest we may write a model of the form 108 where the de nitions of X and Z are dictated by the form of the rst and second stage models THE SIMPLEST MODEL It is in fact the case that the general model Y X Zibi 62 includes as special cases may simple models for repeated measurements PAGE 372 CHAPTER 10 ST 732 M DAVIDIAN A particularly simple model is as follows Suppose there is only one group and for each unit we have repeated measurements Yb However suppose that these measurements are not necessarily over time eg the m units are mother rats and for the 2th mother Y2 represent birthweights of her m pups In the absence of further information a very simple model for this situation is YijU b2 j j1nm 109 The model says that the population of all possible pup weights is centered about u and allows for the possibility of 2 sources of variation among mother rats through bi some mothers have larger pups than others and within mother rats through 627 pups born to a given mother are not all identical and weights may be measured with error If we de ne Xi 1 Zi 1 and bi bi then it is straightforward to see that we may write 109 in the form of 108 It is straightforward to extend this simple model to allow different treatment groups with mean M u n for the 6th group by rede ning and X try it In fact the univariate ANOVA model of Chapter 5 can also be written in this form Recall that in Chapter 5 see page 119 we wrote this model in the form Y X lb amp Thus we see this is again a special case of the general model as above ZZ39 1 bi bi with the particular forms of X and B on page 119 SUMMARY It should be clear from these examples that it is possible to consider a wide variety of subjectspeci c models of the form YZ39 Xi Zibi 62 by suitably de ning Xi Zi and bi This model in its general form is known as the linear mixed effects model PAGE 373 CHAPTER 10 ST 732 M DAVIDIAN 103 General linear mixed effects model For convenience we summarize the form of the linear mixed effects here THE MODEL With Y a X 1 vector of responses for the 2th unit 2 1 m Yi X Z b 1010 where o X is a X 13 design matrix77 that characterizes the systematic part of the response eg depending on covariates and time o B is a p X 1 vector of parameters usually referred to as xed effects that complete the char acterization of the systematic part of the response 0 Zi is a X k design matrix77 that characterizes random variation in the response attributable to among unit sources 0 bi is a k X 1 vector of random effects that completes the characterization of amongunit variation Note that k and 13 need not be equal 0 ei is a X 1 vector of Withinunit deviations characterizing variation due to sources like within unit fluctuations and measurement error ASSUMPTIONS ON RANDOM VARIATION The model components b k X 1 and e X 1 char acterize the two sources of variation among and Within units The usual assumptions are 0 e N Nm0 Here R is a gtlt covariance matrix that characterizes variance and correla tion due to Withinunit sources see the discussion in the last chapter The most common choice is the model that says variance is the same at all time points for all units and that measurements are su iciently far apart in time that correlation if any is negligible ie R 721 As discussed in the previous chapter other models for R are also possible PAGE 374 CHAPTER 10 ST 732 M DAVIDIAN o bi N Nk0 D Here D is a k X k covariance matrix that characterizes variation due to among unit sources assumed the same for all units The dimension of D corresponds to the number of among unit random effects in the model It is possible to allow D to have a particular form or to be unstructured It is also possible to have different D matrices for different groups as we discussed in the last chapter In our discussion here we will present things under the assumption of a common D for all units regardless of group or anything else This may often be a reasonable assumption unless there is strong evidence that different conditions have a nonnegligible effect on variation as well as mean Much of what we discuss in the sequel can be extended to more complex models eg with different D matrices and fancier R matrices 0 With these assumptions we have Xi varY ZZIDZ R 2239 Yi NNniXi 72igt39 1011 That is the model with the above assumptions on ei and bi implies that the YZ39 are multivariate normal random vectors of dimension n with a particular form of covariance matrix The form of 2239 implied by the model has two distinct components the rst having to do with variation solely from amongunit sources and the second having to do with variation solely from withinunit sources SUBJECTSPECIFIC MODEL Although the forms of X2 Z2 and bi are allowed more possibil ities here than in the random coe icient model the spirit of the model is the same If we think about the general form of the model it is clear that the model is a subjectspeci c one In particular if we examine the form of the model Yi Xi Zibi 62 o If we zero in77 on unit 2 and consider this unit alone and in its own right regardless of other units the model has the form of a regression model for the data Y2 The mean part of this regression model is B Xw 21 lt X z bi PAGE 375 CHAPTER 10 ST 732 M DAVIDIAN The vector ei characterizes random variation associated with within unit sources This way of writing this part of the model highlights the fact that individual unit behavior is being charac terized by some combination of B which describes the mean for the population and bi which describes how this particular unit deviates from the population mean Thus the model may be thought of as subjectspeci c as it incorporates the behavior of the individual unit We will focus on individual behavior shortly in particular we will be more formal about the notion of the unit s own mean77 104 Inference 0n regression and covariance parameters As in the previous chapter once we note that the model implies 1011 the methods of maximum likelihood and restricted maximum likelihood may be used to estimate the parameters that char acterize the mean or systematic part of the model B and those that characterize the variation or random part of the model the distinct parameters that make up Bi and D Thus the methods and considerations discussed in the previous two chapters apply exactly as described The generalized least squares estimator for B and its large sample approximate sampling distribution will have the same form with Xi and 2239 as de ned in the model Computation of estimated standard errors Wald and likelihood ratio tests is as before The subject speci c versus population averaged interpretations of the model both apply When the data are balanced in the sense that the times of observation are all the same and the matrices Z are the same for all units then when 721 the GLS and OLS estimators yield the same numerical value As before however the estimated approximate covariance matrices of the estimators will be different that based on the OLS analysis will be incorrect because it will not take proper account of the nature of variation for the data vectors Y2 Recall that the OLS estimator just assumes that all the K are independent so that 2239 I for all The estimated covariance matrix V5 for B which does take variation into account requires estimates of the components of R and D Because we have already discussed these issues in detail in earlier chapters we do not need to do so again here See section 93 and chapter 8 for more PAGE 376 CHAPTER 10 ST 732 M DAVIDIAN 105 Best linear unbiased prediction In chapter 9 we mentioned that an objective of analysis is sometimes to characterize individual behavior As we mentioned above the linear mixed effects model which contains the random coe icient model as a special case is a subjectspeci c model in the sense that an individual s regression model is characterized as having mean Xi Zibi 0 Thus if we want to characterize individual behavior in this model we d like to estimate both and bi We could then form estimates of things like B where applicable and estimates of the mean of a single response at certain times and covariate settings for a particular individual We already know how to estimate However how do we estimate bf We have been putting the word estimate in quotes because technically b is not a xed constant like g rather it is a random effect 7 it varies across units Thus when we seek to estimate bi we seek to characterize a random not a xed quantity 7 the units were randomly chosen from the population In situations where interest focuses on characterizing a random quantity it is customary to use different terminology in order to preserve the notion that we are interested in something that varies Thus estimation of a random quantity is often called prediction to emphasize the fact we are trying to get our hands on something that is not xed and immutable but something whose value arises in a random fashion through for example the fact that units are randomly selected from the population Thus in order to characterize individual unit behavior we wish to develop a method for prediction of the bi NOT THE MEAN ln ordinary regression analysis a prediction problem arises when one wishes to get a sense of future values of the response that might be observed that is it is desired to predict future Y values that might be observed at certain covariate settings on the basis of the data at hand 0 In this case the best guess for the value of Y at a certain covariate value 110 is the mean of Y values that might be seen at 110 wg say 0 As the mean is not known because is not known the approach is to use as the prediction the estimated mean a363 where B is the estimate of B PAGE 377 CHAPTER 10 ST 732 M DAVIDIAN By analogy one s rst thought for prediction of bi would be to use the mean of the population of bi However 0 An assumption of the model is that bi N Nk0 D so that O for all 2 0 Thus following this logic we would use 0 as the prediction for bi for any unit This would lead to the same estimate for individual speci c quantities like B in a random coe icient model for all units But the whole point is that individuals are different thus this tactic does not seem sensible as it gives the same result regardless of individual Thus simply using the mean of the population of random effects bi will not provide a useful result Something that preserves the individuality of the b is needed instead Another thing to note is that this approach does not at all take advantage of the fact that we have some additional information available 7 the data Under the model we have Y Xi Zibi 62 that is the data YZ39 and the underlying random effects bi are related This suggests that there must be information about b in Y that we could exploit In particular is there some sensible function of the data YZ39 that could be used as a predictor for bi Of course this function would also be random as it is a function of the random data Yi CONDITIONAL EXPECTATION To make the discussion a little easier we will assume for the moment that b is a scalar ie k 1 The same reasoning goes through for k gt 1 Call this scalar random effect 1 For our predictor we d like something that is close to 17 If we let CYi be the function of the data we will use as the predictor then one possibility would be to say we d like to choose CY2 so that distance between CYi and bi which we can measure as 52 CY 27 is small This makes sense 7 we d like to use as a predictor something that resembles bi in some sense PAGE 378 CHAPTER 10 ST 732 M DAVIDIAN As both YZ39 and bi are random and hence vary in the population we d like the distance to be small considered over all possible values they might take on Thus it seems reasonable to consider the expectation of this distance averaging it over all possible values ie Ebz CY 2 1012 How small is small A natural way to think is that we d like the function CY2 we use to be the function that makes 1012 as small as possible that is the function CYi we d like to choose is the one that minimizes Eb 7 CY 2 across all possible functions we might choose The particular function CYi that minimizes this expected distance is called the conditional expectation of b given YZ39 The usual notation is to write the conditional expectation as EbY 1013 0 The conditional expectation is itself a random quantity it is a function of the random vector Y2 Thus do not be confused into thinking it is a xed quantity because of the notation 7 the E is being used in a different way 0 This de nition may be extended to the case where bi is a vector CONDITIONAL EXPECTATION AND MULTIVARIATE NORMALITY It turns out that when Y and b are both normally distributed it is possible to nd an explicit expression for the conditional expectation We rst discuss this in detail in a special case the simplest form of the linear mixed model given in equation 109 where b is a scalar bi Yij M bi 627 with Y K1 Ymi e 621 emiy bi N NOD and e N Nni0021 lt of course follows that K N NW D 72 verify It may be shown that under this model mD Eb2 Y2 gt mWiiwy 1014 2 where 7239 is the mean of the m Y2 values in Y2 PAGE 379 CHAPTER 10 ST 732 M DAVIDIAN 0 Note that we might equally well write Eb 7 all the information about bi is summarized in the individual unit mean 7239 This says that to nd the function of the data Y that is closest to b in the sense of minimizing 1012 all we need to know is the sample mean of the data on unit 2 this is su icient This make sense 7 if b is large positive then we d expect this to lead to a 7239 that is large larger than the mean 2 and similarly if b is small negative we d expect this to lead to a 7239 that is small smaller than the mean 2 Note further that 1014 is a linear function of the elements of Y through 7239 In addition note that the expression 1014 we d like to use as our predictor depends on 2 D and 72 which are all unknown but which we can estimate Finally note that if we were to know 2 D and lt72 and we take the expectation of the predictor that is averaging the value of the predictor across all possible values of the elements of YZ39 193 we get E Eb39Y39gt D M7 gt7 o 2 2 7 a2 2 M 7 because 2 That is the average of the predictor across all possible values of the data is 0 which is exactly equal to the expectation of bi the thing we are trying to predict This seems like a good property if we were trying to estimate a xed quantity we would call this property unbiasedness BEST LINEAR UNBIASED PREDICTOR All of these observations are re ected in the name that is often given to the predictor for bi that results from thinking about 1014 Here is the way the thinking goes In practice to actually calculate the value of the conditional expectation for 27 we would need to know 2 D and 72 but these are unknown It is thus natural to think of substituting estimates for them 0 As we have considered before rst think of the ideal situation in which we were lucky enough to know the elements of w which in this case is made up of D and 72 Our model may be written as YZ lnip lnibi 62 so that X Z 1m with 2 thus playing the role of and 2239 llegi021m DJmlt721m compound symmetry for all2 because 1W1 Jn verify 2 n2 1 PAGE 380 CHAPTER 10 ST 732 M DAVIDIAN o If to is known then 2239 is known and in this case the maximum likelihood estimator for u is the weighted least squares estimator see equation 817 which in our case 1m is m 71 m 3 Z 1211ngt 2121Y 2 1 2 71 which may be shown to lead to the result that EZNMD 02 172 ZZ AWD a271 1015 1 Try it 7 you will need to use the matrix fact that 1 D 71 EW in your calculation Note that 2 is a linear function of the data Y2 through 7239 0 Thus under these ideal conditions to calculate the predictor for practical use we would sub stitute 2 for u in the conditional expectation to arrive at MD A Y39 7 1016 mD lt2 m lt gt Note that 1016 is still a linear function of the data through 7i o It may be shown that if we calculate the variance of 1016 it is smaller than the variance of any other linear function of YZ39 we might use to predict bi That is the estimated predictor 1016 is the least variable among all predictors we might have chosen that are linear functions of the data Thus it is best in the sense that it exhibits the least variability so is most reliable as a predictor The predictor 1016 under these ideal conditions is also unbiased in the same sense described above 7 if we nd its expectation it is still equal to 0 even with 2 substituted for u try it 0 As a result the predictor 1016 is referred to as the Best Linear Unbiased Predictor for 17 The popular acronym is BLUP PAGE 381 CHAPTER 10 ST 732 M DAVIDIAN 0 Now of course in real life the elements of w are not known rather they are estimated Thus instead of the ideal WLS estimator 1015 we must use the generalized least squares estimator for 1 which has the same form as the WLS estimator but depends on 32 which is 2239 with the ML or REML estimates 5 and 32 plugged in Moreover these estimates must be plugged into the rest of the form of the predictor Thus in practice one uses as the predictor A MD A b A Y39 7 1017 2 MD 32 2 I gt where 1 is the GLS estimator A EZNMD 367139 EZNWD 3571 39 The symbol is used to denote this predictor la 0 Because we have plugged in these estimates the properties of unbiasedness and smallest vari ance no longer hold exactly However it is hoped that they hold at least approximately Thus the predictor 1017 used in practice is usually also referred to as BLUP although this is not precisely true anymore Another common term is empirical Bayes estimator for 27 which comes from another interpretation of the BLUP we will not discuss here ESTIMATION OF INDIVIDUAL MEAN Recall our earlier observation for the general model that if we zero in on a particular individual we may think of them as having their own regression model with individual speci c mean Xi Zibi In our simple model here this mean is Imp 1 17 which implies that the mean for the jth observation is MMb for all j 1 ni An important goal of predicting b is to allow us to characterize the individual speci c mean for each unit 0 We may in fact formalize this We have been saying that m u bi is the mean for individual 2 Technically M is the conditional expectation of Y2 the data for unit 2 given bi That is m is the function of bi that is closest to Y2 For the jth observation this is written m EOijlbz l Heuristically we may thus think of M as the mean of K were we lucky enough to know 17 PAGE 382 CHAPTER 10 ST 732 M DAVIDIAN We d like to predict not just 17 but m o It turns out that the conditional expectation of M given the data Y is simply M evaluated at the conditional expectation of bi given YZ39 that is we de ne EWIYZ M EUHIYZ 0 Thus it follows that the best linear unbiased predictor of m in the ideal77 case where w is known is given by A MD A Y39 7 1018 wwwzm M lt gt Here we have replaced 1 by the WLS estimate 0 For practical use we would replace 1 by the GLS estimates and D and 72 by the ML or REML estimates in 1018 This predictor of M is also commonly referred to as the BLUP or empirical Bayes estimator for m BL UP AS A WEI GH TED AVERAGE Consider again the ideal77 situation where w is known for simplicity It is possible by some simple algebra to write the BLUP for m 1018 in the alternative form D 02712 A may Yr 7 1019 where E is the WLS estimator 0 Inspection of 1019 reveals that the BLUP has an interesting interpretation as a weighted average between 7239 and 0 In particular note that 7239 may be regarded as the best guess77 for M based on the data for unit 2 only In contrast 0 is the best guess77 for the overall mean of observations averaged across all units in the population Recall that D measures variation among units while 72 measures variation Within units Fur thermore m describes the amount of information available about a particular unit Thus 72 measures the quality77 of our knowledge about unit 2 taking into account both variation due to within unit sources and how many measurements we have 0 If D is large then units vary quite a bit so that even if we know a lot about the population of units this doesn t help us too much for knowing about a particular unit If D is small then units are pretty similar so knowing a lot about the population of units helps us quite a bit for knowing about a particular unit PAGE 383 CHAPTER 10 ST 732 M DAVIDIAN 0 Thus if D is large relative to azm the information we have about unit 2 from unit i s data is more reliable than that from the population In this case note from 1019 that DD 0271 will be close to 1 while 02n D 72712 will be close to 0 Thus BLUPm m 7i This makes sense 7 the information we have about M in 7239 is better than that we have about the unit through the estimated population mean 3 On the other hand if D is small relative to azm the information we have about unit 2 from the population is better than that from unit i s data If n were very small so we have limited data on 2 to begin with this may very well be the case Here the situation is reversed 7 BLUPm m 3 This also makes sense 7 the information we have about M in 7239 is not very good so we rely on the information about the population more heavily These results show that the BLUP for M is a compromise between information from individual 2 alone and information about the whole population through all m units data This compromise weights these 2 sources of information in proportion to their quality When neither term D or 72 dominates the BLUP is a combination of both sources Thus by using BLUP to characterize individual unit means or other features it is popular to say that one borrows strength across units77 supplementing the information from unit 2 alone by information about the whole population from which 2 is assumed to arise IN GENERAL The implications of the above discussion carry over to the case of the general linear mixed effects model Yi Xi Zibi 62 where w is composed of the distinct elements of D and R Speci cally o It may be shown that the conditional expectation of bi given the data YZ39 is Eb Y Dz21Y 7 Xi o In the ideal case where w is known and B is the WLS estimator DZQ21Y7 X 1020 is the best linear unbiased predictor BLUP for bi PAGE 384 CHAPTER 10 ST 732 M DAVIDIAN o In the realistic case where w is not known one forms the approximate BLUP for bi as E ngifwrx 1021 where is as usual 2239 with the estimator for w substituted This predictor is also often referred to as the BLUP for bi or the empirical Bayes estimator for bi o The mean for individual 2 is the conditional expectation EY b Xi Zibi The BLUP for Xi Zib is found by substituting 1020 into this expression ie X zinggfoq A X 1022 where B is the WLS estimator As in the simple model the predictor 1022 has the interpretation that it may be rewritten in the form of a weighted average combining information from individual 2 only and information from the population Thus the same implications given above apply in the general model 7 the BLUP for Xi Zibi may be viewed as borrowing strength across individuals to get the best prediction for individual 2 In practice the approximate BLUP for Xi Zibi is found by substituting 3239 ie A A A A A71 A A71 A A A71 where now B is the GLS estimator This predictor is also referred to as the BLUP or empirical Bayes estimator of the individual speci c mean Xi Zibi IN PRACTICE If one is interested in characterizing individual trajectories it is standard to use the BLUPs for this purpose 0 One speci c case is that of a random coe icient model where Yz Ci i 62 52 Ai 52 For example if the stage one model is a straight line so that B Mm81 are the unitspeci c intercepts and slopes then it is often of interest to characterize m and 8 o This may be done by nding the BLUP with X CZIA and Z C and then obtaining Bi AZ B 32 where B is the GLS estimator The elements of are thus estimates of unit 2 s speci c intercept and slope PAGE 385 CHAPTER 10 ST 732 M DAVIDIAN 0 These estimates are often preferred over just carrying out individual regression ts to each unit s data separately because they borrow strength across individuals by taking advantage of the belief that the linear mixed effects model holds 106 Testing Whether a component is random We have noted that one manifestation of the linear mixed effects model is to think of the usual random coe icient model in which every unit has its own intercept slope etc but then to consider the possibility that the slopes for example do not vary across units That is we would think of slopes as being xed rather than random For de niteness consider a situation with one group Suppose that we consider a straight line model for each subject The full random coe icient model with random intercept and slope is Yij m iitz j 62739 m o 502 u i 512 b 39 D D b var DZ varb D 11 12 bu D12 D22 lf slopes do not vary across units then we have the reduced model with slopes not random given by K7 m 1 62739 m o 502 u i bi 170 varb D11 For de niteness assume in each model that varei Bi 721m These two models lead to the same speci cation for the mean of a data vector Xi with 80 ltij However they involve different overall covariance models 2239 ZZ39DZ 721m In particular the full model 2239 has the usual form with which we do not multiply out here PAGE 386 CHAPTER 10 ST 732 M DAVIDIAN In contrast under the reduced model D D11 and Z2 In so that ZZ39DZ DllJm so that D11 lt72 D11 D11 D D 0392 u D 2i 11 11 11 7 D11 D11 D11 lt72 which is a simple compound symmetric assumption Thus to address the issue of which model is more suitable one might use techniques such as information criteria to informally choose between these models Alternatively noting that we have nested models it is natural to consider conducting a formal hy pothesis test using the likelihood ratio test However there is a dif culty with this that makes the usual approach of comparing the likelihood ratio test statistic to the X2 distribution inappropriate a fact that is not often not appreciated by practitioners The reasons are rather technical here we give an intuitive description of what the issue is 0 Here varb is a 2 X 2 matrix for the full model involving two variances and a covariance varb is a scalar variance for the reduced model Thus although the models are indeed nested going from the full to reduced model requires that the variance D22 0 Moreover there is no longer the need to worry about the covariance D12 between intercepts and slopes because all slopes are the same Thus the difference in models is rather complicated so that the null hypothesis corresponding to the reduced model is complicated So it is clear that his problem seems non standard relative to the other uses of the likelihood ratio test we have seen 0 A major source of the dif culty is that this null hypothesis involves asking whether D22 in the full model is equal to 0 D22 is a variance so it cannot take on any value speci cally a variance must be 2 O by de nition Indeed the value 0 is on the edge or boundary of possible values for D22 Asking whether D22 0 corresponds to whether D22 takes its value on the boundary of the parameter space ie the set of possible values for D22 Contrast this to other situations where we have considered nested models eg if the issue is whether the kth component of B is equal to 0 say as k values can be anything the parameter space is unrestricted and thus k O is not on a boundary PAGE 387 CHAPTER 10 ST 732 M DAVIDIAN The theory that underlies the use of the likelihood ratio test breaks down when the null hypoth esis involves a boundary in this way That is as m a 00 the likelihood ratio test does not have a X2 distribution anymore Thus if one computes the likelihood ratio statistic and compares to the critical value from the xg sampling distribution D22 0 and D12 0 it turns out that the test will tend to not reject the null as often as it should leading the analyst to end up using models that are too simple It is possible to show that instead the correct sampling distribution is something called a mixture of a x distribution and a xg distribution A random variable with this distribution takes its value like a x random variable 50 of the time and like a xg distribution 50 of the time A table of critical values for such X2 mixtures is given for instance in Appendix C of Fitzmaurice Laird and Ware 2004 For a test at level 04 005 xg 095 599 while the corresponding critical value for the mixture is 514 This shows that comparing to the xg sampling distribution will not reject the null hypothesis as often as it should o It is important to realize that SAS PRDC MIXED does not have an automatic way to carry out such tests So the analyst cannot simply expect the software to know that this is an issue This same issue arises more generally For example if we are entertaining a quadratic model Yij m 12 tz j 2it 2j 62739 5239 5 bi 3 gtlt 3 with bi b0 b1 b2 and wonder whether we can do away with the quadratic term altogether the same problem occurs Here the relevant mixture can be very complicated In such complicated situations Fitzmaurice Laird and Ware 2004 recommend as an approximate ad hoc way to conduct the test at level 04 005 to calculate the likelihood ratio test statistic and compare it to the usual X2 critical value one would use if one did not know this was a problem but for 04 01 instead For more on this topic see Verbeke and Molenberghs 2000 section 634 and Fitzmaurice Laird and Ware 2004 sections 75 and 85 10 7 Timedependent covariates In our development so far we have restricted attention to covariates that do not change over time for example treatment group gender age CD4 count at baseline and so on Our interest has been focused on features like whether the way things change over time is different for different groups or is associated with baseline age CD4 etc PAGE 388 CHAPTER 10 ST 732 M DAVIDIAN In some settings information may be collected that changes over time and questions of interest may focus on the relationship between the response and this information As we now discuss this can lead to some important conceptual issues To x ideas consider a longitudinal study to investigate the relationship between a measure of respira tory health and smoking behavior Suppose that at time tij following subject i s entry into the study 393 a measure of respiratory health status is recorded along with Z27 a measure of 1 s current smoking behavior Note that of necessity such a study must be observational it would be unethical to assign subjects to different patterns of smoking 0 Note that we use uppercase Zij to refer to smoking at time tij This is to emphasize the fact that smoking behavior is a characteristic that may vary within and among subjects both at any time and over time in a way that we may only observe That is Zij should be viewed as a random variable In this situation Z27 is something that we may not view as under control77 over time in contrast to things like treatment group and gender Contrast this with a study in which the goal is to investigate the relationship between respiratory health status and exercise Suppose that each subject is assigned to follow a predetermined exercise plan such that at time tij subject 2 engages in exercise intensity 253 Here although exercise intensity changes over time its values are xed in advance in this study in a way that has nothing to do with how the subjects respiratory health status turns out Thus we use lower case zij to emphasize that the exercise intensities are not something we can only observe but are under control of the investigators Returning to the rst study it is clear that there may be complicated interrelationships between respiratory status and smoking behavior For example a subject may decide at some time point to modify his future smoking behavior as a result of his respiratory status eg a subject experiencing poor respiratory health at time 9 may decide to cut back on smoking at time 9 1 In contrast a subject whose respiratory health is not compromised may continue to smoke in the same way Here current smoking behavior and respiratory status impacts future smoking behavior and of course smoking behavior impacts future respiratory health This suggests that even stating the question of interest can be dif cult What do we mean by the relationship between smoking behavior and respiratory health77 Precise description of what is meant by this is often side stepped by investigators Instead they may plow ahead and write down a statistical model As we now discuss this can lead to dif cult or erroneous interpretations PAGE 389 CHAPTER 10 ST 732 M DAVIDIAN o In particular a common approach is to specify a model relating Y2 and Z27 For example one might adopt a populationaveraged model assuming a straight line relationship Yij o 1Z2 j 6277 with some assumptions on the 527 Alternatively a random coe icient model Y j m iz Zz j 62739 might be speci ed with second stage model m i 502 u z 5122 It should be clear that this second model can be written in the form Y Xi Zibi 62 The type of model is not the issue both models imply that the mean of Y2 is of the form 80 alzij In fact we must be careful how we interpret this Because the Zij are random variables that change with 393 we can really only talk about this mean in the context of the Z27 As we have discussed K may be related to past present and future smoking behaviors however this model seems to specify that respiratory health at time 9 is related only to smoking behavior at time 9 To be fancier about this as discussed in Section 105 what we are really writing is a model that describes the conditional expectation of Y2 given knowledge of Z 1 Zmi In the models above we are implicitly assuming that only Zij is associated with K in that knowing Zik k 7 9 does not give us any more information about respiratory status at time tij ln symbols EYjZ 1 Zmi EY jZ 1024 If 1024 does not hold then it should be clear that we could end up drawing conclusions about the relationship that may be misleading In fact yet another issue arises In many controlled studies where units may be randomized to different treatments the goal is to claim that the use of a certain treatment relative to another causes a more favorable mean response or more favorable rate of change of mean response over time PAGE 390 CHAPTER 10 ST 732 M DAVIDIAN o It is widely accepted that such causal interpretation is possible under these circumstances because the assignment of the treatment was in no way related to how the response might turn out assigned at random Here the association between treatment and response may be given a causal interpretation On the other hand suppose we measure smoking behavior and respiratory status at just a single time point Here if there is an association between treatment and response we cannot claim that the smoking caused the respiratory status there may be other factors eg heredity past smoking behavior environmental factors etc that are related both to how a person might be smoking when we see him and how his respiratory health might turn out These are referred to as confounding factors To take this into account it is common to consider a statistical model that includes confounding factors If all such relevant factors are available it may be possible to adjust for them in a regression model so that causal interpretations can be made However in the longitudinal context the problems are compounded The study may be carried out the study because the investigators would like to claim that say higher levels of smoking cause poorer respiratory health over time somehow 0 Even if we write out a model that accurately describes the relationship or association between K and Z 1 Zmi or even if 1024 is true we still cannot draw such a conclusion in general All the model does is describe the association but that smoking actually causes health status does not necessarily follow because of potential confounding We would therefore need to adjust for confounding factors However the complicated interrela tionships between the Y2 and Z27 over time make this extremely di icult if not impossible We do not pursue this issue further as it is quite complex but it should be clear that simply testing hypotheses about components of B in a simple model like those above will not address causal questions in general This discussion is meant to convince the reader that models for longitudinal data that involve time dependent variables as covariates can be very di cult to specify and interpret The analyst should be aware of this and approach such situations with caution Some references related to this discussion are Pepe and Anderson 1994 Fitzmaurice Laird and Ware 2004 Section 153 and Robins Greenland and Hu 1999 PAGE 391 CHAPTER 10 ST 732 M DAVIDIAN 108 Discussion The general linear mixed effects model with its broad possibilities for modeling longitudinal data has become immensely popular as a framework for the analysis of these data Although the basic model has been considered in the statistical literature since the 1970s it was not until a paper by Laird and Ware 1982 appeared in Biometrics describing the model that it commanded widespread attention this article explained the model with more of an eye toward practical application than technical detail As a result although the authors did not invent the model it is sometimes referred to as the Laird Ware model in the statistical and subject matter literature MAIN FEATURES 0 The model allows the analyst to incorporate additional covariate information allows the possibility that some effects do not vary in the population and includes as special cases many simpler popular models such as the random coe icient model 0 The model explicitly acknowledges both aInong and withinunit variation separately allowing the analyst to think about and characterize each source separately 0 Because the model is subjectspeci c in this sense it allows the analyst to characterize individual behavior through the use of best linear unbiased prediction 109 Implementation With SAS We consider two examples H The dental study data 7 here we use these data to illustrate how to t a model with slopes xed rather than random and show how to obtain the BLUPs of the b and I Data from a strength training study We use these data to show how to t and interpret general linear mixed effects models with additional covariates PAGE 392 CHAPTER 10 ST 732 M DAVIDIAN EXAMPLE 1 7 DENTAL STUDY DATA 0 We t two versions of the random coe icient model assuming a straight line relationship for each child i The model with both intercepts and slopes random ie Y j m 1 t j 62739 op og 52 5 biy 5 glrls boys ip ue This is the same model tted in section 97 Here also assume that varb D for both genders and that R 051 girls R 0291 boys ii The model with intercepts random but slopes considered as xed in the populations of boys and girls ie Y j m 1 t j 6277 b i g 02 O G girls D B boys 0 19 ue We also assume as in that varb D for both genders and that R 051 girls R 0291 boys 0 Thus model is the usual random coe icient model with random intercepts and slopes while ii is the modi cation with slopes all taken to be the same for all boys and for all girls Note that we may also write these models using the representation t Ai Bibiy 007 1c 037 1B7 where i For model i A is the usual matrix of 0 s and 1 s that picks off77 the correct elements of B depending on whether 2 is a boy or girl Bi 12 and bi 170271239 ii For model ii A is the usual matrix of 0 s and 1 s that picks off77 the correct elements of B depending on whether 2 is a boy or girl but now Bi 12 and bi bm Of course each model may be written in the general form Yi Xi Zibi 622 PAGE 393 CHAPTER 10 ST 732 M DAVIDIAN o For each model we show how to get PRDC MIXED to produce and print out various subject speci c quantities In particular we show how to use the outpred option of the model statement to obtain the BLUPs at each time of observation for each child ie the values of We also show how to obtain the values of the BLUPS of the b2 3239 by using the solution option of the random statement Finally we exhibit how to obtain output data sets containing the estimates of B and BLUPs of bi and how to manipulate these to obtain the BLUPs of the intercepts and slopes 2 for each individual PAGE 394 CHAPTER 10 ST 732 M DAVIDIAN PROGRAM CHAPTER 10 EXAMPLE 1 Illustration of fitting both a full random coefficient model as in Chapter 9 and a and modified random coefficient model with interc pts random and slopes fixed for the dental data using PROC XED obtaining BLUPs of random effects and random intercepts and slopes where applicable for both models en er we for e to be the SAME for all children within each gender This assumption is probably not true ut is made or illustrative purposes to ow how such a model may be specified in PROC MIXE For both models we take D to be common to both genders and take Ri sigma for girls and Ri sigma 2B for boys using the REPEATE statement We use the RANDOM statement to s ecify how random effects enter the model AND to ask for the BLUPs o the bi to be printed in each case e also use an option in the MODEL statement to ask for e BLUPs of the individual means at each time point for each child options ls80 ps59 nodate run Read in the data set See Example 1 of Chapter 4 data dentl infile dentaldat input obsno child age distance gender Di Use PROC MIXED to fit the two linear mixed effects models For all of the fits we use usual normal ML rather than REML the default We call PROC MIXED twice to fit each model for reasons described be ow In all cases we use the usual parameterization for the mean model we use the syntax for versions 7 and higher of SAS for outputting calculations to data sets from PRO MIXED In the first call to PROC MIXED We use the OUTPREDdataset option in the MODEL statement This requests that the approximate Best Linear Unbiased Predictors for the individual means at eac time point in the data set for each child be put in dataset along with the original data for comparison o n These may be printed with a print statement as s The SOLUTION option in the RANDOM statement requests that the approximate Best Linear Unbiased Predictors for the random effects bi e printed for each child In the second call to PROC MIXED we use the ODS statement to produce data sets containing the ixed effects estimates an the BLUPs for the random ef ects We use the Output Delivery System i or The first ODS call with quotlisting excludequot suppresses printing of the fixed and random effects To fit the full random coefficient model we must specify that both intercept and slope are random in the RANDOM statement To fit where slopes are taken to be constant across all c i n a gender we specify only that intercept is random in the RANDOM statement MODEL i full random coefficient model Call to PROC MIXED to get the printed results title FULL RANDOM COEFFICIENT MODEL WITH BOTH PAGE 395 CHAPTER 10 ST 732 M DAVIDIAN title2 INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER 39 thod ml datadent1 gender chi model distance gender genderage noint solution outpredpdata random intercept age typeun subjectchild solution repeated groupgender subjectchild run proc print datapdata run The output dat sats FIXEDl and RANDOMl we ask PRGC MIXED t e to create in DS statements contain e es imated fixed effects betaha s and r eff cts the BLUPs of bis respective We now combine these into a sin e data set in order to compute the BLUPs of the individua betais This is accomplished by manipulating the output data sets and then merging em Call to PRGC MIXED to produce the output data sets proc mixed methodml datadent1 class gender child model distance gender genderage noint solution random intercept age typeun subjectchild solution repeated group ender subjectchild data fixedl set fixedl eep gender effect estimate title3 FIXED EFFECTS OUTPUT DATA SET proc print datafixed1 run proc sort datafixed1 by gender run data fixed12 set fixedl by gender re ain ixint fixs o e if e fect gender t en fixintestimate if effect agegender then fixslopeestimate if lastgender then do ou pu fixint fixslope en drop effect estimate run title3 RECGNFIGURED FIXED EFFECTS DATA SET proc print datafixed12 run data randl set randl Eender1 if childlt12 then gender0 eep child gender effect estimate title3 RANDGM EFFECTS OUTPUT DATA SET proc print datarand1 run proc sort datarand1 by child run data rand12 set randl by child retain ranint ranslo e if effect Intercept then ranintestimate if effect a e then ranslopeestima e if lastchil then o output ranint ranslope en drop effect estimate run proc sort datarand12 by ender child run title3 RECGNFIGURED RA 0 EFFECTS DATA SET proc print datarand12 run data bothl merge fixed12 rand12 by gender beta0ifixintranint beta1ifixsloperanslope Di title3 RANDGM INTERCE TS AND SLOPES T proc print databoth1 PUAE 396 CHAPTER 10 ST 732 M DAVIDIAN MODEL ii common slope within each gender Call to PROC MIXED to get the printed results T save space we o not print the predicted values title MODIFIED RANDOM COEFFICIENT MODEL WITH title2 INTERCEPTS RANDOM SLOPES FIXED proc mixed me odfml datadent1 model distance gender genderage noint solution random intercept t eun subjectchild solution repeated groupgender subjectchild run Call to PROC MIXED to get the output data sets class gender c i model distance gender genderage noint solution random intercept typeun subjectchild solution proc mixed metho ml datadent1 data fixed2 set fixed2 eep gender effect estimate title3 FIXED EFFECTS OUTPUT DATA SET proc print data ixed2 run proc sort datafixed2 by gender run data fixed22 set fixed2 by gender retain fixint fixslo e if effect gender then fixintestimate if effect agegender then fixslopeestimate if lastgender then o out ixint fixslope en drop effect estimate run title3 RECONFIGURED FIXED EFFECTS DATA SET proc print datafixed22 run data rand2 set rand2 Eender1 if childlt12 then gender0 eep child gender effect estimate title3 RANDOM EFFECTS OUTPUT DATA SET proc print datarand2 run proc sort datarand2 by child run data rand22 set rand2 by child r 39 ans 0 n m u H r o e if effect then ranintestimate 39 39 e do en drop effect estimate run proc sort datarand22 b ender child run title3 RECONFIGURED RAN O EFFECTS DATA SET proc print datarand22 run data both2 merge fixed22 rand22 by gender beta0ifixintranint beta1ifixslope L111 title3 RANDOM INTERCEPTS AND FIXED SLOPES r proc print databoth2 PUAE 397 CHAPTER 10 ST 732 M DAVIDIAN OUTPUT Following the output7 we comment on a few aspects of the output FULL RANDOM COEFFICIENT MODEL WITH BOTH 1 INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER The Mixed Procedure Model Information Data Se t WORKDENT1 Dependent Variable distance Com onents chi d child Variance Subject Effects Gr ect None hod ModelBased Degrees of Freedom Method Containment Class Level Information Class Levels Values gender 2 0 1 child 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Dimensions Covariance Parameters 5 o umns in 4 Columns in Z Per Subject 2 Subjects 27 Max Obs Per Subject 4 Number of Observations Number of Observations Read 108 Number of Observations ed 108 Number of Observations Not Used 0 Iteration History Iteration Evaluations 2 Log Like Criterion 0 1 47824175986 1 2 41892503842 116632499 2 1 416 18869903 1 23326209 3 1 407 89638533 0 01954268 4 2 406 88264563 0 00645800 5 1 406 10632159 0 00056866 6 1 40604318997 000000764 7 1 40604238894 000000000 FULL RANDOM COEFFICIENT MODEL WITH BOTH 2 INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER The Mixed Procedure Convergence criteria met Covariance Parameter Estimates Cov Parm Subject Group Estimate UN11 child 31978 UN21 child 0110 UN22 child 0 01976 Residual child gender 0 444 Residual child gender 1 26294 Fit Statistics 2 Log Likelihood 4060 A ma er 39s bet 424 0 AICC smaller is better 4259 BIC smaller is better 4357 Null Model Likelihood Ratio Test DF ChiSquare Pr gt ChiSq 4 7220 0001 Solution for Fixed Effects Standard PUAE 398 CHAPTERlO ST 732 M DAVIDIAN Effect gender agegender agegender Effect Intercept Intercept Intercept age Intercept Intercept Effect age Intercept Intercept age Intercept Intercept Intercept age Intercept Intercept Intercept age Intercept Intercept age Intercept Intercept Intercept age Intercept Intercept age Intercept Intercept Intercept age Intercept Intercept age Intercept Intercept age gender Estimate Error DF t Value 0 173727 07386 54 2352 1 163406 11114 54 1470 0 04795 006180 54 776 1 07844 009722 54 807 Solution for Random Effects Std Err child Estimate Pred DF t Value 1 0485 11744 54 041 1 006820 01017 54 067 2 1192 1 1744 54 102 2 1420 0 1017 54 40 3 08535 1 1744 54 073 3 01773 0 1017 54 174 4 1702 1 1744 54 145 4 004017 01017 54 040 5 0913 11744 54 078 FULL RANDOM COEFFICIENT MODEL WITH BOTH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER The Mixed Procedure Solution for Random Effects Std Err child Estimate Pred DF t Value 008680 1017 4 06740 1744 4 7 0 07292 1017 4 7 7 0 05461 1744 4 7 0 03641 1017 4 1935 1744 4 01149 1017 4 02190 1744 4 01151 1017 4 2997 1744 0 09085 1017 19249 1744 01530 1017 13469 4342 0 08788 1232 08676 4342 0 04068 1232 03575 4342 0 02176 1232 59 4342 0 02772 1232 1581 4342 0 04153 1232 08972 4342 7 0 02260 1232 06889 4342 L 0 02853 1232 01443 4342 0 07348 1232 0127 4342 002544 1232 2534 4342 01088 1232 02261 4342 0085 232 063 1 4342 L 0006510 0 1232 170 1 4342 11 9 0 1232 0238 1 4342 0 03166 0 1232 01180 1 4342 006104 0 1232 08223 14342 7 7 007545 01232 FULL RANDOM COEFFICIENT MODEL WITH BOTH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER The Mixed Procedure Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F ender 2 54 38472 0001 agegender 2 54 6266 0001 PU E 399 CHAPTERlO MUD mmpmMHoom mmpmMH mpmm mmpmMHoom mmpmMH oumwo mpmm DD D htr39n HHHHHHHHH mm HHH HHH HHH HHH HHH HHH HHH HHH HHH HHH HHH mm 10 14 8 HH HHH HHH Mompmompmo 14 FULL RANDOM COEFFICIENT MODEL WITH BOTH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER madmanlun mmmm Hmmumm mmHvHHmmdm mwva HmSOl FULL RANDOM COEFFICIENT MODEL WITH BOTH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER madmanlun HHHHHHHHHHHHHHH Hmmumm ooooooooooooooo mmHvHHmmdm D F 54 mmmmmmmmmmmmm ppppppppppppp 54 mwva ooooooooooooooo oooooooooooooo mmmmmmmmmmmmmm 05 27 7 so mmmm DPCAJF WD u MMMMMMMMM mmmmmmw m 1 ST 732 M DAVIDIAN PAGE4M CHAPTERlO 00000000llll03030303 oumu o 91 000 mm b b b b H H b H 0000000000000 lOSU39thJMb O DOONOSU39I 108 1253 81030 5007 1 6110 73529 1368 1 1 0967 77585 5412 1 5824 91676 7444 7 6936 81030 0690 1 3075 73529 8334 1 9215 77585 3660 1 5354 91676 6974 6984 81030 0739 1 2101 73529 7359 1 1 17218 77585 1663 1 2335 91676 13955 1 8835 81030 2589 1 3053 73529 8311 1 1 7270 77585 1716 1 1488 91676 13108 6918 81030 0673 1 1 3114 73529 8373 1 9311 77585 3756 7 1 5507 91676 7127 9 0207 81030 3961 7 1 78070 73529 3328 1 5933 77585 0378 1 3796 91676 5416 7067 81030 0822 1 1048 73529 6306 1 1 15029 77585 9474 1 9009 91676 10629 0303 81030 4058 1 6121 73529 1379 FULL RANDOM COEFFICIENT MODEL WITH BOTH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER S t d d i E s r c t g r A L U h a n P P l o p i a n d r r E w p l g c e e e D e e d e e r d d F a r r 23 12 240 1 25 1939 077585 54 005 236384 267494 2 14 280 1 26 757 091676 54 005 249377 286136 8 170 1 21 8 62 081030 54 005 202017 23 4508 24 10 245 1 23 6228 073529 54 005 221486 25 0970 24 12 260 1 25 4194 077585 54 005 238639 26 9749 24 14 295 1 272160 091676 54 005 253780 29 0540 25 8 225 1 226011 081030 54 005 209765 24 2256 25 10 255 1 241065 073529 54 005 226323 25 5807 5 12 255 1 25 6119 077585 54 005 240565 27 1674 25 14 260 1 27 1174 091676 54 005 252794 28 9554 8 230 1 232220 081030 54 005 215974 24 8465 26 10 245 1 249128 073529 54 005 234386 26 3870 26 12 260 1 266036 077585 54 005 250482 28 1591 26 14 300 1 282945 091676 54 005 264565 30 1325 8 220 1 211898 81030 54 005 19 52 228143 27 10 215 1 226076 073529 54 005 211334 24 0818 27 12 235 1 240255 077585 54 005 224700 255809 27 14 250 1 254433 091676 54 005 236053 272813 FULL RANDOM COEFFICIENT MODEL WITH BOTH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER The Mixed Procedure Model Information Data Set WORKDENT1 Dependent Variable distance C 39 Variance Com onents Subject Effects chi d child Group Ef ct stimation Me Residual Varian Method None Fixed Effects SE Method ModelBased Degrees of Freedom Method Containment Class Level Information Class Levels Values gender 2 0 1 child 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 03 01 to to I m 1 m as I 1 1 0 as I 1 00 01 03 I 1 0 1 m 00 0 lllll I I OMOOgtUgtH I I I I I I I 0gt 0gt 000gt 0gt k 7498 3887 I ST 732 M DAVIDIAN PUAE 401 CHAPTER 10 ST 732 M DAVIDIAN 24 25 26 27 Dimensions Covariance Parameters 5 o umns in 4 Columns in Z Per Subject 2 Subjects 27 Max Obs Per Subject 4 Number of Observations Number of Observations Read 108 Number of Observations Used 108 Number of Observations Not Used 0 Iteration History Iteration Evaluations 2 Log Like Criterion 0 1 47824175986 1 2 41892503842 116632499 2 1 41618869903 123326209 3 1 407 89638533 001954268 4 2 406 88264563 000645800 5 1 406 10632159 000056866 6 1 04318997 000000764 7 1 40604238894 000000000 FULL RANDOM COEFFICIENT MODEL WITH BOTH 9 INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER The Mixed Procedure Convergence criteria met Covariance Parameter Estimates Cov Parm Subject Group Estimate UN11 child 3 1978 UN21 child 0110 UN22 child 0 01976 Residual child gender 0 444 Residual child gender 1 26294 Fit Statistics 2 Log Likelihood 4060 AIC smaller is better 4240 AICC smaller is better 4259 BIC smaller is better 4357 Null Model Likelihood Ratio Test DF ChiSquare Pr gt ChiSq 4 7220 0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F gender 2 54 38472 0001 agegender 2 54 6266 0001 FULL RANDOM COEFFICIENT MODEL WITH BOTH 10 INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER FIXED EFFECTS OUTPUT DATA SET Obs Effect gender Estimate 1 gender 0 173727 2 gender 1 163406 3 agegender 0 04795 4 agegender 1 07844 FULL RANDOM COEFFICIENT MODEL WITH BOTH 11 INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER RECONFIGURED FIXED EFFECTS DATA SET Obs gender fixint fixslope PUAE 402 CHAPTERlO ST 732 M DAVIDIAN 1 0 173727 047955 2 1 163406 078437 FULL RANDOM COEFFICIENT MODEL WITH BOTH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER RANDOM EFFECTS OUTPUT DATA SET Obs Effect child Estimate gender 1 Intercept 04853 2 006820 3 Intercept 11922 4 age 01420 5 Intercept 08535 6 age 01773 7 Intercept 17024 8 age 004017 9 Intercept 09136 0 age 008680 1 Intercept 06740 2 007292 3 Intercept 005461 4 age 7 003641 5 Intercept 19350 6 age 01149 7 Intercept 02190 age 01151 Intercept 29974 009085 Intercept 9249 01530 Intercept 3469 L age 008788 Intercept 08676 age 004068 Intercept 035 age 0 02176 Intercept 15946 g 002772 Intercept 11581 0 04153 Intercept 08972 L age 7 002260 Intercept 06889 age 002853 Intercept 01443 0 07348 Intercept 01273 age 002544 Intercept 5349 age 01088 Intercept 022 L age 008535 Intercept 06374 0006510 Intercept 17008 1139 9 Intercept 02387 age 0 03166 Intercept 01180 age 006104 Intercept 08223 FULL RANDOM COEFFICIENT MODEL WITH BOTH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER RANDOM EFFECTS OUTPUT DATA SET Obs Effect child Estimate gender 54 age 27 007545 1 FULL RANDOM COEFFICIENT MODEL WITH BO TH INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER RECONFIGURED RANDOM EFFECTS DATA SET Obs child gender ranint ranslope 1 1 0 048526 006820 2 2 0 119224 014198 3 3 0 085346 017726 4 4 0 170243 004017 5 5 0 91363 008680 6 6 0 067403 007292 7 7 0 005461 003641 8 8 0 193498 011486 9 9 0 021898 0 11515 10 10 0 299738 0 09085 1 0 1 92494 0 15297 1 12 1 134688 008788 13 13 1 086755 004068 PU E 403 CHAPTER 10 ST 732 M DAVIDIAN 14 14 1 035750 002176 15 15 1 159462 002772 16 16 1 115811 004153 17 17 1 89718 02260 18 18 1 068894 002853 19 19 1 014433 0 07348 20 20 1 012730 0 02544 21 21 1 253489 10877 22 22 1 022609 0 08535 23 23 1 063735 00651 24 24 1 170079 11392 25 25 1 023870 0 03166 26 26 1 011799 06104 27 27 1 082229 007545 FULL RANDOM COEFFICIENT MODEL WITH BOTH 15 INTERCEPTS AND SLOPES RANDOM FOR EACH GENDER RANDOM INTERCEPTS AND SLOPES Obs gender fixint fixslope child ranint ranslope beta0i beta1i 73727 47955 48526 06820 8875 41135 73727 47955 19224 14198 1805 62152 7 3727 47955 85346 17726 5193 65681 L 7 3727 47955 L 70243 04017 0752 1971 7 3727 47955 91363 08680 2864 39274 7 3727 47955 67403 07292 6987 0662 7 7 3727 47955 7 05461 03641 181 51595 8 7 3727 47955 8 93498 11486 9 3077 36469 9 7 3727 47955 9 21898 11515 7 1537 36440 0 7 3727 47955 0 99738 09085 L 3753 38869 1 7 3727 47955 1 92494 15297 9 2977 63251 2 3406 78437 2 34688 08788 7 6875 87225 3 3406 78437 3 86755 04068 4731 74369 4 3406 78437 4 35750 02176 831 76262 5 3406 78437 5 59462 02772 7 9352 75665 3406 78437 15811 04153 825 74285 3406 78437 89718 02260 7 2378 80697 3406 78437 68894 02853 517 5584 3406 78437 14433 07348 1963 71090 3406 78437 12730 02544 2133 80981 3406 78437 53489 10877 8755 89315 3406 78437 22609 08535 1145 69903 3406 78437 63735 00651 7033 79088 L 3406 78437 L 70079 11392 6398 89830 3406 78437 23870 03166 5793 75272 3406 78437 11799 06104 4586 84542 3406 78437 82229 07545 5183 70893 MODIFIED RANDOM COEFFICIENT MODEL WITH 16 INTERCEPTS RANDOM SLOPES FIXED The Mixed Procedure Model Information Data Set WORKDENT1 Dependent Variable distance C 39 Variance Com onents Subject Effects chi d child r c an od None Fixed Effects SE Method ModelBased Degrees of Freedom Method Containment Class Level Information Class Levels Values gender 2 0 1 child 27 1 2 3 4 5 6 Z 8 9 10 11 12 13 14 15 16 17 8 19 20 21 22 23 24 25 26 27 Dimensions Covariance Parameters o umns in Columns in Z Per Subject Subjects Max Obs Per Subject N pqum Number of Observations Number of Observations Read 108 Number of Observations Used 108 Number of Observations Not Used 0 PUAE 404 CHAPTERlO ST 732 M DAVIDIAN Iteration History Iteration Evaluations 2 Log Like Criterion 0 47824175986 1 2 41127740673 001732264 2 1 40974920841 0 00328703 3 1 40936512908 0 00011752 4 1 40935237809 000000026 5 1 40935235096 0 00000000 MODIFIED RANDOM COEFFICIENT MODEL WITH INTERCEPTS RANDOM SLOPES FIXED The Mixed Procedure Convergence criteria met Covariance Parameter Estimates Cov Parm Subject Group Estimate 1 chil 1405 Residual child gender 0 05920 Residual child gender 1 27286 Fit Statistics 2 Log Likelihood 4094 AIC smaller is better 4234 AICC smaller is better 4245 BIC smaller is better 4324 Null Model Likelihood Ratio Test DF ChiSquare Pr gt ChiSq 2 6889 0001 Solution for Fixed Effects Standard Effect gender Estimate DF t Value Pr gt t gender 0 173727 07903 79 2198 0001 gender 1 163406 11272 79 1450 0001 agegender 0 04795 005187 79 924 0001 agegender 1 07844 009234 79 849 0001 Solution for Random Effects Std Err Effect child Estimate Pred DF t Value Pr gt t Intercept 1 12154 06434 79 1 89 00626 Intercept 2 03364 06434 79 52 06025 Intercept 3 10527 0 6434 79 164 01058 Intercept 4 127 0 6434 79 31 00014 Intercept 5 002170 0 6434 79 0 03 09732 Intercept 6 1454 0 6434 79 2 26 00266 Intercept 7 03364 0 6434 79 52 06025 Intercept 8 6945 0 6434 79 08 02837 Intercept 9 14542 0 6434 79 2 26 00266 Intercept 10 39611 06434 79 6 16 lt 0001 Intercept 11 5595 06434 79 53 lt 0001 MODIFIED RANDOM COEFFICIENT MODEL WITH INTERCEPTS RANDOM SLOPES FIXED The Mixed Procedure Solution for Random Effects Std Err Effect child Estimate Pred DF t Value Pr gt t Intercept 12 22849 08495 79 69 00087 Intercept 13 13093 0 8495 79 1 54 01272 Intercept 14 05905 0 8495 79 0 70 04890 Intercept 15 3607 0 8495 79 60 01132 Intercept 16 16174 0 8495 79 1 90 00606 Intercept 17 1553 0 8495 79 36 01777 Intercept 18 10013 0 8495 79 1 18 02421 Intercept 19 08986 0 8495 79 1 06 02934 Intercept 20 01284 0 8495 79 15 08803 Intercept 21 37227 08495 79 438 lt 0001 Intercept 22 11040 0 8495 79 1 30 01975 Intercept 23 05905 0 8495 79 0 70 04890 Intercept 24 590 08495 79 0 70 04890 Intercept 25 007702 08495 79 0 09 09280 PUAE 405 CHAPTERlO Intercept Intercept Data Set Dependent Variable Subject Effects Gr etho Degrees of Freedom Method Iteration Class ender hild 26 07445 27 08495 16174 08495 ST 732 M DAVIDIAN 79 088 79 03835 190 00606 Type 3 Tests of Fixed Effects De Num n Effect DF DF ender 2 79 agegender 2 79 MODIFIED RANDOM COEFFICIENT MODEL WITH INTERCEPTS RANDOM SLOPES FIXED The Mixed Procedure Model Information distan ect F Value 34669 7881 Pr gt F 0001 0001 WORKDENT1 ce Variance Com onents chi d child ender od None d ModelBased Containment Class Level Information Levels Values 2 27 14 15 16 17 8 24 25 26 27 Dimensions Covariance Parameters o umns in Columns in Z Per Subject Subjects Max Obs Per Subject Number of Observations Number of Observations Read Number of Observations Used Number of Observations Not Used Iteration History Evaluations thwlOH HHFHJMH p O O O p O M O I p H MODIFIED RANDOM COEFFICIENT MODEL WITH INTERCEPTS RANDOM SLOPES FIXED The Mixed Procedure Convergence criteria met 0 1 1 2 3 4 5 6 7 8 9 1 19 2 Log Like 47824175986 10 11 12 13 20 21 22 23 N pqum 108 108 Criterion Covariance Parameter Estimates Cov Parm UN11 Residual Residual Subject Group child child gender 0 child gender 1 Fit Statistics 2 Log Likelihood Estimate 409 4 AIC smaller is better 4234 AICC smaller is better 4245 BIC smaller is better 4324 Null Model Likelihood Ratio Test DF ChiSquare Pr gt ChiSq 2 6889 0001 PUACE 406 CHAPTERlO Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value gender 2 79 34669 agegender 2 79 7881 mprHo ooo MODIFIED RANDOM COEFFICIENT MODEL WITH INTERCEPTS RANDOM SLOPES FIXED FIXED EFFECTS OUTPUT DATA SET Obs Effect gender Estimate 1 gender 0 173727 2 gender 1 163406 3 agegender 0 04795 4 agegender 1 07844 MODIFIED RANDOM COEFFICIENT MODEL WITH INTERCEPTS RANDOM SLOPES FIXED RECONFIGURED FIXED EFFECTS DATA SET Obs gender fixint fixslope 1 0 173727 047955 2 1 163406 078438 MODIFIED RANDOM COEFFICIENT MODEL WITH INTERCEPTS RANDOM SLOPES FIXED RANDOM EFFECTS OUTPUT DATA SET Effect child Estimate Interce t 12154 Interce t 03364 Interce t 10527 Interce t L 21270 Interce t 002170 Interce t 14542 Interce t 7 03364 Interce t 8 06945 Interce t 9 14542 Interce t 0 39611 Interce t 1 35595 Interce t 2 22849 Interce t 3 13093 Interce t 4 05905 Interce t 5 13607 Interce t 16174 Interce t 11553 Interce t 10013 Interce t 08986 Interce t 01284 Interce t 37227 Interce t 11040 Interce t 05905 Interce t L 0590 Interce t 007702 Interce t 0744 Interce t 1 6174 MODIFIED RANDOM COEFFICIENT MODEL WITH INTERCEPTS RANDOM SLOPES FIXED RECONFIGURED RANDOM EFFECTS DATA SET Obs child gender ranint 1 1 0 121545 2 2 0 033642 3 3 0 1 05266 4 4 0 12703 5 5 0 0 02170 6 6 0 1 45420 7 7 0 33642 8 8 0 69454 9 9 0 1 45420 10 10 0 3 96105 11 1 0 55952 12 1 1 2 28494 13 1 1 1 30935 14 14 1 0 59049 15 15 1 36069 16 16 1 1 61743 17 17 1 15531 18 18 1 1 00127 19 19 1 0 89857 20 20 1 012837 21 21 1 372265 Pr gt F 0001 0001 gender ST 732 M DAVIDIAN PU E 407 CHAPTERlO ST 732 M DAVIDIAN MODIFIED RANDOM COEFFICIENT MODEL WITH RANDOM SLOPES FIX Obs gender w CAJMF O 3 l INTERCEPTS ED RANDOM INTERCEPTS AND FIXED SLOPES fixint INTERPRETATION mmmmmmmmmmmmm fixslope child w CAJMF O ranint 21545 I o to H 1 o betaOi ma 1 betali o The t of Model is identical to that in section 97 using the same assumption on the forms of D and Rg The results appear on pages 175 of the output Also on pages 273 the BLUPs of the elements of bi are printed for eaCh Child as requested in the solution option of the random statement PAGE 408 CHAPTER 10 ST 732 M DAVIDIAN o On pages 577 of the output the data set created by outpred is printed This data set contains the values of X Z 232 for each observation in the data set in the order of appearance in the column Pred Also printed are the contents of the original data set Thus we see that for child 1 with observa tions 210200215230 at ages 8101214 the BLUP of this child s trajectory at these times are 20178210012182422646 Pages 879 are a repeat of the results arising from the second call to proc mixed Note that the solutions for xed and random effects are not printed resulting from the rst and third ods statement Page 10 results from printing out the data set containing the estimates of B created by the ods output SolutionFfixed1 statement SolutionF is a key word recognized by PRDC MIXED as identifying this data set the PRDC MIXED documentation describes many more possibilities of results that may be output to SAS data sets The statements following the proc print to print these results recon gure the data set so that it appears in the form on page 11 This is necessary in order to merge the estimates of B with the BLUPs for the bi in subsequent data steps 0 On pages 12713 the results of printing the data set containing the BLUPs of the bi for each child created by the ods output SolutionRrand1 statement SolutionR is the key word identifying this data set Note that for each child there is a separate row in the le for the intercept BLUP and the slope BLUP b0 and bu In the code the data step following the printing of this data set results in a recon gured data set suitable for mergeing with that containing the estimates of B This data set is given on page 14 The two variables ranint and ranslope contain the BLUPs for b0 and bu respectively Finally page 15 shows the result of printing out the data set obtained by mergeing the two data sets above The variables betaOi and beta1i are the BLUPs for the intercept and slope components of B for each child Pages 16718 shows the output of the t of Model ii in which slopes are taken not to vary For brevity the predicted values using outpred are not requested The results printed on pages 19720 arise from the second call to proc mixed those on pages 21725 are the consequence of the same manipulations of output data sets obtained from ods statements within PRDC MIXED as for Model i described above Note that on page 25 the BLUPs Of oi the child speci c intercepts vary while those of u the child speci c slopes do not 7 slope is the same for all girls and all boys PAGE 409 CHAPTER 10 ST 732 M DAVIDIAN This of course is a result of the model assumption 0 Finally note that regardless of the assumption about how random effects enter the model the estimates of B are identical for Models and ii This is a consequence of the fact that these data are balanced as previously noted EXAMPLE 2 7 WEIGHTLIFTING STUDY IN YOUNG MEN Physical tness researchers were in terested in whether following a new program including both a regimen of exercise and special diet would lead to young men with an interest in weight lifting to be able to bench press greater amounts of weight and to do it more quickly than if they were to follow only the exercise part of the program alone Thus they had a particular interest in the effects of the diet portion of the program To investigate the researchers recruited 100 young men in high school college and beyond with either existing interest and experience with weight lifting or interest in becoming involved in weight lifting It is well known that the amount of weight a man can bench press may be associated with their body weight previous weight lifting experience and age Thus the researchers recorded these baseline characteristics for each man Age mean sd220 27 min16 max32 Weight mean sd 1804 248 min1197 man2276 Previous weightlifting 27 experience Bench press lbs mean sd1637 132 The mean were randomized at the beginning of the study to 2 groups 50 men per group 0 Follow the exercise part of the program only 0 Follow both the exercise and diet parts of the program The amount of weight each man was capable of bench pressing at entry into the study was recorded for all men day 0 Subsequently the men were allowed to come to the gym at which the study was conducted according to their own schedules as would be the case in practice most came at least 4 times per week Periodically members of the research staff would record the amount lbs each man was able to bench press the response Because each man s schedule was different due to their class or work obligations the times at which this was recorded for each man varied across men Most men were followed for about 9 10 months PAGE 410 CHAPTER 10 ST 732 M DAVIDIAN A spaghetti plot of the data is given in Figure 2 Here time is measured in days since entry into the study Note that in each group the weight trajectories appear to be roughly like straight lines with variation about the line within each man Figure 2 Weights bench pressed lbs over time for a men in the no diet group and 6 men in the diet group a No am b ma mu l mu l welth ms l l l l l l l l l l l u an mu 15m 2m 25 am u an mu 15m 2m 25 am m e davs m e davs On the basis of these data the researchers would like to investigate the following speci c issues H Is there evidence that the typical rate of change in amount such men are able to bench press is different depending on whether they followed the diet or not 10 In fact does it matter whether they had previous experience with weight lifting in regard to the rate of change To investigate we consider the following statistical models The most general model is as follows For the 2th man the individual trajectory follows a straight line ie the jth weight bench pressed for man 2 3 measured at day tij after his entry into the study 9 1 ni is given by Y j m utz j 62 PAGE 411 CHAPTER 10 ST 732 M DAVIDIAN Clearly the amount a man can bench press cannot increase without bound forever 7 eventually a man would reach his maximum possible strength and the amount he could bench press would likely level off77 Over the period of this study it seems however that most if not all men have not shown such leveling off Thus a straight line may be a reasonable representation of the trajectories in this time frame however at later times this model may not be appropriate at all Let M be man is body weight lb at baseline let ai be his baseline age and let pi 1 if the man had prior weightlifting experience before the start of the study and pi 0 if not Let d be an indicator of whether man 2 was randomized to follow the program with di 1 or without di 0 the diet component The simplest population model that could be considered would simply follow the study design exactly Because the men were randomized to receive the diet or not we would expect the mean weight bench pressed at time 0 to be the same regardless of whether a man was assigned to the diet or no diet group That is the mean of intercepts 802 would not be expected to be different for the two groups The mean of the slopes u which characterize rate of change as constant over the period of the study may well be different Under these conditions the population model is m t 1702 u i 311012 512 7 where here we have used the difference parameterization77 for the slopes so that 81 represents the typical rate of change for men who do not follow the diet and 811 represents the amount by which the rate of change differs from this with the diet The rst overall question of whether the mean rate of change is different depending on whether the diet is followed may be addressed by asking whether u 0 In the following program this is Model More detailed and exploratory analyses may be carried out Given that it is suspected that men s baseline characteristics may help to explain some of the variation in the men at time 0 We may modify Model to take this into account by allowing the mean intercept to be different depending on baseline weight age and experience u o mwz ozaz ospz 5022 The hope in tting this model which adjusts for baseline characteristics is that if some of the variation in the data at baseline can be explained by systematic features it may lead to more precise estimation and testing for the rate of change PAGE 412 CHAPTER 10 ST 732 M DAVIDIAN Model with this modi cation is given in the program as Model ii The model might be further modi ed to allow an exploratory analysis of whether previous experience plays a role in how men s ability to bench press changes over the time period in the study The following model takes into account baseline characteristics as in Model ii but also allows in the model for man speci c slopes not only the possibility that the mean rate of change in weight bench pressed may be different because of whether a man followed the diet or not but also that this is di ferential depending on whether the man has previous weight lifting experience m o 01w 02a2 ogpz bow u i 11d 12P igdipz 5122 In the program this is Model iii A nal model is considered in the program Model iv which does not allow mean rate of change to depend on either diet or previous experience u i 512 this model may be used with Model ii to get a likelihood ratio test of whether mean rate of change is different depending on whether the diet is followed taking into account the baseline covariates The following SAS program uses PRDC MIXED to t these models to the data It is assumed that 0 With bi b0 b1 varb D the same for both groups diet or not 0 With 6 621 emi varei 721m 72 the same for both groups ldeally these assumptions should be evaluated for relevance and modi ed if necessary we do not do this here but encourage the reader to do this with the data on the class web site PAGE 413 CILAP TER 10 ST 732 M DAVIDIAN PROGRAM CHAPT ER 10 EXAMPLE 2 Illustration of fitting a linear mixed effects model derived from a random coefficient model where the mean slope in each group The model for each man is assumed to The intercepts The slopes are taken to depend on baseline covariates by group diet or not depends on a continuous covariate be a straight line are taken to depend on baseline covariat es differentially We take D to be common for both groups and take Hi to be common to both groups of the form Ri sigma 2 I options ls80 ps59 nodate run Read i n the data set data pda input t infile pressd atz id time press weight age prev diet Use PRGC MIXED to fit linear mixed effects model no rmal title MGDEL i mixed method 1 model press proc class random i we use ML rather than REML to get likelihood ratio tests ml datapdat time timediet solution intercept time typeun subjectid estimate quotslp wdietquot time 1 timediet 1 run Model ii that includes quotadjustmentsquot n rmal for ML rather than REML to get likelihood ratio tests title MGDEL ii proc mixed methodml datapdat class i model ress weight prev age time timediet solution ran om intercept time typeun subjectid estimate quotslp wdietquot time 1 timediet 1 Di Model rate of We 39 include iii includes this adjustment plus the possibility that change depends on both die previous experience estimate statements to estimate each slope contrast statements to make some comparisons title MGDEL iii proc class 1 model press mixed methodml datapdat weight prev age time timediet timeprev timedietprev solution random intercept time typeun sub ect estimate quots p diet no prevquot time timediet 1 estimate quotslp no diet revquot time 1 timep estimate quotslp diet prevquot time 1 timeprev 1 timediet 1 timedietprev 1 contrast quotoverall slp diffquot timediet 1 39 e prev 1 timedietprev 1 chisq contrast quotprev effectquot timeprev 1 timedietprev 1 chisq contrast quotdiet effectquot timediet 1 timedietprev 1 chisq run PU E 414 CHAPTER 10 ST 732 M DAVIDIAN Mggel iv quotreducedquot model with no diet or previous weightlifting e ect title MGDEL iv proc mixed methodml datapdat class i model ress weight prev age time solution random intercept time typeun subjectid run PAGE 415 CHAPTERlO OUTPUT Following the output7 we comment on a few aspects of the output MODEL i The Mixed Procedure ata Se ubject Effect Lstimation Method esidual Variance ixed Effects SE Model Information t ependent Variable 39 S ructure Method Method egrees of Freedom Method WORKPDAT ress d L rofile odelBased Iontainment Class Level Information Class Levels id 100 Values 1 2 3 m 01 m m m u mm mmm m mm 000 m o m H 9 10 11 2 p 01 p 03 p p I p 0 01 O 01 H q 01 u m u u m mm mmmm m mm 0000 m o m H Om m 39hwaH ppppppppp U1 U1 01 03 U1 95 96 97 98 99 100 Dimensions o umns in X Covariance Parameters 4 3 Columns in Z Per Subject Subj ects Max Obs Per Subject Number of Observations Number of Observations Head Number of Observations Used Number of Observations Not Used Iteration History Iteration cnugtcuh3reltgt Evaluations HHHHMH 2 Log Like 778764461022 MODEL i The Mixed Procedure Iteration History Iteration Evaluations 1 1 1 2 Log Like 542090966177 542087642307 542087634256 Convergence criteria met Covariance Parameter Estimates Cov Parm Subject Estimate UN11 id 16479 UN21 id 06063 22 id 001228 Residual 137306 Fit Statistics 2 Log Likelihood 5420 9 AIC smaller is better 54349 AICC smaller is better 5435 0 BIC smaller is better 54531 Null Model Likelihood Ratio Test 12 13 Jchcnugtcuh3 thJthJthJ m m m m Klm kJN m u 839 839 Criterion 003057689 Criterion 000001661 000000004 000000000 ST 732 M DAVIDIAN PAGE4M CHAPTERlO DF ChiSquare Pr gt ChiSq 3 236677 0001 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr gt t Intercept 16389 13056 99 12553 0001 time 02020 001523 98 1327 0001 timediet 01665 002060 639 808 0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F time 1 98 17597 0001 timediet 1 639 6535 0001 MODEL i The Mixed Procedure Estimates Standard Label Estimate Error DF t Value Pr gt t slp wdiet 03685 001520 639 2424 0001 MODEL ii The Mixed Procedure Model Information et WORKPDAT ependent Variable ress ovariance Structure Unstructured ubject ffect d Lstimation Me esi ual Vari Method rofile ixed Effects SE M hod odelBased egrees of Freedom Method Class Level Information Class Levels Values id 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 Dimensions Covariance Parameters 4 o umns in 6 Columns in Z Per Subject 2 Subjects 100 Max Obs Per Subject 12 Number of Observations Number of Observations Head 839 Number of Observations Used 839 Number of Observations Not Used 0 Iteration History Iteration Evaluations 2 Log Like Criterion 0 737792880597 1 2 541472631658 000700491 2 1 539779499881 000207735 3 1 539299291567 0 00033764 4 1 539226713310 000001407 5 1 539223925291 000000003 MODEL ii The Mixed Procedure ST 732 M DAVIDIAN PU E 417 CHAPTER 10 ST 732 M DAVIDIAN Iteration History Iteration Evaluations 2 Log Like Criterion 6 1 539223919542 0 00000000 Convergence criteria met Covariance Parameter Estimates Cov Parm Subject Estimate UN11 id 10454 UN21 id 01806 UN22 id 001227 Residual 137285 Fit Statistics 2 Log Likelihood 53922 AIC smaller is better 54122 AICC smaller is better 54125 BIC smaller is better 54383 Null Model Likelihood Ratio Test DF ChiSquare Pr gt ChiSq 3 198569 0001 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr gt t Intercept 13086 123075 96 1063 0001 weight 006093 004260 639 143 01531 15 6 2349 639 641 0001 a e 08181 03876 639 211 00352 time 02014 001578 98 1276 0001 timediet 01674 002221 639 754 0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F weight 1 639 205 01531 prev 1 639 4113 0001 MODEL ii 6 The Mixed Procedure Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F a e 1 639 445 00352 time 98 16294 0001 timediet 1 639 5679 0001 Estimates Standard Label Estimate Error DF t Value Pr gt t slp wdiet 03688 001576 639 2340 0001 MODEL iii 7 The Mixed Procedure Model Information Set WGRKPDAT ependent Variable ress ovariance Structure Ugstructured 5 L4 D ro e ho odelBased egrees of Freedom Method Containment Class Level Information Class Levels Values PU E 418 CHAPTER 10 ST 732 M DAVIDIAN id 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 Dimensions Covariance Parameters 4 o umns in 8 Columns in Z Per Subject 2 Subjects 100 Max Obs Per Subject 12 Number of Observations Number of Observations Head 839 Number of Observations Used 839 Number of Observations Not Used 0 Iteration History Iteration Evaluations 2 Log Like Criterion 0 1 727005573644 1 2 534230391536 000013213 2 1 534203719070 000000140 3 1 534203451402 000000000 MODEL iii 8 The Mixed Procedure Convergence criteria met Covariance Parameter Estimates Cov Parm Subject Estimate UN11 id 10390 UN21 id 01075 UN22 id 0007303 Residual 137266 Fit Statistics 2 Log Likelihood 53420 AIC smal er is be r 5366 0 AICC smaller is better 53664 BIC smaller is better 53973 Null Model Likelihood Ratio Test DF ChiSquare Pr gt ChiSq 3 192802 0001 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr gt t Intercept 13083 123290 96 1061 0001 weight 006032 0 04267 639 141 01580 16 89 3 23608 639 716 0001 age 08011 03883 639 206 00395 time 01715 001428 96 1200 0001 timediet 01444 002027 639 712 0001 prevtime 01154 002805 639 411 0001 prevtimediet 007575 003915 639 193 00534 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F weight 1 63 00 01580 prev 1 639 51 20 0001 age 1 63 4 2 00395 time 1 96 144 11 0001 timediet 1 639 50 76 0001 prevtime 1 639 16 92 0001 prevtimediet 1 63 74 00534 MODEL iii 9 PU E 419 CHAPTERlO ST 732 M DAVIDIAN Labe sl lp S slp Label overal prev e diet e The Mixed Procedure Estimates Standard 1 Estimate DF diet no prev 03158 001443 639 no diet prev 02869 002415 639 diet prev 05070 002329 639 Contrasts Num Den DF DF ChiSquare P Value 1 slp diff 3 639 15873 5291 ffect 2 639 6540 3270 ffect 2 639 9396 4698 MODEL iv The Mixed Procedure Model Information Set WORKPDAT ependent Variable ress ovariance Structure Unstructured ubject E fect d Lstimation Me hod esidual Vari ce Method ro ixed Effects SE M hod odelBased egrees of Freedom Method Containment Class id Cov Levels Class Level Information Values 100 6 7 8 9 18 19 mpmm 00000000 mpmm 3000 68 00 0000 00 103 000 96 97 98 99 Dimensions ariance Parameters oumns in Col Sub Max Number Number Number Ev Iteratio n 0 1 2 3 4 5 Iteration Ev 6 umns in Z Per Subject jects Obs Per Subject Number of Observations of Observations Head of Observations Used of Observations Not Used Iteration History 2 Log Like 55258304 aluations 7681 01010101 pppp capoml IOH D 54085223 543717181826 MODEL iv HHHHMH The Mixed Procedure Iteration History 2 Log Like 543716382593 aluations 1 Convergence criteria met Covariance Parameter Estimates t Value 2189 1188 2177 Pr gt ChiSq 0001 0001 A o o o H 839 839 Criterion 001095523 Criterion 000000000 PU E 420 CHAPTER 10 ST 732 M DAVIDIAN Cov Parm Subject Estimate UN11 id 10401 UN21 id 01711 UN22 id 001930 Residual 137321 Fit Statistics 2 Log Likelihood 54372 AIC smaller is better 54552 AICC smaller is better 54554 BIC smaller is better 54786 Null Model Likelihood Ratio Test DF ChiSquare Pr gt ChiSq 3 224439 0001 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr gt lt Intercept 13096 123232 96 1063 0001 eight 006097 004265 639 143 01533 157659 23516 639 670 0001 age 08044 03881 639 207 00386 time 02851 001399 99 2039 0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F weight 1 639 204 01533 prev 1 639 4495 0001 age 1 639 429 00386 MODEL iv 12 The Mixed Procedure Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F time 1 99 41558 0001 INTERPRETATION 0 From the output for the ts of Models and ii on pages 2 and 5 difference in rate of change for using the diet versus not is estimated as about 311 017 lbsday standard error 002 the estimate is almost identical whether adjustment for baseline characteristics is included or not The pvalue of 00001 for the Wald test indicates that the evidence is very strong that the diet does have a positive effect on the rate of change From the estimate statement in each case we have that the estimated slopes are 31 020 015 lbsday with no diet and 31 311 037 016 lbsday We can obtain the likelihood ratio statistic in the case of baseline adjustment from the output of models ii and iv The observed statistic is 54372 7 53922 450 The statistic has a x distribution for which the critical value for a 005 level test is xiog5 384 Thus it is clear that the evidence is very strong that the diet makes a different 0 Turning to the exploratory analyses consider the output for Model iii on pages 7710 Here PU E 421 CHAPTER 8 ST 732 M DAVIDIAN 8 General linear models for longitudinal data 81 Introduction We have seen that the classical methods of univariate and multivariate repeated measures analysis of variance may be thought of as being based on a statistical model for a data vector from the ith individual 2 1 m So far we have written this model in different ways Following convention we wrote the model as Y a M 6 where M is the 1 X 71 matrix 11 Min M 111 Him and the individual means M are for the 6th group at the jth time We could equally well write this model as YZ39 M 6239 for unit 2 coming from the 6th population 6 1 q Regardless of how we write the model we note that it represents YZ39 as having two components 0 a systematic component which describes the mean response over time depending on group membership The individual elements of ye M for the 6th group at the jth time are further represented in terms of an overall mean and deviations as My M W w WM along with constraints 2231 n 0 etc in order to give a unique representation As noted in the last chapter this representation i Requires that the length of each data vector Y be the same 71 ii Does not explicitly incorporate the actual times of measurement or other information PAGE 208 CHAPTER 8 ST 732 M DAVIDIAN 0 an overall random deviation 62 which describes how observations within a data vector vary about the mean and covary among each other Both univariate and multivariate ANOVA models assume that varei 2 is the same n X 71 matrix for all data vectors Furthermore i 2 is assumed to have the compound symmetry structure in the univariate model This came from the assumption that each element of 62 is actually the sum of two random terms ie 62739 bi 6277 where the random effect bi has to do with variation among units and 627 has to do with variation within units ii 2 is assumed to have no particular structure in the multivariate model We also noted in Chapter 5 that this model could be written in an alternative way Speci cally we de ned B as the column vector containing all of u n 0 TA5 stacked and Xi to be a matrix of 0 s and 1 s with 71 rows that picks off the appropriate elements of B for each element of YZ39 We wrote the model in the alternative form Yi Xi 5i7 8 1 where again 62 is the overall deviation vector with varei 2 Note that both the univariate and multivariate ANOVA models could be written in this way what would distinguish them would again be the assumption on 2 This model along with the usual constraints has the avor of a regression model for the 2th unit Regardless of how we write the model it says that for a unit in group 6 Yij M W Yj WM 6277 82 so that is taken to have this speci c form As we will now discuss a representation like 81 offers a convenient framework for thinking about more general model for longitudinal data In this chapter we will discuss such a model writing it in the form 81 We will see that we will be able to address several of the issues raised in the last chapter 0 Alternative de nitions of X 239 and B will allow for unbalanced data and explicit incorporation of time and other covariates PAGE 209 CHAPTER 8 ST 732 M DAVIDIAN o Re ned consideration of the form of varei will allow more realistic and general assumptions about covariance including the possibility of different covariance matrices for different groups 82 Simplest case i one group balanced data To x ideas we rst consider a very simple special case of the longitudinal data situation focusing mainly on the issue of allowing the model to contain explicitly information on the times of observation on each individual For this purpose we will continue to assume that the data are balanced Formally consider the following situation 0 Suppose Yi 2 1 m are all n X 1 where the jth element Y2 is observed at time tj Here the times t1 tn are the same for all units 0 Suppose that there is only one group so that all units are thought to behave similarly The mean vector is thus simply no group subscript necessary ll IA L17quot397IquotLngt39 We observed in the dental study that the sample means for girls and for boys seem to follow an approximate smooth straightline trajectory Figure 1 illustrates the gure shows the sample means at each time age and an estimated straight line to be discussed later for the data for each group gender Figure 1 Dental data Sample means at each time across children compared with straight line ts Buys 3m ms1ance mm ms1ance mm 24 24 PAGE 210 CHAPTER 8 ST 732 M DAVIDIAN The sample means suggest that the true means W at each time point may very well fall on a straight line This observation suggests that we may be able to re ne our view about the means Rather than thinking of the mean vector as simply as set of n unrelated means m we might think of these means as satisfying My o 1tj that is the means fall on the line with intercept 80 and slope 81 This suggests replacing 82 by Ygi t itj 272 83 Model 83 says that at the jth time tj K values we might see have mean 80 ltj and vary about it according to the overall deviations 57 In contrast to 82 this model represents the mean as explicitly depending on the time of measurement tj With just one group 6 and hence 77 would be the same for all units in that model and the mean depends on time through 47 and TA5 Instead of requiring 714 separate parameters w j 1 n to describe the means at each time 83 requires only two the intercept and slope Thusif we are willing to believe that the true means do indeed fall on a straight line 83 is a more parsimonious representation of the systematic component Under the new model 83 we are automatically including the belief that the trajectory of means should be a straight line Our best guess estimate for this trajectory would be intuitively found by estimating the intercept and slope 80 and 81 coming up An additional possible advantage would be as follows If we wanted to use these data to learn about for example mean distance at age 11 years the straight line provides us with a natural estimate while it is not clear what to do with the sample means to get such an estimate connect the dots How would we assess the quality of such an estimate eg provide a standard error To summarize if we really believe that the mean trajectory follows a straight line model 83 seems more appropriate because it exploits this assumption PAGE 211 CHAPTER 8 ST 732 M DAVIDIAN MATRIX REPRESENTATION The model 83 may be written in matrix form With YZ39 as usual the n X 1 data vector de ning 1 t1 1 t X 2 x3 7 go i 1 tn we can write the model as Y X 62 84 This has the form of model 81 Because all units are seen at the same 71 times the matrix X is the same for all units 00 VARIANCE MATRIX The above development offers an alternative way to represent mean response To complete the model we need to also make an assumption about the covariance matrix of the random vector 62 For example as in the classical models we could assume that this matrix is the same for all data vectors ie varei 2 for some matrix 2 Momentarily we will address the issue of speci cation of 2 more carefully for now as we consider the situation of only a single population it is natural to take this matrix to be the same for all units MULTIVARIATE NORMALITY Suppose we further assume that the responses Y2 are normally dis tributed at each time point so that the YZ39 are multivariate normal Thus we may summarize the model as Y N MAX32 where X and B are as above 83 General case 7 several groups unbalanced data covariates The modeling strategy for the mean above may be generalized We consider several possibilities 0 units from more than one group 0 different numberstimes of observations for each unit 0 other covariates PAGE 212 CHAPTER 8 ST 732 M DAVIDIAN MORE THAN ONE GROUP For de niteness suppose there are q 2 groups as in the dental study example From Figure 1 the data support a model that says for each group the means at each age fall on a straight line but perhaps the straight line is different depending on group gender This suggests that if unit 2 is a girl we might have Yij op iptj 62739 35 where og and re are the intercept and slope respectively describing the means at each time for girls as a function of time Similarly if unit 2 is a boy we might have Yz j uB lBtj 62739 86 where 103 and 81 3 are the intercept and slope possibly different from og and l g De ning for the 2th unit 6239 0 if unit 2 is a girl 1 if unit 2 is a boy note that we can write 85 and 86 together as Yij 17 SD300 Si808 17 6igttj l SM313 6239 87 This may be summarized in matrix form as follows The full set of intercept and slopes o g l g 103 and 81 3 characterize the means under these models for both groups De ne the parameter vector summarizing these og u og ue Then de ne 17 62 17 6 t1 62 6J1 X s s s 3 89 17 6239 17 50M 52 Sitn PAGE 213 CHAPTER 8 ST 732 M DAVIDIAN It is straightforward to see that this is a slick way of noting that ifz is a girl or boy respectively we are de ning 1 t1 0 0 0 0 1 t1 X EEEEX EEEE 1 tn 0 0 0 0 1 tn respectively With these de nitions it is a simple matrix exercise to verify that Xi yields the n X 1 vector whose elements are o g 1 gtj or 80 3 LBtj depending on whether 2 is a boy or girl We may thus write the model succinctly as Yi Xi 62 where B and Xi are de ned in 88 and 89 respectively 0 Note that the matrix Xi is different depending group membership 0 Note that X is not of full rank a boy does not have information about the mean for girls and vice versa 0 Note that B contains all parameters describing the mean trajectory for both groups MULTIVARIATE NORMALITY With the additional assumption of normality each Y under this model is n variate normal with mean Xi where Xi depends on group membership With some additional assumption about the covariance matrix eg varei 2 for all 2 we have Yi NNnXi672gt39 IMBALANOE It is possible to be even more general For de niteness we consider two examples ULTRAFILTRATION DATA FOR LOW FLUX DIALYZERS These data are given in Vonesh and Chinchilli 1997 section 66 Low ux dialyzers are used to treat patients with end stage renal disease to remove excess uid and waste from their blood In low ux hemodialysis the ultra ltration rate ml hr at which uid is removed is thought to follow a straight line relationship with the transmembrane pressure mmHg applied across the dialyzer membrane A study was conducted to compare the average ultra ltration rate the response of such dialyzers across three dialysis centers where they are used on patients A total of m 41 dialyzers units were involved The experiment involved recording the ultra ltration rate at several transmembrane pressures for each dialyzer PAGE 214 CHAPTER 8 ST 732 M DAVIDIAN Figure 2 shows individual dialyzer pro les for the dialyzers in each center A notable feature of the gure is that the 4 pressures time77 here at which each dialyzer was observed are not necessarily the same Thus the 2th dialyzer has its own set of times tij j 1 n 4 Hence we cannot calculate sample means because each dialyzer is seen at potentially different pressures However if we envision taking means in each panel of the gure across all time points it seems reasonable that the means would very likely fall approximately on a straight line Figure 2 Dialyzer pro les L 39 rate vs L pressure for 41 dialyzers in 3 centers Center 1 Center 2 2000 2000 mlhr 1500 mlhr l 1500 1000 1000 ultra ltratl on rate ultra ltratlon rate 500 500 tranmembrane pressure mmHg tranmembrane pressure mmHg Center 3 1000 1500 2000 ultra ltration rate mlhr 500 tranmembrane pressure mmHg With the modeling strategy we have adopted this does not really pose any additional di iculty From the gure a reasonable model for the 2th dialyzer is K 81 gtij 57 dialyzer 2 in center 1 Y2 83 4t j 527 dialyzer 2 in center 2 Y2 85 gtij 527 dialyzer 2 in center 3 810 Here 81 83 85 are the intercepts and 82 84 85 are the slopes for the means straight lines for each center PAGE 215 CHAPTER 8 ST 732 M DAVIDIAN De ning n32739 r r 786y7 we can de ne a separate n X 1 X 239 matrix for each unit based on its group membership and unique set of times ti for example for unit 2 from the rst center 1t2 10000 1t n0000 We may thus again write the model 810 as Yi X i 62 where Xi is de ned appropriately for each unit and is de ned as above HIPREPLACEMENT STUDY These data are adapted from Crowder and Hand 1990 section 52 30 patients underwent hip replacement surgery 13 males and 17 females Haematocrit the ratio of volume packed red blood cells relative to volume of whole blood recorded on a percentage basis was supposed to be measured for each patient at week 0 before the replacement and then at weeks 1 2 and 3 after the replacement The primary interest was to determine whether there are possible differences in mean response following replacement for men and women Spaghetti plots of the pro les for each patient are shown in the left hand panels of Figure 3 We will discuss the right hand panels later PAGE 216 CHAPTER 8 ST 732 M DAVIDIAN Figure 3 Hmmatocrit trajectories for hip replacement patients The left hand panels are individual pro les by gender the right hand panels show a tted quadratic model for the mean superimposed Males individual trajectories Males mean at age 6552 superimposed c haematocrit O 40 haematocrit O 40 c c N N 00 05 10 i5 20 25 30 00 05 10 i5 20 25 30 Females individual trajectories Females mean at age 6607 superimposed c c in in haematocnt haematocnt It may be seen from the gure that a number of both male and female patients are missing the mea surement at week 2 in fact there is one female missing the pre replacement measurement and week 2 The reason for this is not given by Crowder and Hand however because it is so systematic happening only at this occasion and for about half of the male and half of the female patients it suggests that the reason has nothing to do with the patients health or recovery from the replacement Perhaps the centrifuge used to obtain haematocrit values went on the blink that week before all patients values could be obtained We will assume that the reason for these missing observations has nothing to do with the thing of primary interest gender this seems reasonable in light of the pattern of missingness for week 2 Thus we have a situation where the data vectors Y are of possibly different lengths for different units In particular we now have that YZ39 is X 1 where ni is the number of observations on unit i Thus the total number of observations from all units is m i1 PAGE 217 CHAPTER 8 ST 732 M DAVIDIAN To determine an appropriate parsimonious representation for the mean of a data vector for each group we could calculate the sample means at each time point for males and females We must be a bit careful however because of the missingness the sample means at different times will be of different quality Nonetheless it seems clear from the gure that a model that says the means fall on a straight line for either gender would be inappropriate For almost all patients the prereplacement reading is high then following replacement the haematocrit goes down and then slowly rebounds over the next 3 weeks This suggests that the relationship of the means with time might look more like a quadratic function of time These observations suggest the following model Y j i Ztij gtgj 62739 males Y2 84 gm mg 52 females 811 In 811 we have allowed for the possibility that the times for each 2 are not the same writing tij For this data set the times that are potentially available for each individual are the same however as we saw in the dialyzer example above this need not be the case To write the model in matrix form de ne 81739 r r 786gt Clearly the matrix Xi for a given unit will depend on the times of observation for that unit and will have number of rows 71 each row corresponding to one of the n elements of 3 For example for a male with n observations we have We may thus summarize the model as Yi X 62 gtlt 1 where X is the X 6 matrix de ned appropriately for individual 2 PAGE 218 CHAPTER 8 ST 732 M DAVIDIAN OOVARIANOE MATRIX We have to be a little more careful here Because now we are dealing with data vectors Y of different lengths m note that the corresponding covariance matrices must be of dimension gtlt Thus it is not possible to assume that the covariance matrix of all data vectors is identical across 2 For now we will write varei 2239 to recognize this issue 7 the 2 subscript indicates that at the very least the covariance matrix depends on 2 through its dimension m For example suppose we believed that the assumption of compound symmetry was reasonable such that all observations K have the same overall variance 72 say and all are equally correlated no matter where they are taken in time Thus this would be a valid choice even for a situation where the times are different somehow on different units either as in the dialyzer example or because of missing observations As in Chapter 4 to represent this we would have a second parameter 0 For a data vector of length m then no matter where its 71 observations in time were taken the matrix 2239 would be the m X matrix 1 p 2 02 p 1 p p p 1 No matter what the dimension or the time points under this assumption the matrix 2239 would depend on the 2 parameters 72 and p for all 2 and depend only on 2 because of the dimension We will discuss covariance matrices more shortly MULTIVARIATE NORMALITY With the assumption of normality we can thus write the model succinctly as Yi NNniXi672igt39 ADDITIONAL OOVARIATES We in fact can write even more general models which allow for the possibility that we may wish to incorporate the effect of other covariates In reality this does not represent a further extension of the type of models we have already considered as group membership is of course itselfa covariate Recall that we wrote in 89 the Xi matrix in terms ofa group membership indicator 6 technically this is just a covariate like any other The point we emphasize here is that there is nothing preventing us from incorporating several covariates into a model for the mean These covariates may be indicators of other things or continuous PAGE 219 CHAPTER 8 ST 732 M DAVIDIAN HIP REPLACEMENT CONTINUED In the hip replacement study the age of each participant was also recorded and in fact an objective of the investigators was not only to understand differences in haematocrit response across genders but also to elucidate whether the age of the patient has an effect on response It turns out that the sample mean age for males was 6552 years and that for females was 6607 years From Figure 3 the patterns look pretty similar for both genders of course there is no easy way of discerning from the plot whether age affects the response To illustrate inclusion of the age covariate consider the following modi ed model where a is the age of the 2th patient Yij i Ztij gtgj 7a2 62739 males Yz j 4 aky etgj h 62739 female 812 Model 812 says that regardless of whether a person is male or female the mean haematocrit response at any time increases by 87 for every year increase in age keep in mind that 87 could be negative One can envision fancier models where this also depends on gender It is straightforward to write this in matrix notation as before with 81739 r r 7 we can de ne appropriate Xi matrices ie for a male of age ai 1 til tZZl 0 0 0 12 1 tim tgm 000m PARAMETERIZATION It is possible to represent models like those above in different ways For de niteness consider the dialyzer example We wrote the model in 810 as K 81 gtij q dialyzer 2 in center 1 Y2 g 4t j 527 dialyzer 2 in center 2 Y2 5 gtij 527 dialyzer 2 in center 3 It is sometimes more convenient although entirely equivalent to write the model in an alternative parameterization As we have discussed a question of interest is often to compare the rate of change of the mean response over time pressure here among groups In this situation we would like to compare the three slopes 82 84 and 85 PAGE 220 CHAPTER 8 ST 732 M DAVIDIAN De ne 61 1 unit 2 from center 1 0 0w 62 1 unit 2 from center 2 0 0w Then write the model as Y j i 25 1 35 2 mm 562 1t2 j 662 2t2 j 62739 813 There are still 6 parameters overall but the ones in 813 have an entirely different interpretation from those in the rst model It is straightforward to observe by simply plugging in the values of 6H and 6 for each center that the following is true Center lntercept Slope 1 i z 4 s 2 i g 4 e 3 i 4 Note that 82 and 83 have the interpretation of the difference in intercept between Centers 1 and 3 and Centers 2 and 3 respectively and 81 is the intercept for Center 3 Similarly 85 and 85 have the interpretation of the difference in slope between Centers 1 and 3 and Centers 2 and 3 respectively and 81 is the slope for Center 3 This parameterization allows us to estimate as we will talk about shortly the differences of interest directly This same type of parameterization is used in ordinary linear regression for similar reasons This type of parameterization is the default used by SAS PRDC GLM and PRDC MIXED which we will use to implement the analyses we will discuss shortly The different parameterizations yield equivalent models the only thing that differs is the interpretation of the parameters PAGE 221 CHAPTER 8 ST 732 M DAVIDIAN 84 Models for covariance In the last section we noted in gory detail how one may model the mean of each element of a data vector in very exible and general ways We did not say much about the assumption on covariance matrix except to note that when the data are unbalanced with possibly different numbers of observations for each 2 it is not possible to think in terms of an assumption where the covariance matrix is strictly identical for all 2 at least in terms of its dimension We have noted previously that the classical methods make assumptions about the covariance matrix in the balanced case that are either too restrictive or too vague For the approach we are taking in this chapter in contrast to the classical models and methods as we will soon see there is nothing really stopping us from making other assumptions about the covariance matrix in the sense that we will be able to estimate parameters of interest obtain approximate sampling distributions for the estimators and carry out tests of hypotheses regardless of the assumption we make In Chapter 4 we reviewed a number of covariance structures Here we consider using these as possible models for varei 2239 We will be using SAS PRDC MIXED to t the models in this chapter using the method of maximum likelihood to be discussed in section 85 Thus it is useful to recall these structures and note how they are accessed in PRDC MIXED Note that by modeling varei directly we do not explicitly distinguish between amongunit and Withinunit sources of variation In this strategy we just consider models for the aggregate of all sources In the next two chapters we will discuss a re ned version of our regression model for longitudinal data that explictly acknowledges these sources BALANCED CASE It is easiest to discuss rst the case of balanced data Suppose we have a model YZ Si 71 X Under these conditions we may certainly consider the same assumptions of covariance matrix as in the classical case That is assume that the covariance matrix varei is the same for all 2 and equal to 2 where 2 has the form of 0 Compound symmetry SAS PRDC MIXED uses the designation type cs to refer to this as sumption 0 Completely unstructured SAS PRDC MIXED uses the designation type 1111 to refer to this assumption PAGE 222 CHAPTER 8 ST 732 M DAVIDIAN ALTERNATIVE MODELS We now recall the other models Actually there is nothing stopping us from allowing varei to be different for different groups eg in the dental study allow different covariance matrices for each gender We discuss this further below 0 Onedependent Recall that it seems reasonable that observations taken more closely together in time might tend to be more alike77 than those taken farther apart If the observation times are spaced so that the time between 2 nonconsecutive observations is fairly long we might conjecture that correlation is likely to be the largest among observations that are adjacent in time that is occur at consecutive times Relative to the magnitude of this correlation the correlation between observations separated by two time intervals might for all practical purposes be negligible An example of a onedependent model embodying this assumption is This model would make sense even if the times are not equallyspaced in time as they are for example in the dental study 8 10 12 14 It is possible to extend this to a twodependent or higher dependent model or to heterogeneous variances over time as discussed in Chapter 4 SAS PRDC MIXED uses the designation type toep 2 for Toeplitz with 2 diagonal bands to refer to this assumption with the same variance at all times With groups we could believe the onedependent assumption holds for each group but allow the possibility that the variance 72 and correlation p are different in each group The same holds true for the rest of the models we consider 0 Autoregressive of order 1 equallyspaced in time This model says that correlation drops off as observations get farther apart from each other in time The following model really only makes sense if the times of observation are equallyspaced The so called AR1 model with homogeneous variance over time is 1 p 2 pnil H72 2 varei 72 p p p pnil pn72 p 1 SAS PRDC MIXED uses the designation type ar1 to refer to this assumption PAGE 223 CHAPTER 8 ST 732 M DAVIDIAN 0 Markov unequally spaced in time The AR1 model may be generalized to times that are unequallyspaced eg 1 3 4 5 6 7 as in the guinea pig diet data The powers of p are taken to be the distances in time between the observations That is if djk Itij itikly 37k 17w7n7 then the model is SAS PRDC MIXED allows this type of model to be implemented in more than one way eg with the type sppow designation We will consider examples of tting these structures to several of our examples in section 88 The SAS PRDC MIXED documentation as well as the books by Diggle Heagerty Liang and Zeger 2002 and Vonesh and Chinchilli 1997 discuss other assumptions DECIDING AMONG OOVARIANOE STRUCTURES In the balanced case one may use the tech niques discussed in Chapter 4 to gain informal insight into the structure of var6 Inspection of sample covariance matrices scatterplot matrices autocorrelation functions and lag plots can aid the analyst in identifying possible reasonable models These methods can be modi ed to take into account the fact that one believes that the mean vectors follow smooth trajectories over time such as a straight line For instance instead of using the sample means for centering in these approaches one might estimate somehow eg by least squares treating all the individual responses from all units as if they were independent even though we know they are probably not Least squares is clearly not the best way to estimate recall our discussion in Chapter 3 however this estimator may be good enough to provide reasonable estimates of the means at each time tj that take advantage of our willingness to believe they follow a smooth trajectory so might be preferred to using sample means at each 9 on this account In particular if M o ltjv say for a single group we would estimate p by 30 ltj and use this in place of the sample mean A complete discussion of graphical and other techniques along these lines may be found in Diggle Heagerty Liang and Zeger 2002 PAGE 224 CHAPTER 8 ST 732 M DAVIDIAN It is also possible to use other methods to deduce which structure might give an appropriate model we will see this shortly Later in the course we will discuss a popular way of thinking about the problem of modeling covariance and a popular way of taking into account the possibility that we might be wrong when adopting a particular covariance model UNBALANOED CASE Suppose rst that we are in a situation like that of the hip replacement data ie all times of observation are the same for all units however some observations are missing on some units For de niteness suppose as in the hip data we have times t1 t2t3 t4 0123 and suppose we have a unit 2 for which the observation at time t3 is not available Thus the vector YZ39 for this unit is of length n 3 We could represent this situation notationally two different ways i For this unit write YZ39 K1K2K3 to denote the observations at times ti1t 2t 3 O 1 3 Thus in this notation j indexes the number of observations within the unit regardless of the actual values of the times There are 3 times for this unit so 9 123 This is the standard way of representing things generically ii To think more productively about covariance modeling consider an alternative Here let 9 index the intended times of observation This unit is missing time 9 3 thus represent things as YZ YZ 1Y 2Y 4gt at times t1t2t4 013 814 Now consider the models discussed above and the alternative notation Assume we believe that varY 72 for all 9 We thus want a model for lt72 C0VK17 K2 C0VK17 K4 22 VaFY2 COVY 2Y 1 lt72 COVY 2Y 4 COVK47 K1 C0VK47 K2 lt72 o The compound symmetry assumption would be represented in the same way regardless of the missing value all it says is that observations any distance apart have the same correlation Thus under this assumption 2239 would be the 3 X 3 version of this matrix 0 Under an unstructured assumption this matrix becomes convince yourself 2 lt71 012 014 22 012 7 724 2 03914 03924 0394 PAGE 225 CHAPTER 8 ST 732 M DAVIDIAN 0 Under the onedependent model which says that only observations adjacent in time are corre lated this matrix becomes convince yourself 72 p02 0 22 p02 lt72 0 0 72 0 Under the AR1 model this matrix becomes convince yourself 1 9 pg 22 02 p 1 92 3 p2 1 These examples illustrate the main point 7 if all observations were intended to be taken at the same times but some are not available the covariance matrix must be carefully constructed according to the particular time pattern for each unit using the convention of the assumed covariance model Now consider the situation of the ultra ltration data Here the actual times of observation are different for each unit Consider again the above models 0 Here the unstructured assumptions are dif cult to justify Because each unit was seen at a different set of times they cannot share the same covariance parameters so the matrix 2239 must depend on entirely different quantities for each 2 The compound symmetry assumption could still be used as it does not pay attention to the actual values of the times Of course it still suffers from the drawbacks for longitudinal data we have already noted 0 We might still be willing to adopt something like the onedependent assumption in the same spirit as with compound symmetry saying that observations that are adjacent in time regardless of how far apart they might be are correlated but those farther are not However it is possible that the distance in time for adjacent observations for one unit might be longer than the distance for nonconsecutive observations for another unit making this seem pretty nonsensical The AR1 assumption is clearly inappropriate by the same type of reasoning The so called Markov assumption seems more promising in this situation 7 the correlation among observations within a unit would depend on the time distances between observations within a unit PAGE 226 CHAPTER 8 ST 732 M DAVIDIAN Clearly with different times for different units modeling covariance is more challenging In fact it is even hard to investigate the issue informally because the information from each unit is different In the next two chapters of the course we will talk about another approach to modeling longitudinal data that obviates the need to think quite so hard about all of this INDEPENDENCE ASSUMPTION An alternative to all of the above in both cases of balanced and unbalanced data is the assumption that observations within a unit are uncorrelated which with the assumption of multivariate normality implies that they are independent That is if we believe that all observations have constant variance varY 72 take 2239 varei 721m 0 This assumption seems incredibly unrealistic for longitudinal data It says that observations on the same unit are no more alike than those compared across units In a practical sense it implies variation among units must be negligible otherwise we would expect observations on the same individual to be correlated due to this source o It also says that there is no correlation induced by within unit uctuations over time This might be okay if the observations are all taken su iciently far apart in time from one another however may be unrealistic if they are close in time Occasionally this model might be sensible eg suppose the units are genetically engineered mice bred speci cally to be as alike as possible Under such conditions we might expect that the component of variation due to variation among mice might indeed be so small as to be regarded as negligible lf furthermore the observations on a given mouse are all far apart in time then we would expect no correlation for this reason either In most situations however this assumption represents an obvious model misspeci cation ie the model almost certainly does not accurately represent the truth 0 However sometimes this assumption is adopted nonetheless even though the data analyst is fully aware it is likely to be incorrect The rationale will be discussed later in the course SUMMARY The important message is that by thinking about the situation at hand it is possible to specify models for covariance that represent the main features in terms of a few parameters Thus just as we model the systematic component in terms of a regression paraIneter B we may model the random component PAGE 227 CHAPTER 8 ST 732 M DAVIDIAN With models like those above this is accomplished through a few covariance parameters sometimes called variance or covariance components which are the distinct elements of the covariance matrix or matrices assumed in the model 85 Inference by maximum likelihood We have devoted considerable discussion to the idea of modeling longitudinal data directly However we have not tackled the issue of how to address questions of scienti c interest within the context of such a model 0 With a more exible representation of mean response we have more latitude for stating such questions as we have already mentioned 0 For example consider the dental study A question of interest has to do with the rate of change of distance over time 7 is it the same for boys and girls In the context of the classical ANOVA models discussed earlier we phrased this question as one of whether or not the mean pro les are parallel and expressed this in terms of the T4023 Of course in the context of the model given in 85 and 86 the assumption of parallelism is still the focus but it may be stated more clearly directly in terms of slope parameters ie Hot310 1Bgt 0 Furthermore we can do more Because we have an explicit representation of the notion of rate of change77 in these slopes we can also estimate the slopes for each gender and provide an estimate of the difference If the evidence in the data is not strong enough to conclude the need for 2 separate slopes we could estimate a common slope 0 Even more than this is possible Because we have a representation for the entire trajectory as a function of time we can estimate the mean distance at any age for a boy or girl To carry out these analyses formally then we need to develop a framework for estimation in our model and a procedure to do hypothesis testing The standard approach under the assumption of multivariate normality is to use the method of maximum likelihood MAXIMUM LIKELIHOOD This is a general method although we state it here speci cally for our model Maximum likelihood inference is the cornerstone of much of statistical methodology PAGE 228 CHAPTER 8 ST 732 M DAVIDIAN The basic premise of maximum likelihood is as follows We would like to estimate the parameters that characterize our model based on the data we have One approach would be to use as the estimator a value that best explains77 the data we saw To formalize this 0 Find the parameter value that maximizes the probability or likelihood that the observations we might see for a situation like the one of interest would be end up being equal to the data we saw 0 That is nd the value of the parameter that is best supported by the data we saw Recall that we have a general model of the form Yi N NmXz 7 22 where 2239 is a gtlt covariance model depending on some parameters 0 The regression parameter characterizes the mean Suppose it has dimension 1 o Denote the parameters that characterize 2239 as w o For example in the AR1 model to 720 For us the data are the collection of data vectors YZ39 2 1 m one from each unit It will prove convenient to summarize all the data together in a single long vector of length N recall N is the total number of observations 231 m which stacks them on one another Y1 Ym INDEPENDENCE ACROSS UNITS Recall that we have argued that a reasonable assumption is that the way the data turn out for one unit should be unrelated to how they turn out for another Formally this may be represented as the assumption that the YZ39 2 1 m are independent 0 This assumption is standard in the context of longitudinal data and we will adopt it for the rest of the course 0 Recall that this assumption also underlied the univariate and multivariate classical methods PAGE 229 CHAPTER 8 ST 732 M DAVIDIAN JOINT DENSITY OF Y We may represent the probability of seeing data we saw as a function of the values of the parameters and w by appealing to our multivariate normal assumption Speci cally recall that if we believe Yi N NmX 2i then the probability that this data vector takes on the particular value y is represented by the joint density function for the multivariate normal recall Chapter 3 For our model this is My 27rgt i2 3 12 eXpy Xi gt21yi Xi W 815 Because the YZ39 are independent the joint density function for Y is the product of the m individual joint densities 815 ie letting fy be the joint density function for all the data Y thus representing probabilities of all the data vectors taking on the values in y together m m fy lmy HiawrmZlal lZ exp7lty 7 Xi 21y Xi 2gt 816 2 MAXIMUM LIKELIHOOD ESTIMATORS The method of maximum likelihood for our problem thus boils down to maximizing f y evaluated at the data values we saw in the unknown parameters and w The maximizing values will be functions of y These functions applied to the random vector Y yield the so called maximum likelihood ML estimators o 816 is a complicated function of B and w Thus nding the values that maximize it for a given set of data is not something that can be done in closed form in general Rather fancy numerical algorithms the details of which are beyond the scope of this course are used These algorithms form the guts of software for this purpose such as SAS PRDC MIXED and others SPECIAL CASE 7 w KNOWN We rst consider an ideal situation unlikely to occur in practice Suppose we were lucky enough to know w eg if the covariance model were AR1 this means we know 72 and 0 In this case all the elements of the matrix 2239 for all 2 are known In this case it is possible to show using matrix calculus that the maximizer of fy in B evaluated at Y is m 1 m B 2X421Xgt nggfm 817 2 1 2 1 WEIGHTED LEAST SQUARES Note that this has a similar avor to the weighted least squares estimator we discussed in Chapter 3 In fact the estimator B is usually called weighted least squares estimator in this context as well PAGE 230