Applied Regression Analysis
Applied Regression Analysis STAT 51200
Popular in Course
Popular in Statistics
This 25 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 51200 at Purdue University taught by Kristofer Jennings in Fall. Since its upload, it has received 34 views. For similar materials see /class/207938/stat-51200-purdue-university in Statistics at Purdue University.
Reviews for Applied Regression Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/19/15
Statistics 512 Marfan Syndrome Case Study Background According to the Medline Plus Medical Encyclopedia7 Marfan syndrome is an inheritable disorder of connective tissue which adds strength to the body7s structures that affects the skeletal system7 cardiovascular system7 eyes7 and skin7 The most obvious effects of Marfan syndrome are skeletal defects typically recognized in a tall7 lanky person with long limbs and spider like ngers7 chest abnormalities7 curvature of the spine7 and a particular set of facial features7 including a highly arched palate and crowded teeth77 Abraham Lincoln was thought to have Marfan syndrome In 19967 the prevalence of Marfan syndrome was estimated to be 1 in 3000 to 5000 people A common complication of Marfan syndrome is aortic rupture7 a result of the patients aorta growing too large with respect to the rest of the body Ultimately7 the life expectancy of an untreated Marfan patient is around 40 years With treatment7 the life expectancy approaches population norms An important tool for diagnosing Marfan syndrome7 consid ered in several studies in the late 19907s7 uses the relationship between aortic root diameter ADD7 as measured by an echocardiograph and approximate body surface area BSA The following analysis is based on data compiled by Dr Paul Pitlick at the Stanford De partment of Pediatrics Choosing the Model An initial plot of the data with a smoother Figure 1 seems to indicate that while there is some curvature in the data7 a linear t might not be inappropriate AoD 14259 12629BSA7 amp2 398 The t is signi cant F 1315171 lt 000017 and both coef cient estimates are signi cantly different from zero p lt 00001 in both cases The predictor explains 55 R2 05584 of the variability in the response However7 the residual plot is perhaps not satisfactorily scattered and centered around the zero line Figure 27 and the residuals were a bit too right skew to be from a normal distribution See the histogram and qqplot in Figure 3 The tests for normality as applied to the residuals all show a highly signi cant deviation from a normal distribution Previous experience had shown that a log log transformation taking the log of both the predictor and response is necessary to produce the best t lndeed7 the log transformation on the AoD values is suggested by the Box Cox procedure in SAS7 both when the predictor is the original BSA measurement and when the predictor has been log transformed The fact that the R2 jumps to 061 with the log log transformation also makes this suggestion more attractive Body Surface Area and Annie Diameter with Smoolhing Curve 1 n DZ n no m us In na us LDV 1121116151J171Jlil asumm Figure 1 Scatterplot of aortic root diameter AOD against body surface area BSA The plot of the transformed variables Figure 4 makes the data look closer to linear The best tting line is log210D 3300463 logBSA the estimated error standard deviation is 01397 and the R2 value is 06101 Both parameter estimates are still signi cantly different from zero Note that a regression through the origin7 with this and the untransformed data7 would probably not lead to better predictions or inference ls this intuitively satisfactory At the same time7 there is some suggestion that several of the smallest logBSA ob servations may be in uencing the goodness of the curve lndeed7 after taking out the three smallest observations7 the R2 falls to 05133 However7 the resulting predictor 7 logAOD 330 0441logBSA7 6 0140 7 is barely different from the original trans formed estimator So while these points in the lower range of logBSA may have caused an overly optimistic assessment of the t7 the model was not directly in uenced7 by their pres ence The fact that the results of the inference on the parameters do not change p lt 00001 in every case further con rms that this model would probably give us the most accurate predictions and most valid inferential results An analysis of the residuals Figure 5 shows that there are no obvious patterns which would indicate deviations from linearity or from the assumption of constant variance While there is some deviation from normality see Figure 67 the large sample size n 106 en sures reasonably valid inference Body Size and Aonic Diameter Residual Plot 4 n DZ n no m us In na us Lni l l21ll6l51 l7l1 2 asumm Figure 2 Scatterplot of original model residuals against body surface area Inference lnference7 based on the model and parameter estimates7 should help us nd succinct answers to practically important questions Unfortunately7 practical importance can be dif cult to divine without further knowledge of the context of the research For this section7 we will try to suggest interesting questions and see how our inferential techniques can be used to provide answers Prediction Figure 7 shows 95 con dence intervals and prediction intervals for the linear model based on the transformed data The practical signi cance of these bands is not so important7 since most clinical studies are interested in the relationship between body surface area and root aortic diameter for diagnosing Marfan syndrome There is not so much interest in predicting aortic diameter which is hard to measure from the more easily calculated approximation of body surface area There is7 however7 a substantial interest in making predictions of aor tic diameter based on body surface area and previous aortic diameter measurements The methods for these models are beyond the scope of this class Still7 several aspects of the con dence interval and prediction interval regions are worth noting First off7 notice that three data points fall outside the prediction intervals This should not be too surprising7 since we would expect 005 gtlt 106 53 points to fall outside the prediction intervals The probability that three or fewer points would fall outside the prediction region assuming the coverage is exactly 95 is 0218 This probability suggests that the coverage of the prediction intervals is close to valid mummmw demmmw Figure 3 Histogram and qqplot of residuals original model In addition we might be interested in how much better are our predictions when we use BSA as a predictor The R2 of 06101 suggests that the error variance is 1 7 061 039 times the variance of the Y7s Note that this means the ratio of the standard errors is approximately V039 0628 However suppose we wish to compare the length of the con dence interval for the simple linear model at some value X to the con dence interval length for the model with no predictor The null model con dence interval 32193304 covers the linear regression con dence interval at X X 32353289 The ratio of the lengths of these intervals is the 0628 we would expect However at the extreme high end of the predictor distribution where logBSA 05247 the linear model con dence interval is 2321 times as large as the null con dence interval the difference is that the con dence interval in the linear regression case moves with different X values The price we pay for using a predictor is that the interval widens as we get away from the central X value Ultimately the prediction intervals give us a sensible notion of where the root aortic di ameter values should lie for persons with a given body surface area For instance a Marfan patient with a BSA of 1 m2 hence a logBSA value of 0 would rarely have an aortic root diameter of 15 mm 1 Calibration Of course we might instead be interested in diagnosing the sensibility77 of an aortic root diameter reading This would require inverse prediction or calculating which BSA values would likely correspond with a given AoD measurement The book suggests an inverse prediction interval for a response observation Yh of the form Y 7b itn721 spred 1 b1 7 1The prediction would of logAOD would be 312976 with a standard error of V32Std Error Mean Predict2 x0101924r0101372 01392 The hypothesized value of log15 is thus 41234 standard units away from the predicted value The probability that a value so far from the prediction would appear in these patients is approximately 5 X 10 5i Log Body Surface Area and Log Aorlic Diameter wi39lh Smodlhing Curve in m Ll laglESA Figure 4 Scatterplot of transformed aortic root diameter against transformed body surface area where spred is the prediction standard error with Xh For instance7 an aor tic diameter observation of 30 mm will with probability 095 come from a patient with logBSA between 03748 and 08226 or BSA between 06875 and 2276 Note how this inverse prediction interval covers a large portion of the range of the predictor distribution A geometric approximation to this interval can be seen by nding the intersection of the prediction boundaries and the horizontal line at logAOD log30 The x coordinates of these intersections are approximately where you can nd the boundaries of the inverse prediction interval Try this with the right hand graph in Figure 7 Comparison with Normal Patients The primary clinical use for this data is to compare the prediction model for Marfan patients with the same predictions for normal patients Figure 8 shows the prediction line for Mar fan patients with a 95 Working Hotelling con dence region along with the expectation line for normal patients ElogAOD 30189 04629logBSA There is an obvious difference between the lines nearly any null hypothesis test comparing the two lines would be con dently rejected On Joint Con dence Regions Of course7 the Working Hotelling band is just one example of a con dence region for the entire line Using the Bonferroni correction to form it7 a jointly 95 con dence region for the Marfan parameters 7 a rectangle with sides 32667 3329 for 60 and 038017 05450 for 61 7 does not cover the normal patient parameter values Residual Scanerplot wi39lh Transformed Variables RH unl us lunlBSM Figure 5 Scatterplot of residuals frorn nal model against log transformed body surface area If our interest were only in describing patients with a few distinct BSA values7 we might get even better results using the Bonferroni corrected or Scheffe corrected rnean prediction intervals The Working Hotelling cutoff value W for this data is 2483 This cutoff value is achieved in other words7 B tn2172 g 2 2483 when 9 gt 3 for the Bonferroni correction or when 9 2 2 for the Scheff correction This means that when we are forming a con dence region with more than 3 different Bonferroni corrected con dence intervals or more than one Scheff corrected con dence interval7 the actual joint coverage level is much greater than 95 In other words7 we would do better ie7 get more power using the Working Hotelling bands7 which7 although they are designed to provide correct coverage for a joint 10017 oz con dence region for my number of different con dence intervals7 give srnaller ie7 more powerful con dence regions than the Bonferroni or Scheff regions On the other hand7 if for the sake of argument we only ever wished to get 95 coverage jointly on 2 different rnean values7 we would get a smaller con dence region using the Bonferroni correction on each component interval Power Notice that the value of 61 in the normal data 04629 is well within the 95 con dence interval 0390647 053446 for 61 in the Marfan data Before concluding that the difference between the slopes is not signi cant7 we should consider the power of a test whose null hy pothesis is that 61 04629 Figure 9 shows the value of the power for different alternative values of 61 Only a clinical expert could decide what values would constitute practically signi cant differences7 but it seems that many small differences would be detected with high probability in this study It should not be much of a surprise that the power for detecting the difference between the normal and Marfan slope 7 a test with very large p value 7 is close to 005 wumdhmm Figure 6 Histogram and qqplot of residuals from the nal model w mum mum Figure 7 Con dence intervals and prediction intervals for transformed data The upshot of the indistinguishability of the normal and Marfan population slope pararneters depends on the ultimate goal ofthe researcher If the slope value from the normal population is measured exactly7 then one might be inclined to believe that that number should be used for the slope in the Marfan population However7 its not clear which set of parameter values will give the best prediction in the extreme tails of the data distribution At the same tirne7 predictions based on the two slope values that are within the range of the data would be very close Of course7 if the slope number from the normal population is based on data of a simi lar sample size with the same error variance7 then a larger7 multivariate model would be needed to address questions of differing slope parameter values Overlap and Classi cation An important aspect of the clinical usefulness of this data relies on its discrirninatory power the practitioner wants to know if these two measurements BSA and AoD can be used to discriminate between healthy and Marfan patients While we dont cover problems of clas Working Hotelling con dence region and the line for normal petiems mum mum Figure 8 The dotted expectation line for normal patients does not fall within the 95 con dence region for Marfan patients si cation in this class remember7 the response follows a binomial 01 distribution in that case7 we can arrive at some preliminary insights about the best classi cation rule for these data Given that the relationship between BSA and ADD in both populations 7 as well as the boundaries of the prediction intervals 7 are all lines with nearly the same slope7 we would expect that the classi cation rule will also be a line In other words7 there is probably some slopeintercept combination 6061 such that for a patient with a given BSAADD reading Xth7 if logYh 7 60 61 logXh gt 07 we can be reasonably certain that the patient has Marfan syndrome As it is7 the intercepts between the two models differ by 02788 log mm units The standard error of prediction is approximately half this value over the entire range of the logBSA values If we assume the slopes are equal7 putting a line midway between the lines for the two population would mean that approximately 16 of patients the probability of going past one standard deviation in the t distribution in each population would be misclassi ed ls this a practically useful misclassi cation rate Conclusion Think about this example carefully Are there additional questions that our inferential methods could answer Are there practically important questions that cannot be answered with our methods so far Are there further practically important conclusions that we could arrive at using the results of this analysis Finally7 think about how changing different values Power Calculalion for Null Hypothesis of No Slope Di erenoe Figure 9 Power curve for the detecting deviations of the slope for the Marfan population from the slope for the normal population 7 eg7 the design7 the 04 level7 the sample size 7 would affect the results of this analysis What would you postulate would be the effects on the Classi cation rule Statistics 512 Applied Linear Models Topic 9 Topic Overview This topic will cover Random vs Fixed Effects Using EMS to obtain appropriate tests in a Random or Mixed Effects Model Chapter 25 Oneway Random Effects Design Fixed Effects vs Random Effects Up to this point we have been considering xed effects models in which the levels of each factor were xed in advance ofthe experiment and we were interested in differences in response among those speci c levels Now we will consider random effects models in which the factor levels are meant to be representative of a general population of possible levels We are interested in whether that factor has a signi cant effect in explaining the response but only in a general way For example were not interested in a detailed comparison of level 2 vs level 3 say When we have both xed and random effects we call it a mixed effects model The main SAS procedure we will use is called proc mixed77 which allows for xed and random effects but we can also use glm with a random statement We7ll start rst with a single random effect In some situations it is clear from the experiment whether an effect is xed or random However there are also situations in which calling an effect xed or random depends on your point of view and on your interpretation and understanding So sometimes it is a personal choice This should become more clear with some examples Data for oneway design Y the response variable Factor with levels 239 l to r i is the jth observation in cell 239 j l to n A balanced design has n n KNNL Example KNNL page 1036 knn11036sas Y is the rating of a job applicant Factor A represents ve different personnel interviewers of cers7 r 5 levels 71 4 dz ereat applicants were randomly chosen and interviewed by each interviewer ie 20 applicants applicant is not a factor since no applicant was interviewed more than once The interviewers were selected at random from the pool of interviewers and the appli cants were randomly assigned to interviewers Here we are not so interested in the differences between the ve interviewers that happened to be picked ie does Joe give higher ratings than Fred7 is there a difference between Ethel and Bob Rather we are interested in quantifying and accounting for the effect of interviewer in general There are other interviewers in the population at the company and we want to make inference about them too Another way to say this is that with xed effects we were primarily interested in the means of the factor levels and the differences between them With random effects7 we are primarily interested in their variances Read and check the data data interview infile hSystemDesktopCH24TA01DAT input rating officer proc Obs DOONOEO39IprNMD k print datainterview rating officer 76 65 85 m H m m m m m m M M M M H H H H 19 80 5 20 79 5 Plot the data titlel Plot of the data symboll vcircle inone cblack proc gplot datainterview plot ratingofficer run Plot of he data a n anv Find and plot the means proc means datainterview output outa2 meanavrate var rating by officer titlel Plot of the means symboll vcircle ijoin cblack proc gplot dataa2 plot avrateofficer run Plot Of 19 means 3 mu Random effects model cell means This model is also called 0 ANOVA Model H o A variance components model Ym39 M W o The M are iid Nu 0 NOTE THIS IS DIFFERENT o The EM are iid N002 o M and EM are independent 0 Y Nuai02 Now the M are random variables with a common mean The question of are they all the same77 can now be addressed by considering whether the variance of their distribution 0 is zero Of course the estimated means will likely be different from each other the question is whether the difference can be explained by error 02 alone The text uses the symbol 02 instead of 0 they are the same thing I prefer the latter nota tion because it generalizes more easily to more than one factor and also to the factor effects model Two Sources of Variation Observations with the same i eg the same interviewer are dependent and their covariance is 73 The components of variance are 0 and 02 We want to get an idea of the relative magnitudes of these variance components Random factor effects model Same basic idea as before M u 1 04 The model is YM u 1 04 1 EM 04139 N N 07 0 Em N N07 02 Ym N NW7 0 02 The book uses 03 instead of 0 here Despite the different notations 0E and 02 are really the same thing because M and 04 differ only by an additive constant 1 so they have the same variance That is why in these notes I m using the same symbol 0 to refer to both With two factors we will have to distinguish between these Parameters There are two important parameters in these models 0 and 02 also 1 in the The cell means MM are random variables not parameters 2 2 We are sometimes interested in estimating 0 2033 A In some applications it is called the intraclass correlation coe cient It is the correlation between two observations with the same 2 ANOVA Table 0 The terms and layout of the ANOVA table are the same as what we used for the xed effects model 0 The expected mean squares EMS are different because of the additional random effects so we will estimate parameters in a new way 0 Hypotheses being tested are also different EMS and parameter estimates EMSE 02 as usual We use MSE to estimate 02 EMSA 02 710 Note that this is different from before From this you can see that we should use ml to estimate 0 Hypotheses H0 2 Ti 0 H1 2 7 f 0 The test statistic is F MSAMSE with r 7 1 and rn 7 1 degrees of freedom since this ratio is 1 when the null hypothesis is true reject when F is large and report the p value Note that in the one factor analysis the test is the same it was before This WILL NOT be the case as we add more factors SAS Coding and Output run proc glm With a random statement proc glm datainterview class officer model ratingofficer random officer Sum of Source DF Squares Mean Square F Value Pr gt F Model 4 1579700000 394925000 539 00068 Error 15 1099250000 73283333 Corrected Total 19 2678950000 Random statement output Source Type III Expected Mean Square officer VarError 4 Varofficer This is SAS7s way of saying EMSA 02 40 note 71 4 replicates proc varcomp This procedure gets the variance components proc varcomp datainterview class officer model ratingofficer MIVQUEO Estimates Variance Component rating Varofficer 8041042 VarError 7328333 Other methods are available for estimation mivque is the default SAS is now saying VarError dz 7328333 notice this is just MSE 394925 7 73283 MSA 7 MSE Varo f fmr 3 8041042 4 n As an alternative to using proc glm with a random statement7 and proc varcomp7 you could instead use proc mixed7 which has some options speci cally for mixed models proc mixed proc mixed datainterview cl class officer model rating random officervcorr The C1 option after datainterview asks for the con dence limits The Class statement lists all the categorical variables just as in glm The model rating line looks strange ln proc mixed7 the model statement lists only the red e ects Then the random effects are listed separately in the random statement In our example7 there were no xed effects7 so we had no predictors on the model line We had one random effect7 so it went on the random line This is different from glm7 where all the factors xed and random are listed on the model line7 and then the random ones are repeated in the random statement 0 Just in case you7re not confused enough7 proc varcomp assumes all factors are random effects unless they are speci ed as xed Proc mixed gives a huge amount of output Here are some pieces of it Covariance Parameter Estimates C0v Parm Estimate Alpha Lower Upper officer 804104 005 244572 149897 Residual 73 2833 005 399896 175 54 The estimated intraclass correlation coef cient is 1 6734 i a i amp i 0 5232 7 372 7 37 804104732833 39 39 About half the variance in rating is explained by interviewer Output from vcorr option This gives the intraclass correlation coef cient Row C011 C012 C013 C014 1 10000 05232 05232 05232 2 05232 10000 05232 05232 3 05232 05232 10000 05232 4 05232 05232 05232 10000 Con dence Intervals o For M the estimate is 17quot and the variance of this estimate under the random effects model becomes 02437 W which may be estimated by 5245 See page 1038 for derivation if you like To get a Cl we use a t critical value with r 7 1 degrees of freedom 0 Notice that the variance here involves a combination of the two errors and we end up using MSA instead of MSE in the estimate we used MSE in the xed effects case 0 We may also get point estimates and Cl7s for 02 0 and the intraclass correlation UiUi0 2 See pages 1040 1047 for details All ofthese are available in proc mixed Applications 0 In the KNNL example we would like Ti039 02 to be small7 indicating that the variance due to interviewer is small relative to the variance due to applicants 0 In many other examples we would like this quantity to be large One example would be measurement error if we measure 7 items 71 times each7 02 would represent the error inherent to the instrument of measurement Twoway Random Effects Model Data for twoway design 0 Y7 the response variable 0 Factor A with levels 239 1 to a 0 Factor B with levels j 1 to b 0 Yle is the kth observation in cell 2397j k 1 to 711 o For balanced designs7 n 71m KNNL Example 0 KNNL Problem 25157 page 1080 knnllO80sas 0 Y is fuel ef ciency in miles per gallon 0 Factor A represents four different drivers7 a 4 levels 0 Factor B represents ve different cars of the same model7 b 5 0 Each driver drove each car twice over the same 40 mile test course 71 2 Read and check the data data efficiency infile hSystemDesktopCH24PR15DAT input mpg driver car proc print dataefficiency Obs mpg driver c 25 25 M 00 HOOOUDIQCA ar 1 1 2 2 3 3 01015me A O HHHHHH 7 284 1 4 8 279 1 4 9 271 1 5 10 266 1 5 Prepare the data for a plot and plot the data data efficiency set efficiency dc driver10 car titlel Plot of the data symboll vcircle inone cblack proc gplot dataefficiency plot mpgdc Plotoflhedata Find and plot the means proc means dataefficiency output outeffout meanavmpg var mpg by driver car titlel Plot of the means symboll v A ijoin cblack symb012 v B ijoin cblack symbolS v C ijoin cblack symbol4 v D ijoin cblack symbolS v E ijoin cblack proc gplot dataeffout plot avmpgdrivercar Plot of the means mm nv mi m2 m5 999 e e es Random Effects Model Random cell means model Yam MM ELM 0 MM Np7ai NOTE THIS IS DIFFERENT o 61 NM N0702 as usual 0 pig7 ELM are independent 0 The above imply that Yam N NM7ai 02 Dependence among the Yle can be most easily described by specifying the covariance matrix of the vector Random factor effects model Yam M 04 67 046 61373 where 04139 6739 a m 0 N N N 070 N 070329 N07 UiB 2 2 2 2 UAUBUABU Now the component 02 from the cell means model can be divided up into three components A7 B7 and AB That is7 02 0 0 033 10 Parameters 0 There are ve parameters in this model u 0 023 033 02 o The cell means are random variables7 not parameters ANOVA Table 0 The terms and layout of the ANOVA table are the same as what we used for the xed effects model 0 However7 the expected mean squares EMS are different EMS and parameter estimates E 02b7wi7wiB 02 mag 71033 EMSAB U2 Wig 02 Estimates of the variance components can be obtained from these equations or other meth ods Note the patterns in the EMS these hold for balanced data They all contain 02 For MSA7 it also contains all the 02s that have an A in the subscript 0 and 033 similarly for the other MS terms The coef cient of each term except the rst is the product of n and all letters not repre sented in the subscript It is also the total number of observations at each xed level of the level corresponding to the subscript eg there are nb observations for each level of A Hypotheses i 2 7 i 2 HOAJTAiOy HlAUA7 0 HogiUg0 H13U 7 0 HoAB UiB 0 HlAB UiB 31 0 Hypothesis HoA o HoAiUi0H1AiUi7 0 o EMSA 02 bnai 71033 0 EMSAB 02nUiB EMSE UZ 0 Need to look for the ratio that will be 1 when H0 is true and bigger than 1 when it is false So this hypothesis will be tested by F AJSA AJSAB not the usual xed effects test statistic The degrees of freedom for the test will be the degrees of freedom associated to those mean squares a 717a 71b 71 0 Notice you can no longer assume that the denominator is MSEHH Note that the test using MSE is done by SAS7 but it is not particularly meaningful it sort of tests both main and interaction at once Hypothesis HOB OHOBU0H13307 0 o EMSB 02 mag 71033 0 EMSAB 02 71033 EMSE U2 0 So HOB is tested by F Hypothesis HoAB AJSB MSAB with degrees of freedom b 717a 71b 71 2 7 2 OHOABUAB70H1ABUAB7 0 o EMSAB 02 71033 EMSE U2 0 So HoAB is tested by F Alf15 with degrees of freedom 1 7 1b 7 17abn 7 1 Run proc glm proc glm dataefficiency class driver car model mpgdriver car drivercar random driver car drivercartest Model and error output Source DF Model 19 Error 20 Corrected Total 39 Sum of Squares 3774447500 35150000 3809597500 Mean Square F Value Pr gt F 198655132 11303 lt0001 01757500 Factor effects output Source DF Type I SS Mean Square F Value Pr gt F driver 3 280 2847500 93 4282500 car 4 94 7135000 23 6783750 drivercar 12 24465000 02038750 116 03715 Only the interaction test is valid here the test for interaction is MSABMSE7 but the tests for main effects should be MSAMSAB and MSBMSAB which are done with the test statement7 not MSE as is done here However7 if you do this the main effects are signi cant as shown below Lesson just because SAS spits out ap ualue doesn t mean it is for a meaningful test Random statement output Source Type III Expected Mean Square driver VarError 2 Vardrivercar 10 Vardriver car VarError 2 Vardrivercar 8 Varcar drivercar VarError 2 Vardrivercar Randomtest output The GLM Procedure Tests of Hypotheses for Random Model Analysis of Variance Dependent Variable mpg Source DF Type III SS Mean Square F Value Pr gt F driver 3 280284750 93428250 45826 lt0001 car 4 94713500 23678375 11614 lt0001 Error 12 2446500 0203875 Error MS drivercar This last line says the denominator of the F tests is the MSAB Source DF Type III SS Mean Square F Value Pr gt F drivercar 12 2446500 0203875 116 03715 Error MSError 20 3515000 0175750 For the interaction term7 this is the same test as was done above proc varcomp proc varcomp dataefficiency class driver car model mpgdriver car drivercar MIVQUE0 Estimates Variance Component mpg Var driver 9 32244 Varcar 293431 Var drivercar 001406 Var Error 0 17575 Mixed Models Twoway mixed model Two way mixed model has 0 One xed main effect 0 One random main effect 0 The interaction is considered a random effect Tests 0 Fixed main effect is tested by interaction in the denominator 0 Random main effect is tested by error 0 Interaction is tested by error 0 Notice that these are backwards from what you might intuitively extrapolate from the two way random effects and two way xed effects model See Table 255 page 1052 and below for the EMS that justify these statements Also see Table 256 for the tests page 1053 Notation for twoway mixed model Y7 the response variable A7 the xed effect 1 levels E7 the random effect b levels We7ll stick to balanced designs n Factor effects parameterization Yam M cw 5739 045M ELM Where 0 M is the overall mean7 0 al are xed but unknown xed main effects with ai 07 o 37 are N07 0123 independent random main effects7 0 046m are random interaction effects Randomness is catching so the interaction between a xed and a random effect is considered random and has a distribution However7 the interactions are also subject to constraints kind of like xed effects 046m N07 flaiB subject to the constraint Zimmm 0 for each j Because of the constraints7 046m having the same j but different 239 are negatively correlated7 with covariance COVltOL ij7OLBij 7973 Expected Mean Squares 2 b 2 2 EMSA 039 ailzainaa EMSB 02 mm EMSAB U2 710 EMSE U2 SAS proc glm writes these out for you but it uses the notation QA to denote the xed quantity Zn 0412 It uses the names VarErr0r 027 VarB 0237 and VarA gtlt B 033 It doesnt actually use the names A and B it uses the variable names Looking at these EMS7 we can see that different denominators will be needed to test for the various effects HOA all a 0 is tested by F 1le HOB U123 0 is tested by F MsE H0143 U33 0 is tested by F Wig39 So7 though it seems counterintuitive at rst7 the xed effect is tested by the interaction7 and the random effect is tested by the error Example KNNL Problem 2516 knn11080asas Y service time for disk drives A make of drive xed7 with a 3 levels B technician performing service random7 with b 3 levels The three technicians for whom we have data are selected at random from a large number of technicians who work at the company data service infile hstat512datasetsch19pr16dat input time tech make k mt make10tech proc print dataservice proc glm dataservice class make tech model time make tech maketech random tech maketechtest The GLM Procedure Dependent Variable time Sum of Source DF Squares Mean Square F Value Pr gt F Model 8 1268177778 158522222 305 00101 15 Error 36 1872400000 52011111 Corrected Total 44 3140577778 RSquare Coeff Var Root MSE time Mean 0403804 1291936 7211873 5582222 Source DF Type I SS Mean Square F Value Pr gt F make 2 28311111 14155556 tech 2 24577778 12 288889 024 07908 maketech 4 1215288889 303822222 584 00010 We have MSA 1416 MSB 1229 MSAB 30382 and MSE 5201 The GLM Procedure Source Type III Expected Mean Square make VarError 5 Varmaketech Qmake tech VarError 5 Varmaketech 15 Vartech maketech VarError 5 Varmaketech To test the xed effect make we must use the interaction FA MSAMSAB 141630382 005 with 24 df p 0955 To test the random effect tech and the interaction we use error FB MSEMSE 12295201 024 with 236 df p 07908 FAB MSABMSE 303825201 584 with 436 df p 0001 The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable time Source DF Type III SS Mean Square F Value Pr gt F make 2 28311111 14155556 005 09550 tech 2 24577778 12 288889 Error MSmaketech 4 1215288889 303822222 Source DF Type III SS Mean Square F Value Pr gt F maketech 4 1215288889 303822222 584 00010 Error MSError 36 1872400000 52011111 Threeway models We can have zero one two or three random effects etc EMS indicate how to do tests In some cases the situation is complicated and we need approximations eg when all are random use MSAB MSAC 7 MSABC to test A