### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Experimental Statistics For Biological Sciences II ST 512

NCS

GPA 3.79

### View Full Document

## 100

## 0

## Popular in Course

## Popular in Statistics

This 258 page Class Notes was uploaded by Jordane Kemmer on Thursday October 15, 2015. The Class Notes belongs to ST 512 at North Carolina State University taught by Jason Osborne in Fall. Since its upload, it has received 100 views. For similar materials see /class/223954/st-512-north-carolina-state-university in Statistics at North Carolina State University.

## Reviews for Experimental Statistics For Biological Sciences II

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/15/15

Rough Table of Contents for notes that accompany the lectures Topic Pages Simple linear regression SLR heights example 1 9 Simple linear regression 10 Correlation 12 20 Probability Model for SLR 21 Multiple linear regression 33 Matrix formulation for MLR 36 Variable selection in MLR 41 52 Partial and seq sums of squares 47 50 ANOVA as regression 69 General linear model 72 ANCOVA 82 88 Lack of t 89 92 One way ANOVA7contrasts 93 102 Multiple comparisons 103 109 Expected mean squares7 part I 110 111 Sample size computation 112 116 Orthogonal polynomial contrasts 117 121 Multi factorial expts 122 2 X 2 expts 122 133 More than 2 levels 134 More than 2 factors 141 Unbalanced example 149 Blocking 156 165 Latin squares 166 175 One way random e ects 176 189 Mixed models 190 235 Split plot expts 236 255 Instructor Email Of ce Hours Course Website Lecture Labs TAs Text Lecture notes Computing Course Description ST512 R is an applied course that introduces statistical methods based on linear models commonly used in designed experiments With continuous response variables Examples include multiple linear regression7 factorial designs7 and split plot ex ST512R Fall 2008 Syllabus Jason A Osborne Patterson Hall7 Room 16 Phone 515 1922 osborne statncsuedu for administrative matters only For technical questions7 please see me F 1010 am 1200 pm subject to change www4statncsueduosbornest512r M7W7F 910 am 1000 am7 HA 320 HA G100 SlCL lab7 Tues 430 pm 545 pm HA G100 SlCL lab7 Thur 130 pm 245 pm Arun Krishna7 akrishn ncsu edu Yu Cheng Ku7 yku2ncsu edu Statistical Research Methods in the Life Sciences by PV Rao7 19987 2007 BrooksCole Online or at SirSpeedy 834 8128 The statistical software package SAS Will be used extensively The labs are intended in part to facilitate the learning of this package SAS is available free of charge for NCSU students periments It is a prerequisite for most advanced courses in statistics Topic Chapter Simple and multiple linear regression 1011 General linear models7 ANCOVA 12 Factorial experiments 13 Random and mixed effects models 14 Blocking the RCBD and latin square 15 Mixed models 14 Repeated measures and split plots 16 ST512 R Fall 2008 Syllabus continued Policies of the instructor Attendance While not mandatory regular attendance and oral participation are encour aged They will not be considered in the grading process Graded Coursework o Homework 5 assignments the avg of which counts 25 of total grade 0 First midterm exam September 26 25 of total grade 0 Second midterm exam October 31 25 of total grade 0 Final Exam December 8 800 am 1100 am 25 of total grade 0 Students achieving Z 90 of the total points will receive at least an A Z 80 of the total points will receive at least a B 2 70 of the total points will receive at least a C Homework Working together on homework is acceptable but assignments must be sub mitted individually lf SAS printouts are included in work output and page counts must be kept to a minimum and pertinent elements of the output must be explained clearly Exams Notes may be written on a standard size sheet of paper front and back and used for exams a single sheet for each exam Aside from this allowance exams are closed book No make up exams will be allowed If a valid excuse for missing an exam is provided the unmissed exams will be reweighted to account for a total of 80 of the grade Academic Integrity Academic misconduct such as cheating on exams will not be toler ated Please see the NCSU policy at the link below http wwwncsuedupoliciesstudent ervicesnavigationphp continued studentdiscipline CwborneS1512 1 ST 512 Exptl Stats for Biol Sciences II Osborne Week 1 Simple linear regression 7 height example Reading Chapter 10 The association between height of adults and their parents Scatterplot of heights uffspnm helqh 75 tig1er History of Statistics pg 285 gives Galton s famous data I on heights of sons colunms 1 and average parents height rowsx I I scaled to represent a male height essentially 301157 heights versus I l 39 39 l fathers heights Taken f om Di y web 617 622 632 642 652 662 672 682 692 702 712 722 732 737 730 o o o o o o o o o o o 3 725 o o o o o o o 1 2 1 2 7 2 4 715 o o o o 1 3 4 3 5 1o 4 9 2 2 705 1 o 1 o 1 1 3 12 18 14 4 3 3 695 o o 1 16 4 17 27 2o 33 25 2o 11 4 5 685 1 o 7 11 16 25 31 34 48 21 18 4 3 o 675 o 3 5 14 15 36 38 28 38 19 11 4 o o 665 o 3 3 5 2 17 17 14 13 4 o o o o 655 1 o 9 5 7 11 11 7 7 5 2 1 o o 645 1 1 4 4 1 5 5 o 2 o o o o o 640 1 o 2 4 1 2 2 1 1 o o o o o The points in the scatterplot on the right have been jittered to convey the frequencies of heights in the dataset Osborne STEZZ 2 Consider a statistical model for these data randomly sampled from some population of interest In particular choose a model which accounts for the apparent linear dependence of the mean height of sons on midparent height X Let Y1 Yn denote the sons heights Given X 1 Y16061 E fort1nn928 where E1 En are 0 independent 0 identically and o normally distributed random variables with mean 0 and error variance 02 Write E1 N0 02 This implies 1 Mac EYlX 96 60 6196 2 VarYlX L 02 Three unknown parameters 606102 quantify the whole population of interest Question Suppose we ignore midparent height 1 Consider esti mating the mean E Propose a model Propose a method for obtaining a con dence interval for the mean height of the sons in the population from which these data were randomly sampled Use summary statistics on page 6 to complete this naive analysis Osborne STEZZ H D qgtoo Cf Ch 1 00 10 1 H 12 Many questions to answer using regression analysis What is the meaning in words of 61 Truefalse a 61 is a statistic b 61 is a parameter c 61 is unknown What is the observed value of 5 1 Truefalse a 5 1 is a statistic b 5 1 is a parameter c 5 1 is unknown IS 61 61 How much does 5 1 vary about 61 from sample to sample Pro vide an estimate of the standard error as well as an expression indicating how it was computed What is a region of plausible values for 61 suggested by the data What is the line that best ts these data using the criterion that sniallest sum of squared residuals is best How much of the observed variation in the heights of sons the y axis is explained by this best line What is the estimated average height of sons whose niidparent height is 1 68 ls this the true average height in the whole population of sons whose niidparent height is 1 68 Under the model what is the true average height of sons with niidparent height 1 68 Osborne ST512 4 13 14 15 16 17 18 19 20 2 H 22 What is the estimated standard deviation among the population of sons whose parents have midparent height 1 68 Would you call this standard deviation a standard error What is the estimated standard deviation among the population of sons whose parents have midparent height 1 72 Bigger smaller or the same as that for 1 68 ls your answer obviously supported or refuted by inspection of the scatterplot What is the estimated standard error of the estimated average for sons with midparent height 1 68 M68 60 6861 Provide an expression for this standard error ls the estimated standard error of M72 bigger smaller or the same as that for 468 ls the observed linear association between son s height and mid parent height strong Report a test statistic What quantity can you use to describe or characterize the linear association between height and midparent height in the whole population ls this a parameter or a statistic Let Y denote the height of a male randomly sampled from this population and X his midparent height ls it true that the pop ulation correlation coef cient p satis es De ne My 03 MX 0X p Parameters or statistics What are plausible values for p suggested by the data ls E1 E928 jg N0 02 a reasonable assumption Osborne STEZZ options ls75 nodate data Galton array cdata14 if n 1 then input cdata1cdata14 Q retain cdata1cdata14 drop cdata1cdata14 i input parent 0 o i 1 to 14 input count 0 soncdatai output end S 617 622 632 642 652 662 672 682 692 702 712 722 732 737 730 0 0 0 0 0 0 0 0 0 0 0 1 3 0 725 0 0 0 0 0 0 0 1 2 1 2 7 2 4 715 0 0 0 0 1 3 4 3 5 10 4 9 2 2 705 1 0 1 0 1 1 3 12 18 14 7 4 3 3 695 0 0 1 16 4 17 27 20 33 25 20 11 4 5 685 1 0 7 11 16 25 31 34 48 21 18 4 3 0 675 0 3 5 14 15 36 38 28 38 19 11 4 0 0 665 0 3 3 5 2 17 17 14 13 4 0 0 0 0 655 1 0 9 5 7 11 11 7 7 5 2 1 0 0 645 1 1 4 4 1 5 5 0 2 0 0 0 0 0 640 1 0 2 4 1 2 2 1 1 0 0 0 0 0 proc print dataGaltonobs100 run data big set galton drop j count do j1 to countoutput end proc print databigobs20 proc means var son parent data questions these values used for prediction or estimation at x68x72 input parent son cards data big set big questions run proc reg model sonparentclb output outout1 residualr pyhat uclpihigh lclpilow uclmcihigh lclmcilow stdpstdmean data questions set out1 I proc print title quotquestions regarding prediction estimation when x68 x72quot run Osborne STEZZ data fisherz n928 rsqrt02105 rratio1r1 r zprobit0975 exponprobit0975sqrtn 3 rlowrratioexp 2expon 1rratioexp 2expon1 rhighrratioexp2expon 1rratioexp2expon1 run proc printrun goptions devpslepsf colorsblack symboll irl valuedot proc gplot plot sonparent run quit proc univariate dataout1 normal plot title quotresidual analysisquot var r run The SAS System The MEANS Procedure Variable N Mean Std Dev Minimum Maximum son 928 680884698 25179414 617000000 737000000 parent 928 683081897 17873334 640000000 730000000 Osborne STEZZ The REG Procedure Dependent Variable son 1 68 2 72 Dbs n r rratio 1 928 045880 269551 Sum of 195996 Analysis of Variance 678893 007457 677429 704745 016871 701434 0 expon 064443 Mean Source DF Squares Square F Value Pr gt F Model 1 123693401 123693401 24684 lt0001 Error 926 464027261 501109 Corrected Total 927 587720663 Root MSE 223855 R Square 02105 Dependent Mean 6808847 Adj R Sq 02096 Coeff Var 328770 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 2394153 281088 852 lt0001 parent 064629 004114 1571 lt0001 Parameter Estimates Variable DF 95 Confidence Limits Intercept 1842510 2945796 parent 056556 072702 questions regarding prediction estimation when X68 X72 3 Dbs parent son yhat stdmean cilow cihigh pilow pihigh r 680356 634936 722849 708056 660688 748801 rlow rhigh 040645 050815 Osborne ST512 8 H weww 1 1 12 l Q 9051 Answers to questions from simple linear regression analysis of Galton s height data Change in average son s height inches per one inch increase in niidparent height in the whole population 61 is an unknown parameter 6 1 065 son inchesniidparent inch from output 6 1 065 is an observed value of a statistic 61 is the slope of the population mean 6 1 is the slope from the SLR of the observed data 6 1 61 is unlikely JSWElSm 004 from output Add and subtract about 2 SE to get 057 073 y 239 06595 r2 21 M68 6 0 6861 679 from output also Not sure as M68 60 6861 is unknown M68 60 6851 Osborne ST512 9 13 14 15 16 17 18 19 20 2 l 22 MSE 224 Not a SE xMS 224 Assume homoscedasticity S E 30 6831 007 Expressions given by A A i i 68 52 SEltMlt68gtgt e MSE n Z 2 168 MSEX X11 68 X a 928 x 2 design matrix SAEmm gt flaws 7 V021 046 moderate positive p is a population correlation coef cient True These parameters describe the bivariate population of son and midparent heights Using the complicated expression in Rao and in notes the con dence interval is 1i 722 7273 1 lire 7 geihVniS 1 lienV7273 1 177 ewV7273 1 or 041lt p lt 051 Residuals reasonably symmetric no heavy tails Osborne STEZZ 10 ST 512 EXptl Stats for Biol Sciences II Week 2 Simple linear regression Reading Ch 10 An example The association between corn yield and rainfall Yields y in bushels acre on corn raised in six niidwestern states from 1890 to 1927 recorded with rainfall 1 inchesyr y17quot397y38 and 5E1w738 Year 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 Yield 245 337 279 275 217 319 368 299 302 32 Rainfall 96 129 99 87 68 125 13 101 101 101 Year 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 Yield 34 194 36 302 324 364 369 315 305 323 Rainfall 108 78 162 141 106 10 115 136 121 12 Year 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 Yield 349 301 369 268 305 333 297 35 299 352 Rainfall 93 77 11 69 95 165 93 94 87 95 Year 1920 1921 1922 1923 1924 1925 1926 1927 Yield 383 352 355 367 268 38 317 326 Rainfall 116 121 8 107 139 113 116 104 Osborne STEZZ 11 A scatterplot provides some indication of an association between 1 and y In particular yields increase with rainfall Corn yields from 18901927 g 9 8 2 8 7 2 g m m gt N 8 8 10 12 14 16 X inches of rain Some questions 0 How can we describe the association between yield and rainfall Does it appear linear o How can we measure the strength of the linear association 0 To what degree is the variability in yield described or explained by its association with rainfall o How can we use this association to estimate average yield given a certain level of rainfall o How can we use this association to predict future yield if we have an idea about what the rainfall will be Osborne STEZZ 12 Correlation De nition The sample correlation coef cient my of the paired data 17912792v39vnvyn is de ned by r zm mo mmw4gt sm vzm mmw4wzo mmm4gtsm SM is called the sample covariance of 1 and y m ma m n l Sxy Some properties of my 0 my is a measure of the linear assn between 1 and y in a dataset o correlation coe icients always between 1 and 1 1 3 my 3 1 o The closer my is to 1 the stonger the positive linear association between 1 and y o The closer my is to 1 the stonger the negative linear association y tends to be smaller than avg when 1 bigger than avg o The bigger my the stronger the linear association 0 If my 1 then 1 and y are said to be perfectly correlated Osborne STEZZ 13 Summary statistics for corn yields data a 108 sf 513 555 227 g 319 53190 5y 444 21 i M 1473 x 398 S y n 1 38 1 Applying the formula for my we get x 398 398 S y i 040 714 i y sxsy x513gtlt190 987 The population correlation coef cient p Just as E can be used to say things about a population mean 11 my can be used for inference about the population correlation 606 0167115 pm This parameter refers to the correlation among 1 and y in the population from which the sample was drawn XMXO MYD prE 9 0X all A test statistic useful for inference about p is 1 1 R 1 p Z 3 l l 9 W gtltoglR og1pgt Asymptotically Z has the standard normal N0 1 distribution so that it can be used to derive methods of inference for p testing and con dence intervals Osborne ST512 14 Under H0 p 0 1 1 R ixn 3 log 1 R A large saniple test of Hg with level or then rejects H0 Whenever 1 1R I L 3l0g gt 2062 a N01 1 1R EV I L lt 21042 vvhere za satis es or PrZ gt 204 with Z N N0 1 An approximate 1001 0z con dence interval for p can be obtained by inverting the Fisher transformation 1 1p 1 w 2 OE1 9 The probability statement 1 1 R 1 1 06 P16317042 lt V n log lt 2062 can be rearranged to yield a a 1001 00 con dence interval for 1 1 1 a 1Og i Z 2 2 1 R n3 Note that p and 1 are related by 62 1 p e210 1 Evaluating p at the limits for 1 leads to the interval ewm 1 2zm 1 eihVniS 17 ew 7273 1 Osborne STEZZ 15 For the corn yields data 7 04 and n 38 and a 95 interval is given by 009 lt p lt 064 There is a one to one correspondence between testing and interval estiniation here so that H0 p 0 would be rejected at or 005 Exercise 1 Examine the butterfat and temperature data plotted on the next page ls there evidence of linear association The sample corre lation coef cient is 7 045 based on randonily sanipled days Carry out an appropriate test Obtain a 95 con dence inter val for the population correlation coef cient describing the linear association between butterfat and temperature D Suppose that two variables X and Y have correlation p 06 What is the probability that a random sample of n 30 observa tions from this bivariate population will yield a sample correlation coef cient of 07 or higher Osborne STEZZ Some example scatterplots r1 004 and r2 045 Resolution run 5k 1104 D D D D D D 9 D D D E N D If v U u G u 7 g D u D a no u D 0 D g DuDD D D DUO 0 on 51 D a o a D D D a DO a D B D D D on D DDgsnD D H U 0 a U D D 0 oDoD n D e 0 Da gD m a u 00 a n D D 0 0 E 0 DO U 0 B D n a DD 0 D ODD no 8 D a DD 5 D o u uUD a D D D D ED 3 0 0 D D D D N 40 6D ED AGE Bmle atdata D D m u 0 q D D 0 u a u 0 0 0 v D D D D u D D D D D D D D D D D 40 45 5D 55 6D 65 tempevamve Osborne STEZZ An exercise Label the four plots below with the four sample correlation coef Cientsz 1 7 03 2 7 07 3 7 01 4 7 06 v o v I D I I N I39 0 39 39 N o gt o gt o O I 39 o t0 39 LO I I 03 I u 0 L0 U2 39 I 24 25 25 30 32 34 35 25 25 30 32 34 X X o o I v I I 8 3939 39 o I I g I 39 N 39 quot v 39 o gt D I I gt r 39 I 0 LO 0 II 39 o g 39 8 LO 0 m L0 25 30 32 34 35 35 25 25 30 32 34 Osborne STEZZ 18 Correlation does not imply causation Famous examples of spurious correlations o A study nds a high positive correlation between coffee drinking and coronary heart disease Newspaper reports say the fragrant essence of the roasted beans of Co ea arabica are a menace to public health In a city if you were to observe the amount of damage and the number of re engines for enough recent res you would likely see a positive and signi cant correlation among these variables Obviously it would be erroneous to conclude that re engines cause damage Lurking variable a third variable that is responsible for a cor relation between two others Aka confounding factor An example would be to assess the association between say the read ing skills of children and other measurements taken on them such as shoesize There may be a statistically signi cant association between shoe size and reading skills but that doesn t imply that one causes the other Rather both are positively associated with a third variable age Among 50 countries examined in a dietary study high positive correlation among fat intake and cancer see gure next page This example is taken from from Statistics by Freedman Pisani and Purves Osborne STEZZ In countries where people eat lots of fat like the United States rates of breast cancer and colon cancer are high This correlation is often used to argue that fat in the diet causes cancer How good is the evidence Discussion lf fat in the diet causes cancer then the points in the diagram should slope up other things being equal So the diagram is some eVidence for the theory But the eVidence is quite weak because other things aren t equal For example the countries with lots of fat in the diet also have lots of sugar A plot of colon cancer rates against sugar consumption would look just like gure 8 and nobody thinks that sugar causes colon cancer As it turns out fat and sugar are relatively expensive In rich countries people can afford to eat fat and sugar rather than starchier grain products Some aspects of the diet in these countries or other factors in the life style probably do cause certain kinds of cancer and protect against other kinds So far epidemiologists can identify only a few of these factors with any real con dence Fat is not among them p 152 Statistics by Friedman Pisani Purves and Adhikari Osborne 51312 20 Flguri a Cancer rates platted agalnst fat in the diet fur a sample at cnuntries 25 szmmnds UK Dwmwk 1 CWquot 39N szza hd m WW 094 th a 20 gamma 5 Aumha o mm quot mo Gavmany 3 Ha y Nwway 39Franm quot 5 o Hos ovama onzhd g Pnrmga MW rquot to EH AHWKW oPmand E v Mums u a39n a 5211 3 Panama Wgns awa 5 W Cn umbfa Pummm D PE WS OMBX K10 D aw Eww 9 mm mm a Samar o o 2 SO 75 00 25 50 V75 v make per Caplua per day grams mm x camn EKpEanilEvmen u mmy mm in mmmAemaw mm Cantu mum m H mm P am anyhght 17y cam mam mama 17y Pemxssnn Osborne STEZZ 21 A linear model for regression Observe n independent pairs y1r1 yz 2 yn L n A probabilistic model for Y conditional on X 1 Y1 o 61961 E deterniinistic random error coniponent coniponent vvhere E1 En are independent and identically distributed normal 2 random variables with mean 0 and variance 0 Write E1 Nd N0 02 Note that this implies 1 Mac EYlX 96 60 6196 2 VarYlX L 02 De nitions 0 response or dependent variable Y left side of regression equation 0 independent variable or predictor variable X right side o intercept term 60 E YlX 0 where Mr crosses y axis o slope term 61 average change in E Y per unit increase in 1 0 error variance 02 60 61 and 02 are modelled as xed unknown parameters which can be estimated from the data using simple linear regression Nonlinear regression other models for EYlX x such as M95 509551 Osborne STEZZ 22 Fitting a linear model 0 Choose best values for 60 61 Choose 5 0 and 5 1 so that SSlEl 112 7 Bo 312 id 7 ye is minimized These are least squares LS estimates De nitions 0 Predicted value of response Y1 given X L Zv 13239 30 31961 0 residual for the ith observation 6239 yr 13239 Elementary calculus can show that 5 0 and 5 1 vvhich minimize the sum of squared residuals SS are given by 61 2m m2 E Rao notation p 396 g sample covariance sample variance s mg 30 1 319 An unbiased estimate of 02 is given by SSE Azi a 7MSE n2 Osborne ST512 23 m The line satisfying the equation 9 30 3196 is called the linear regression of y on m It is also called the least squares line For the corn yield data recall that E 108 inches 1 319 bushels per acre 53 51s 190sy 398 my 040 so that i 5562 61 7 826 398 190 H 0776 bushels per acre inches per year 60 1 5 19 319 0776108 235 bushels per acre 040 yielding the least squares line of Q 235 0776 Note that 1 21191 97 0 2 2711sz y is minimized Osborne ST512 The ANOVA table from simple linear regression Observed variability in the response Y is measured by the total sum of squares S S TOT and can be partitioned into independent com ponents the sum of squares due to regression SSW and the sum of squares due to error SS Source Sum of squares df Mean Square F Ratio Regression SSW 1 MSW MSW Error SSE n 2 MSE Total SS TOT n 1 The sums of squares are de ned by SSTOT SSW SSE 25 0 5 11 1 5 0 31932 21 Tl 122C751 9502 11 62 EM Q 2 SSTOT SSW Osborne STEZZ 25 The F ratio can be used to test for signi cance of regression or to test the null hypothesis that the slope parameter 61 is zero H02610 H12617 0 at level 04 The critical value for F is the upper or percentile from the F distribution with 1 numerator and n 2 denominator VS degrees of freedom This F test is equivalent to a T test based on the statistic we re about to discuss 31 T2 SEWH The mean square for error M S E is an unbiased estimator for i MSW MSE 02 the common variance of the response variable Y conditional on an observed independent variable X o2 VarYlX Here conditional means for those elements in the population with independent variable X As such it can be used to construct con dence intervals for 60 and 61 It is based on n 2 degrees of freedom The ratio of SSW to SSTOT is called the coe icient of de termination or sometimes simply r square It represents the proportion of variation observed in the response variable y which can be explained by its linear association with at In simple lin ear regression r square is in fact equal to rig But this isn t the case in multiple regression It is also equal to the squared correlation between y and y This is the case in multiple re gression Osborne STEZZ Con dence intervals for 60 61 Important results for sampling distribution of 5 0 5 1 given 11 at o unbiasedness EltBll17 7mm 61 and EltBol17quot397n 60 o for normal data 37 J 5 1 which leads to 2 A 0 Var61l1 Z n Var5 26 L i 02 1 2 017H397n i n Take and substitute MS for 02 to get estiniated standard errors SAEWI MSW 571160 MSE1 1001 00 con dence intervals for 61 and 60 are given by 61 tn 2 042 MSlEl A 1 i2 Beam 2042 MSEltES Osborne STEZZ 27 Any hypothetical slope like may be tested using the T statistic below with df n 2 T 31 Sliver S E 61 Con dence interval for E YlX 0 The conditional niean E YlX 0 can be estimated by evaluating the regression function Mme at the estimates 60 61 The conditional variance of the expression isn t too di icult 1 0 02 W430 BlmolX x0 72 a This yields a con dence interval of the form 1 x0 at 30 31960 i tn 2 a2MSE a 3 Exercise derive these variances Osborne ST512 28 The yield on corn by rainfall example Source Sum of squares clf Mean Square F Ratio Regression 114 1 114 695 Error 591 36 164 Total 705 37 The 07 005 critical value is F1 36 005 411 Therefore there is a signi cant positive linear association between yield and rainfall For 95 Cls use t26 0025 2028 For 61 note that ISm n 1si 51338 1 1898 so that a ci is given by 164 0776 i 2028 1898 01 0776 i 20280294 or 0776 i 0596 For 30 1 1082 235 i 2028 164 38 1898 01 235 i 2028324 or 235 i 657 Com yields With LS HHS Osborne STEZZ 29 Prediction Often prediction of the response variable Y for a given value say 0 of the independent variable is of interest In order to make statements about future values of Y we need to take into account 0 the sampling distribution of 5 0 and 5 1 0 the randomness of a future value Y We ve seen that the predicted value of Y based on the linear regres sion is given by Yo 30 3111 In order to form a 95 prediction interval take Y0 itm 2a2MSE 1 Example Suppose that one year rainfall is 0 14 inches but that yield Y0 from the six states hasn t been measured Obtain a 95 prediction interval for Y0 using the model 23 5 0 77614 i 2 028 16 4 1 1 14 103 i i 39 39 38 1898 or 344i85 or 259429 Osborne STEZZ 30 The 95 prediction interval is 259 429 A 95 con dence interval for E YlX 14 is given by 1 14 1082 235 077614 i 2028 164 38 1898 or 344i232 or 321367 What is the difference Exercise taken from Dickey s notes An industrial quality control expert takes 200 hourly measurements on an industrial furnace which is under control and nds that a 95 con dence interval for the mean temperature is 50035 53136 As a result he tells management that the process should be declared out of control whenever hourly measurements fall outside this interval and of course is later red for incompetence Why and what should he have done Osborne ST512 31 A note of caution Mark Twain in Life on the Mississippi 1n the space of 176 years the Lower Mississippi has shortened itse1f 252 n1i1es That is an average of a tri e more than one n1i1e and a third per year Therefore any ca1n1 person who is not b1ind or idiotic can see that in 742 years from now the 1ower Mississippi wi11 be one n1i1e and three quarters 1ong and Cairo 111 and New Or1eans wi11 have joined their streets together 1t is not safe to extrapo1ate the resu1ts of a 1inear regression for the purposes of prediction beyond the range of observed independent variab1es 32 Osborne ST512 1 Butterfat by temperature in cows N o yj average percent butterfa for 10 cows on datej 0 3 temperature on datej o n 20 successive days Date 1 2 3 4 5 6 7 8 9 10 3 64 65 65 64 61 55 39 41 46 59 465 458 467 460 483 455 514 471 469 465 9339 Date 11 12 13 14 15 16 17 18 19 20 3 56 56 62 37 37 45 57 58 60 55 yj 436 482 465 466 495 460 468 465 46 446 Hybrid duck data 0 Mallard and Pintail ducks were crossed yielding n 11 second generation males with attributes as given in the table 0 yj Behavioral index 0 3 Plumage index 0 A 0 corresponds to a purely rnallard phenotype and a 15 corresponds to a purely pintail phenotype o The same scoring is used to quantify duck behavioral traits 15487914 157104911 14614 515 713 31011 7 9339 3 Cricket Data 0 yj Chirps per second 0 3 Ternperature 0F c Striped ground cricket yj 20 16 198 184 171 155 147 171 154 162 3 886 716 933 843 806 752 697 82 694 833 yj 15 172 16 17 144 3 796 826 806 835 763 ST 512 Week 23 Multiple linear regression MLR lntro Model Selection Reading Ch 11 Multiple linear regression an example A random sample of students taking the same exani 1o SUMyTiME GRADE 105 10 75 110 12 79 120 6 66 116 13 65 122 16 91 130 6 79 114 20 96 102 15 76 Consider a regression model for the GRADE of subject 6 K in which the mean of Y1 is a linear function of two independent variables X11 IQ and X12 Study TIME for subjects 6 1 8 Y 60 llQ leME error or Y1 60 61X11 62X12 E or Y1 60 61X11 62X12 E1 Y2 60 61X21 62X22 E2 Y8 60 61X781 62X82 E8 Osborne STEZZ 34 GRADE AND STUDY TIME EXAMPLE FROM ST 512 NOTES Plot of STUDYIQ Symbol used is STUDY 98 91 76 79 75 79 13 85 68 Osborne STEZZ 35 A multiple linear regression MLR model w 19 independent variables Let 19 independent variables be denoted by 1 JP 0 Observed values of 19 independent variables for ith subject from sample denoted by 11 12 d ip 0 response variable for ith subject denoted by Y1 o Fori1n MLRmodelforK Y1 60 615E11 629012 39 39 39 612951 E As in SLR E1 En N0a2 Least squares estimates of regression parameters minimize SS TL SSlEl 291 60 619511 39 39 39 i z 11 A SSE 02 nizlill Interpretations of regression parameters 0 02 is unknown error variance parameter 0 60 61 6p are p 1 unknown regression parameters 60 average response when 951 2 d p 0 6239 is called a partial slope for 2 Represents mean change in y per unit increase in 1 with all other independent variables held xed For this example with p 2 and n 8 BO 074 e 047 62 21 What is the uncertainty associated with these parameter estimates Osborne ST512 36 Matrix formulation of MLR Let a 1 X p 1 vector for 19 observed independent variables for individual i be de ned by 70239 17 117 127 137 7951p The MLR model for Y1 Yn is given by Y1 606111621239 6p1pE1 Y2 6061216222quot396p2pE2 Yn 60 6196711 622 39 39 39 pnp En This system of n equations can be expressed using matrices where 0 Y denotes a response vector n X 1 o X denotes a design matrix n X p 1 o 6 denotes a vector of regression parameters 19 1 X 1 o E denotes an error vector n X 1 Here the error vector E is assumed to follow a multivariate normal distribution with variance covariance matrix 72LZ For individual i Y1 226 E Osborne STEZZ 37 Some simpli ed expressions a is a known p X 1 vector BXM4XY vmmHHXXH 2 We WSlEKX XW 2 ValdezB a fla What are the dimensions of each of these quantities o Rao calls X Xf1 the S matrix o E is the estimated variance covariance matrix for the estimate of the regression parameter vector 6 Some more simpli ed expressions YX XQXny Hy eY Y Y XB U HW 0 Y is called the vector of tted or predicted values 0 H XX X 1X is called the hat matrix 0 e is the vector of residuals Osborne ST512 38 For the IQ Study TIME example with p 2 independent variables and n 8 observations consider XYX X 1X X 1XY XX X 1X Y 1 105 10 110 12 120 6 116 13 122 16 130 8 114 20 102 15 HHHHHHH and 8 919 100 X X 919 106165 11400 100 11400 1394 2890 023 022 X X 1 023 00018 00011 022 00011 00076 074 X X 1X Y 047 7 210 SSE e e Y Y Y Y 458 e edf 915 7 26445 207 205 ZMSEX X 1 207 0017 0010 205 0010 0070 Osborne ST512 39 Mr t 4 Cf H D 0 4 Cf Cb Some questions use preceding page What is the estimate for 61 Interpretation What is the standard error of 5 1 ls 61 0 plausible while controlling for possible linear associa tions between Test Score and Study time t0025 5 257 Estimate the mean grade among the population of ALL students with Q 113 who study TIME 14 hours Report a standard error and then a 95 con dence interval Some answers 5 1 047 second element of X Xle Y exam points per IQ point for students studying the same amount V0017 013 square root of middle element of H0 61 0 T statistic t 5 1 0SE5 1 Observed value is t 47017 4713 36 gt 257 61 differs signi cantly from 0 Unknown population mean 9 60 61113 6114 Estimate 111314 B 836 Var111314 B 1113147r6111314 or 111314 3111314 13 or SE 13 114 it00255SE or 836i2507114 or 307366 Osborne STEIZ DATA GRADES INPUT IQ STUDY GRADE 00 CARDS 105 10 75 110 12 79 120 6 68 116 13 85 122 16 91 130 8 79 114 20 98 102 15 76 DATA EXTRA INPUT IQ STUDY GRADE CARDS 113 14 DATA BOTH SET GRADES EXTRA PROC REG MODEL GRADE IQ STUDYP CLM XPX INV COVB The SAS System The REG Procedure Model Crossproducts X X X Y Y Y Variable Intercept IQ STUDY GRADE Intercept 8 919 100 651 IQ 919 106165 11400 74881 STUDY 100 11400 1394 8399 GRADE 651 74881 8399 53617 X X Inverse Parameter Estimates and SSE Variable Intercept IQ STUDY GRADE Intercept 28898526711 0226082693 0224182192 07365546771 I 0226082693 00018460178 00011217122 0473083715 STUDY 0224182192 00011217122 00076260404 21034362851 GRADE 07365546771 0473083715 21034362851 45759884688 Analysis of Variance Sum of Mean Source DF Squares Square P Value Pr gt F Model 2 59611512 29805756 3257 00014 Error 5 4575988 915198 Corrected Total 7 64187500 Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 073655 1626280 005 09656 IQ 1 047308 012998 364 00149 STUDY 1 210344 026418 796 00005 Covariance of Estimates Variable Intercept IQ STUDY Intercept 26447864999 2069103589 2051710248 I 2069103589 0016894712 0010265884 STUDY 2051710248 0010265884 00697933458 Output Statistics Dependent Predicted Std Error Obs Variable Value Mean Predict 95 CL Mean Residual 1 750000 714447 19325 664770 764124 35553 abbreviated 8 760000 805426 19287 755847 855005 45426 9 836431 11414 807092 865771 40 Osborne ST512 41 Model Selection 1134 denote 19 independent variables Consider several models 11951795271 3 EYl17273 60 61951 179527 953 EYl17273 60 62952 H 96 27953 EYl17273 60 63953 95 EYl17 27 3 60 6111 62902 63903 96 3E2 3E3 EYl17273 60 6111 63963 3 1902 3 E Yl17273 60 6111 62962 1010179027 963 EYl1 5E2 963 60 62962 63963 A is nested in B nieans model A can be obtained by restricting eg setting to 0 parameter values in model B 2 M 3 M 17 4 Mx1r2 5 M 1 6 M 7 True or false 0 Model 1 nested in Model 4 0 Model 1 nested in Model 5 0 Model 2 nested in Model 4 0 Model 4 nested in Model 1 0 Model 3 nested in Model 4 0 Model 5 nested in Model 4 A nested in B gt A called reduced B called full 19 number of regression parameters in full niodel q number of regression parameters in reduced model 19 q number of regression paranieters being tested Recall SSW ZOE W SSlE ZOE m2 SST0t Em Y Osborne ST512 42 Model Selection concepts In comparing two models suppose 61 g in reduced model A 61 b q 111 6 in full model Comparison of models A and B amounts to testing H0 36q1 6q2 6p 0 model A Oh H1 6q17 q2 6 not all 0 need model B SSlElr Sump q MSlHol MS f MS r and f abbreviate reduced and full respectively Let F Difference in the numerator called an extra regression sum 0 f squares Rlt6q17 6q27 39 39 39 76191607 617 627 39 39 39 761 ok to supress 60 in these extra SS terms Theory gives that if H0 holds model A is appropriate F behaves according to the F distribution with p q numerator and n p 1 denominator degrees of freedom Extra SS terms for comparing some of the nested models on preced ing page 0 Model 1 in model 4 R62 63161 0 Model 2 in model 4 7 0 Model 3 in model 4 7 0 Model 1 in model 5 R glb 1 0 Model 5 in model 4 7 Osborne STEZZ An example How to measure body fat 43 For each of n 20 healthy individuals the following measurements were made bodyfat percentage yi triceps skinfold thickness 11 thigh circumference 2 midarm circumference 3 x1 HMwMH o oop o 39mkfmkn39m39mknknipb39mbioib39mbboknkfm 39oio39m39wlb39uioin39u39miniol kinioiobioool k 39mb39mknb39m39m39o39w39oboio39m39mkniob39oiob to 01 Hoopm ooooo mpmwth OH oo o Summary statistics Symbol Variable mean st deV y Body fat 202 51 X1 Triceps 253 50 X2 Thigh Circ 512 52 X3 Midarm Circ 276 36 Osborne ST512 Ll 27 2 Y 117 314 x1 146 586 x2 42 2 370 KB 213 Pearson Correlatlon Coefflclents N 20 Prob gt Irl under HO Rho0 y x1 x2 x3 y 100000 084327 087809 014244 lt0001 lt0001 05491 x1 084327 100000 092384 045778 lt0001 lt0001 00424 x2 087809 092384 100000 008467 lt0001 lt0001 07227 x3 014244 045778 008467 100000 05491 00424 07227 Marginal associations between y and x1 and between y and x2 are highly significant7 providing evidence of a strong 7quot m 085 linear association between average bodyfat and triceps skinfold and between average bodyfat and thigh circumference Multicollinearity linear associations among the independent varie ables causes problems such as in ated sampling variances for Osborne STEZZ data bodyfat input x1 x2 x3 y cards 195 431 291 119 247 498 282 228 data abbreviated 227 482 271 148 252 510 275 211 proc reg databodyfat model yx1 x2 x3 model yx1 model yx2 model yx3 model yx1 x2 x3xpx i covb corrb Yields the following abbreviated output The SAS System The REG Procedure Parameter Estimates Parameter Standard Variable DF Estimate Error Intercept 1 11708469 9978240 x1 1 433409 301551 x2 1 2 85685 258202 x3 1 218606 159550 Parameter Standard Variable DF Estimate Error Intercept 1 149610 331923 x1 1 085719 012878 Parameter Standard Variable DF Estimate Error Intercept 1 2363449 565741 x2 1 085655 011002 Parameter Standard Variable DF Estimate Error Intercept 1 1468678 909593 x3 1 019943 032663 C C C C Value 144 111 137 Value 045 666 Value 418 779 Value 161 061 Pr gt t 02578 01699 02849 01896 Pr gt t 06576 lt0001 Pr gt t 00006 lt0001 Pr gt t 01238 05491 45 Osborne ST512 46 lModel Selection examples In the bodyfat data consider comparing the simple model that Y depends only on 1 triceps and not on 2 thigh or 3 midarm after accounting for 1 versus the full model that it depends on all three MOdelA 31105179527903 60 61901 me BzummQ u6 2 3 or the null hypothesis H062630 vs H126263notboth0 after accounting for 1 1431 984 2 224 FL M am 615 62 How many df associated with this F ratio Recall n 20 The 95 h percentile is F005 363 Q Conclusion from this comparison of nested models After accounting for variation in bodyfat explained by triceps there is still some association between mean bodyfat and at least one of 2 3 thighmidarm To get this F ratio in SAS try proc reg databodyfat model yX1 X2 X3 test X2OX3O run Adding 2 3 to the model leads to over tting a model too complex for it s own good Osborne STEZZ 47 PROC GLM can replace PROC REG to get the SUMS OF SQUARES for use in model selection as in the following output The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr gt F Model 3 3969846118 1323282039 2152 lt0001 Error 16 984048882 61503055 Corrected Total 19 4953895000 Source DF Type I SS Mean Square F Value Pr gt F x1 1 3522697968 3522697968 5728 lt0001 x2 1 331689128 331689128 539 00337 x3 1 115459022 115459022 188 01896 Source DF Type III SS Mean Square F Value Pr gt F x1 1 1270489278 1270489278 207 01699 x2 1 752927788 752927788 122 02849 x3 1 1154590217 1154590217 188 01896 Standard Parameter Estimate Error t Value Pr gt t Intercept 1170846948 9978240295 117 02578 x1 43340920 301551136 144 01699 x2 28568479 258201527 111 02849 x3 21860603 159549900 137 01896 Note agreement between p Values from Type III F tests and p Values from t tests from parameter estimates from MLR Osborne ST512 48 Type I sums of squares sequential order of selection niatters Type III sums of squares partial Type ll sums of squares partial change in SSE due to adding term A to model with all other terms not Containing A In the output Rlt61l60gt 3523 Rlt62l60761gt 332 R63l60761762 115 R61l60762763 127 Rmzl oa 617 63 75 Type 111 test for by test of partial association between y and j after accounting for all other 1 Type III F ratios froni bodyfat data for 1 2 3 respectively F2077 F39 122 F39 166 615 615 6 Partial effects signi cant 7 Use F095 1 16 449 Exercise specify the comparison of nested models that corresponds to each of these F ratios Osborne STEIZ 49 In GLM output which niodeIs are the type I tests comparing 1 Type I SS for 1 from PROC GLM appropriate for SLR of y on 1 2 Type I SS for 2 from PROC GLM appropriate for test of association between y and 2 after accounting for 1 3 Type I test for 3 from PROC GLM same as type III test for 3 In aII three of these tests MS computed from fuII niodeI Sonie niodeI coniparison exanipIes 1 Compare niodeIs 1 and 6 2 Compare niodeIs 2 and 6 For 1 use R62I6061 in the F ratio F i R626061 MSE6 332 Rlt61 2 332 4954 3523 33220 2 1 332 109917 51 Note that SSIEIf SSITOtISSIRIf and SSIRIf SSIRITRIB2I607 61 F005 1 17 445 nested model 1 is rejected in favor of model 6 there is evidence 19 0037 of association between y and 2 after accounting for dependence on 1 Osborne STEZZ 50 To compare models 2 and 6 we need SSRT Rmzl o 3820 which cannot be gleaned from preceding output You could also get it from Ty gtlt SST0t or from running something like proc reg model yX1 X2ss1 ss2 run The REG Procedure Sum of Mean Source DF Squares Square F Value Pr gt F Model 2 38543871 19271935 2980 lt0001 Error 17 109 95079 646769 Corrected Total 19 49538950 Parameter Standard Variable DF Estimate Error t Value Pr gt t Type I SS Type II SS Intercept 1 1917425 836064 229 00348 815676050 3401785 X1 1 022235 030344 073 04737 35226980 347289 X2 1 065942 029119 226 00369 3316891 3316891 F 7 Rome 62Adf MSEf i SSRf SSRT1 65 3523 332 3820 7 65 34 0 5 i 65 39 Conclusions 0 2 gives you a little when you add it to model with 1 0 1 gives you nothing when you add it to model with 2 0 Take model with 2 Has higher 7 2 too Note that all of these comparisons of nested models are easy to carry out using the TEST statement in PROC REG Osborne ST512 Another example revisiting test scores and study tinies Consider this sequence of anaIyses 1 Regress GRADE on IQ 2 Regress GRADE on IQ and TIME 3 Regress GRADE on TIME IQ TI where TI TIMEIQ ANOVA Grade on IQ SOURCE DE SS MS F p VaIue IQ 1 159393 159393 0153 071 Error 6 625935 10432 It appears that IQ has nothing to do with grade but we did not Iook at study time Looking at the multiple regression we get The EEG Procedure Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr gt F Model 2 596 11512 298 05756 3257 00014 Error 5 4575988 915198 Corrected Total 7 641 87500 Parameter Standard Variable DF Estimate Error t Value Pr gt It Intercept 1 073655 1626280 005 09656 IQ 1 047308 012998 364 00149 study 1 210344 026418 796 00005 Now the test for dependence on IQ is signi cant p 00149 Why Exercise Use X X 1 beIow and the relationship t2 F to deter mine the type III sums of squares for IQ and study 288985 2261 2242 X X 1 2261 0018 0011 2242 0011 0076 Osborne STEIZ The interaction niodeI The SAS System 1 The EEG Procedure Analysis of Variance Sum of Mean Source DF Squares Square P Value Pr gt F Model 3 61081033 20360344 2622 00043 Error 4 3106467 776617 Corrected Total 7 64187500 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr gt It Type I SS Type II SS Intercept 1 7220608 5407278 134 02527 52975 1384832 IQ 1 013117 045530 029 07876 1593930 064459 study 1 411107 452430 091 04149 58017582 641230 IQstudy 1 005307 003858 138 02410 1469521 1469521 Discussion of the interaction niodeI We caII the product IS IQSTUDY an interaction term Our niodeI is G 7221 013 gtolt I 411 S 005311 gtolt S Now if IQ 100 we get G 7221 131 411 531s and if IQ 120 we get A G 7221 157 411 637S Thus we expect an extra hour of study to increase the grade by 120 points for someone with IQ 100 and by 226 points for someone with IQ 120 if we use this interaction niodeI Since the interaction is not signi cant we may want to go back to the sinipIer main effects model This example taken from Dickey s ST512 notes Osborne ST512 53 Some questions about design matrices Recall three models under consideration for the bodyfat data M13 1520179627963 60 61961 M23 151517515271 3 60625L 2 M6 1 1520179627963 60 6111 62902 Q MSEM6 lt lWSEM1 and MSEM6 lt lWSEM2 but the par tial slopes have larger standard errors in M6 Why Design matrices 1 195 431 1 195 1 247 498 1 247 XM6 3 3 XM1 3 1 227 482 1 227 1 252 510 1 252 similarly for X M2 7 5061 10234 X XM6 133863 203537 523330 7 7 139 0053 i 1 i XXlMl lt XXlM1lt 0002 7 7 503 0093 i 1 i XXlM2lt XXlM2lt 00019 103 029 035 X9036 0014 0012 0013 Q Why is Varb 0 bigger in M2 than in M1 Osborne ST512 54 Recall the Resolution Rim 5k race data Obs age sex race pace 1 28 M 166833 538333 2 39 M 169500 546667 3 41 M 171333 551667 4 42 M 174000 561667 abbreviated 157 52 F 468833 151000 158 10 F 536000 172667 159 10 F 53 6167 172667 160 81 M 543167 175000 Summary statistics n 160 Symbol Variable mear1 st cleV variance y Pace 91 22 50 90 Age 351 147 2165 Resolution Run 5k 112004 D w 1 u v 7 u u u u N 1 u r U D D a an n D u u o 15w D D U D1 an O m u F D u m D J u D JrJrEgan B t l u u u u n1 Du la 1 Men m t a u Mb 1 u anw t l t 1 t D u i i t i w 7 1 l l l 20 40 60 80 Osborne ST512 55 Quadratic model for pace Y as a function of age Yib o61 6222E fori1160 where E1 131 N0 02 o 6 60 61 62 is a vector of unknown regression parameters 0 02 is the unknown error variance of paces given age 1 Compare this model with the previously discarded SLR niodel Y16061 E fori1160 Q1 Does 61 have the same interpretation in both models Q2 How can we compare the two models A2 Using F ratios to compare nested models see output next page Rlt62l607 61 MS full W MS full i 1136 111 43 SSlElred SSlElfull1 MSEfuu 7870 67441 43 F 262 A 2 E S E with F0051157 390 Since 262 gtgt 39 the linear model is iniplausible when compared to the quadratic niodel Osborne STEIZ age2 defined in data step as ageage PRDC REG MDDEL paceage MDDEL paceage age2ss1 ss1 generates sequential sums of squares only the 2nd model statement really necessary RUN The SAS System 1 The REG Procedure Model MDDEL1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr gt F Model 1 109650 109650 022 06396 Error 158 78699821 498100 Corrected Total 159 78809472 Root MSE 223182 R Square 00014 Dependent Mean 912063 Adj R Sq 00049 Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 892271 045724 1951 lt0001 age 1 000564 001203 047 06396 Model MDDEL2 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr gt F Model 2 11364500 5682250 1323 lt0001 Error 157 67444972 429586 Corrected Total 159 78809472 Root MSE 207265 R Square 01442 Dependent Mean 912063 Adj R Sq 01333 Parameter Standard Variable DF Estimate Error t Value Pr gt t Type I SS Intercept 1 1178503 070216 1678 lt0001 13310 age 1 019699 004113 479 lt0001 109650 age2 1 000294 000057380 512 lt0001 11254850 Osborne ST512 Resolution Run 5k 112004 u g E a s 0 m G O co i w 20 40 60 80 AGE Fitted model is 11785 019796 000294902 01quot Maya 11785 0197096 000294age2 Osborne 5T5 1 2 Inference for response Y given predictor xi A photography out t specializing in portraits of children operates studios in n 21 cities They7re considering expansion into other cities For city 2397 de ne 0 mil thousands of people aged 3 16 0 12 community disposable income per capita o Yi company sales Given 1 and 27 a MLR model for these data is given by Yi 0 1xi1 2xi2EifOri 1n Where errors are assumed iid normal W constant variance 0 For a city With 1 2 the model for mean sales is Mlt7 177 2 EYl7 17762 50 51761 52762 A scatterplot matrix Ll m 2 sales 137 2 camecmp dlncume Osborne STEZZ 59 Summary statistics using SIMPLE option in PROC REG statement The REG Procedure Descriptive Statistics Uncorrected Standard Variable Sum Mean SS Variance Deviation Intercept 2100000 100000 2100000 0 0 X1 130240000 6201905 87708 34671662 1862033 X2 36000000 1714286 619026000 094157 097035 y 382000000 18190476 721072 130981048 3619130 Some questions Consider cities with 01 90k kids and 02 18k 35 per capita 0 Estimate the mean of the sales function among such cities along with a standard error and 95 con dence interval 0 Obtain a 95 prediction interval of yo the sales which would be observed for such an individual city SAS generates B and MSE gtolt X Xf1 The SAS System Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 6885707 6001695 115 02663 X1 1 145456 021178 687 lt0001 X2 1 936550 406396 230 00333 Covariance of Estimates Variable Intercept X1 X2 Intercept 36020346743 87459395806 2414229923 X1 87459395806 00448515096 0672442604 X2 2414229923 0672442604 16515755794 Osborne ST512 60 Moments of linear combinations of random vectors Appendix B Let W denote a p X 1 random vector with mean MW and covariance matrix 2W Suppose a is a p X 1 xed vector of coef cients Then Ea W a uw Vara W a Zwa See httpwwwstatncsuedupeopledickeyst512crsnotesnotes1htm for reVieW of ma trices and random vectors Inference for the mean response in MLR Consider all cities with 90000 youngsters and average disposable in come of 18k To estimate mean sales and report a standard error take 5 1 90 18 and consider 41 0 E066 66 Varx66 620 Substitution of 5 and f ZlSX X 1 gives the estimates 68857 0050 19018 1455 9366 2307 3602035 8746 241423 1 7arrr0 19018 8746 004485 067244 90 241423 067244 16516 18 209 SAEWxOD 209 46 which can be obtained using PROC REG and the missing y trick Dbs targetpop dincome y X0 X1 X2 p semean r 1 900 180 1 900 180 230632 455677 Osborne ST512 61 Partial correlations The partial correlation coef cient for 1 in the MLR EltYl12 1 p 60 6111 6213 39 39 39 61 is de ned as the correlation coef cient between the residuals com puted from the two regressions below Y 606226ppE X1 606226ppE Call these sets of residuals 6y2737m7p and 612737m7p respectively The partial correlation between y and 1 after accounting for the linear association between y and 2 3 a p is de ned as ry127377p correlation between 6y2737m7p and 612737W7p The partial coe of determination is r312737m7p Note also see Figure 117 that 72 Rlt61l607 627 39 39 39 761 lelmlp Rlt627 637 39 39 39 MBA60 Osborne STEZZ Dbs 1 2 3 4 5 20 Bodyfat data conipare niodels 12 and 6 ignore 3 X1 X2 y py1 195 431 119 152190 247 498 228 196764 307 519 187 248195 298 543 201 240481 191 422 129 148762 252 510 211 201050 bodyfat data py2 331903 248145 132827 312360 078756 190215 611952 446384 208203 394805 119739 228760 197616 299637 125118 abbreviated 099500 006892 200494 1 2 0 M 1 The partial correlation coef cient between y and 1 38822 05061 after 134939 060956 474782 172013 174728 004571 accounting for 2 is rym 017 and the partial for 2 after accounting for 1 is rym 048 The partial coe icients of determination are r312 003062 and r321 023176 Q If you had to choose one variable or the other from 1 and 2 which would it be Q Anything wrong with throwing both 1 and 2 in the nal model Q Write the coe icients of determination in terms of extra sums of squares using notation Note partial correlations obtained in SAS using PCORR2 option Variable Intercept 1 X1 X2 Squared Partial DF Corr Type II 1 003062 1 023176 Osborne STEZZ 63 Partial regression plots A plot of the residuals from the regression Y60622 63pE versus the residuals from the regression X160622quot3963pE is called a partial regression plot for 1 or a partial leverage plot of 1 in the MLR They can be generated 0 in SASINSIGHT by Clicking o Analyze o Eit X Y 0 Output tab 0 Plots Partial Leverage 0 using the PARTIAL command in the MODEL statement of PROC REG The SAS System The EEG Procedure Partial Regression Residual Plot Osborne STEZZ 64 Partial Regression Residual Plot Q What can these plots tell us A1 They convey info about linear associations between y and a can didate independent variable 1 after accounting for linear associations between y and other independent variables 11 2 24 n1 mp A2 They can convey info about nonlinear associations between y and 1 after accounting for linear associations with other variables A3 They can illuminate possible outliers Osborne STEZZ H D 0 4 Cf Some exercises hint use matrix algebra or SAS Suppose you are a local 32 yr old male runner Regarding these data as randomly sampled from the population of all local run ners t a quadratic regression function and use it to obtain an estimate of the mean 5k pace in your cohort of all 32 yr old male runners Report a standard error and 95 con dence interval Obtain a 95 prediction interval for your time if you are about to run the race Explain the difference between the two intervals in questions 1 and 2 At what rate is M changing with age Estimate the appropri ate function Estimate 6 the peak age to run a 5k in the fastest time ls 9 a linear function of regression parameters Can you obtain an unbiased estimate of the standard error of 9 Residual diagnostics Residuals can be plotted against independent variables to check for model inadequacy Residuals can be plotted against predicted values to look for in homogeneity of variance heteroscedasticity The sorted residuals can be plotted against the normal inverse of the empirical CDF of the residuals in a normal plot to assess the normal distributional assumption A nonlinear association in such a q q plot indicates nonnormality Osborne ST512 66 1 Obtain the observed quantiles by ordering the residuals 61 S 62 S S 6m 2 For each i 1 n compute the expected quantile froni 3 Plot the ordered residuals on the vertical axis versus the or Z 1 q Zl n1 dered theoretical quantiles on the horizontal axis As an illustration we ll obtain the eij and the 11 for the data in a table To do this we ll need the ranks of the residuals Rank of 61 i The empirical cumulative probability associated with 61 is Rank of 61 N 1 These can be used to obtain the correspoding theoretical quantiles 19239 via an 21 192 Osborne STEZZ proc reg simple outestthree model pacemale age age2 output outtwo pyhat rresid run proc rank outtwo ranks rankresid var resid run data two set two ecdfrankresid130 qprobitecdf run proc print var age pace yhat resid rankresid q run proc gplot plot residq run Obs age pace 1 21 526667 2 33 528333 3 19 540000 4 20 541667 5 20 548333 38 46 810000 52 57 901667 55 32 941667 74 46 117500 76 12 131667 129 71 141333 The SAS System yhat resid 8 34197 307531 7 88456 260123 8 48871 308871 8 41283 299616 8 41283 292949 00 HM wgto 01 wgttgt 010 ll 00 HH MO 00 on 00 789498 152169 82074 354260 91610 400572 124141 171922 rankresid 2 8 1 3 5 I o o I I I I I H H m H M HMMO Osborne STEZZ 68 An exercise Match up letters abcd with the model Violation a b a b 0 o v o m gr 39 g3 39 gt8 gta gm e0 98 9L0 H I I o quotl 39 I 39 a I 39 I I l I 39 5 10 15 20 5 10 15 20 10 15 20 5 10 15 20 25 x x predicted predicted c d c d o I I I I w 39 39 o o 39 Bo 5N Qt 39 gtN 39 393 n 3930 39 39 39 o 39 m 1 N g 39 99 961 O I I 39 I I I 39 I I 5 10 15 20 5 10 15 20 0 10 20 30 40 10 15 20 25 x x predicted predicted I a in b we 39o 39a e ecgt uuul39quot 9 39o 39U39D 39 9 9 a 0 E E o 2 1 0 1 2 o 2 1 0 1 2 Quantiles of Standard Norm Quantiles of Standard Norm I c in d E E 3 39 3 a 39 E0 N 9F 90 quot39 0 801 a 39quot 5 E E o 2 o Quantilee1ofsgnda1rd N02rm Qua tilegof Sgnda1rd N02rm 1 Heteroscedasticity 2 Nonlinearity 3 Nonnormahty 4 Model ts ST 512 Weeks 45 The general linear model Reading Ch 1213 ANOVA revisited Following data come from study investigating binding fraction for several antibiotics using n 20 bovine serum samples Antibiotic Binding Percentage Sample mean Penicillin G 296 243 285 32 286 Tetracyclin 273 326 308 348 314 Streptomycin 58 62 11 83 78 Erythromycin 216 174 183 19 191 Chloramphenicol 292 328 25 242 278 A completely randomized design CRD was used Q Are the population means for these 5 treatments plausibly equal Q Do these sample treatment means differ signi cantly 5 10 15 20 25 30 35 l binding fraction boxplot E E H Osborne ST512 70 Modelling the binding fraction expt One model paranieterizes antibiotic effects as differences from niean Yij M f 7392 Eij fori 1 5 andj 1 4 where E27 are iid N0 02 errors Unknown parameters 0 M overall population mean avg of 5 treatment population means 0 739 difference between population mean for treatment i and M o 02 population variance of bf for a given antibiotic To test H0 7391 7392 7395 0 we just carry out one way ANOVA Sum of Mean Source df squares Square F Treatments 4 1481 370 41 Error 15 136 9 Total 19 1617 Conclusion Use F005 4 15 306 Paranieter estiniates fr f1 f2 f3 f4 f5 111 22935 Standard errors of parameter estimates Osborne ST512 71 Table for balanced one way ANOVA 1 denotes jth observation receiving level i of treatment factor with t levels for a total of N observations Sum of Mean Source df squares Square F Treatments t 1 SST M S T 2873 F Error N t SSE MSE S W Total N 1 SSTOT Where SSlTl Z 2ny 9192 5113 Z 2 192 SSlTOTl Zan 9192 The linear model 27 E067 M 73 could be t using MLR with 5 indicator variables 1 5 for the 5 antibiotics Let 1 if treatment j j 0 else The MLR model is Y1 60 619611 629012 639013 649024 659015 E i 17 7 20 Where a design matrix X of 1 s and 0 s of dimension 20 X 6 could be speci ed Note that 60 M and by 739 0 problem X Xf1 does not exist 0 standard errors for parameter estimates can t be obtained 0 model is overpammetem39zed 6 parameters 5 means Osborne STEZZ 72 A general linear model Models which paranieterize the effects of classi cation factors this way are general linear models One way ANOVA and linear regres sion models are general linear models The linearity pertains to the parameters not the explanatory variables Here reparanieteriZing using 5 1 indicator variables leads to a general linear model De ne 1 2 3 4 as before Then the MLR model is Y1 60 619611 629012 639013 649024 E i17quot39720 where E1 131 N 0 02 The X matrix looks like x H HHHHHHHHHHHHHHHHHHHH OOOOOOOOOOOOOOOOHHHH OOOOOOOOOOOOHHHHOOOO OOOOOOOOHHHHOOOOOOOO OOOOHHHHOOOOOOOOOOOO Osborne STEZZ 73 Remarks 0 X Xf1 exists 0 continuous covariates as opposed to indicators can be added and it is still a general linear model For the one way ANOVA 278 08 B X X 1X Y 36 200 87 Estimates for the ve treatment means obtained by substitution of 6 into 151517515271 37954 60 6111 62952 63953 64954 60 278 21000 6061 286 20100 6062 314 10707170 BOBB 7398 12000 1 60164 191 Compare with page 1 69 Osborne ST512 74 For standard errors use 2 23 23 23 23 23 45 23 23 23 i MSEX X 1 45 23 23 45 23 45 Let a b c d be de ned by d 11000b 10100c 10010d 10001 Then g1000 6061 a g0100 65ng b b Mao 10 ems as g000 1 gw l d b 1vava 0 60 i 60 and a fla b flb c flc d fld 311 23 7ar5 0 7ar5 0 so the estimated SE for any sample treatment mean is V23 15 Recall from one way ANOVA that SAEy1 1 MilEl g m 15 Osborne STEZZ 75 A general linear model for 5k times of men AND women Resolution Run Jan 1 2004 Centennial Campus Quadratic model M 6061612 was used for the association between mean pace for male runners and age 1 Consider modelling female race times as well How could the model be extended to incorporate sex differences Let 2 2 and let an indicator variable 3 be de ned by 1 female 3 0 male Some candidate models 15201751527953 60 14515179527953 60635L 3 19617516 271 3 60611 19617516 271 3 60 61 62 19617516 271 3 60 611622633 15201751527953 60 6111 62902 63963 64901903 65 962963 4 5 Resolution Run 5k 1I1f2004 m 12 14 16 l l l l Osborne STEZZ 76 data race5k set race5k sexfsexquotFquot age2ageage agefagesexf age2fage2sexf run proc reg dataone model pace model pacesexf equivalent to two sample t test model paceage age2 model pacesexf age age2 model pacesexf age age2 agef age2f test agef0 age2f0 run The REG Procedure Model MDDEL1 Sum of Mean Source DF Squares Square F Value Pr gt F Model 0 0 Error 159 78809472 495657 Corrected Total 159 78809472 Root MSE 222634 R Square 00000 Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 912063 017601 5182 lt0001 Model MDDEL2 Sum of Mean Source DF Squares Square F Value Pr gt F Model 1 17074137 17074137 4370 lt0001 Error 158 61735335 390730 Corrected Total 159 78809472 Root MSE 197669 R Square 02167 Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 826614 020280 4076 lt0001 seXf 1 210335 031819 661 lt0001 Osborne STEZZ 77 For MODELS output7 see linear and quadratic ts from multiple regression notes Model MDDEL4 Sum of Mean Source DF Squares Square F Value Pr gt F Model 2 11364500 5682250 1323 lt0001 Error 157 67444972 429586 Corrected Total 159 78809472 Root MSE 207265 R Square 01442 Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 1178503 070216 1678 lt0001 age 1 019699 004113 479 lt0001 age2 1 000294 000057380 512 lt0001 Model MDDEL5 Sum of Mean Source DF Squares Square F Value Pr gt F Model 3 29034851 9678284 3033 lt0001 Error 156 49774621 319068 Corrected Total 159 78809472 Root MSE 178625 R Square 03684 Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 1018317 064228 1585 lt0001 sexf 1 219792 029535 744 lt0001 age 1 017146 003562 481 lt0001 age2 1 000281 000049481 567 lt0001 Fitted model is 60 611 6212 for men l 1018 01795 00028954 06 2 60 611 621 63 for women l 1018 220 01795 00028le Osborne STEZZ 78 Model MDDEL6 Sum of Mean Source DF Squares Square F Value Pr gt F Model 5 29352828 5870566 1828 lt0001 Error 154 49456644 321147 Corrected Total 159 78809472 Root MSE 179206 R Square 03725 Parameter Standard Variable DF Estimate Error t Value Pr gt t Intercept 1 1060848 088641 1197 lt0001 sexf 1 125728 123237 102 03092 age 1 019986 004842 413 lt0001 age2 1 000321 000064628 496 lt0001 agef 1 006882 007298 094 03471 age2f 1 000103 000103 099 03217 2 60 6195 6295 630 640 650 men 1061 02090 00032x2l 2 2 Mm i 60 611 621 631 641 651 women 60 63 61 64W 62 60952 1061 125 020 00m 00032 00010x2 1186 01390 00022x2l Osborne STEZZ 79 Model 5 pace Model 6 Which model is better What do we mean by better Is there a test we can use to compare these models Osborne STEZZ 80 Comparison of models 5 and 6 reduced M961 2 3 60 6111 62902 63963 llli M901 7527 5E3 60 6111 62902 63963 64961963 65962963 Extra sum of squares Rb 4 65W 61 62 SSRf SSRT 2935 2903 30 The F ratio i Rlt64765l60761762w635 3 i i i 0 5 MSEf 321 i 321 39 The observed F ratio is not signi cant on df 2 154 F In SAS you could use proc reg model paceage age2 seXf agef age2f test agef0 age2f0 run to get the following model selection F ratio in the output The REG Procedure Model MODELS Test 1 Results for Dependent Variable pace Mean Source DF Square F Value Pr gt F Numerator 2 158988 050 06105 Denominator 154 321147 Which model do we choose at this point Osborne ST512 81 More stuff Estimate the peak running age for men and for women ls it different 9 M and 6W denote peak running ages for men and women respectively Using calculus on the model 6 regression 61 6 61 64 W 262 262 65 These are nonlinear functions of regression paranieters Note that acceptance of any model but 6 implies equality of these peak ages 9M i 305 different intercepts model 5 W i 301 full model 6 0AM 305 different intercepts model 5 311 full model 6 Things to ponder Q Which of these estimates is better Q Which of these estimates are closest to the true peaks Q What criterion can we use to assess the estimation Osborne ST512 82 Analysis of covariance ANCOVA Covariates are predictive responses Associations between covariates z and the main response variable of interest y and be used to reduce unexplained variation 02 An nutrition example A nutrition scientist conducted an experiment to evaluate the effects of four vitamin supplements on the weight gain of laboratory animals The experiment was conducted in a completely randoniized design with N 20 animals randomized to a 4 supplement groups each with sample size n E 5 The response variable of interest is weight gain but calorie intake 2 was measured concomitantly Diet yg Diet y Diet y Diet y 1 48 2 65 3 79 4 59 1 67 2 49 3 52 4 50 1 78 2 37 3 63 4 59 1 69 2 73 3 65 4 42 1 53 2 63 3 67 4 34 1 131 63 2 132 578 3 133 652 4 yin 488 1 51 123 2 52 149 3 53 97 4 s4 109 Q ls there evidence of a vitamin supplenient effect The GLM Procedure Dependent Variable y Sum of Source DF Squares Mean Square F Value Pr gt F Model 3 797800000 265933333 182 01836 Error 16 2334400000 145900000 Corrected Total 19 3132200000 Osborne STEZZ 83 But calorie intake 2 was measured concomitantly Diet y 2 Diet y 2 Diet y 2 Diet y z 1 48 350 2 65 400 3 79 510 4 59 530 67 440 2 49 450 3 52 410 4 50 520 78 440 2 37 370 3 63 470 4 59 520 69 510 2 73 530 3 65 470 4 42 510 53 470 2 63 420 3 67 480 4 34 430 Q How and Why could these new data be incorporated into analysis A ANCOVA can be used to reduce unexplained variation Model given zi Y1 60 61 162 2635r 3 6221E fori120 HHHH vvhere 23 is an indicator variable for subject i receiving vitaniin supplenient j 1 subject i receives supplenient j 1quot r Z 0 else and errors E1 2 N0 02 Exercise specify the parametric mean weight gain for the rst sub ject in each treatment group conditional on their caloric intakes Osborne STEZZ Proceeding with MLR analysis of this general linear model Dependent Variable y Source Model Error Corrected Total R Square 0623102 Source diet 2 Source diet Z Class The GLM Procedure Class Level Information Levels Values 4 1 2 3 Sum of DF Squares Mean 4 1951680373 487 15 1180519627 78 19 3132200000 Coeff Var Root MSE 1511308 8871376 DF Type I SS Mean 3 797800000 265 1 1153880373 1153 DF Type III SS Mean 3 1537071659 512 1 1153880373 1153 933333 4 F Value 620 Square 920093 701308 y Mean 5870000 F Value 338 1466 Square 880373 F Value 651 1466 Square 357220 880373 To test for a diet effect H0 61 62 63 0 use the F ratio on 3 and 15 numerator and denominator degrees of freedom Note that this is a comparison of nested models Q Conclusion FYI this model was t with the following code proc glm class diet model ydiet 2 means diet lsmeans dietstderr run NOTE the drop in MSE was I z 129 is I z 99 Pr gt F 00038 Pr gt F 00463 00016 Pr gt F 00049 00016 type III Osborne STEZZ 85 Adjusted and unadjusted means Recall the sample mean weight gains for the four diets generated by the means diet statement in proc glm The GLM Procedure Level of y diet N Mean Std Dev Mean Std Dev 1 5 630000000 12 2678441 442000000 58 9067059 2 5 57 8000000 148727940 434000000 61 0737259 3 5 652000000 96540147 468000000 36 3318042 4 5 48 8000000 10 8949530 502000000 408656335 These means y are computed without taking 2 into account so they are called unadjusted means Unadjusted means do not make any adjustment for the facts that 1 caloric intake may vary by diet presumably by chance not be cause of diet 2 weight gain depends on caloric intake Adjusted means are estimated mean weight gains at a common ref erence value sample mean 2 of the covariate z Here 2 442 434 468 5024 4615 The adjusted means are then just m 6061Bx4615gt gm BOB2BA46L5gt m 6063Bx4615gt m BOBA4615gt Osborne STEZZ 86 To get SAS to report the estimated regression parameter vector 5 use the solution option in the model statement The default pa ranieterization is the one we ve adopted here where 60 is the mean of the last level of the classi cation treatnient factor Standard Parameter Estimate Error t Value Pr gt ltl Intercept 3566310108 B 2241252629 159 01324 diet 1 2429519136 B 619932022 392 00014 diet 2 2044121688 B 635678835 322 00058 diet 3 2212060844 B 580625371 381 00017 diet 4 000000000 B z 016825319 004394140 383 00016 NOTE The X X matrix has been found to be singular and a generalized inverse was used to solve the normal equations Terms whose estimates are followed by the letter B are not uniquely estimable Substitution of 5 into the expressions for adjusted nieans yields gha 357 243 0174615 663 3 357 204 0174615 624 girl 357 221 0174615 641 441 357 0174615 420 Standard errors of QM Consider gm What vector 6 is needed so that 0 3 gm What is the standard error of 0 5quot Osborne ST512 87 To get SAS to produce the adjusted means and estimated standard errors use an lsmeans statement for the factor diet The GLM Procedure Least Squares Means Standard diet y LSMEAN Error Pr gt t 1 662809372 40588750 ltOOO1 2 624269627 41473443 ltOOO1 3 641063543 39776677 ltOOO1 4 419857458 43482563 ltOOO1 Concerns Aside from the usual residual based checks for model adequacy does treatment affect the covariate To check this one could carry out a one way ANOVA treating 2 as a response variable and check for a diet effect on the mean of z The GLM Procedure Dependent Variable 2 Sum of Source DF Squares Mean Square F Value Pr gt F Model diet 3 1409500000 469833333 184 01798 Error 16 4076000000 254750000 Corrected Total 19 5485500000 A No eVidence that treatment affects covariate Q Among the diets which we ve concluded are different what are the differences Look at the means have a guess Q If you are a lab animal and you want to gain weight which diets would you choose Q Why are the standard errors for the adjusted means different Q Which adjusted means require the most adjustment Osborne STEZZ Vitamin supplement ANCOVA grams y z calories Osborne ST512 Lack of t of a SLR model supplementary to textbook Hiking example completely randomized experiment involving alpine meadows in the White Mountains of New Hampshire N 20 lanes of dimension 05m x 15m randomized to 5 trampling treatments i trt group 1 Number of passes yij Heightcm 1 0 207 159 178 176 2 25 129 134 127 90 3 75 118 126 114 121 4 200 76 95 99 90 5 500 78 90 85 67 Slmple unear Hegresslon Two models for mean plant height SLR model Mm 60 6196 one factor ANOVA model MM M 73 Osborne STEIZ proc reg model ynumpass run proc glm class K model ynumpass x xnumpass in the dataset run The SAS System 1 The EEG Procedure Dependent Variable y heightcm Analysis of Variance Sum of Mean Source DF Squares Square P Value Pr gt F Model 1 14129532 14129532 1915 00004 Error 18 13279418 737745 Corrected Total 19 27408950 Root MSE 271615 RSquare 05155 Dependent Mean 1179500 Adj RSq 04886 Parameter Standard Variable Label DF 39 ror t Value Pr gt t Intercept Intercept 1 1411334 080592 1751 0001 numpass 1 001449 000331 438 00004 The SAS System 2 The GLM Procedure Class Level Information Class Levels Values x 5 1 2 3 4 5 Sum of Source DF Squares Mean Square P Value Pr gt F Model 4 2431620000 607905000 2948 0001 Error 15 309275000 20618333 Corrected Total 19 2740895000 RSquare Coeff Var Root MSE y Mean 0887163 1217387 1435909 1179500 Source DF Type I SS Mean Square P Value Pr gt F numpass 1 1412953228 1412953228 6853 0001 3 1018666772 339555591 1647 0001 Source DF Type III SS Mean Square P Value Pr gt F numpass 0 00000000 3 1018666772 339555591 1647 0001 Osborne ST512 91 When the t treatments have an interval scale the SLR model and all polynomials of degree 19 g t 2 are nested in one factor ANOVA model with t treatment means F ratio for lack of t To test for lack of t of a polynomial reduced model of degree 19 use extra sum of squares F ratio on t 1 p and N t df SSlacl of tt 1 p F M S pure error Where MSpure error MSEfuu and SSlack of t SST7 t SSRpogy SSlElpozy SSlElfuu SSEp0ly S S pure error In a simple linear p 1 model for the meadows data SSlacl of t 243163 141295 101867 on t 1 p 3df and the sum of squares for pure error is SSfuu 3093 yielding 1018673 34 z 165 309315 21 highly signi cant since F001 3 15 542 gt model misspeci ed SLR model suffers from lack of t Next step either go with the one factor ANOVA model or specify some other model such as quadratic Osborne ST512 92 Exercises 1 Hiking data test for lack of t of quadratic model df 2 15 Also let 1quot log L 1 Obtain a plot of y vs 1quot Test for lack of t of a model in which mean plant height is linear in 1quot 2 Test for lack of t of of SLR model for corn yields and rainfall data Test for lack of t of quadratic model for same data Spec ify SSlack of t and S S pure error ST 512 Weeks 78 Completely randomized factorial designs Reading Ch 913 0 This packet lntroduction notation jargon Terms factors levels treatments treatment combinations main effects interaction effects crossed factors nested fac tors contrasts orthogonal contrasts expected mean squares multiplicity of comparisons familywise or experimentwise er ror rates power Speci c topics gtolt multiple comparisons gtlt expected mean squares gtolt power computations 0 Next packet 2 X 2 experiments a X b experiments three factor ANOVA nested vs crossed designs Osborne STEZZ 94 Comparisons contrasts aniong nieans De nition In the one way ANOVA layout ijM E ji17277t7 andj12n with E27 Mo 02 a linear function of the group means of the form 961M1C2M239 Ct t is called a of the treatment means De nition The 0239s are the of the linear combination If t 0102ctch0 1 the linear combo is called a De nition Contrasts in which only two of the coe icients are nonzero are called or contrasts De nition Contrasts in with more than two nonzero coe icients are called contrasts Result The best estimator for a contrast of interest can be obtained by substituting treatment group saniple means 1 for treatment population nieans m in the contrast 6 5 01K02Y239 Ctyt Osborne ST512 95 Example For the binding fraction data consider the pairwise con trast comparing penicillin population mean to Tetracyclin mean 0M1 M2Dur64 mlmuwm mlmus Using the result point estimator of 9 is 0 r hK E Recall the binding fraction data and ANOVA table Binding Sample Sample Antibiotic Percentage mean variance Penicillin G 296 243 285 32 286 104 Tetracyclin 273 326 308 348 314 101 Streptomycin 58 62 11 83 78 57 Erythromycin 216 174 183 19 191 33 Chloramphenicol 292 328 25 242 278 159 Sum of Mean Source df squares Square F Treatments 4 1481 370 41 Error 15 136 905 Total 19 1617 Substitution of y H and Q21 yields 6 286 314 28 Q How good is this estimate Osborne STEZZ 96 Sampling distribution of 5 Q What is the sampling distribution of 5 Q That is what are E6 SE6 and shape of distribution of 9 9k Me Var Normality follows because d is a linear function of normal data Y1 Standard error To test H0 9 do often 0 versus H1 9 y 60 a t use t test t est null 6 do 13 tNit SF 5130 At level or the critical value for this test is tN t or 2 1001 00 con dence interval for a contrast 9 2 02m given by lZCiyiitN7t7a2 Osborne STEZZ 97 Here i 32 905 2127 So that the t statistic becomes 28 i m i which is not in the critical region so that the sample mean binding 132 fractions for Penicillin G and Tetracyclin do not differ signi cantly A 95 con dence interval is given by 28 i 2132127 or 7317 Code next page estimates all contrasts involving Pen G o 61 My 1 1000M 062bM 0936M 064dM along with the contrast comparing Pen G with mean of other four antibiotics 65 7 7 7 7 Here M M17M27M37M47M5 Osborne STEZZ proc glm dataone class drug model ydrugclparm estimate quottheta1quot drug 1 1 estimate quottheta2quot drug 1 0 1 estimate quottheta3quot drug 1 0 0 1 estimate quottheta4quot drug 1 0 0 0 1 estimate quottheta5quot drug 4 1 1 1 1divisor4 run The GLM Procedure Class Level Information Class Levels Values drug 5 1 2 3 4 5 Sum of Source DF Squares Mean Square F Value Pr gt F Model 4 1480823000 370205750 4088 lt0001 Error 15 135822500 9054833 Corrected Total 19 1616645500 R Square Coeff Var Root MSE y Mean 0915985 1312023 3009125 2293500 Source DF Type III SS Mean Square F Value Pr gt F drug 4 1480823000 370205750 4088 lt0001 Standard Parameter Estimate Error t Value Pr gt t theta1 27750000 212777270 130 02118 theta2 207750000 212777270 976 lt0001 theta3 95250000 212777270 448 00004 theta4 08000000 212777270 038 07122 theta5 70812500 168215202 421 00008 Parameter 95 Confidence Limits theta1 73102402 17602402 theta2 162397598 253102402 theta3 49897598 140602402 theta4 37352402 53352402 theta5 34958278 106666722 Osborne ST512 99 Orthogonal contrasts In the same way the SSTOT can be partitioned into independent components SSTrt and SSE the sum of squares for treatments SSTrt can be partitioned into t 1 independent components Let two contrasts 61 and 62 be given by 6161mcmt and 62d1M1quot39dtMt or t t 91 Z 02 and 92 Z dim 11 11 De nition The two contrasts 61 and 62 are mutually orthogonal if the products of their coef cients sum to zero 01d1ctdt eld 0 11 Consider several contrasts say k of them 61 6k The set is mutually orthogonal if all pairs are mutually orthogonal Examples 11000 and 00 1 10 orthogonal 1 12 12 0 0 and 000 1 1 orthogonal 11000 and 0 1 100 orthogonal 62v and Q orthogonal gt d and 6 are statistically independent Osborne STEZZ 100 Sums of squares for contrasts In the same way SSTrt was obtained for a treatment effect a sum of squares term can be obtained for a contrast 912 02 02 Tl 11 If 61 651 are t 1 mutually orthogonal contrasts then SSTrt 55ml 55 55914 SSW There is one df associated with a sum of squares for an indiVidual contrast 6j and if Q 0 then it can be shown that ESS9 02 To test H0 Qj 0 versus H1 Qj y 0 use 53m F J MSE on 1 numerator degree of freedom and N t denominator degrees of freedom For 61 M1 M2 in the binding fractions 282 MSE lt 000gt Using F005 1 15 454 is the value 61 0 plausible F 173 Osborne ST512 A new dataset Number of contaminants in IV uids made by t 3 pharmaceutical companies Cutter Abbott McGaW 255 105 577 264 288 515 342 98 214 331 275 413 234 221 401 217 240 260 1 2738 2045 3967 Sum of Mean Source df squares Square F Treatments or pharmacies 2 113646 56823 581 Error 15 146753 9784 Total 17 260400 Consider the following 2 contrasts A 01MM MA and 92110 w Q Are these contrasts orthogonal Q Are the estimated contrasts 61 and 02 independent Exercise Compute S S 61 and SS62 Add em up Osborne STEZZ 102 proc format value firmfmt 1quotCutterquot 2quotAbbottquot 3quotMcGawquot run data one infile quotpharmfirmdatquot input firm con format firm firmfmt run proc glm orderformatted title quotcontaminant particles in IV fluidsquot class firm model confirm contrast C avg of M and A firm 05 1 05 contrast McGaw Abbott firm 1 0 1 estimate C avg of M and A firm 05 1 05 estimate McGaw Abbott firm 1 0 1 run contaminant particles in IV fluids 1 The GLM Procedure Class Levels Values firm 3 Abbott Cutter McGaw Sum of Source DF Squares Mean Square F Value Pr gt F Model 2 1136463333 568231667 581 00136 Error 15 1467536667 97835778 Corrected Total 17 2604000000 R Square Coeff Var Root MSE con Mean 0436430 3391268 9891197 2916667 Contrast DF Contrast SS Mean Square F Value Pr gt F C avg of M and A 1 28622500 28622500 029 05965 McGaw Abbott 1 1107840833 1107840833 1132 00043 Standard Parameter Estimate Error t Value Pr gt t C avg of M and A 26750000 494559849 054 05965 McGaw Abbott 192166667 571068524 337 00043 Osborne STEZZ 103 Multiple Comparisons 0 Can t go carrying out many many tests of signi cance Willy nilly o eg consider the case with t 5 antibiotic treatments all simple pairwise contrasts of the form 9 m My 0 lt 3 10 tests of signi cance each at level or 005 o probability of committing at least one type I error 7 De nition When testing k contrasts the lexperimentwise error rate or familywise is fwe Prat least one type I error Methods for simultaneous inference for multiple contrasts include o Scheff o Bonferroni o Tukey om ozne ST512 A context in which multiplicity is a big issue Microairay experiments which may involve thousands of genes and tests a 520 92m 1 m E 1 410 gm Em E 3 3 2 n u NSGdL Naggwi in D12 ru5u3u1u12335 Lagzrmuchgwaieyjms Data courtesy of Cassi Mybuig Osborne STEZZ 105 Bonferroni Suppose interest lies in exactly k contrasts The Bonferroni adjust ment to or which controls f we is or or k Simultaneous 95 con dence intervals for the k contrasts given by 2 7 7 7 a all1 a2Y2 am i a 1 MSE Z n j and 0 52 b11717 5212 bin i a 1 MSE Z n 9 2 7 7 7 0 km k1Kk2Bktl it l MSE n 3 j where 1 denotes df for error ag 1 might have to be obtained using software For the binding fraction example consider only pairwise comparisons with Penicillin 91M1 M2792M1 M3793 N1N4764N1M5 We have k 4 0 ODSk 00125 and og 15 284 Osborne STEZZ 106 Substitution leads to 12 12 O2 O2 2 t 15 MSE 284 905 60 so that 95 con dence intervals for 61 62 63 64 take the form 131 13239 i 60 In SAS an adjustment for k 4 can be achieved with care proc glm dataone title quotBonferroni correction for 4 contrastsquot class drug model ydrugclparm alpha0125 estimate quottheta1quot drug 1 J H estimate quottheta2quot drug 1 0 1 estimate quottheta3quot drug 1 0 0 1 estimate quottheta4quot drug 1 0 0 0 1 run Bonferroni correction for 4 contrasts The GLM Procedure Class Level Information Class Levels Values drug 5 1 2 3 4 5 Standard Parameter Estimate Error t Value Pr gt t theta1 27750000 212777270 130 02118 theta2 207750000 212777270 976 lt0001 theta3 95250000 212777270 448 00004 theta4 08000000 212777270 038 07122 Parameter 9875 Confidence Limits theta1 32606985 88106985 theta2 268106985 147393015 theta3 155606985 34893015 theta4 68356985 52356985 actually siniultaneous 95 con dence intervals Osborne STEZZ 107 Another method Scheff For simultaneous 95 con dence intervals for ALL contrasts use 2 C44 4 t e 1FMSE z where Fquot Fa t 1 N t For a pairwise comparisons of means W and Mk this yields 234 1174 i t71FMSEl1nj1714 Using 04 005 need to specify o t from the design 0 Fquot sanie critical value as for H0 73 E 0 0 MS from the data 134 131 0 m y from the data For binding fraction data t 1FMSE 5 1306905 G i 744 If any two sample nieans differ by more than 744 they differ signif icantly For IV fluids gA 2045 yiM 39667 90 27383 and t 1FMSE i i 3 13689784 1549 W nk conclusion about pairwise contrasts conipare w Bonferroni Osborne STEZZ 108 Tukey Tukey s method is better than Scheff s method when making comparisons in balanced designs n n1 n2 m It is conservative controlling the experimentwise error rate and has a lower type ll error rate in these cases than Scheff It is more powerful For simple contrasts of the form 9Mj Mk totest H0260vsH167 0 reject H0 at level or if MSE gt tva where qt N t oz denotes or level studentized range for t means and N t degrees of freedom These studentized ranges can be found in Table CH of Rao For the IV data 13 15 005 367 Tukey s 95 honestly signif icant difference HSD for pairwise comparisons of treatment means in this balanced design are W784 367 T 1483 Osborne STEZZ 109 proc glm class firm model confirm means firmscheffe tukey run Tukey s Studentized Range HSD Test for con NOTE This test controls the Type I experimentwise error rate but it generally has a higher Type II error rate than REGWQ Alpha 005 Error Degrees of Freedom 15 Error Mean Square 9783578 Critical Value of Studentized Range 367338 Minimum Significant Difference 14833 Means with the same letter are not significantly different Tukey Grouping Mean N firm A 39667 6 McGaw B A 27383 6 Cutter B 20450 6 Abbott The GLM Procedure Scheffe s Test for con NOTE This test controls the Type I experimentwise error rate Alpha 005 Error Degrees of Freedom 15 Error Mean Square 9783578 Critical Value of F 368232 Minimum Significant Difference 15498 Means with the same letter are not significantly different Scheffe Grouping Mean N firm A 39667 6 McGaw B A 27383 6 Cutter B 20450 6 Abbott Osborne STEZZ 110 Expected mean squares De nition The lniean square for treatnientsl is given by MSTrt SSlTrtl 1 Z Zlgw 9192 t l t i 1 lty y 91 Z My 2 11 Q Why are there t 1 degrees of freedom associated with M S TN A Note that the terms in the 22 above do not depend on j so it is a sum of squares from t independent saniple treatment means leaving t 1 df for assessing variability It can be shown that EMSTrtH1 ESSTrtt 1H1 02 znilm M2 02 Tit L1 201239 M2 balanced case 02 mJ where 1 2 i V 2 wTt1 W m Note that under H0 m E M and 0 so that EMSTrtH0 ESSTrtt 1H0 Osborne ST512 111 De nition The is given by MSE Egg This is just a generalization of the pooled variance SE to the case of more than t 2 groups SS M S E N t 1 175 jn2 N t Z ZW J 292 11 31 H A 25 H r t V C HM A 2w 1 7175 1 ts ltNtsf E0922 02 gt 7 ltNt0 ltNt0 Nt a 2 a Osborne STEZZ 112 Sample size computations for one way ANOVA Now consider the null hypothesis in a balanced experiment using one way ANOVA to compare t treatment means and or 005 H03M1M2Mtu versus the alternative H13M 7 rijforsomei7 j Q Suppose that we intend to use a balanced design How big does our sample size ml n2 nt n need to be Of course the answer depends on lots of things namely 02 and how many treatment groups t we have and how much of a difference among the means we hope to be able to detect and with how big a probability Given Oi M1 Mt and 02 we can choose n to ensure a of at least 6 using the noncentml F distribution Recall that the critical region for the statistic F MSTMS is everything bigger than Fa t 1 N t Fquot The power of the F test conducted using or 005 to reject H 0 under this alternative is given by 1 6PrMSTMSE gtFH1 is true 1 Osborne STEZZ 113 Let 73 m M for each treatment i so that H037172739t0 When some H1 is true and the sample size n is used in each group it can be shown that the F ratio has the noncentral F distribution with noncentrality paranieter t t e y 2 2 e er V53 quot2 a 1 1 This is the paranieterization for the F distribution used in both SAS and 8 One way to obtain an adequate sample size is trial and error Soft ware packages can be used to get probabilities of the form 1 for various values of n Russ Lenth s website is also terri c and helpful http www stat uiowa eduquotrlenthPower Osborne ST512 114 An example suppose that a balanced completely randomized design CRD is to be used to test for a difference in the number of contam inant particles in lV fluid for three pharmaceutical companies It is believed that the standard deviation on a given observation is about 100 particles for each company In order to test H0 M1 M2 M3 at level or 005 how large does the common sample size n need to be Q What alternative to H0 would be meaningful What is a A The alternative H1 M1 M2 ii 30 230413 M60 320 would be meaningful Assume 0 z 100 Q What is an acceptable type ll error rate or what kind of power are we looking for A Suppose that 1 6 08 should be good enough To obtain probabilities of the form 1 we need the noncentrality parameter y v 7mg lt The or 005 critical value for H0 is given by F F3 13n 1005 7392 7393 202 n 302 302 6021002 054m 0 We need the area to the right of F for the noncentral F distribution with degrees of freedom 2 and 3n 1 and noncentrality parameter y 054m The following printout suggests the suf ciency of n 19 for power of 1 6 08 Osborne STEZZ data one do n3 to 25 output end run data one set one t3 nu1t 1 nu2tn 1 sumtau2 302 302 602 sigma210000 sigma2u23var100100190 ncptsigma2u2sigma2 nCpnsumtau2sigma2 qffinv095nu1nu2 pfprobfqfnu1nu2ncp power1 pf run proc printrun DBS N T NU1 NU2 SUMTAU2 SIGMA2 SIGMA2U 1 3 3 2 6 5400 10000 1800 2 4 3 2 9 5400 10000 1800 3 5 3 2 12 5400 10000 1800 4 6 3 2 15 5400 10000 1800 5 7 3 2 18 5400 10000 1800 6 8 3 2 21 5400 10000 1800 7 9 3 2 24 5400 10000 1800 8 10 3 2 27 5400 10000 1800 9 11 3 2 30 5400 10000 1800 10 12 3 2 33 5400 10000 1800 11 13 3 2 36 5400 10000 1800 12 14 3 2 39 5400 10000 1800 13 15 3 2 42 5400 10000 1800 14 16 3 2 45 5400 10000 1800 15 17 3 2 48 5400 10000 1800 16 18 3 2 51 5400 10000 1800 17 19 3 2 54 5400 10000 1800 18 20 3 2 57 5400 10000 1800 19 21 3 2 60 5400 10000 1800 20 22 3 2 63 5400 10000 1800 21 23 3 2 66 5400 10000 1800 22 24 3 2 69 5400 10000 1800 23 25 3 2 72 5400 10000 1800 11 mm mm gt gtmmMMH mmmmmmmmmmmmmmmmmmmmmugtm OOOOOOOOOOOOOOOOOOOOOOO OOOOOOOOOOOOOOOOOOOOOOO 115 Osborne STEZZ 116 Another example suppose we want to test equal niean binding frac tions aniong antibiotics against the alternative H13MPM37MTM37MSM 67MEM7MCM so that 71 373 373 67 75 Z 0 Assume 0 3 and we need to use or 6 005 The noncentrality parameter is given by n v 3 lt3gt lt 3 l The following code should do the trick data one do n2 to 10 output end run data one set one t5 nu1t 1 nu2tn 1 sumtau23232 62 sigma29 ncpnsumtau2sigma2 qffinv095nu1nu2 pfprobfqfnu1nu2ncp power1 pf run proc printrun DBS N T NU1 NU2 SUMTAU2 SIGMA2 NCP QF PF POWER 1 2 5 4 5 54 9 12 5 19217 059246 0 40754 2 3 5 4 10 54 9 18 3 47805 022465 0 77535 3 4 5 4 15 54 9 24 3 05557 006437 0 93563 4 5 5 4 20 54 9 30 2 86608 001533 0 98467 5 6 5 4 25 54 9 36 2 75871 000319 0 99681 6 7 5 4 30 54 9 42 2 68963 000060 0 99940 7 8 5 4 35 54 9 48 2 64147 000010 0 99990 8 9 5 4 40 54 9 54 2 60597 000002 0 99998 9 10 5 4 45 54 9 60 2 57874 000000 1 00000 Osborne ST512 Example poultry science experiment measures bodyweights of chick ens from a 4 diet groups characterized by protein concentration in diet Orthogonal polynomial contrasts 0 Y 21 day bodyweights of chickens 0 completely randomized design with one factor protein in diet with four equally spaced levels 0 thanks to P Plumstead for data 0 n 18 pens N 72 diet 1 level of diet mean diet std deV Tukey group protein 1 s grouping 1 218 993 38 2 235 1003 28 3 252 1022 39 4 269 1050 32 One way ANOVA table Sum of Source DF Squares Mean Square F Value Pr gt F Model 3 343117666 114372555 957 ltOOO1 Error 68 81279 4678 1195 2863 Corrected Total 71 1155912344 R Square Coeff Var Root MSE AMBW21D Mean 0296837 3399254 3457291 1017073 Osborne ST512 118 Some omitted exam questions 1 Sketch a plot of mean bodyweight at 21 days against protein content 2 Consider the following three sample contrasts l 3g1 132 133 3134 92 131 132 133 134 93 Qi 3132 3133 134 a True false These estimated contrasts are orthogonal b True false lf contrast sums of squares are obtained then 55m 55 5593 SSTrt c Report 01 02 03 d Provide an expression for the standard error of 02 e Estimate the standard error of 02 f Report 5501 3 Fitting the SLR model leads to SSReg 32742 and SST0t 115591 a Report the F ratio for testing for a lack of t of the linear model b The appropriate critical value for a test with level or 005 is F 313 Draw a conclusion about the adequacy of the linear model using or 005 is there evidence that the linear model is inadequate Osborne ST512 119 The contrasts in problem 2 are called orthogonal polynomial con trasts The table below gives coef cients for orthogonal polynomial contrasts for balanced single factor experiments with 3 4 or 5 equally spaced levels Factor Poly Coef cients for levels Degree contrast y H Q21 Q31 Q41 115 SS 3 1 31 1 0 1 1361160 2 92 1 2 1 Rlt62l607 61 4 1 01 3 1 1 3 1361160 2 02 1 1 1 1 R62l o61 3 63 1 3 3 1 mammals 5 1 01 2 1 0 1 2 7 2 a 2 1 2 1 2 7 3 a 1 2 0 2 1 7 4 all 1 4 6 4 1 7 Rightmost column indicates extra SS in MLR of the form M6061622quot39 The contrast corresponding to a polynomial of degree 19 can be used to test for a 19 degree association 0 large indicates linear association between y and m 0 large indicates quadratic association between y and m 0 large 63 indicates cubic association between y and m This is computationally and otherwise easier than tting polyno mial regressions of various degrees Osborne STEZZ 120 Proc glm title quotprotein concentration and chicken weightsquot class cp MDDEL AMBW21Dcp contrast cp linear cp 3 1 1 3 contrast CP quadratic CP 1 1 1 contrast CP cubic CP 1 3 3 1 contrast all three CP 3 1 1 3 cp 1 1 1 1 cp 1 3 3 1 all three tests that the 3 vector of contrasts is 000 estimate cp linear cp 3 1 1 3 estimate CP quadratic CP 1 1 1 1 estimate CP cubic CP 1 3 3 1 RUN proc glm no class statement will fit regression model model ambw21dcp cpcp cpcpcp H run proc glm model ambw21dcp run protein concentration and chicken weights 1 The GLM Procedure Class Levels Values CP 4 218 235 252 269 Sum of Source DF Squares Mean Square F Value Pr gt F Model 3 343117666 114372555 957 lt0001 Error 68 812794678 11952863 Corrected Total 71 1155912344 Contrast DF Contrast SS Mean Square F Value Pr gt F cp linear 1 3274155648 3274155648 2739 lt0001 CP quadratic 1 156866674 156866674 131 02560 CP cubic 1 154337 154337 000 09714 all three 3 3431176658 1143725553 957 lt0001 Standard Parameter Estimate Error t Value Pr gt t cp linear 190734127 364430498 523 lt0001 CP quadratic 18670635 162978273 115 02560 CP cubic 1309524 364430498 004 09714 Osborne STEZZ 121 protein concentration and chicken weights 3 The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr gt F Model 3 343117666 114372555 957 lt0001 Error 68 812794678 11952863 Corrected Total 71 1155912344 Source DF Type I SS Mean Square F Value Pr gt F CP 1 3274155648 3274155648 2739 lt0001 CPCP 1 156866674 156866674 131 02560 CPCPCP 1 154337 154337 000 09714 Standard Parameter Estimate Error t Value Pr gt t Intercept 1060706771 1769023708 006 09524 CP 11320308 219280559 001 09959 CPCP 1630049 9032122 002 09857 CPCPCP 0044424 123628 004 09714 protein concentration and chicken weights 6 The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr gt F Model 1 327415565 327415565 2766 lt0001 Error 70 828496779 1183 5668 Corrected Total 71 1155912344 Standard Parameter Estimate Error t Value Pr gt t Intercept 7438748249 5210077470 1428 lt0001 CP 112196545 213317368 526 lt0001 Note MSE Linear regression on 1 preferred to one factor model for Plumstead s data Multiple comparisons among treatment means might be unnecessary ST 512 Exptl Stats for Biol Sciences II Weeks 910 Multi factor ANOVA Problems Ch 13 o 2 X 2 experiments 0 a X b experiments 0 three factor ANOVA o nested vs crossed designs not described in packet An example of a 2 X 2 study Cholesterol measurements for random samples of my E 7 people from four populations are given in the table below The groups cohorts are de ned as follows I The population of women younger than 50 H The population of men younger than 50 Ill The population of women 50 years or older IV The population of men 50 years or older Group Cholesterol level avg std dev l 221 213 202 183 185 197 162 g11947 520 H 271 192 189 209 227 236 142 13112094 541 111 262 193 224 201 161 178 265 31112120 540 IV 192 253 248 278 232 267 289 131V2513 532 Osborne STEZZ 123 One way AN OVA Model Y2quot M E j L 4F 73 it ligj i1234 j127and EM 1101 Mug Parameters M 7391 7392 7393 7394 02 With 2173 0 constraint One way ANOVA table The GLM Procedure Class Levels Values cohort 4 I II III IV Number of observations 28 Sum of Source DF Squares Mean Square F Value Pr gt F 1228085714 409361905 346 00323 Model 3 Error 24 2843457143 118477381 Corrected Total 27 40715 42857 R Square Coeff Var Root MSE y Mean 0301627 1587245 3442054 2168571 Source DF Type I SS Mean Square F Value Pr gt F cohort 3 12280 85714 409361905 346 00323 Conclusion so far is the cohort means m or M 7391 are not plausibly equal using oz 005 Osborne STEZZ 124 Some terminology De nition A in an experiment or study is a variable whose effect on the response is of primary interest The values that a factor takes in the experiment are called factor or De nition ln lcompletely randomized designsl experimental units are randomly assigned to factor levels or treatment groups Note The cholesterol study is NOT a completely randomized design as randomization of subjects to different levels of AGE and GENDER isn t possible De nition When the same number of units are used for each treat ment the design is ln one way analysis of cholesterol data GOHORT is the only factor This factor can be broken down into two factors in a two way analysis AGE factor A and GENDER factor B De nition If there are observations at all combinations of all factors the design is otherwise it is Osborne STEZZ H r kf oN N 125 Exercise Estimate the mean difference in cholesterol between young men and young women Estimate the mean difference between old men and old women Estimate the mean difference between men and women Estimate the mean difference between older and younger folks Estimate the mean difference between the differences estimated in 1 and 2 Provide standard errors for all of these estimated contrasts Specify the vectors de ning these contrasts For example the rst contrast of cohort means can be written 911717070 M2M1 Osborne STEZZ 126 Consider the following contrasts of the cohort cholesterol means in the population 63 11 11u 64 1 111u 65 111 1 M Q Are these contrasts orthogonal Q TrueFalse 55 3 5554 5555 SST7 t Another exercise 1 Compute the sums of squares for the estimated contrasts in 3 4 and 5 using the exercise just completed and the fact that if 5 Z 611 then A 92 SSW 62 712 2 Formulate a test of H0 62v 0 for each of these three contrasts Obtain the F ratio for each of these tests 3 Obtain the or 005 critical region for each test Compare the observed F ratios to critical value and draw conclusions about a an age effect b a gender effect c an age gtlt gender interaction Osborne STEZZ 127 Types of effects Two way ANOVA model for the cholesterol measurements Yijk M f 061 6339 f 046 Em i12aandj12bandk127n EM 131 N0 02 Parameter constraints 21041 Si by 0 and Exam E 0 for eachj and Z a j E 0 for each i Factor A AGE has a 2 levels A1 younger and A2 older Factor B GENDER has b 2 levels B1 female and B2 male Three kinds of effects in 2 X 2 designs effects are simple contrasts o MAlB MU M1 simple effect of gender for young folks H o MAB1 MU M1 simple effect of age for women effects are differences of simple effects MAB 1114le If431 HIV MU M111 1 difference between simple age effects for men and women difference between simple gender effects for old and young folks interaction effect of AGE and GENDER 3 effects are averages or sums of simple effects 1 MA i Hf431 f MA32 D MB MB New Exercise Classify the contrasts in the last exercise as simple inter action or main effects Osborne ST512 128 Partitioning the treatment SS into t 1 orthogonal components 12281 SS Trt SS 63 SS 64 SS6 5103 6121 1056 o a 1b 1 df for AB interaction 0 a 1 df for main effect of A o b 1 df for main effect of B F test for interaction effect To test for interaction H0 3 0661106612 0662106622 0 vs H1 016 74 0 for some ij use 65 MAB and Sam a 1b 1 MSE on 1 and 28 4 24 numerator denominator df For cholesterol datathe estimated interaction effect is 05 MAB 2513 2094 212 1947 419 173 246 the associated sum of squares is 2462 2462 5565 1056 lt712 lt712 4 T T 7 and F 10561185 09 which isn t signi cant at or 005 on 124 df Osborne ST512 129 F test for main effects To test for main effect of A AGE H02061O20VS11112061740010627400 use 64 MA and A F 5564 MSE on 1 24 df The estimated main effect of AGE is A 2513 2094 212 1947 592 A 296 the associated sum of squares is A 296 2 296 2 3364 12 12 21 12 l 6121 7 and F 61211185 52 lsince F005 1 24 426 AGE effect signi cant at oz 005 Similarly for the main effect of B gender H0161620VS H13617 00T627 O use 63 MB on 1 and 24 df i 2094 1947 2513 212 i 113 2 2 27 A 27 27 SSW 12 12 12 12 1 5103 27 27 27 Z7 7 and F 51031185 43 lsince F005 1 24 426 GENDER effect signi cant at oz 005 Osborne STEZZ 130 Con dence intervals for effects If 9 0 1001 00 con dence interval given by 2 i ta2 N t MSE Z 6 2 For the cholesterol data with t0025 24 206 we have a 95 con dence interval for the AGE gtlt GENDER interaction effect 4 246i206i1185 or 246 i206260 or 2978 a 95 con dence interval for the AGE effect 1 296 i 206 51185 or 296 i 206130 or 27 564 and a 95 con dence interval for the GENDER effect 1 270 i 206 51185 or 270 i 206130 or 015 539 The term under the is the estimated standard error of the esti mated contrast Zciym iMSlE Osborne STEZZ 131 SAS code for Cholesterol problem data one input cohort do subj1 to 7 input y if cohortquotIquot then do genderquotWquot agequotyquot end else if cohortquotIIquot then do genderquotMquot agequotyquotend else if cohortquotIIIquot then do genderquotWquot agequotoquotend else if cohortquotIVquot then do genderquotMquot agequotoquotend output run proc glm class cohort model ycohortclparm constrast quotmain effect of age quot cohort 1 1 1 1 constrast quotmain effect of genderquot cohort 1 1 1 1 constrast quotinteraction effect quot cohort 1 1 1 1 estimate quotmain effect of age quot cohort 1 1 1 1divisor2 estimate quotmain effect of genderquot cohort 1 1 1 1divisor2 estimate quotinteraction effect quot cohort 1 1 1 1 run proc glm class gender age model yagegender run Osborne STEZZ SAS output abbreviated for Cholesterol problem The SAS System The GLM Procedure Class Level Information 132 Class Levels Values cohort 4 I II III IV Sum of Source DF Squares Mean Square F Value Pr gt F Model 3 1228085714 409361905 346 00323 Error 24 2843457143 118477381 Corrected Total 27 4071542857 R Square Coeff Var Root MSE y Mean 0301627 1587245 3442054 2168571 Contrast DF Contrast SS Mean Square F Value Pr gt F main effect of age 1 6121285714 6121285714 517 00323 main effect of gender 1 5103000000 5103000000 431 00488 interaction effect 1 1056571429 1056571429 089 03544 Standard Parameter Estimate Error t Value Pr gt t main effect of age 295714286 130097426 227 00323 main effect of gender 270000000 130097426 208 00488 interaction effect 245714286 260194851 094 03544 Parameter 95 Confidence Limits main effect of age 27206396 564222175 main effect of gender 01492111 538507889 interaction effect 782730065 291301493 Osborne STEZZ Source Model Error Corrected Total R Square 0301627 Source age gender genderage The GLM Procedure Class Levels Values gender 2 M W age 2 o y Sum of DF Squares Mean Square F Value 3 1228085714 409361905 346 24 2843457143 118477381 27 4071542857 Coeff Var Root MSE y Mean 1587245 3442054 2168571 DF Type I SS Mean Square F Value 1 6121285714 6121285714 517 1 5103000000 5103000000 431 1 1056571429 1056571429 089 Exercise 133 Pr gt F 00323 Pr gt F 00323 00488 03544 1 Express the effects below in terms of model parameters 04 63 046 2 Estimate these effects AGE younger 1 older 2 1947 2120 GENDER female j 1 male j 2 2094 2513 Osborne ST512 134 a X b designs An example Entomologist records energy expended y by N 27 honeybees at a 3 temperature A levels 20 30 40 C consuming liquids with b 3 levels of sucrose concentration B 20 40 60 in a balanced completely randomized 3 X 3 design Temp Suc Sample 20 20 31 37 47 20 40 55 67 73 20 60 79 92 93 30 20 6 69 75 30 40 115 129 134 30 60 175 158 147 40 20 77 83 95 40 40 157 143 159 40 60 191 180 199 The SAS System The GLM Procedure Class Levels Values TEMP 3 20 30 40 SUC 3 20 40 60 Sum of Source DF Squares Mean Square F Value Pr gt F Model 8 6302474074 78 7809259 8707 lt 0001 Error 18 16 2866667 09048148 Corrected Total 26 6465340741 R Square Coeff Var Root MSE y Mean 0974809 8795505 0951218 1081481 Source DF Type I SS Mean Square F Value Pr gt F TEMP 2 2931585185 1465792593 16200 lt 0001 SUC 2 3099585185 1549792593 17128 lt 0001 TEMPSUC 4 271303704 67825926 750 00010 Osborne STEZZ 135 3 X 3 honeybee example continued ll llBrEt On plot amp Unlike 2 X 2 study not possible to express interaction between factors A TEMP and B SUCROSE using a single number w 1 Level of Level of Y TEMP SUC n Mean SD 20 20 3 38333333 080829038 20 40 3 65000000 091651514 20 60 3 88000000 07810249 30 20 3 68000000 075498344 30 40 3 12 6000000 098488578 30 60 3 16 0000000 141067360 40 20 3 85000000 091651514 40 40 3 153000000 087177979 40 60 3 190000000 095393920 The plot above is called an interaction plot ln 2 X 2 designs Rao distinguishes between lqualitative l and lquantitative l interactions de pending on Whether or not the sign of the two simple effects is the same or different at the two levels of the other factor Exercise Obtain an interaction plot for the cholesterol data Char acterize the observed interaction as qualitative or quantitative Osborne STEZZ 136 Partitioning SST0t in a X b design Two way ANOVA Model lYijC M 041 61 046 Eijkl i12aandj12bandk12n Deviations total 3 yijk Q due to level i of factor A 324 y7 due to levelj of factor B y7j g due to levels i of factor A and j of factor B after subtracting niain effects g j Q Qi g gj g gijgigjg SSlTOtl Z Z szjk g2 239 j k SSlAl Z Z ZOJVH g2 239 j k SSlBl Z Z ZOEH glz 239 j k SSlABl Z Z Z g 911 glz 239 j k SSlEl Z Z 219 2792 239 39 k yiy k EH gt 1117 1119 gm i 1117 gt yiy k 1137 yijk EH gt 1117 gt H 113 1117 yiy k 1137 Square both sides X products vanish See output p 13 note additivity Also Type I and Ill SS equal Osborne STEZZ 137 a X b exanipie continued Test for interaction effect is anaiagous to p7 H0 06 E 0 vs H1 06 y 0 for son1eij F i M S AB i MS on a 1b 1 and N ab numerator denominator df For honeybee data 3 3 55MB 0 Z ZULH Q 911 f Qgt2 271 11 31 271 4 75 09 which is highly signi cant p 0001 on 418 df We could proceed to test for main effects but we won t Q Why not A Because effect of one factor depends on the level of the other factor it doesn t make sense to talk about main effects If one insists on main effects the appropriate F ratios are iSSiAia l FAiwona 1JV ab df FBWonb 1N ab df MSE Osborne ST512 138 Another a X b design no interaction Yields on 36 tomato crops from balanced complete crossed design with a 3 varieties A at b 4 planting densities B Variety Density khectare Sample 1 10 79 92 105 10 81 86 101 10 153 161 175 20 112 128 133 20 115 127 137 20 166 185 192 121 126 140 30 137 144 154 30 180 208 210 40 91 108 125 40 113 125 145 40 172 184 189 CONwawawa 0439 O ANOVA table The SAS System 1 The GLM Procedure Class Levels Values a 3 1 2 3 b 4 10 20 30 40 Number of observations 36 Sum of Source DF Squares Mean Square F Value Pr gt F Model 11 4223155556 38 3923232 2422 lt0001 Error 24 380400000 15850000 Corrected Total 35 4603555556 R Square Coeff Var Root MSE y Mean 0917368 9064568 1258968 1388889 Source DF Type I SS Mean Square F Value Pr gt F 2 3275972222 1637986111 10334 lt0001 b 3 866866667 288955556 1823 lt0001 6 80316667 13386111 084 05484 Osborne STEZZ 139 Analysis of replicated two or more factor designs often proceed according to the following directions 1 Check for interaction 2 If no interaction analyze main effects 3 If interaction analyze siniple effects Since there is no evidence of interaction we proceed to analyze main effects The F ratios for factors A and B are each highly signi cant p lt 00001 Level of y a N Mean Std Dev 1 12 11 3333333 188309867 2 12 122083333 234887142 3 12 181250000 173369023 Level of y b N Mean Std Dev 10 9 114777778 375458978 2O 9 143888889 296835158 3O 9 157777778 336480972 4O 9 13 9111111 35325077 Osborne STEZZ 140 A conventional look at main effects is just to make pairwise com parisons aniong marginal nieans after averaging over other factors Pairwise comparisons of density means using Tukey s procedure with or 005 are given below Use means btukey to obtain the output The GLM Procedure Tukey s Studentized Range HSD Test for y NOTE This test controls the Type I experimentwise error rate but it generally has a higher Type II error rate than REGWQ Alpha 005 Error Degrees of Freedom 24 Error Mean Square 1585 Critical Value of Studentized Range 390126 Minimum Significant Difference 16372 Means with the same letter are not significantly different Tukey Grouping Mean N b A 157778 9 30 A B A 143889 9 20 B B 139111 9 40 C 114778 9 1O Osborne ST512 141 A three factor example In a balanced complete crossed design N 36 shrimp were ran domized to abc 12 treatment combinations from the factors below A1 Temperature at 250 C A2 Temperature at 350 C B1 Density of shrimp population at 80 shrimp 401 B2 Density of shrimp population at 160 shrimp 40 C1 Salinity at 10 units C2 Salinity at 25 units C3 Salinity at 40 units The response variable of interest is weight gain KW after four weeks Three way ANOVA Model Yijkl MO i jVk 046 am 6 056Vijk Emu 12 12 123 123 EM a N002 N wk H Many constraints such as E n 0 E nj6j E nl k 0 239 j k where n denotes the number of observations at the ith level of factor A Osborne STEZZ Source Model Error Corrected Total R Square 0870301 Source The GLM Procedure Sum of DF Squares Mean Square 11 4676363333 425123939 24 696906667 29037778 35 5373270000 Coeff Var Root MSE y 1930270 5388671 279 DF Type I SS Mean Square 1 153760000 153760000 1 212187778 212187778 1 87111111 87111111 2 967625000 483812500 2 3008551667 1504275833 2 6743889 3371944 2 240383889 120191944 Interaction plot for shrimp wt gains F Value 14 Mean 1667 64 F Value 5 7 3 O 30 31 142 Pr gt F lt OOAAOOO 0001 Osborne STEZZ Level of Level of a b N 25 80 9 25 160 9 35 80 9 35 160 9 Level of Level of a C N 25 10 6 25 25 6 25 40 6 35 10 6 35 25 6 35 40 6 Level of Level of b C N 80 10 6 80 25 6 80 40 6 160 10 6 160 25 6 160 40 6 Level of Level of Level of a b C 25 80 10 25 80 25 25 80 40 25 160 10 25 160 25 25 160 40 35 80 10 35 80 25 35 80 40 35 160 10 35 160 25 35 160 40 143 111111 Std Dev 185106051 128739077 85475305 57953525 y mmmmmmmmmmmmz 70 399 305 369 293 236 Std Dev 15109600 114206246 69987618 56450864 45375838 38096807 Std Dev 188065326 122218520 77415761 144240655 74529636 32788718 y 666667 Std Dev 17156146 87648921 59858166 16623277 46971623 Osborne STEZZ 144 Interpretation of second order interaction interaction is between two factors interaction is between three factors Consider the AB interaction at each of three levels C 1 C2 C3 To do this look at three 2 X 2 tables as follows B 01 El 132 A A1 70 71 A2 403 331 B 02 El 132 A A1 466 333 A2 275 312 B 03 El 132 A A1 3590 252 A2 243 231 Q How is the ABC interaction nianifested here A We could conipute ABC1 ABC2 ABCg and see if these rst order interactions with C xed are the same We know they are not by the F A BC ratio and p Value 12ABC391 408 70 331 71 z 77 Exercise Obtain 2ABC2 and MABCg as well as AB interaction plots for C 1 C 2 and C 3 lnterpret the plots Osborne STEZZ 145 Getting interaction contrasts using the ESTIMATE statement in GLM To get SAS to estimate the interaction like MABCQ the AB inter action at C 1 you must specify the parameters involved We saw on the last page that A ABC Hf 1 Q211 9111 y221 9121 effect of A at 3101 effect of A at 3201 Using the model to specify parameters we can write Ef z11l M 062 61 V1O l2104Vl216V11046V211 Q111 M 061 61 V1 06611 06V11 6V11 066V111 3219 M 062 62 V1 06622 OZV21 6V21 066V221 Ey 121 M 061 62 V1 06612 04Vl116V21046V121 E E Add these all up to get the contrast we re interested in MABC1 Note that all terms vanish except the second order parameters and rst order 0 parameters Ill4301 04521 04511 06622 04612 f 046V211 046V111 046V221066V121 These can be rearranged so that they agree with the ordering of the treatment combinations employed by the ESTIMATE statement Ill4301 0461106612 06621 06622 046V111066V121066Vl211 066V221 Osborne STEIZ 146 ESTIMATE statement with two two IeveI factors If A appears before B in the CLASS statement SAS uses the foIIow ing ordering for 046 terms when specifying contrasts in ESTIMATE or CONTRAST statements A1B1A1B2A2B1A2B2 ESTIMATE statement with two two IeveI and a three IeveI factor SimiIarIy with a CLASS a b c statement the foIIowing order is used for second order interaction parameters 4113101 4113102 4113103 4113201 4113202 4113203 A2B1C 1A2B1C 2A2B1C 3A2B2C 1A2B2C 2A2B2C 3 If 046 is a vector of AB interaction effects with the defauIt ordering and Iikewise for 0467 then the contrast MABCl on p 25 can be written 1441301 1 1 1 1046 1 0 0 1 0 0 1 0 0 10 0mm SimiIarIy for MABCz and MIABCg 1441302 111 10460 100100100 10046y 1441303 1 1 1 10400 0 10 0 1 0 0 10 0 1046y Osborne ST512 147 proc glm Class a b C model yabc estimate quottheta1 ABC1quot ab 1 1 1 1 abc 1 0 0 1 0 0 1 0 0 1 0 0 estimate quottheta2 ABC2quot ab 1 1 1 1 abc 0 1 0 0 1 0 0 1 0 0 1 0 estimate quottheta3 ABC3quot ab 1 1 1 abc 0 0 1 0 0 1 0 0 1 0 0 1 estimate quott1 avt2t3quot abc 2 1 1 2 1 1 2 1 1 2 1 1divisor2 means alblc run Standard Parameter Estimate Error t Value Pr gt t theta1 ABCl 77333333 622230159 124 02259 theta2 ABC2 169666667 622230159 273 00118 theta3 ABC3 94333333 622230159 152 01426 t1 avt2t3 209333333 762073196 275 00112 t2 avt1t3 161166667 762073196 211 00450 The F A BC ratio indicates the three contrasts estimated above are not plausibly p 005 equal ABCl 1BC392 and MABCg dif fer signi cantly Interaction plots from p24 suggest the comparison a M301 imam 041303 Adding up all the coef cients in this combination yields the contrast below to use with an ESTIMATE statement 1 1 1 1 1 1 1 1 7 W W 17 3 3 173 3 177 2 2 2 2 2 2 2 2 PS Three factor interactions are not easily interpreted Effects 1 can sometimes be made additive through a transformation of the response Osborne STEZZ 148 Activity w ESTIMATE statement A linear function of parameters is estimable if and only if there is a linear combination of Y whose expected value is 6 Exercise identify the estimable contrasts in each of the ESTIMATE statements in the correspondence below which pertains to a 3 X 2 study with factors and levels Factor Levels A additive acetic nothing sorbate B uv 01 To osbornestatncsuedu Subject non estimatable estimate statements I am still having trouble with the estimate statements the only ones that work for the additiveuv interaction are where we contrast the same additive over the uv can anything be done about this proc glm class additive uv model ycountadditive uv uvadditive estimate acetic uv0 vs acetic uv1 uv 1 1 uvadditive 1 1 O O O 0 estimate acetic uv0 vs nothing uv0 uv 1 1 uvadditive 1 O 1 O O 0 estimate acetic uv0 vs nothing uv1 uv 1 1 uvadditive 1 O O 1 O 0 estimate acetic uv0 vs sorbate uv0 uv 1 1 uvadditive 1 O O O 1 0 estimate acetic uv0 vs sorbate uv1 uv 1 1 uvadditive 1 O O O O 1 estimate acetic uv1 vs nothing uv0 uv 1 1 uvadditive O 1 1 O O 0 estimate acetic uv1 vs nothing uv1 uv 1 1 uvadditive O 1 O 1 O 0 estimate acetic uv1 vs sorbate uv0 uv 1 1 uvadditive O 1 O O 1 0 estimate acetic uv1 vs sorbate uv1 uv 1 1 uvadditive O 1 O O O 1 estimate nothing uv0 vs nothing uv1 uv 1 1 uvadditive O O 1 1 O 0 estimate nothing uv0 vs sorbate uv0 uv 1 1 uvadditive O O 1 O 1 0 estimate nothing uv0 vs sorbate uv1 uv 1 1 uvadditive O O 1 O O 1 estimate nothing uv1 vs sorbate uv0 uv 1 1 uvadditive O O O 1 1 0 estimate nothing uv1 vs sorbate uv1 uv 1 1 uvadditive O O O 1 O 1 estimate sorbate uv0 vs sorbate uv1 uv 1 1 uvadditive O O O O 1 1 estimate uv0 vs uv1 uv 1 1 estimate acetic vs nothing additive 1 1 estimate acetic vs sorbate additive 1 O 1 estimate nothing vs sorbate additive O 1 1 Osborne ST512 149 ST 512 Exptl Stats for Biol Sciences II Fall 2003 Dr Jason A Osborne Supplement on design imbalance Recall the 2 X 2 cholesterol study Suppose the study is unbalanced and the data are given by Gender Age Male Female Marginal mean young 271192189209 162 111 2123 227236 old 289 262193224201 211 2216 161178265 111 2304 121 2058 1311 22077 1312 1627 1321 2897 1322 212 Consider an additive two factor ANOVA model Yijk Mai6jEijk Exercise nish parametric expressions for expected values below 1 EY1 M 041 661 62 Ema EY1 EY2 aa2b 17b 2 Marginal sample means are not real useful in this unbalanced study Q How are group population means estimated then A Least squares means What would be estimated by marginal means if design were balanced Osborne STEZZ 150 Parametric expressions for all the population means of interest are given below for the additive model Population group effect of interest estimate Young folks M on 62 18803 Older folks M a2 an 62 Men M 041 042 61 Women M og 042 62 Young men M on 61 Older men M 042 61 Young women M on 62 Older women M 042 62 Invoking the command lsmeans age gender will report least squares estimates for the rst four means above The GLM Procedure Least Squares Means Standard gender y LSMEAN Error Pr gt t m 251525773 16233482 lt0001 w 183597938 15842256 lt0001 Standard age y LSMEAN Error Pr gt t jr 188025773 16233482 lt0001 sr 247097938 15842256 lt0001 All of these quantities are estimated using linear combinations of the treatment means of the form 9 Cn n 0121312 0211321 0221322 A 2 The coe icients are chosen so that E 6 9 and Z is minimized 2 Osborne ST512 151 Example What are the coe icients for the contrast which estimates the population mean for young folks M 041 62 with min imum variance 611 612 1coeff for 041 621 622 0coeff for 042 1 011 021 Elcoeff for 61 1 612 622 coeff for 62 Variance is then proportional to 2 2 2 2 2 2 1 2 1 2 c c c c c 1 c 611 611 A g g ii 112 2 11 12 21 22 11 12 21 22 which is minimized at 011 8 by setting the derivative to zero and solving The least squares mean is then 66 7 1 6 7 66 1 7 E912 5 gly21 a y22 18803 Similarly for old folks the contrast with minimum variance has 611 612 A 66 0 1 97y11 and 1 1 621 5 1897622 5 1897 so that the estimate for the old folks mean is 18 18 1 18 1 18 2471 97y11 97y12 2 97y21 2 97y22 Exercise obtain least squares estimators and estimates of marginal means for men and women as well as for each agegtltgender combina tion Osborne STEZZ 152 Q ls there an age effect Should we base our conclusion on E Z Zig g2 3256 239 j k A Might not be a good idea if factor B has an effect This is the type 1 SS for Age if Age is the rst factor entered into the model or the so called unadjusted sum of squares for age Alternatively consider the contrast Gage d1 642 We can obtain the coef cients of the LS estimate of this contrast and then use them to get SSWWE which is the sum of squares for the age effect adjusted for gender or type ll sum of squares for age Qage 561 562 0111311 012g12 0211321 0221322 where 611 612 1coeff for 041 621 622 1coeff for 042 611 621 0coeff for 61 612 622 0coeff for 62 Var ag is minimized when 011 3 which leads to age 5907 with 59072 48 74W Hwy 6044 97 917 T Osborne STEZZ 153 I 221 213 202 183 185 197 162 II 271 192 189 209 227 236 142 III 262 193 224 201 161 178 265 IV 192 253 248 278 232 267 289 options ls75 data one input gender age 2 do i1 to 7 input y 00 output end cards w jr 162 m jr 271 192 189 209 227 236 w sr 262 193 224 201 161 178 265 m sr run proc glm class age gender model yage gendersolution lsmeans g estimate estimate estimate estimate estimate estimate estimate estimate contrast estimate contrast ender agestderr quotlsmean for young folksquot intercept 2 age 2 0 gender 1 1divisor2 quotlsmean for older folksquot intercept 2 age 0 2 gender 1 1divisor2 quotlsmean for menquot intercept 2 age 1 1 gender 2 0divisor2 quotlsmean for womenquot intercept 2 age 1 1 gender 0 2divisor2 quotlsmean for young menquot intercept 1 age 1 0 gender 1 0 quotlsmean for young womenquot intercept 1 age 1 0 gender 0 1 quotlsmean for old menquot intercept 1 age 0 1 gender 1 0 quotlsmean for old womenquot intercept 1 age 0 1 gender 0 1 quotage effectquot age 1 1 quotage effectquot age 1 1 quotgender effectquot gender 1 1 means gender age run Osborne STEIZ 154 The SAS System 1 The GLM Procedure Class Level Information Class Levels Values age 2 jr sr gender 2 m w Number of observations 28 NOTE Due to missing values only 15 observations can be used in this analysis Sum of Source DF Squares Mean Square F Value Pr gt F Model 2 831806735 415903368 342 00669 Error 12 1460686598 121723883 Corrected Total 14 2292493333 R Square Coeff Var Root MSE y Mean 0362839 1605812 3488895 2172667 Source DF Type I SS Mean Square F Value Pr gt F age 1 325629762 325629762 027 06144 gender 1 7992437592 7992437592 657 00249 Source DF Type III SS Mean Square F Value Pr gt F age 1 6044348306 6044348306 497 00457 gender 1 7992437592 7992437592 657 00249 Standard Parameter Estimate Error t Value Pr gt t Intercept 2131340206 B 1277243521 1669 lt0001 age jr 590721649 B 2650916484 223 00457 age sr 00000000 B gender m 679278351 B 2650916484 256 00249 gender w 00000000 B Osborne STEZZ Contrast age effect gender m w age jr gender effect Parameter lsmean for lsmean for lsmean for lsmean for lsmean for lsmean for lsmean for lsmean for age effect These notes were adapted from one of Dr Dickey s lecturesz httpwwwstatncsuedu st512infodickeycrsnotesrnotesunbalhtm Least Squares Means y LSMEAN 251525773 183597938 y LSMEAN 188025773 247097938 DF 1 6044348306 1 7992437592 Estimate young folks 188025773 older folks 247097938 men 251525773 women 183597938 young men 221989691 young women 154061856 old men 281061856 old women 213134021 59072165 Contrast SS Standard Error Pr gt ltl 16233482 lt 0001 15842256 lt 0001 Standard Error Pr gt ltl 16233482 lt 0001 15842256 lt 0001 Mean Square F Value 6044348306 497 7992437592 657 Standard Error t Value 162334818 1158 158422561 1560 162334818 1549 158422561 1159 137197962 1618 262714097 586 262714097 1070 127724352 1669 265091648 223 155 Pr gt F 0 0 0457 0249 Pr gt t OAAAAAAAA 0001 0001 Osborne ST512 156 Block Designs Reading Ch 153154156 Motivation sometimes the variability of responses among experi mental units is large making detection of differences among treat ment means M1 M2 nt dif cult In a randomized block design RED 1 matched sets of experimental units are formed each consisting of t units Goal is reduced variability of response within a block That is the units within a block are homogeneous Variance between blocks is ok 2 Blocks are randomly assigned to each of the t treatments Re stricted randomization as opposed to a completely randomized design Osborne ST512 157 RED rst example Acrophopia can be treated in several ways 0 Contact desensitization activity task demonstrated then walked through while a therapist is in constant contact with the subject 0 Demonstration participation therapist talks subject through task without any contact 0 Live Modeling subject simply watches completion of task Severity of acrophobia measured by HAT Height Avoidance Test scores The point of the study is to investigate the effectiveness of the three therapies The study will measure HAT scores before and after therapy There is considerable heterogeneity in the degree to which acrophobia af icts subjects So N 15 subjects will be put into blocks according to their original HAT score then one from each block will be randomly assigned to a therapy Let YM denote the change in HAT score for subject in block j assigned to treatment i Bigger score means bigger reduction in fear Therapy Contact Demonstration Live Block j Desensitization Participation Modeling guy 1 8 2 2 267 2 gm 11 1 0 4 3 9 12 6 9 4 16 11 2 967 5 24 19 11 18 Avg 1 136 9 34 Osborne ST512 158 RED example Source Sum of Squares if Mean Square F A Therapies 2609 2 1305 153 B Blocks 438 4 1095 128 Error 684 8 86 Total 7673 14 Data taken from Larsen and Marx 1986 SST0t SSH SSB SSE a b 551121 22611 11192 11 11 1 b a 551A Z 2631 9gt2 529H 9gt2 11 11 11 1 b b 55131 Z Zlt9j 9gt2 029 39 9192 11 11 11 1 b 35113 Z 2611 111 y 131 11 11 Note that 1 7 71 7 739 7 139 1 7397 y y y y 91 ygt y y 9 99 therapy effect block effect error Osborne STEZZ 159 F tests in the RED A model for RED with xed treatment therapy effects is Yij Mai6jEij Wherei1a j1bandE jlN0a2 Mean squares obtained by dividing SS by df MSA 39f MSB isfllil MSlEl N Sfl Ell1 The primary hypothesis of interest is for a therapy effect H0 a1 a2 a3 0 vs H1 not all equal Using level or reject H0 if MsiAi MS gtFaa 1N a b1 The EMS for error is EMS 02 but only under the additiv ity assumption that there is no block trt interaction This assump tion is required for inference about treatment effects in the absence of replication common to block designs For the HAT scores FA MSAMSE 130586 153 which has 19 lt 001 on 2 8 df providing strong evidence of a therapy effect lnference including MCPs for CONTRASTS involving xed effects is the same in the complete RED as it is for other factorial experiments with xed effects Eg xMSEb Osborne STEZZ 160 Multiple comparisons among means in the RED simultaneous 95 con dence intervals for contrasts like 61 C2M2 39 39 39 CaMa look like 2 C 61g1 02g2 39 39 39 caga l a E 32 where Fquot F005 a 1 N a b 1 For simultaneous pairwise differences these look like 135 3H i a 1FMSE minimum signi cant difference For the HAT scores 131 1367 132 9 33 34 and a 1FMSE15 15 m3 14468625 55 with nyM signi cantly different from the other two LM brings about signi cantly less improvement than the other two therapies The minimum signi cant difference term in the RED is MS qan a b1a For the acrophobia RED the term is 404 gtolt 53 means therapyscheffe tukey will get the job done in SAS Osborne STEZZ 161 Tukey s Studentized Range HSD Test for variable DIFF NOTE This test controls the type I experimentwise error rate but generally has a higher type II error rate than REGWQ Alpha 005 df 8 MSE 855 Critical Value of Studentized Range 4041 Minimum Significant Difference 52843 Means with the same letter are not significantly different Tukey Grouping Mean N TREAT A 13600 5 Contact Desensit 2 9000 5 Demonstration Pa B 3400 5 Live Modelling Scheffe s test for variable DIFF NOTE This test controls the type I experimentwise error rate but generally has a higher type II error rate than REGWF for all pairwise comparisons Alpha 005 df 8 MSE 855 Critical Value of F 445897 Minimum Significant Difference 55226 Scheffe Grouping Mean N TREAT A 13600 5 Contact Desensit A A 9000 5 Demonstration Pa B 3400 5 Live Modelling Osborne ST512 Another example blocks are random 162 this material to be covered after random effects have been introduced A study investigates the ef ciency of four different unit dose injec tion systems For each system an individual subject pharmacist or nurse measures the average time it takes to remove a unit of each system from its outer package assemble it and simulate an injection Data from Larsen and Marx 1986 Average times seconds for implementing systems Subject Standard Vari Ject Unimatic Tubex guy 1 356 173 244 250 256 2 313 164 224 260 240 3 362 181 228 253 256 4 311 178 21 24 235 5 394 188 233 242 264 6 347 17 218 262 249 7 341 145 23 24 239 8 365 179 241 209 249 9 407 164 313 369 313 1 355 171 238 258 256 Model Kjuoz BjE j oi14aandj19b 0 04 denote xed system effects 0 Bj jg N00 and E27 131 N002 denote random subject block and error effects B J Osborne STEZZ 163 data one input subject system time cards proc mixed methodtype3 class system subject model timesystemddfmsatterth random subject lsmeans systemadjtukey cl pdiff run The SAS System The Mixed Procedure Model Information Data Set WORKDNE Dependent Variable time Covariance Structure Variance Components Estimation Method Type 3 Residual Variance Method Factor Fixed Effects SE Method Model Based Degrees of Freedom Method Satterthwaite Class Levels Values system 4 1 2 3 4 subject 9 1 2 3 4 5 6 7 8 9 Total Dbservations 36 Type 3 Analysis of Variance Sum of Source DF Squares Mean Square Expected Mean Square system 3 1559202222 519734074 VarResidual Qsystem subject 8 177405000 22175625 VarResidual 4 Varsubject Residual 24 148472778 6186366 VarResidual Osborne STEZZ Source system subject Residual Effect system system system system 164 Error Error Term DF F Value Pr gt F MSResidual 24 8401 lt0001 MSResidual 24 358 00072 Covariance Parameter Estimates Cov Parm Estimate subject 39973 Residual 61864 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F system 3 24 8401 lt0001 Least Squares Means Standard system Estimate Error DF t Value Pr gt t Alpha 1 355111 10637 219 3338 lt0001 005 2 171333 10637 219 1611 lt0001 005 3 237889 10637 219 2236 lt0001 005 4 258333 10637 219 2429 lt0001 005 Least Squares Means Effect system Lower Upper system 1 333044 377178 system 2 149266 193400 system 3 215822 259956 system 4 236266 280400 Clearly the injection system effects are highly signi cant as is the random block or subject effect which has an estimated variance component of 7 MSB 222 62 4 squared seconds Osborne STEZZ Effect system system system system system system WMMHHH Effect system system system system system system Note the df columns system m m w system WMMHHH Differences of Least Squares Standard system Estimate Error 183778 11725 117222 11725 96778 11725 66556 11725 87000 11725 20444 11725 Differences of system Adj P Alpha 2 lt0001 005 3 lt0001 005 4 lt0001 005 3 lt0001 005 4 lt0001 005 4 03242 005 DF 24 24 24 24 24 24 d Least Squares Lower 159579 93023 72579 90755 111199 44644 Means Value Pr gt ltl 174 Means Upper 207977 141421 120977 42356 62801 03755 165 Adjustment Tukey Kramer Tukey Kramer Tukey Kramer Tukey Kramer Tukey Kramer Tukey Kramer Adj Lower Adj Upper 151433 84878 64433 98900 119345 52789 216122 149567 129122 34211 0 For difference of means pesky mean random effects wash out o For means pesky mean random effects don t wash out necessi tating a Satterthwaite df approximation MOz 1BE 1 MOz 2BE 2 042i 0 12 E 1 E 2 Osborne ST512 166 Latin squares for experiments with two blocking factors 0 Experiment with N 30 plants 3 fertilizer trts 0 blocks to control for variability in exptl units 0 location on bench a second blocking factor Non randomized design number indicates trt 12 3 312 2 31 Fertilizer trts randomized to rowgtltcolun1n or sunlight gtlt Weight conibinations by randomly perniuting the 1 columns 2 rows Eg random number generator perniutes 1 2 3 to get 2 3 1 Plac ing the columns in this order leads to 2 31 12 3 312 Osborne ST512 167 Another randoni permutation of 1 2 3 is 3 2 1 Placing the rows in this order leads to an unreplicated 3 X 3 X 3 design 2 31 312 12 3 gt 12 3 312 2 31 Suppose nine rows are available Three latin squares may be gener ated one below the other The one on top is Closest to sunlight the one on bottom furthest Columns correspond to initial height blocks 312 12 3 2 31 Osborne ST512 168 SAS code to illustrate how such an experiment might go follows To consider an unreplicated 3 X 3 X 3 Latin square ignore squares 2 and 3 The xed effects niodel generated by the code is Yijk M7kP HjE jc Where 0 M 13 71727393 10 1 are trt effects 0 p1 p2 p3 10 1 are sunlight effects 0 H1 Hz Hg 20 2 are initial ht effects 0 Mk jg N0 02 with 02 1 Exercises 1 specify the theoretical mean in each of the nine cells of the un replicated design in square 1 2 specify the marginal means for trt row and column Fixed effect niodel replicated design has extra tern1 Yijkl M 77 9100 51 6k Eiij Where o 61 62 63 30 3 are square effects 0 For each k p10 p200 p3k 30 3 are nested row effects Osborne STEZZ data latinsq array sqware3 303 array slight13 101 in square 1 array slight23 101 in square 2 array slight33 101 in square 3 array iheight3 202 initial height effects array treatment3 101 fertilizer effects input square row col trt growthroundgrowth01 sigma1 try various values of sigma to generate the data if square1 then do growth 10 sqwaresquare slight1row iheightcol treatmenttrt sigmarannor1234 end else if square2 then do growth 10 sqwaresquare slight2row iheightcol treatmenttrt sigmarannor1234 end else do growth 10 sqwaresquare slight3row iheightcol treatmenttrt sigmarannor1234 en heightcol cards H H wMHwMHwMHwMHwMHwMHmMHmMb wMH Mwb HMwwb MMHQJQJMHHQJMb wmwtob Mb w wwwwwwwwwMMMMMMMMMD HHHHHHH mmwMMMHHwawMMMHb wawmmkjb H 169 Osborne STEZZ data one proc glm dataone set latinsq if square1 run class square row col trt model growth row col trt lsmeans row col trt run Source Corrected Total RSquare 0969083 Source row col trt The SAS System The GLM Procedure Class Levels Values square 1 1 row 3 1 2 3 col 3 1 2 3 trt 3 1 2 3 Number of observations 9 Sum of BF Squares Mean Square F Value 6 3155333333 525888889 1045 2 100666667 050333333 8 3256000000 Coeff Var Root MSE growth Mean 5415724 0709460 1310000 DF Type I SS Mean Square F Value 2 994666667 497333333 988 2 2016000000 1008000000 2003 2 144666667 072333333 144 growth row LSMEAN 1 145000000 2 128333333 3 119666667 growth col LSMEAN 1 147000000 2 135000000 3 111000000 growth trt LSMEAN 1 125333333 2 133666667 3 134000000 Pr gt F 00899 Osborne STEZZ proc glm datalatinsq class square row col trt model growth rowsquare col trt model growth square rowsquare col trt lsmeahs square rowsquare col trt run Source Corrected Total Source square rowsquare col trt H o s m M H m M H m M H The SAS System The GLM Procedure Sum of Square Mean Square 12 3010822222 250901852 14 127777778 09126984 26 3138600000 Mean Square 776677778 55962963 483100000 77744444 DF Type I SS 1553355556 2 6 2 966200000 2 155488889 growth square LS EAN 1 m M N o m M N H H m o H m o 0103 H m o H m o H m 0 square m m m M M M H H H m U c c c ltgt ltgt ltgt 77333333 growth trt LSMEAN m M H H o o H o m H m H m H m H m H u H 108888889 F F Value 2749 Value 8510 5293 852 Pr gt F 0001 171 Osborne STEZZ 172 Exercise Let YM denote the observation in row 1 column j Let l7C denote the treatment mean for level k of the treatment factor For an unrepli cated latin square identify these sums of squares ZZfQH 9192 SSl l 11 31 ZZltyn y72 55l l 11 31 02QH 9192 SSl l 3 2 131 911 131 21312 SS 1 31 3 2 aZltQkg2 SS k1 Note that m is determined by the i j combination In the 3 X 3 X 3 scheme used for our plant heights fertilizer k 3 was assigned to the rst row and column so that in the last sum of squares above for i 1j 1 m is the third fertilizer treatment mean 133 1340 Osborne STEZZ A 4 X 4 X 4 example taken from Ott and Longnecker o Blocking factors plot rows plot columns 0 Treatment factor Fertilizer 4 levels 2 factors 1broacl A2broaol 83bar1d A4bar1d B 342 00gtng 13 4 213 4 21 data watermelons input row col trt yield cards 7 1 1 1 1 1 2 3 143 1 3 4 128 1 4 2 166 2 1 2 17 2 2 1 178 2 3 3 140 2 4 4 131 3 1 4 135 3 2 2 173 3 3 1 169 3 4 3 141 4 1 3 145 4 2 4 136 4 3 2 165 4 4 1 173 proc glm class row col trt model yield row col trt estimate quotmicronutrient effect ABquot trt 1 1 1 1divisor2 contrast quotmicronutrient effect ABquot trt 1 1 1 1 estimate quotplacement effectquot trt 1 1 1 1divisor2 contrast quotplacement effectquot trt 1 1 1 1 estimate quotplacementxnutrient interaction quot trt 1 1 11 1 1 11 contrast quotplacementxnutrient interaction quot trt lsmeans row col trt run Osborne STEZZ The SAS System The GLM Procedure Sum of Source DF Squares Mean Square Model 9 049335000 005481667 Error 6 000075000 000012500 Corrected Total 15 049410000 Source DF Typ Mean Square row 3 000085000 000028333 col 3 001235000 000411667 trt 3 048015000 016005000 Contrast DF Contrast SS Mean Square microeffect AB 1 002250000 002250000 placeeffect 1 045562500 045562500 interaction 1 000202500 000202500 Standard Parameter Estimate Error t microeffect AB 007500000 000559017 placeeffect 033750000 000559017 interaction 004500000 001118034 trt y FMMH Plot of meanyieldplacement meanyield 18 ield LSMEAN 132500000 F Value Pr gt F 43853 0001 F Value Pr gt F to to 1 o H 00 H o 3293 0 128040 0001 F Value Pr gt F 18000 0001 364500 0001 1620 00069 Value Pr gt 1t1 1342 0001 6037 0001 402 00069 Symbol is value of micronutrient 1 A 1 B 1 16 1 1 1 14 A 1 B 1 1 12 1 band broadcast 174 Osborne STEZZ proc glm datawatermelons class row col micronutrient placement model yield row col micronutrient placement lsmeans micronutrient placement run The SAS System The GLM Procedure Class Levels Values row 4 1 2 3 4 col 4 1 2 3 4 micronutrient 2 A B placement 2 band broadcast Sum of Source DF Squares Mean Square F Value Model 9 049335000 005481667 43853 Error 6 000075000 000012500 Corrected Total 15 049410000 RSquare Coeff Var Root MSE yield Mean 0998482 0724819 0011180 1542500 Source DF Type I SS Mean Square F Value row 3 000085000 000028333 227 col 3 001235000 000411667 3293 micronutrient 1 002250000 002250000 18000 placement 1 045562500 045562500 364500 micronutriplacement 1 000202500 000202500 1620 micronutrient yield LSMEAN 158000000 B 150500000 placement yield LSMEAN and 137375000 broadcast 171125000 micronutrient placement yield LSMEAN A 142250000 A broadcast 1 73750000 B band 132500000 B broadcast 168500000 Pr gt F 0001 175 ST 512 Exptl Stats for Biol Sciences II Weeks 1112 Mixed Models for factorial designs Reading Ch 141142143 0 One way random effects model to study variances 0 Mixed effects models 0 Subsampling 0 Expected mean squares for mixed models An example using one way random effects model o Genetics study w beef animals Measure birthweight Y lbs 0 t 5 sires each mated to a separate group of n 8 dams o N 40 completely randomized Birthweights Sire Level Sample y 5239 177 1 61 100 56 113 99 103 75 62 836 226 200 2 75 102 95 103 98 115 98 94 975 112 201 3 58 60 60 57 57 59 54 100 631 150 202 4 57 56 67 59 58 121 101 101 775 259 203 5 59 46 120 115 115 93 105 75 910 280 Q Statistical model for these data A One way xed effects model Yij M 7392 Eij Where 73 denotes the difference between the mean birthweight of pop ulation of offspring from sire i and u mean of Whole population Osborne STEZZ 177 The one way random effects niodel Yij M T1 E27 fori12tandj1n V V V xed random random with T1T2n liv0a E11EtniN00 2 0 T1 T2 Tt independent of E11 Em Features 0 T1 T2 denote random effects drawn from some population of interest That is T1 T2 is a l o and a2 called 0 conceptually different from one way xed effects model For beef aninial genetic study with t 5 and n 8 the random effects T1 T2 T5 reflect sire to sire variability No particular interest in 7391 7392 7395 from the xed effects niodel 327 M 739 E27 fori12tandj1n xed xed random with 0 7391 7392 7 unknown model parameters 0E11EtniN00 2 Osborne ST512 178 One way random effects model continued Exercise Using the random effects model specify E067 and VarY j 0 Two components to variability in data 02 0 0 T1 T2 T3 T4 T5 a random sample of sire effects 0 Sire effects is a population in its own right Contrast this situation with the binding fractions Why not model antibiotic effects random Why xed See Ch 17 for more discus sion Model parameters 02 0 M Sums of squares and mean squares same as in one way xed effects ANOVA SSlTl Z ZfQH 9192 SSEl ZZltyzj 13292 SSTOTl 221927 9192 The ANOVA table is almost the same it just has a different expected mean squares column Source SS df MS Expected MS Treatment SS T t 1 M S T 72 no Error SSE N t MSE 02 Total S S Tot N 1 179 Osborne STEZZ Estimating parameters of one way random effects model 1 g 72 MSE A2 i MST MSE UT f For sires data y7 826 and df MS Expected MS Source SS Sire 5591 4 1398 028a Error 16233 35 464 02 Total 21824 39 l 826 32 464 lbs A2 1398 464 0T 8 i 117 lbs2 Speci c questions pertaining to this study Consider the birthvveight of a randomly sampled calf 1 What is the estimated variance of such a calf 2 Estimate how much of this variation is due to the sire effect 3 Estimate how much of this variation is not due to the sire effect General questions 1 ls it possible for an estimated variance component to be negative 2 How 3 What do you do in that case Osborne STEZZ 180 Other parameters of interest in random effects models av OvaH VarY j 0 02 lEYzyl W Note this is not estimated by Coeff Var in PROC GLM output Ilntraclass correlation coef cient I COVltY j7 Yik 0 p xVarY jarY k 0 U2 0 Interpretation the correlation between two responses receiving the same level of the random factor 0 Bigger values of p1 correspond to bigger smaller random treat ment effects For sires A i xll7464 i CV 7 i 029 p 117464 03920 lnterpretations o The estimated standard deviation of a birthweight 241 is 29 of the estimated mean birthweight 826 o The estimated correlation between any two calves with the same sire for a male parent or the estimated intrasire correlation co ef cient is 020 Osborne STEZZ 181 Using PROC GLM for random effects models data one input sire do i1 to 8 input bw output end cards 177 61 100 56 113 99 103 75 62 200 75 102 95 103 98 115 98 94 201 58 60 60 57 57 59 54 100 202 57 56 67 59 58 121 101 101 203 59 46 120 115 115 93 105 75 run proc glm class sire model bwsire random sire run The GLM Procedure Class Levels Values sire 5 177 200 201 202 203 Sum of Source DF Squares Mean Square F Value Pr gt F Model 4 559115000 139778750 301 00309 Error 35 1623275000 46379286 Corrected Total 39 2182390000 R Square Coeff Var Root MSE bw Mean 0256194 2608825 2153585 8255000 Source Type III Expected Mean Square sire VarError 8 Varsire 02 VarError and 0 Varsire Osborne STEZZ 182 Testing a variance component H0 0 0 Recall that 0 Var D the variance among the population of treatment effects Msm i MS F reject H0 at level or if F gt Fa t 1 N t For the sires 1398 F 301 264 F 005 4 35 464 gt lt so H0 is rejected at or 005 The p Value is 00309 Q Isn t this just just like the F test for one way ANOVA with xed effects A Yes Osborne STEZZ 183 lnterval Estimation of some model parameters A 95 con dence interval for M derived by consideration of S E Y t n 7 1 Y N Y2 21 1 1 t n NZZWT EM 11 31 M T E where 71 T1 Ttlt and E Z El N so that Varl7 VarT E 0i 2 i f nt i 2 2 7 TithT a E MST m lt l D If the data are normally distributed then Y M t N 2571 MST 7275 and a 95 con dence interval for M given by Y i t0025 t 1 MSW nt Sires data y7 826MST 1398nt 40 Critical value t0025 4 278 yields the interval 826 i278591 or 661990 Osborne ST512 184 Con dence interval for p1 A 95 con dence interval for p1 can be obtained from the expression Fobs Fa2 F0b3Flio 2 lt p1 lt F0b3n1Fa2 F0b3n1Flio 2 vvhere Fad2 F t 1 N t and Fobs is the observed F ratio for treatment effect from the ANOVA table For the sires F053 301 and F0025 31790975 0119 The formula gives 001075 Note the asymmetry and disagreement with test of H0 0 0 These formulas arrived at via some distributional results M s T t 1 llNX il 027w MSE o N 0 l m o M S T and MS are independent 0 Ratio of independent X2 RVs divided by df has an F distribution MS T MS E lt i i me 02 n0 0 which explains the F test for H0 052 0 o Rearranging the probability statement below MST 1 7P F1 0 1N OEWT FO 1N 4 r t tltMSElt t t 0392 so that p1 gets left in the middle yields the con dence interval yields the ci at the top 0 the page Osborne STEZZ 185 Using PROC MIXED for random effects models proc mixed cl class sire model bw random sire estimate quotmeanquot intercept 1cl run The SAS System 1 The Mixed Procedure Model Information Dependent Variable bw Covariance Structure Variance Components Estimation Method Residual Variance Method Profile Fixed Effects SE Method Model Based Degrees of Freedom Method Containment Class Levels Values sire 5 177 200 201 202 203 Covariance Parameter Estimates Cov Parm Estimate Alpha Lower Upper sire 11675 005 299707 705137 Residual 46379 005 30511 78917 Estimates Standard Label Estimate Error DF t Value Pr gt t Alpha mean 825500 59114 4 1396 00002 005 Estimates Label Lower Upper mean 661373 989627 Osborne STEZZ 186 More interval estimation for variance components The estimated residual variance component for the sire data was 72 MSE 464 1552 A 95 con dence interval for this variance component is given by 40 5464 2 40 5464 532 a 206 01quot 35 35 464 lt 2 lt 464 532 a 206 or 3052 7395le2 This can be derived using the distributional result MSE Iv t Ul Mm setting up the probability statement MS i 2 04 2a 1 aPrltX1 N tltN t 02 ltX N t Rearranging to get 02 in the middle yields the 1001 a con dence interval for 02 N tMSE N tMSEgt X342 7 Xiia2 Q What are the mean and variance of the 9amp5 distribution Osborne ST512 187 lnterval estimation for 0 The estimated variance component for the random sire effect was A2 i UT 7 117 Q How can we get a 95 con dence interval for 0 A In a similar fashion but the con dence level based on Satterth vvaite s approximation to the degrees of freedom of the linear combi W cm gt nation of M S terms 2 v 2 XOC27g where A nor For the sire data A 8 x 117 2 df 13982 4242 i Using the CL option in the MIXED statement will request this con dence interval and will use this approximation to df and will not round to the nearest integer df 963975176 00297 X 025176 63987 yielding the 95 con dence interval 176117 176117 687 029 Of 307051 Osborne STEZZ 188 Review of one way random effects ANOVA The model Y M T1 EH fori12tandj1n V V V xed random random with T1 T2 T 131 N0 0 independent of E11 Em 231 N0 02 Remarks 0 T1 T2 randomly drawn from pop n of treatment effects 0 Only three parameters M a 0 0 Several functions of these parameters of interest CW 2 PI 001436773010 g 0 Two observations from same treatment group not independent Exercise match up the formulas for con dence intervals below with their targets p 02 01 i7 i t0025t 1 MSW nt FobsiFliaQ Fobs Fa2 Fobsn71F17a2 l Fobsn71Fa2 NitMSEl NitMSEl 7 f Xa2 X17042 X2 A 7 X2 A a2df 17011511 Osborne STEZZ 189 Modelling factorial effects xed or random A guide Random Fixed Levels selected from conceptually oo popn of X collection of levels nite number of possible levels X Another expt would use same levels X would involve new levels sanipled X from same popn Goal estiniate varconips X estiniate longrun nieans X Inference for these levels used in this expt X for the popn of levels X ST 512 Exptl Stats for Biol Sciences II Weeks 1112 Mixed Models for factorial designs read Ch 17 Two factor designs with factors that are xed random and nested crossed 1 Entomologist records energy expended y by N 27 honeybees 0 at three TEMPERATURES 20 30 4000 o consuming three levels of SUCROSE 20 40 60 Temp Sue Sample 20 20 31 37 47 20 40 55 67 73 20 60 79 92 93 30 20 6 69 75 30 40 115 129 134 30 60 175 158 147 40 20 77 83 95 40 40 157 143 159 40 60 191 180 199 2 Experiment to study effect of drug and method of administration on fasting blood sugar in a random sample of N 18 diabetic patients see Rao exercise 1335 datasetblsugardat 0 First factor is drug brand 1 tablet brand ll tablet insulin injection 0 Second factor is type of administration see table Drug Type of Administration Mean y m Variance 3 2 1 Brand 1 tablet j130mg gtlt1 157 63 339 15mg X 2 197 93 2 2 Brand ll tablet j 20mg X 1 20 1 j 210mg X 2 173 63 2 3 Insulin injection j 1 before breakfast 28 4 j 2 before supper 33 9 Osborne ST512 191 3 An experiment is conducted to determine variability among lab oratories interlaboratory differences in their assessment of bac terial concentration in milk after pasteurization Milk w various degrees of contamination was tested by randomly drawing four samples of milk from a collection of cartons at various stages of spoilage Y is colony forming unitsMl Labs think they re receiVing 8 independent samples Sample Lab 1 2 3 4 1 2200 3000 210 270 2200 2900 200 260 2 2600 3600 290 360 2500 3500 240 380 3 1900 2500 160 230 2100 2200 200 230 4 2600 2800 330 350 4300 1800 340 290 5 4000 4800 370 500 3900 4800 340 480 Data from Oehlert7 2000 4 An expt measures Campylobacter counts in N 120 chickens in a processing plant at four locations over three days Means std for n 10 chickens sampled at each location tabulated below Location Before After After After Day Washer Washer mic rinse chill tank 1 7007000 4831000 1202000 1179000 7903449 3416680 380724 783205 2 7589000 5202000 809000 869000 7455132 1768627 484801 552619 3 9526000 3317000 620000 837000 0317600 2225908 502881 572015 Data courtesy of Michael Bashor7 General Mills Transformation Osborne ST512 192 5 An experiment to assess the variability of a particular acid among plants and among leaves of plants Plantz39 1 2 3 4 Leafj 1 2 3 1 2 3 1 2 3 1 2 3 k 1 112 165 183 141 190 119 153 195 165 73 89 113 k 2 116 168 187 138 185 124 159 201 172 78 94 109 k 3 120 161 190 142 182 120 160 193 169 70 93 105 Data from Neter7 et al 199 6 6 Study of effect of salinity on barley growth in a controlled medium Salinity Container Weights g c c 6b 6b 12b 12b 1129 737 564 42 483 328 Ml wl MH 1108 655 598 334 477 261 111 85 569 421 566 269 0 Two containers for each level of salinity treatment factor for a total of 6 containers Osborne STEIZ 193 Six types of two factor models Fixed and or random effects that are either crossed or nested Yijk Yijk Yijk Yijk Yijk Yijk mmewwe 4 04 6339 046 E jk crossed xed 4 04 6 E k nested xed 4 A Bj AB E jk crossedrandom 4 A BM E k nestedrandom 4 04 Bj 04B E k crossedmixed 4 04 BM E k nestedmixed In the models above 0 GREEK symbols parameterize FIXED unknown treatment means 0 CAPITAL letters represent RANDOM effects 0 for Model 1 20 26 Z046 Zj046 j E 0 o for Model 2 204 2 6 E 0 o for Model 3 A B ABM are all independent 0 for Model 4 2 04 0 and B 04B j are all independent 0 for Model 5 A BM are all independent 0 for Model 6 204 0 Recall o IRANDOM effects are used when it makes sense to think of I lLEVELS of factor as random sample from a populationl Osborne STEZZ 194 Identifying the appropriate model for our 6 examples 1 Energy expended by honeybees 0 First factor 0 Second factor 0 Fixed or random 0 Crossed or nested 0 Model Yijk M E239jk 2 Change in fasting blood sugar for diabetics 0 First factor 0 Second factor 0 Fixed or random 0 Crossed or nested 0 Model Yijk M E239jk 3 Measuring bacterial concentration in milk 0 First factor 0 Second factor 0 Fixed or random 0 Crossed or nested 0 Model Yijk M E239jk Osborne STEZZ 195 4 Measuring bacteria counts in chickens at processing plant 0 First factor 0 Second factor 0 Fixed or random 0 Crossed or nested 0 Model Yijk M E239jk 5 Acids in leaves of plants 0 First factor 0 Second factor 0 Fixed or random 0 Crossed or nested 0 Model Yijk M E239jk 6 Barley growth 0 First factor 0 Second factor 0 Fixed or random 0 Crossed or nested 0 Model Yijk M E239jk Osborne ST512 196 Tables of expected means squares EMS see Rao table 141 When factors A and B are CROSSED and no sum to zero assump tions are made on random effects expected means associated with sums of squares are given in the table below Source df A7 B xed A7 B random A xed B random A a 71 02 72be 02 nbai 7201243 02 nbzbi 72023 B b7 1 02 nazbjzg 02 mm 7201243 02 mm 72023 AB 1 71 02 nwiB 02 7201243 02 72023 gtltb 7 1 Error abn 7 1 lt72 lt72 lt72 When factor B is NESTED in factor A expected means associated with sums of squares are given in the table below Source df A7 B xed A7 B random A xed B random A a 7 1 02 nbzb 02 nba4 720123 02 nbzbi 720123 BA ab71 02 717 7211523 02 720123 02 nag Error abn 7 1 lt72 lt72 lt72 where 12 and 02 values are de ned on the next page Help with computing expected mean squares without sum to zero assumptions on random effects 1 If a factor X with index i is random then EMS X is a linear combo of 02 and varcomps for all random effects containing index i Coef cients for varcomps are limits of indexes NOT listed summed over in random effects 2 If a factor X is xed Treat it like it is random and then just replace the varcomp for X with the effect size Osborne STEZZ 197 effect size of factor A effect size of factor B a E 2 E1046 effect size of interaction 11 31 1 m 1 a b m E 2 6321 effect Size of factor B 11 31 variance component for factor A VarA VarB variance component for factor B VarAB j variance component for interaction variance component for factor B error variance VarE jk The term e ect size is often used in power considerations and some times involves division by 02 Osborne ST512 198 Using expected mean squares to analyze data in mixed models 0 EMS tables dictate which F ratios test which effects 0 EMS tables yield estimating equations for variance components Milk example p2 F tests and estimating variance components 1 To test for interaction effect use F A B 2 To test for main effect of A use F A MMs AA 3 To test for main effect of B use F B AilSig Note the departure from xed effects analysis where MS is al ways used in the denominator The SAS System 1 The GLM Procedure Dependent Variable ly logy Sum of Source DF Squares Mean Square F Value Pr gt F Model 19 5603510844 294921623 19144 lt0001 sample 3 5318978788 1772992929 115089 lt0001 lab 4 230248803 057562201 3737 lt0001 samplelab 12 054283253 004523604 294 00161 Error 20 030810726 001540536 Corrected Total 39 5634321569 The wrong F ratio and p Value for testing for random LAB A effect i MSA i 05756 T MSE T 00154 The correct F ratio and p Value for testing for random LAB A effect 3737p lt 00001 MSA 05756 F 1272 00003 MSAB 00452 p l Osborne STEZZ 199 Estimating variance components The estimated variance components satisfy the following system of equations MSE 02 MSAB 02n033 0220ZB MSA 02 70031 mi 2 8031 20313 MSB 02 mag n03 02 1003 2033 l Q gt Substitution of MSE 00154 MSAB 00452 MSA 05756 MSB 177299 into the system of equations yields estimated variance components A72 MSW 00154 331B MSAB7MSE 00452700154 001492 3 i MSA7MSAB 05756700452 00663 i ml 8 i MSB7MSAB 7 177299700452 i i W i 10 i 1768 3 Osborne STEZZ data one infile quotmilkdatquot firstobs4 input sample lab y lylogy run proc glm class lab sample model lysamplelab random sample lab samplelab test hlab sample esamplelab lsmeans samplelab run The GLM Procedure Dependent Variable ly Sum of Source DF Squares Mean Square F Value Pr gt F Model 19 5603510844 294921623 19144 lt0001 Error 20 030810726 001540536 Corrected Total 39 5634321569 R Square Coeff Var Root MSE ly Mean 0994532 1821098 0124118 6815577 Source DF Type I SS Mean Square F Value Pr gt F sample 3 5318978788 1772992929 115089 lt0001 lab 4 230248803 057562201 3737 lt0001 labsample 12 054283253 004523604 294 00161 Source Type III Expected Mean Square sample VarError 2 Varlabsample 10 Varsample lab VarError 2 Varlabsample 8 Varlab labsample VarError 2 Varlabsample Tests of Hypotheses Using the Type III MS for labsample as an Error Term Source DF Type III SS Mean Square F Value Pr gt F lab 4 230248803 057562201 1272 00003 sample 3 5318978788 1772992929 39194 lt0001 Osborne STEZZ 201 proc varcomp class sample lab model ysamplelab run Variance Components Estimation Procedure Variance Component 1y Varsample 176847 Varlab 006630 Varsamplelab 001492 VarError 001541 Q At the end of the day what is the conclusion from the analysis of this crossed random effects experiment 0 There is evidence of variabilility due to laboratoryx sample in teraction interlaboratory effects vary by sample 0 The estimated parameters 11 variance components of the model Yijk M 141 By ABM Em are 02 00154 0313 000149 031 00663 0 17685 1 682log scale 0 The standard error of Y111 can be derived by Y M 121 3 AB E VarT VarT VarB VarAB VarE 2 2 2 7B 7AB 7 2 1 TaTb ab abn Osborne STEZZ 202 Estimation of standard error and approximation of df The standard error 7 02 2 2 SEltY can be estimated by substitution of estimated variance components 72 which leads to a a e b ab abn lots of algebra and cancellations SAEY nab For the milk data we have i MSW MSB MSlABD A 7 1 SEY E058 1773 005 06757 For a 95 con dence interval we have a problem we don t know how many df are associated with a t statistic based on this estimated SE Osborne ST512 203 ST511 Flashback Unequal variances independent samples t test Example Suspended particulate matter Y in micrograms per cu bic meter in homes with smokers Y1 and without smokers smokers 133 128 136 135 131 131 130 131 131 132 147 no smokers 106 85 84 95 104 79 72 115 95 Summary statistics g1 1332s 260 132 928 5 1954 m 11 n2 9 Assumptions 0 Yu Ylnl and Y21 Yzm are independent random samples from normal distributions with unknown M1 M2 01 02 and a y 0 Note the large difference in the sample variances H03M1M20V H13M1M27 o Consider the test statistic Yr 572 M1 M2 xSlzm Sgnz For small n1 n2 this quantity does not have the standard normal T distribution nor does the version where S is used in the denomi nator An approximate solution is to use the student t distribution with df approximated by the Satterthwaite approximation 61M31 CQMSQ2 61M312df162MSQ2df2 where M51 522 and 011m d7 Osborne ST512 204 ST511 Flashback continued For the air pollution in homes with a smoking occupant data 01M51 2611 236 02M52 19549 2171 and A i 236 21712 df 2362 21712 10 8 974 The 975 percentile of the t distribution with df 974 is t0025 974 2236 A 95 con dence interval for the mean difference between homes with and without a smoking occupant M1 M2 is given by 1332 928 i 2236 2611 19549 OT 404 i 2230491 OT 404 i 1097 Of 294 514 These data would lead to the rejection of H0 M1 M2 0 versus the two tailed alternative The observed test statistic is given by 1332 928 i 404 i 201119549 491 This problem aka the Behrens Fisher probleni obs 82 p lt 00001 Osborne STEZZ data one infile quotsmokersdatquot firstobs2 input y smoke label yquotsuspended particulate matterquot run proc ttest class smoke var y run The SAS System The TTEST Procedure Statistics Lower CL Variable smoke N Mean y 0 9 82032 y 1 11 12976 y Diff 1 2 4991 Lower CL Variable smoke Std Dev Std Dev y 0 9443 1398 y 1 35603 50955 y Diff 1 2 76046 10064 T Tests Variable Method Variances DF y Pooled Equal 18 y Satterthwaite Unequal 974 Equality of Variances Variable Method Num DF Den DF y Folded F 8 10 Mean 92778 13318 404 Upper CL Std Dev 26783 89422 14883 t Value 893 823 F Value 753 Upper CL Mean 10352 1366 309 Std Err 466 15363 45235 Pr gt t lt0001 lt0001 Pr gt F 00045 Osborne STEZZ 206 The two way random effects model for milk data Satterthwaite s approximation cont d To approximate the df associated with a t statistic based on a stan dard error of the form ClMSlCQMSQ 39CkMSk a linear combination of mean square terms use the Satterthwaite approximation 62 61M3162MSQ CkMSk2 61M312df162MSQ2df2 Recall that for the milk data we have SAEltYgt mm MSlB MSlABD i 10581773 005 i 40 06757 The degrees of freedom associated with this linear combination is approximated by 067574 8 f 177323 lmy4 5 00045212 Using t0025 318 308 a 95 con dence interval for the mean M among the population of all labs and samples is given by 682 i 30806757 318 or 682 i 208 log scale Osborne STEZZ 207 data one infile quotmilkdatquot firstobs4 input sample lab y lylogy run proc mixed cl class sample lab model lys ddfmsatterth cl random sample lab samplelab run The SAS System 1 The Mixed Procedure Model Information Dependent Variable ly Covariance Structure Variance Components Estimation Method Residual Variance Method Profile Fixed Effects SE Method Model Based Degrees of Freedom Method Satterthwaite Covariance Parameter Estimates Cov Parm Estimate Alpha Lower Upper sample 17685 005 05664 248486 lab 006630 005 002233 07260 samplelab 001492 005 0005761 009261 Residual 001541 005 0009017 003213 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr gt t Alpha Intercept 68156 06757 318 1009 00016 005 Effect Lower Upper Intercept 47325 88987 Osborne STEZZ 208 untransformed data 0 S m 8 A z 039 a E m 5 U I 1 8 e I O I I I I I I IU IS 20 25 30 35 AU sampIe Iogtransformeddata ogbacterla 7 o 8 o I I ap xnp0 sampIe milkdata lt readtablequotmilkdatquotskip3colnamescquotsamplequotquotlabquotquotbacteriaquot attachmilkdata postscriptfilequotmilkplotlpsquot parmfrowc21 A 2x1 template for two plots in a single column plotxsampleybacteriapchlab titlequotuntransformed dataquot legend254400pch15legendcquotLab 1quotquotLab 2quotquotLab 3quotquotLab 4quotquotLab 5quot plotxsampleylogbacteriapchlab titlequotlog transformed dataquot postscript Osborne STEZZ Interaction plot for milk raw counts 8 samnle 8 2 1 g8 4 oo 3 EN 1 2 3 4 5 lab 8 mean of ogy 6 7 209 Osborne ST512 210 A nested design Experiment to study effect of drug and method of administration on fasting blood sugar in diabetic patients 0 First factor is drug brand 1 tablet brand ll tablet insulin injec tion 0 Second factor is type of administration see table Drug Type of Mean Variance Mean Administration gm 531 yum Brand 1 tablet 30mg X 1 157 63 177 15mg X 2 197 93 Brand ll tablet 20mg X 1 20 1 187 10mg X 2 173 63 Insulin injection before breakfast 28 4 305 before supper 33 9 This is exercise 1335 Grand mean is 111 223 De nition Factor B is in factor A if there is a new set of levels of factor B for every different level of factor A Osborne ST512 Analysis of variance in nested designs Consider a two factor design in which factor B is nested in factor A Let YZvjk denote the kth response at level j of factor B Within level i of factor A A model Yrjk062396jrErjc fori12a j12b k12n S S Tot can be broken down into coniponents reflecting variability due to A B A and variability not due to either factor SS SST0t SSA SSBA SSE Swot Z 2 gym rm 531A yi y2 SSBA1 germ yam 551E gum gm The ANOVA table looks like Sum of Mean Source df squares Square F A a 1 SSA MSA 131 FA 7 53 BltAgt Mr 1 Sam4 MSBA1 35151 FBltAgt M551 Error N 2 b SSE MSE 3515 Total N 1 S S TOT lfb1b2babthenz bi 1 ab 1 and de abn 1 Osborne ST512 212 Inference from nested designs To test H0 07 E 0 use F A on a 1 and de degrees of freedom To test H0 6 E 0 for all ij use FEM on 2va 1 and de degrees of freedom For the diabetics blood sugar data with 111 223 and means Drug Type of Mean Variance Mean Administration gm 531 yum Brand 1 tablet 30mg X 1 157 63 177 15mg X 2 197 93 Brand ll tablet 20mg X 1 20 1 187 10mg X 2 173 63 Insulin injection before breakfast 28 4 305 before supper 33 9 SSA 23177 2232 137 2232 305 2232 6114 SSBA 3157 1772 197 1772 200 1372 173 1372 23 3052 33 3052 722 SSE 72 Q1 How many df associated with 5M Q2 How many df associated with SSBA Q3 How many df associated with SS Osborne STEZZ 213 data one infile quotblsugardatquot firstobs2 dlm 09 x input a b rep y druga adminb run proc glm class a b model ya ba output outtwo pp rr means a balsd estimate quoteffect of B within A1quot ba 1 1 estimate quoteffect of B within A2quot ba 0 0 1 1 estimate quoteffect of B within A3quot ba 0 0 0 0 1 1 estimate quotA1 mean A2 meanquot a 1 1 estimate quotA1 mean A3 meanquot a 1 0 1 estimate quotA2 mean A3 meanquot a 0 1 1 run The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr gt F Model 5 6836111111 1367222222 2279 lt0001 Error 12 720000000 60000000 Corrected Total 17 7556111111 R Square Coeff Var Root MSE y Mean 0904713 1099522 2449490 2227778 Source DF Type I SS Mean Square F Value Pr gt F a 2 6114444444 3057222222 5095 lt0001 ba 3 721666667 240555556 401 00344 Standard Parameter Estimate Error t Value Pr gt t effect of B within A1 40000000 200000000 200 00687 effect of B within A2 26666667 200000000 133 02072 effect of B within A3 50000000 200000000 250 00279 A1 mean A2 mean 10000000 141421356 071 04930 A1 mean A3 mean 128333333 141421356 907 lt0001 A2 mean A3 mean 118333333 141421356 837 lt0001 Osborne ST512 214 Conclusions 0 The administration effect B nested in the type of drug effect A is statistically signi cant p 00344 This is due mostly to the before breakfast supper difference which is estimated to be 1332 1331 5m9dl with an estimated standard error of S E 2 7 o The effect of type of drug factor A is highly signi cant p lt 00001 Unadjusted pairwise comparisons indicate that the in sulin injections yield greater changes on average in blood sugar than either pill and the mean changes brought by the pills don t differ signi cantly 0 The following contrasts may be of interest 1 91 M13 10111 f M21 19 f M22 1 92 M29 10111 f Mm f 19 f M22 Exercise Estimate them and test their signi cance H0 62v 0 Osborne ST512 215 More Two factor mixed models Ch 14 o Expt measures campylobacter counts in N 120 chickens in a processing plant Crossed design with two factors gtolt Location 4 levels gtolt Day 3 levels 4 X 3 layout n 10 chickens per combo Location Before After After After Day Washer Washer mic rinse chill tank 1 7007000 4831000 1202000 1179000 7903449 3416680 380724 783205 2 7589000 5202000 809000 869000 7455132 1768627 484801 552619 3 9526000 3317000 620000 837000 0317600 2225908 502881 572015 Data courtesy of Michael Bashor7 General Mills 0 An experiment to assess the variability of a particular acid among plants and among leaves of plants Plantz39 1 2 3 4 Leafj 1 2 3 1 2 3 1 2 3 1 2 3 k 1 112 165 183 141 190 119 153 195 165 73 89 113 k 2 116 168 187 138 185 124 159 201 172 78 94 109 k 3 120 161 190 142 182 120 160 193 169 70 93 105 0 Study of effect of salinity on barley growth in a controlled medium Salinity Container l Weights g C 1 1129 1108 111 C 2 737 655 85 6b 1 564 598 569 6b 2 42 334 421 12b 1 483 477 566 12b 2 328 261 269 Total of 6 containers Osborne STEZZ 216 Analysis of Campylobacter counts on chickens data Residual plots resid Vs for bacteria counts after tting two factor xed effects niodeis similar plots for mixed models residuals versus predicted log transform 2 o 1 u C O a 339 3 J I H 3 3 3 quotC5 2 391 g A I u u D u 5 e 39 39 n u 1 u 39 w n 72 predictedlog location residuals versus predicted log tansfurm 2 a o e o n u 1 C n a u o o a 39 u a 39 I n 39 A a u quot rcs 39 5 A 4 E quotC5 2 n 5 g I u n n D u 5 A 39 n u 1 n a a 2 predictedlog location 1 nan2 AAA noon Osborne STEZZ 217 data one Bashor data infile quotbashordatquot firstobs3 input day location y lylogy run proc glm class day location model y lylocationday output outtwo rresidual residuallog ppredicted predictedlog run symboll valuedot colorblack symb012 valuesquare colorblack symb013 valuetriangle colorblack symbol4 valuediamond colorblack axis1 offset11 labelheight3 axis2 offset11 labelheight3 angle90 legend1 labelheight2 proc gplot datatwo title quotresiduals versus predictedquot plot residuallogpredictedloglocationhaxisaxis1 vaxisaxis2 legendlegend1 plot residuallogpredictedloglocationhaxisaxis1 vaxisaxis2 legendlegend1 run proc mixed methodtype3 cl class day location model lylocationddfmsatterth outppredz random day daylocation lsmeans locationadjtukey run proc glm the old way of doing things before PRDC MIXED class day location model lydaylocation random day daylocation test hlocation edaymethod lsmeans locationpdiff wrong run proc mixed methodtype3 to get ANDVA table with EMS terms proc mixed cl to get asymmetric confidence intervals class day location model lylocationddfmsatterth random day daylocation lsmeans locationadjtukey run Osborne STEZZ 218 Source location day daylocation Residual Source location day daylocation The SAS System The Mixed Procedure Model Information Data Set WORKDNE Dependent Variable ly Covariance Structure Variance Components Estimation Method Type 3 Residual Variance Method Factor Fixed Effects SE Method Model Based Degrees of Freedom Method Satterthwaite Class Level Information Class Levels Values day 3 1 2 3 location 4 1 2 3 4 Type 3 Analysis of Variance Sum of DF Squares Mean Square Expected Mean Square 3 97865388 32621796 VarResidual 10 Vardaylocation Qlocation 2 2787355 1393677 VarResidual 10 Vardaylocation 40 Varday 6 4533565 0755594 VarResidual 10 Vardaylocation 108 59254946 0548657 VarResidual Type 3 Analysis of Variance Error Error Term DF F Value Pr gt F MSdaylocation 6 4317 00002 MSdaylocation 6 184 02375 MSResidual 108 138 02303 Osborne STEZZ 219 generated by 2nd run of PRDC MIXED Cov Parm Estimate Alpha Lower Upper day 001595 005 0002071 1156981 daylocation 002069 005 0002844 145734 Residual 05487 005 04274 07303 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F location 3 6 4317 00002 Least Squares Means Standard Effect location Estimate Error DF t Value Pr gt t location 1 108870 01747 733 6233 lt0001 location 2 104953 01747 733 6009 lt0001 location 3 88745 01747 733 5081 lt0001 location 4 89394 01747 733 5118 lt0001 Differences of Least Squares Means Standard Effect location location Estimate Error DF t Value Pr gt t location 1 2 03917 02244 6 175 01316 location 1 3 20125 02244 6 897 0 0001 location 1 4 19476 02244 6 868 00001 location 2 3 16208 02244 6 722 00004 location 2 4 15559 02244 6 693 0 0004 location 3 4 006488 02244 6 029 0 7823 Differences of Least Squares Means Effect location location Adjustment Adj P location 1 2 Tukey Kramer 03801 location 1 3 Tukey Kramer 00004 location 1 4 Tukey Kramer 00005 location 2 3 Tukey Kramer 00015 location 2 4 Tukey Kramer 00018 location 3 4 Tukey Kramer 09907 Osborne STEZZ 220 Theory for mixed crossed model used to analyze Campylobacter data Discussion of MIXED output Predicted 2 3 location day H ol H O2 o o og Model Yzjk M 061 Bj OZB 39 Eijk w variance components 0 03 02 Osborne STEZZ 221 Campylobacter analysis continued Fixed Factor A location Random Factor B day To test H0 03 0 use M S AB 076 F 138 AB MSE 055 on a 1b 1 6 and abn 1 108 df The p value is 02303 providing no evidence of a random day gtlt location interaction effect The variance component for this randoni effect is estimated by A2 MSAB MSE 076 055 00 n 10 lntepretation there is no evidence that day to day variability varies 0021 by location The estimated variance component is itself very small lt92 e MSlE A2 MSAB MSE UQB 076 095 0021 10 A2 MSB MSAB OB T 139 076 0016 40 Osborne STEZ2 222 Implied correlation structure What is the correlation of two observations taken on the same day 0 at the same location 0 at different locations Recall that Yzjk M 061 Bj OZB 39 Eijk COVY171 17Y171 2 02 0 03 CovB Bi CovozB j ozBL39j 02 0 03 0 03 COVYlj 17Y21k2 02 0 03 COVltB 7 02 0 03 0 007quot 5239jk17Y0c2 Estimates of these correlations are M i 0037 7 00160021055 587 i 0063 0016 f 0016 f i 587 i 0027 Which is which What about the correlation of two observations on different days Osborne STEZZ 223 Some analysis of xed effects Consider testing for a xed effect of location That is test the hy pothesis that average bacteria counts are constant across the loca tions H02061O2O3OZ40 M S A 326 FA 7 MSAB 7 7432 on a 1 3 and a 1b 1 6 df which is signi cant p 00002 Osborne STEZZ 224 Campylobacter analysis continued To estimate the a pairwise comparison among location means such as a4 043 consider g4 g3 8940 8875 0065 Note that 1 1 V Y l7 2 Ml 4 3 7g 7 nb nb Why not What is SE and how can it be estimated 9 Y2 Y1 22 B Oz B2 E2 041 B oz BH Err a2 041 oz B2 oz BH E2 E1 which has variance Var6 Vader BM Varoz B1 VarE2 VarE1 a a 2 QB 2 2 b nb 072 naiB which can be estimated nicely on a 1b 1 6df by vm M3AB for the chickens where g4 g3 006 the S E is A A 2 Var9 M3 10076 022 Since t0025 6 245 a 95 ci for 9 given by 006245022 Osborne STEZZ 225 Campylobacter analysis continued Reporting standard errors for sample means of levels of xed factor like LOCATION means is a little niessier Y 061 B O Bz39 E VarY VarB VarozB VarE 2 7 8 7043 7 i b nb n0 n02 02 estimated by A 7 1 VarY mm mg 32 algebra yields a linear combo of multiple EMS ternis ng 1EMSAB EMSlBl The standard error is estimated easily enough 573 d m 1MSAB MSlBl 1 4 1076 139 120 V003 0175 but the df must be approximated using the Satterthwaite approach A 01754 df 733 L magnum 1392 1202 6 2 with deB 6 de 2 Since t0025 733 234 a 95 ci for the population mean of location 1 for example is 109 I 2340175 Osborne STEZZ 226 SAS code to t two factor random effects model for plant acid data Nested or crossed options ls75 nodate data one infile quotplantaciddatquot input y plant leaf rep run proc mixed C1 methodtype3 proc mixed cl Class plant leaf model ys cl random plant leafplant run goptions colorsblack devpslepsf goptions colorsblack axisl valueh2 offset10 symboll valuedot h15 symb012 valuediamond h1 5 symb013 valueplus h15 proc gplot title quotplant acidsquot plot yplantleafhaxisaxis1 run Osborne STEZZ The Mixed Procedure Class Level Information Class Levels Values plant 4 1 2 3 4 leaf 3 1 2 3 Type 3 Analysis of Variance Sum of Source DF Squares Mean Square plant 3 343178889 114392963 leafplant 8 187453333 23431667 Residual 24 3033333 0126389 Source Expected Mean Square Error Term plant VarResidual 3 Varleafplant MSleafplant 9 Varplant leafplant VarResidual 3 Varleafplant MSResidual Residual VarResidual Source F Value Pr gt F plant 488 00324 leafplant 18539 lt0001 Covariance Parameter Estimates Cov Parm Estimate Alpha Lower Upper plant 101068 005 103930 306066 leafplant 77684 005 01142 154227 Residual 01264 005 007706 02446 Covariance Parameter Estimates Cov Parm Estimate Alpha Lower Upper plant 101068 005 26599 49970 leafplant 77684 005 35322 287787 Residual 01264 005 007706 02446 Error DF 24 Osborne STEZZ Effect Intercept Solution for Fixed Effects Standard Estimate Error DF t Value Pr gt t 142611 17826 3 800 00041 Solution for Fixed Effects Effect Lower Upper Intercept 85882 199341 phntadds 217 20 O 8 19 O O i O i n7 0 i O 16 0 3 0 1A O 12 c i O n7 i i 8 9 0 E7 0 O 7 1 1 1 9 l 2 3 4 leaf plant c 1 0002 3 228 Alpha 005 Osborne ST512 229 Discussion of MIXED output and analysis of plant acid data Random nested model Yijk M 141 BM Eijk w variance components 02 031 0 To test for random effect of nested factor B leaf H0 0 0 i MSBA i 234 i on b 1a 8 and n 1ab 24 df p Value lt 00001 To test for random effect of factor A plant H0 031 0 M SA 1144 488 M SBA 234 on a 1 3 and b 1a Sdf with p 00324 Reminder Watch that denominator M S l lt92 M51131 A2 i MSBA MSE OB i f 234 013 A2 i MSlAl MSleAll 0A i nb 1144 234 101 So there is some eVidence of both a random plant effect and a ran dom leaf effect nested in plant The magnitudes of these effects are quanti ed by the estimated variance components The statistical signi cance addressed by the p Values Osborne ST512 230 Implied correlation structure for plant acids What is the correlation of two observations taken from the same plant 0 and the same leaf 0 and different leaves Recall that YM u 141 BM Eijkl COVYijk17Yijk2 02 031 0 Cow11141 CovBj BN0 02 031 0 007quot 5239jk17 Yijkg 031 0123 cowl1311617 161262 CovA A1 02 031 0 CorrYijlk17 Y1 k2 031 i 02Uiagw Estimates of these correlations are 10178 i 179 i 10178013 i 180 i 101 101 0 180 This means that two measurements taken on the same leaf are almost perfectly correlated Almost all the variation in any measurement can be explained by the leaf and plant effects Osborne STEZZ 231 SAS code to analyze data from two factor Barley growth experiment Nested or crossed data one Barley growth and salinity input salinity 3 container do rep1 to 3 input y output end cards C 1 1129 1108 111 C 2 737 655 8 5 6b 1 564 598 5 69 6b 2 42 334 421 12b 1 483 477 566 12b 2 328 261 269 run proc mixed C1 methodtype3 proc mixed Cl Class salinity container model ysalinitys C1 ddfmsatterth random containersalinity lsmeans salinitytdiff pdiff run proc gplot plot ysalinitycontainer run Barley growl by salinity m oo Osborne STEZZ 232 The SAS System 1 The Mixed Procedure Class Levels Values salinity 3 12b 6b c container 2 1 2 Total Observations 18 Type 3 Analysis of Variance Sum of Source DF Squares Mean Square salinity 2 98572211 49286106 containersalinity 3 32939750 10979917 Residual 12 3273067 0272756 Source Expected Mean Square salinity VarResidual 3 Varcontainersalinity Qsalinity containersalinity VarResidual 3 Varcontainersalinity Residual VarResidual Error Source Error Term DF F Value Pr gt F salinity MScontainersalinity 3 449 01254 containersalinity MSResidual 12 4026 lt0001 Residual Covariance Parameter Estimates Cov Parm Estimate Alpha Lower Upper containersalinity 35691 005 11223 552528 containersalinity 35691 005 22885 94266 Residual 02728 005 01403 07432 Osborne STEZZ Standard Effect salinity Estimate Error DF t Value Intercept 93150 13528 3 689 salinity 12b 53417 19131 3 279 salinity 6b 44717 19131 3 234 salinity c 0 Solution for Fixed Effects Effect salinity Lower Upper Intercept 50099 136201 salinity 12b 114300 07467 salinity 6b 105600 16167 salinity c Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F salinity 2 3 449 01254 Least Squares Means Standard Effect salinity Estimate Error DF t Value salinity 12b 39733 13528 3 salinity 6b 48433 13528 3 salinity c 93150 13528 3 Differences of Least Squares Means Standard Effect salinity salinity Estimate Error DF t Value salinity 12b 6b 08700 19131 3 salinity 12b c 53417 19131 3 salinity 6b c 44717 19131 3 Solution for Fixed Effects Pr gt t 00063 00683 01015 045 279 234 233 Alpha 005 005 005 Pr gt t 00606 00373 00063 Pr gt t 06802 00683 01015 Osborne ST512 234 Test for container effect F MSBA MSE 027 on ab 1 3 and abn 1 12 df Here 19 lt 00001 Test for salinity treatment effect i MSA i 493 i MSBA m i on a 1 2 and ab 1 3 df Here 19 01254 which is not signi cant 449 Note that using the wrong error term leads to a different conclusion M S A 493 Fmon 1807 9 MS E 027 with p lt 00001 on 2 and 12 df ls there a treatment effect going on Why or why not Which F ratio is appropriate ls it a matter of modelling preference Estimated variance components 32 MSE 027 333W MSBA MSEn 1098 0273 357 lmplied correlation structure for two different k1 y k2 observations from the same container 2 OM 0077quot 10 YZH Jki 362 02 0 which is estimated by 357 3 093 027 357 Osborne STEZZ 235 Inference for xed effects is clean without need for Satterthwaite Yi M 041 Bz E which has 02 02 S E Y2 B b nb which can be estimated cleanly on b 1a 3 df by 7 1 1 SECm MSlBAl 110 nb 6 and differences are just as easy Zl Z2 0421 0 12 Bwi 3W2 Eil E 2 which has 7 7 0 02 2 2 2 SEY2391 Y12 2 la 03 which can be estimated cleanly on b 1a 3 df by A l2 2 SElK1 Y2392 MSlBAl 6110 This is a good place to go back and look at RED expt w ran dom block effects The experiment measured assembly times for iV injection systems ST 512 EXptl Stats for Biol Sciences II Weeks 1314 Split plots a repeated measures design Reading Ch 16 Repeated measures models Consider an experiment to study effects of irrigation and aerially sprayed pesticide on yields of different varieties of corn Factors 0 A Pesticide treatment a 3 levels 0 B lrrigation treatment b 4 levels called treatment trt o Plots n 2 per level of A total of no 6 plots For the moment ignore B or x the level of B plot pest y 1 1 534 2 1 465 1 2 543 2 2 572 1 3 559 2 3 574 One way ANOVA for A effect and plot effects Source df A Pesticide a 1 2 Error or plots n 1a 2 13 3 Total an 1 5 Osborne STEZZ 237 Split plot design Levels of factor B are randomly assigned to b 4 subplots within each of the na 6 plots in a split plot design pest plot B1 B2 B3 B4 1 1 534 538 582 595 1 2 465 511 492 513 2 1 543 563 604 645 2 2 57 2 569 616 668 3 1 559 586 624 645 3 2 574 602 572 627 Each row corresponds to one of no 6 plots Each plot is divided into b 4 subplots and levels of factor B are assigned to these at random The ANOVA table on the preceding page is at the whole plot level Sources of variation for the split plot level Source df B treatnients b 1 3 AXB a 1b 16 BXplotA b 1n 1a 9 aka Subplot error o A a between plots or between subjects factor 0 B a within plots or within subjects factor 0 Plots are subjects in repeated measures terminology where time is often the within subjects factor Suggestion draw a picture of the layout Osborne STEZZ 238 Source df EMS A Pesticide a 1 2 02 b0 lan PlotA n 1a 2 13 3 02 bag B treatments b 1 3 72 marJig AXB a 1b 16 02m1313 BXplotA b 1n 1a 9 02 Subplot error Total abn 1 23 Where variance components and size effects pertain to the model for a completely randomized split plot design random error component Yijk M 061 61 046 j SIM Em x Mj xed component Here i 1a andj1b and k 1n vvhere n denotes the number of plots treated with level i of factor a If n is constant call it n Random effects and variance components SIM 51 N 0 Eijk N0 02 Size effects for xed factors same as in prior 2 factor models For our example or denote pesticide effects by denote irrigation effects 06 are interactions F tests for xed effects guided by EMS column above Osborne STEZZ 239 For the corn yields data on p 2 Source MS df EMS F p Vaiue AzPesticide 1231 2 02ba bnwj 39 01452 Whole piot error MSSA326 3 02ba 101 00031 13 treatnients 602 3 02nawg 137 00003 AgtltB 41 6 a2nwiB 13 03607 BXpiotA MSE 32 9 02 Subpiot error Total 23 0 MS S denotes mean square for WHOLE piots nested in A 0 MS denotes error or subplot mean square For pesticide effect on 2 3 df F MSAMSSA 1281326 For irrigation effect on 3 9 df F MSBMSE 60232 For pesticide by irrigation interaction on 6 9 df F MSABMSE 4132 For randoni effect of Whole piots on 3 9 df F MSSAMSE 32632 Estimated varconips 02 MSE 32 and 012 MSSA MSE4 73 Osborne STEZZ 240 Pairwise comparisons Several kinds of pairwise comparisons of treatment means 1 Main effects of A y all y all 2 Main effects of B y in y imi 3 Simple effects of A y ilji y mi 4 Simple effects of B yij y mi 5 lnteraetien e eets yle y imi Skipping the algebra the standard errors for all of these compar isons save 3 and 5 can be estimated cleanly That is with single M S terms and integer df See table 166 careful of errata Comparison Variance Estimate df Kl Yig 02 f b0 MSlszll n 1a Yj1 Y12 772102 ViMSlEl n 1W 1a 213 YQH 02 73 2 73 messy 211 212 02 n 1b 00 2111 2212 02 73 2 73 messy To analyze data from a CRSPD in SAS consider using PROC MIXED instead of PROC GLM proc mixed methodtype3 class field pest trt irr cv model ytrtlpestddfmsatterth random fieldpest parms nobound lsmeans trt pestpdiff can use adjbon to adjust for multiplicity lsmeans trtlpestpdiff if there were interaction run parms statement can be used to keep SAS from dropping random effects w negative estimated varcomps Osborne STEZZ Source trt pest pesttrt fieldpest Residual Source trt pest pesttrt fieldpest The SAS System The Mixed Procedure Model Information Data Set WORKDNE Dependent Variable Covariance Structure Variance Components Estimation Method Type 3 Residual Variance Method Factor Fixed Effects SE Method Model Based Degrees of Freedom Method Satterthwaite Class Level Information Class Levels Values field 2 1 2 pest 3 1 2 3 irr 2 1 2 cv 2 1 2 trt 4 1 2 3 4 Type 3 Analysis of Variance Sum of DF Squares Mean Square Expected Mean Square 3 180697917 60232639 VarResidual Qtrtpesttrt 2 256275833 128137917 VarResidual 4 Varfieldpest Qpestpesttrt 6 24490833 4081806 VarResidual Qpesttrt 3 97806250 32602083 VarResidual 4 Varfieldpest 9 29058750 3228750 VarResidual Type 3 Analysis of Variance Error Error Term DF F Value Pr gt F MSResidual 9 1866 00003 MSfieldpest 3 393 01452 MSResidual 9 126 03607 MSResidual 9 1010 00031 Osborne STEZZ Effect pest Covariance Parameter Estimates Cov Parm Estimate fieldpest 73433 Residual 32287 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F trt 3 9 1866 00003 pest 2 3 393 01452 pesttrt 6 9 126 03607 Least Squares Means Standard pest trt Estimate Error DF t Value 1 541167 13274 49 4077 2 56 1500 13274 49 4230 3 58 1667 13274 49 4382 4 615500 13274 49 4637 1 528750 20187 3 2619 2 59 7500 20187 3 2960 3 59 8625 20187 3 2965 Differences of Least Squares Means Standard trt pest trt Estimate Error DF t 1 2 20333 10374 9 1 3 40500 10374 9 1 4 74333 10374 9 2 3 20167 10374 9 2 4 54000 10374 9 3 4 33833 10374 9 2 68750 28549 3 3 6 9875 28549 3 3 0 1125 28549 3 242 Pr gt t lt0001 lt0001 lt0001 lt0001 00001 lt0001 lt0001 Value Pr gt ltl 196 00816 390 00036 717 lt0001 194 00838 521 00006 326 00098 241 00952 245 00919 004 09710 Osborne STEZZ pest Osb orne ST512 244 Corn yield irrigation pesticide and cultivars continued So the treatment effect B is highly signi cant p 00003 Are there particular comparisons among the three treatments that are of interest There are because real experiment is actually slightly more complicated than previously described The factor B is really a 2 X 2 combination of irrigation and cultivar B lrr CV 1 no 1 2 no 2 3 yes 1 4 yes 2 The 3 df for the Within plot factor B can be broken up into three 1 df coniponents due to main effect of irr main effect of CV and interaction Same with the AB interaction The plot below averages over pesticide and eld 71 10 12 14 16 18 20 Osborne STEZZ prOC mixed methodtype3 Class field pest irr CV trt model ytrtlpestddfmsatterth model yirrCVpestddfmsatterth random fieldpest parms nobound lsmeans trt pestpdiff adjtukey lsmeans irr CV pdiff lsmeans irrCV run Source CV irrCV pest pestirr peStCV pestirrCV fieldpest Residual Class field pest irr CV trt DF H Levels JgtMMCOM Type Sum of Squares 953750 010417 733750 275833 747500 385833 357500 806250 058750 The SAS System 1 The Mixed PrOCedure Class Level Information Values r kr kr kr kr k MMMMM 0 3 4 3 Analysis of Mean Square 133953750 44010417 2733750 128137917 X 873750 0 692917 to 678750 602083 228750 0 VarianCe EXpeCted Mean Square VarResidual QirrirrCVpestirrpestirrCV VarResidual QCVirrCVpestCVpestirrCV VarResidual QirrCVpestirrCV VarResidual 4 Varfieldpest QpestpestirrpestCVpestirrCV VarResidual QpestirrpestirrCV VarResidual QpestCVpestirrCV VarResidual QpestirrCV VarResidual 4 Varfieldpest VarResidual Osborne STEZZ Source irr cv irrcv pest pestirr pestcv pestirrcv fieldpest Residual Effect cv irrcv irrcv irrcv irrcv Effect irr MMHH Type 3 Analysis of Variance Error Term MSResidual MSResidual MSResidual MSfieldpest MSResidual MSResidual MSResidual MSResidual Covariance Parameter Estimates Cov Parm Estimate fieldpest 73433 Residual 32287 Least Squares Means Standard rr cv Estimate Error DF 551333 12219 361 59 8583 12219 361 1 56 1417 12219 361 2 588500 12219 361 1 541167 13274 49 2 56 1500 13274 49 1 581667 13274 49 2 615500 13274 49 Differences of Least Squares Means CV Estimate 47250 27083 Standard Error 07336 07336 246 Error DF F Value Pr gt F 9 4149 00001 9 1363 00050 9 085 03815 3 393 01452 9 275 01171 9 021 08109 9 083 04670 9 1010 00031 t Value Pr gt ltl 4512 lt 0001 4899 lt 0001 4595 lt 0001 4816 lt 0001 4077 lt 0001 42 30 lt 0001 43 82 lt 0001 46 37 lt 0001 DP t Value Pr gt t 9 644 00001 9 369 00050 Osborne STEZZ 247 Split plot in blocks RCBSPD In the randomized block split plot design sets of homogeneous plots are be formed and levels of the whole plot factor are assigned to the plots within these sets in a restricted randomization Assignment of levels of the split plot factor are as in the CRSPD In the split plot experiment with pesticide as the whole plot fac tor and irrigationXCV as the split plot factor suppose the six plots come from two farms with three plots in each farm Suppose that the three pesticide treatments are randomized to plots within farms Renumbering plots 121212 as 123456 and supposing plots 236 come from farm 1 and plots 145 from farm 2 the data are given as The SAS System 1 Dbs farm pest plot B1 B2 B3 B4 1 2 1 1 534 538 582 595 2 1 1 2 465 511 492 513 3 1 2 3 543 563 604 645 4 2 2 4 57 2 569 616 668 5 2 3 5 559 586 624 645 6 1 3 6 574 602 572 627 At the whole plot level ignoring the split plot factor the df in an ANOVA for pesticide effects are given by Source df A Pesticide 2 Farms Error Total 5 so that an F ratio for the pesticide effect is based on df 2 2 Osborne ST512 248 In general for a RCBSPD with a levels of a whole plot level A randomized to 7 blocks for a total of Ta plots and b levels of a split plot factor B Within each plot the model and ANOVA table are given by YM u on Rk by 046 SRM E jk 127 Rk SRMC E jk Where o i denotes level of A 0 j denotes level of B o k denotes block nd nd Bk N N0 03 and SRC N N0 02 All random errors are mutu 7 87 ally indep endent Source df EMS A a 1 02 b0 b71113 Blocks 7quot 1 02 b0 abaf Whole plot error 7 1a 1 72 b0 Blockx A B b 1 02 arr1 AB a 1b 1 02 711313 Error ab 17 1 02 B X BlocksA Total ab 1 Osborne STEZZ 249 data one one is original dataset set one if plot in 236 then farm1 else farm2 run proc mixed methodtype3 class farm plot pest irr cv trt model ypesttrt random farm farmpest run The Mixed Procedure Class Levels Values farm 2 1 2 plot 6 1 2 3 4 5 6 pest 3 1 2 3 trt 4 1 2 3 4 Type 3 Analysis of Variance Sum of Source DF Squares Mean Square Expected Mean Square pest 2 256275833 128137917 VarResidual 4 Varfarmpest Qpestpesttrt trt 3 180697917 60232639 VarResidual Qtrtpesttrt pesttrt 6 24490833 4081806 VarResidual Qpesttrt farm 1 59220417 59220417 VarResidual 4 Varfarmpest 12 Varfarm farmpest 2 38585833 19292917 VarResidual 4 Varfarmpest Residual 9 29058750 3228750 VarResidual Cov Parm Estimate farm 33273 farmpest 40160 Residual 32287 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr gt F pest 2 2 664 01309 trt 3 9 1866 00003 pesttrt 6 9 126 03607 Osborn9751512 250 Split plot in blocks RCBSPD Researchers for an ice cream manufacturer conduct an expt to study the effects of variety sweetchariie camarosa and gaviota and mix ing speed slow medium and fast on ice cream quality One batch of each variety of strawberries is sampled on Monday over four consec utive weeks Each batch is divided into three parts which are ran domized to the three mixing speeds and three quarts of iced cream are produced stored for one month then tested for texture quality on a scale from 1 100 Data Obs week variety Low Medium High 1 1 c 47 49 49 2 1 g 48 51 54 3 1 s 46 49 51 4 2 c 43 46 49 5 2 g 51 55 53 6 2 s 48 49 48 7 3 c 47 51 51 8 3 g 50 53 53 9 3 s 44 5O 5O 1O 4 c 44 49 51 11 4 g 53 53 58 12 4 s 48 51 52 Ice cream texture quality TQ 60 O 55 6 50 e t I 45 g 4D c g s variety speed 0 0 0 Low Medlum ltgt ltgt ltgt mgr Osborne ST512 251 Model Yijk 1123 Bk 043 Em Where i 1 2 3 a variety j123 b Speed k 1234r week ANOVA Sketch Source df Expected MS Variety a 1 2 Block 7 1 3 VXB a 1r 16 Speed b 12 VXS a 1b 14 Error b 1r 1a 18 Total abr 1 35 Osborne STEZZ 252 SAS code for splitplot in blocks data one infile quotstrawberryicedatquot firstobs3 input week variety speed tq run goptions devps aXisl offset1 cm1 cm labelheight2 quotvarietyquot valueheight2 aXis2 offset1 cm1 cm labelheight2 quotTQquot valueheight2 symboll valuedot cblack h12 symbol2 valueplus cblack h12 symb013 valuediamond cblack h12 proc gplot dataone title quotIce cream texture qualityquot plot tqvarietyspeedhaXisaXis1 vaXisaXis2 plot tqspeedvariety run quit proc mixed dataone methodtype3 class week variety speed model tqvarietyspeed random week weekvariety lsmeans variety speeddiff lsmeans varietyspeed run Osborne STEZZ The SAS System The Mixed Procedure Model Information Data Set WGRKGNE Dependent Variable tq Covariance Structure Estimation Method Type 3 Residual Variance Method Fixed Effects SE Method Degrees of Freedom Method Factor ModelBased Containment Class Levels Values week 4 1 2 3 4 variety 3 c g s speed 3 Low Medium high Dimensions Covariance Parameters Max Obs Per Subject Type 3 Analysis of Variance Sum of Source Squares H variety 48666667 s eed varietyspeed week H H to w more 111111 37666667 weekvariety Residual Type 3 Analysis of Variance Source Expected Mean Square variety VarResidual 3 Varweekvariety Qvarietyvarietyspeed VarResidual Qspeedvarietyspeed varietyspeed VarResidual Qvarietyspeed week VarResidual 3 Varweekvariety 9 Varweek weekvariety VarResidual 3 Varweekvariety Residual VarResidual Variance Components Mean Square 74333333 1 03 mmmo gt o l gt o I 092593 Error Term MSweekvariety MSResidual MSResidual MSweekvariety MSResidual Osborne STEZZ Effect varietyspeed varietyspeed varietyspeed varietyspeed varietyspeed varietyspeed varietyspeed varietyspeed varietyspeed Type 3 Analysis of Variance Error Source DF F Value variety 6 1347 speed 18 2680 varietyspeed 18 026 6 116 weekvariety 18 264 Residual Covariance Parameter Estimates Cov Parm Estimate week 009877 weekvariety 1 1420 Residual 2 0926 Fit Statistics 2 Res Log Likelihood AIC smaller is better AICC smaller is better BIC smaller is better Least Squares Means Standard variety speed Estimate Error c 480000 06961 g 526667 06961 s 488333 06961 Low 474167 05424 Medium 505000 05424 high 515833 05424 c Low 452500 09129 c Medium 487500 09129 c high 500000 09129 g Low 505000 09129 g Medium 530000 09129 g ig 545000 09129 s ow 465000 09129 s Medium 497500 09129 s high 502500 09129 Pr gt F 00060 lt 0001 09004 03990 00516 1182 1242 1253 1224 DF t Value 6 6895 6 75 66 6 7015 18 8741 18 93 10 18 95 10 18 4957 18 5340 18 5477 18 55 32 18 5806 18 5970 18 5094 18 5450 18 5505 AAAAAAAAAAAAAAA 254 Osborne STEZZ Effect variety Effect variety Differences of Least Squares variety speed Medium Differences of variety speed variety speed Means Estimate Least Squares Means variety 5 s s speed t Value 487 O87 400 Medium 5 22 high 706 high 183 Standard Error

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.