STATISTICAL METHODS II
STATISTICAL METHODS II STAT 516
Popular in Course
verified elite notetaker
Popular in Statistics
This 13 page Class Notes was uploaded by Shane Marks on Monday October 26, 2015. The Class Notes belongs to STAT 516 at University of South Carolina - Columbia taught by B. Habing in Fall. Since its upload, it has received 56 views. For similar materials see /class/229660/stat-516-university-of-south-carolina-columbia in Statistics at University of South Carolina - Columbia.
Reviews for STATISTICAL METHODS II
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/26/15
Displaying IVIain Emu and Interactinns Supplamm w Secton 9 5 Emu Hahmg rUmvexsnyafSamh Camhm Las UpiMed Mm 19 2mm On seaweed m39m seaweedz Uni of of Zluf 529 syuf vzuf Uni 229i ZADf n 529 Inf 529 no 279i 3 5L 2L 5 y nnf 21L s 5 s zst Ls Ls 2 2 m WW 2 mmmmm m wag Scum smvsqms Nunqu W w H W mm mm 7 mm man an 3 Wm M A Sumquuams Na ethmwemedzdm malqu the mmmn hum mm nun nmu mm H D Emu mm m Mu mm mmumn Pm 1 WA L555 iEE mm m m Nance um SAS gave eshmms fax a afthz mm effects and mm wen Llanth mm L effect was ggm cam Because anlythz L mm appmd ng canL um ls m amyam x reallynzed m xepun Icmlld xeptesem39hsampmas 2 8816M Pk pm 5 9599 o 1 0th mm A pmblzm mm 5 n gm m mm Lhasa Wmquot W W m high n may a hz mam r lzvels xs 7 2155 fax mm and w medm m um mm mm On mya nkmg um um accmlm wuld be m um mymovmnmmm Lfacmx sum we kmw than m Dulyam hansslgm cam Th2 camct dls ayvmuld thus be 5 am swnrh um 1m man than three semngs m ltm ltm V2 7537 fhn venllxesem n mummmum mm mum I 5 ma mm mu m sumquot m nnly 2m rn szzn Lnch m m on ur mm 1 7 ms Can yml t2 haw gm has th57 lepts and Small Fish 72 2225 7 mu n 2272 Inth we had a um um um mu mm wnh 21th afthz rm m w cmdd than Just Add 5 km fax um max dmensmml dls ay Dr 5 sum athzx mck m present m result possibly incomplete list 0 Topics Covered in Chapters 6 and 9113 Chapter 6 and Supplement One Way Analysis of Variance The oneway ANOVA table and notation including how to make the table The assumptions for the oneway ANOVA That the modi ed Levene s test aka Brown and Forsythe test can be used to test the variances are equal What hypothesis the basic ANOVA Ftest the omnibus test test tests Familywise or experimentwise type I error rate cap versus comparisonwise type I error rate 0L Conservative vs Liberal Holm vs Bonferroni vs Fisher How the Holm Test works Making a display from the Holm test on all pairs of treatment levels How to construct contrasts How to show contrasts are orthogonal only works when the ANOVA is balanced What the estimates and tests corresponding to contrasts tell us Making con dence intervals for contrasts When to use each of the following the basic ANOVA test Holm test for each pair of tests and Contrasts Interpreting the SAS output Topics M Covered from Chapter 6 Pg 238239 Hartley39s Fmax test Pgs 249252 Fitting Trends Pgs 254267 Tukey s HSD Duncan s Multiple Range Scheffe Section 68 Analysis ofMeans Chapter 9 and Supplement Factorial Experiments What Factorial Fixed Effect Balanced and Replications mean How to use the ANOVA table and how it ts together but not all the equations Partitioning the SSB The model equation for the two factor ANOVA table with interactions and what the terms mean The relationship of the 2 by 2 two factor ANOVA table with interactions to the orthogonal contrasts in a one way ANOVA with AI AII BI BII for example The model for the factorial models with more factors for example all the different interaction terms that have to be added What to do when there are no replications and why this is necessary What about the interpretation gets more complicated when there is an interaction Interpreting the SAS output Topics M Covered from Chapter 9 Section 94 Interaction Contrasts Polynomial Responses Lack of Fit Test Sections 1 0 Random Effects and NonFactorial Models Random Effect vs Fixed Effect What hypothesis are being tested in a random effects model If given the EMS how they are used to determine which MS go into making the appropriate F If given the EMS how they can be used to estimate the variances How to read and use the EMS from PROC GLM The purpose of blocking That each treatment must appear in each row and column once in a Latin square Why we might want to use a latin square Topics M Covered from Chapter 10 Pgs 469470 The definition of relative efficiency and the formula s for it Pgs 480484 Factorial Experiments in a Randomized Block Design Section 105 Other Designs Sections 111113 Dummy Variables Why the usual formula s don t work for unbalanced or nonfactorial data see example 112 pg 514 That PROC ANOVA and the means statement in PROC GLM only work for balanced factorial data That PROC GLM and the lsmeans statement work fine even if the design isn t balanced or factorial The basic ideas of coding something as a dummy variable and using regression pg 511 Contrasts and Multiple Comparisons Supplement for Pages 3 02323 Brian Habing 7 University of South Carolina Last Updated July 20 2001 The F test from the ANOVA table allows us to test the null hypothesis The population means of all of the groupstreatments are equal The alternate hypothesis is simply that At least two are not equal Often this isn t what we want to know Say we are comparing 20 possible treatments for a disease The ANOVA F test sometimes called the omnibus test could only tell us that at least one of the treatments worked differently than the others We might however want to be able to rank the 20 from best to worst and say which of these differences are signi cant We might want to compare all the treatments produced by one company to those of another or maybe all the treatments based on one idea to those based on another An obvious suggestion in each of these cases would be to simply do a large number of ttests To rank the 20 from best to worst we could simply do a separate ttest for each possible comparison there are 190 of them To compare the two companies or two ideas we could simply group all of the observations from the related methods together and use ttests to see if they differ One difficulty with this is that the OLlevel probability of a Type I error may no longer be what we want it to be Sidak s Formula Stepping back from the ANOVA setting for a minute say we wish to conduct onesample ttests on twenty completely independent populations If we set 1005 for the first test that means that 005 0L Preject Hg for test one H0 is true for test one We could write the same for the other nineteen populations as well Ifwe are concerned about all twenty populations though we might be more interested in the probability that we reject a true null hypothesis at all That is 0LT Preject Hg for test one U reject Hg for test two U U reject Hg for test 20 H0 is true for all tests We call this quantity the family wise or experiment wise error rate The 0L for each individual test is called the comparison wise error rate The family or experiment in this case is made up of the twenty individual comparisons Using the rules of probability and the fact that we assumed the tests were independent for this example we can calculate what 0LT would be if we used 1005 for the comparisonwise rate onpreject Ha forl v rejectHu for 2 v v rejectHu for 20 Hu is true for all tests 17 Pfail to reject Hg for 1 n n fail to reject HD for 20 Hg is true for all tests 17 Pfail to reject Hg for 1 Hg is true for all tests Pfail to reject Hg for 2 Hg is true for all tests 11OL1OL 1rot 1r1rot 1 a 17005 1 0952 064 The chance ofmaking at least one error ctT isn t 5 it s nearly 64 r39 39 t t itnktestswe et 06 1 r 1700 When the tests are independent Ifwe know What 1139 We mnt We can solve for the needed X to get 0clrlroc1 k If quot T 39 39 0cof 000256 for each individual comparison Bonferroni s Formula In the case ofANOVA the various tests will o en not be independent wae mnt to conduct the ttests to compare 20 possible medical treatments to each other then clearly the comparison ofl to 2 and 1 to 3 will not be independent they both contain treatment 1 The diagram ueiu 39 quot 39 39 L 4 due is I A 5quot V 17a 17a 141T T is as large as possible a is in between T is as small as possible assume the worst what usually happens assume the best Bunrerruni 7777 Fisher The worst possible case in terms of 0LT would be if the type I errors for the individual tests were mutually exclusive In this case 0LT Preject Hg for l U reject Hg for 2 U U reject Hg for k H0 is true for all tests Preject Hg for l H0 is true for all tests Preject Hg for k H0 is true for all tests OL0L 0L koc to amaXimumofone or equivalently OL XTk This is Bonferrom39 s formula The best possible case in terms of 0LT would be if the type I errors for the individual tests all overlapped In this case 0LT 0L So far then If we are performing a set of tests that are independent then we can use Sidak s adjustment to gure out what comparisonwise 0L we should be using If the tests are not independent then we have a choice We could be liberal and reject true null hypotheses too often use 0LT 0L or be conservative and not reject the true null hypotheses as much as we should for our desired 0LT use Bonferroni In terms of OCT we would be better being conservative then The problem with this is that if we do not reject the true null hypotheses enough we also will not reject the false ones enough In the case of comparing the means of treatments if we are liberal using 0LT 0L we will find lots of differences that are there but also lots of differences that aren t real If we are conservative we won t find lots of fake differences but we will also miss the real ones Fisher s LSD One method for dealing with the fact that using 0LT 0c is too liberal is called the Fisher Least Significant Difference LSD test The idea is to only check to see if the means of groups are different if you reject the omnibus Ftest This makes some obvious sense if you fail to reject that there are no differences why would you continue looking While this helps keep the number of false rejections down it does have two downsides The first problem can occur when you fail to reject the overall ANOVA null hypothesis Because the omnibus test from the ANOVA table is looking at all of the groups at once it will sometimes miss a difference between just two means It has to sacrifice power for each individual comparison in order to test them all at once The second problem can occur when we do reject the overall ANOVA null hypothesis and proceed to do the other comparisons of the group means The omnibus test may have rejected because ofa difference between only two means but because using 0LT 0c is liberal we may find more differences than are really there Because of these two difficulties Fisher s LSD can t be highly recommended The Holm Test The Holm test is a method for dealing with the fact that the Bonferroni procedure is too conservative The main idea comes from noticing that we always used the condition H0 is true for all tests instead of using the condition that it was true only for the speci c test we were doing The procedure behind the Holm test is to rst nd all of the pvalues for all of the individual tests we were performing and then rank them from smallest to largest Compare the smallest to Powk If you fail to reject the null hypothesis for the rst step then you stop here If you do reject then compare the next smallest to 0LXTk 1 Again if you fail to reject the null hypothesis then you stop here if you do reject continue on and use XxTk2 You do not need to check the omnibus Ftest rst thus avoiding the rst problem with Fisher s LSD For example say you have ve hypotheses you are testing you wanted 0LT 005 and you observed p values of 001 1 0751 0020 0030 and 0001 respectively Test Number P value Compare To nnc39lu ion 5 0001 0055001 reject Hg for test 5 1 0011 005400125 reject Hg for test 1 3 0020 005300166 fail to reject for test 3 4 0030 no comparison made fail to reject for test 4 2 0751 no comparison made fail to reject for test 2 Notice that Bonferonni s test would only have rejected for test 5 Using 0LT 0L would have rejected for tests 5 l 3 and 4 Thus the power of the Holm test is somewhere in between that of the Bonferroni procedure and Fisher s LSD While it is more powerful than Bonferroni s method it rejects more false Ho s it still makes sure that 0LT is held to the desired level unlike Fisher s LSD Notice that if all the null hypotheses are true we make an error if we reject any of them The chance that we reject any is the same as the chance that we reject the rst which is XTk We are thus safe for the same reason that Bonferroni s formula works Now assume that we rejected the rst null hypothesis because it was false There are only k I tests left and so when we go to the second test we can start as if we were using Bonferroni s formula with k I instead of k And we continue in this way While this argument is not a proof that the Holm Test protects the family wise error rate 0LT it should make the general idea fairly clear While there are many other methods for making multiple comparisons see pages 307313 the Holm test performs fairly well compared to all of them controls 0LT at the desired level and is fairly easy to understand Because of this it will be the method that we will focus on Contrasts In order to perform any of these tests though we must be able to tell SAS what we want done The building blocks for many of the SAS procedures that we will have SAS use are called contrasts A contrast is simply a linear function of the various treatment group means whose coef cients sum to zero Consider the example presented in Table 74 on pages 296298 Here we have three groups lhealthy 2nonmelancholicdepressed and 3melancholicdepressed Each of these three groups has an associated parameter M1 uh uz unonmdep and M3 umdep Examples of contrasts here would include L1 0u1 1 M2 lu3 written in SAS as 0 l l L2l ll1 01121113 1 0 1 L3 l Hl l llz 0113 l l 0 L4 l Hl 12 Hz 105 05 Notice that in each case the coefficients sum to 0 0ll0 10l0 10 10 112120 The theory says that we can estimate the contrasts using A k A k a L Zafi and 6i MSmZn i1 i1 where the ai are the coefficients for the contrast and the estimate 5 is normally distributed if the ANOVA assumptions are met Since we have the standard error for L we could make a confidence interval for L or test the null hypothesis that L0 The question though is why would we want to If we look at L1 we simply have the difference of the means of the second and third groups the non melancholic depressed and the melancholic depressed It thus appears as if the contrast L1 is simply comparing the means of those two groups Ifwe use the estimate of L and its standard error to construct a ttest of the hypothesis L10 we get rd 7 k yz V3 fiNi MSrex MSrex quot2 quot3 This is exactly the twosample ttest for H0 uluz0 except that we are using MSres instead of the pooled variance estimate There is also an Ftest for this contrast that tests exactly the same hypothesis the F value will always be the square of the tvalue If we return to the other three contrasts L2 is simply testing whether the nondepressed and melancholic depressed differ Similarly L3 is simply testing whether the healthy and nonmelancholicdepressed differ The last contrast is somewhat more complicated It is comparing the mean of the healthy to the average of the means of the two depressed groups That is it is comparing nondepressed to depressed Independence Orthogonal Contrasts and the Holm Sidak Test an aside note Two contrasts are said to be orthogonal if the dotproduct of their coefficient vectors is zero So two k k k contrasts L Zauui and L Zbuui would be orthogonal if Za bi 0 In the above example then 1 i1 i1 L1 and L4 would be orthogonal but no other pair of these contrasts would be The reason to care if two contrasts are orthogonal is that the estimates that go with a set of orthogonal contrasts are independent The test statistics will not be however as they both contain the MSres in the denominator There is a modification of the Holm test that uses Sidak s formula instead of Bonferroni s However because the statistics won t be independent and there really isn t much difference between the values given by Bonferroni s formula and Sidak s formula we will just use the basic Holm test 5 Tying it All Together When we approach an ANOVA problem there are three basic types of questions we could have in mind 1 Are there any differences between any of the group means Choose 0L and simply use the F test from the ANOVA table the omnibus test N Do the means of some particular groups differ from the means of some other particular groups Choose 0LT and come up with the contrasts you wish to test Find the pValues for the tests that go with these contrasts and then use the Holm test procedure to see which are signi cant F What is the order of the group means and which are signi cantly different from each other Choose 0LT Make all of the contrasts that compare two means to each other nd their pValues and use the Holm test procedure to see which are signi cantly different Then make a simple graph to display the result It is important to note that you should decide which one of these questions you want to answer before you look at any of the output If for some reason you don t know why you are looking at the data in advance something called Scheffe s method can be used Also you should only pick one of these three questions It doesn t make sense to look at more than one of them does it Finally in all cases remember to check the assumptions Example Hormones and Depression The follow pages contain the code and output for answering each of the questions above for the example on pages 296298 The write up assumes that the desired familywise error rate is 0LT005 Check the Assumptions Using PROC INSIGHT and the Modi ed Levene s test PROC INSIGHT OPEN tab7p4 FIT cortgroup l PROC GLM DATAtab7p4 ORDERDATA CLASS group MODEL cortgroup MEANS group HOVTESTBF RUN 10 10 39 R I R I I o I I o r I r t 39 t 39 10 10 10 11 12 13 2 0 2 Pcori u RNcori The GLM Procedure Brown and Forsythe39s Test For Homogenelty of cort Varlance ANOVA of Absolute DeVIatIons From Group Medlans Sum of Mean Source DF Squares Square F Value Pr gt F group 2 6 5816 32908 048 06234 Error 53 365 9 69029 From the residual vs predicted plot the means for each of the three groups seem to be near zero they must always be for a oneway ANOVA However it is not clear from the residual vs predicted plot if the variances of the errors for the three groups are the same Using the modi ed Levene test we fail to reject that they are different with a pvalue of 06234 Finally from the QQ plot of the residuals it appears that the distribution of the errors is approximately normally distributed with the possible exception of two outliers Assuming the experimental design satis es the independence assumption then all four assumptions for the ANOVA are met in this case
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'