Popular in Course
Popular in Statistics
This 21 page Class Notes was uploaded by Orval Funk on Monday September 28, 2015. The Class Notes belongs to STAT102 at University of Pennsylvania taught by Staff in Fall. Since its upload, it has received 17 views. For similar materials see /class/215434/stat102-university-of-pennsylvania in Statistics at University of Pennsylvania.
Reviews for INTROBUSINESSSTAT
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/28/15
Lect 15 STAT102 One Way Analysis of Variance ANOVA Read Ch 91 Comparison of means among 1 groups Individual ttests vs multiple comparison Two methods Bonferroni and TukeyKramer Oneway Analysis of Variance o Oneway ANOVA a technique designed to compare the means of two or more groups Extends the equalvariance two sample test discussed in Lecture 2 0 Uses an Ftest to determine whether there are overall any signi cant differences among the means 0 Then if there are any overall differences uses special multiple comparison tests to determine which differences between pairs of means are significant 0 We ll discuss the theory in the context of an example 0 This example uses data printed in USA Today 5 years ago that reports the returns for prior years of a sample of Mutual Funds Stock Returns Example Look at the 5 yr Returns in the USA Today stock fund data to see whether there are differences in 5 yr Returns according to the Type of mutual fund In this data there are four main Types aka Broad Objectives and we will concentrate on these B Balanced G1 Growth and Income G Growth CL Global Here are sidebyside plots of the returns for the four major groups This plot shows means diamonds and quantile bOX plots for each group The means diamonds are computed from the standard assuming equal variance analysis discussed below 5 yr Return By Broad Objective There are clearly noticeable differences among the returns Overall are they statistically significant If so which differences are significant Individual means Here are the means and standard deviations for each group and the SEs for the mean of each group as computed from the SD of that group Means and Std Deviations Level Number Mean Std Dev SE Mean B 6 1062 2623 1071 G 31 1926 5107 917 GI 26 1505 4025 789 GL 9 9844 3894 1298 One Way ANOVA Theory Groups labeled 139 1 1 Observations YU in the 1th group with 1 n n Zn observations in all Model Yij u 61 where 6 indep normal with mean0 amp var 02 y Em An alternate form of the model with y Zy and a y y This implies that 206 0 Basic test is of H0 All the group means are the same ie M1M2MI VS Ha They re not all the same Analysis of Variance Explanation of Calculations Formulas amp Relation to Regression 0 The Model has EYlj 2 ul 0 So we ll estimate each ul by the mean of the corresponding Yij j lnl denoted by Z ni l 39 J 0 As with all our previous regression estimators this is a Least Squares Estimator ie it minimizes the total SSError SSE 20 1512 2m 1sfs where if 7 1 0 As in other types of regression settings this is compared to 2 S ST 20 K where Yquot denotes the grand mean Test ofH0 u1 2y 0 We calculate the reduction in Sum of Squares due to the model SSR SST SSE SSR I 1 SSEn 1 0 Degrees of Freedom The DF for the model is I 1 This is because under H0 the value of u1 is not restricted but then the 0 And use F FH H to test H0 remaining u2 ul are completely restricted to be this same value There are I 1 completely restricted values under H0 and hence I 1 DF The alternate form of the model has 1 u 061 with 2051 0 Hence H0 a1a1 0 is the same as H0 a1 05H 2 0 As before there are only I 1 completely restricted values under H0 and hence I 1 DF for the Model Fund EX Test of H0 JMP tables for the Oneway ANOVA Summary of Fit RSquare 039 Root Mean Square Error 4444 Spooled or Se Observations 72 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model 3 86631 28877 1462 288771975 Error 68 134305 1975 ProbgtF C Total 71 220936 3112 ltOOO1 The FRatio has 3 and 68 DF and tests H0 All the group means are the same Versus the alternative Ha They re not all the same Reject H0 at 005 since the PValue is ltOOO1 Multiple Comparison Tests Intro 0 Since we have rejected the null hypothesis that all the means are the same we would like to go on to investigate the differences between each pair 0 A rst step could be to examine the estimates of the means and their SE5 Means for Oneway Anova Level Number Mean Std Error Lower 95 Upper 95 B 6 10617 1814 6996 14237 G 31 19261 798 17669 20854 GI 26 15050 872 13311 16789 GL 9 9844 1481 6888 12801 Std Error uses a pooled estimate of error variance 0 Note that the SES in this table are not the same as those on our p4 Why not o If we construct standard ttest 1001 a confidence intervals for each pair assuming equalvariances we have 1 1 For M M K Y itn11a2MSE I J o The DF here for the tstatistic is n I 68 since this calculation uses xMSE 2 Se having 68 DF 0 For example G 13 1926 lO62i2gtlt444gtlt 311 864 i 396 468 to 1260 0 Note that the probability is 104 for EACH such intervals that it contains the true value of the corresponding to uj HVIP has several ways of displaying the results of this construction of con dence intervals Use the Fit Y by X platform and then go there to the arrow command Compare means Each pair Student s t Here is one way to display all the results from these CI s This shows which CI s for y uj contain 0 Means Comparisons for each pair using Student39st t Alpha 1 995 005 Level Mean G A 1 9261 Gl B 1 5050 B C 1 061 7 G L C 9844 Levels not connected by same letter are signi came different Here is another way This gives all the con dence intervals both numerically and graphically Note the blue lines Lower CL and Upper CL on the plot Level Level Difference LovverO UpperCL pValue Difference G GL 942 606 1278 0000 G B 865 469 1260 0000 GI GL 521 178 864 00035 GI B 443 42 845 00310 G GI 421 185 657 00007 B GL 77 390 545 0743 PValue is for Individual Tests of Difference 0 11 o If we look at the above intervals we might be tempted to claim that we are 95 certain that 6059 77G uGL 312775 and 4689SyG uB 126 and 1776 am uGL 8635 and 4173 uB 3845 and 1853 77G uGI 6570 and 3902 s 3 uGL 5446 0 This would be associated with a claim that we are 95 certain that the rst ve of these differences are ALL 96 0 Such claim would be unjusti ed 0 What is true is that each individual confidence interval has a 95 chance of being right But this implies that there is a much smaller chance that all of these con dence intervals are right 0 An issue here is the difference between an Individual Coverage Rate and a Familywise Coverage rate Simultaneous Confidence Intervals 0 When several confidence intervals are considered simultaneously they constitute a family of C15 0 Individual Coverage Rate The probability that any individual confidence interval in the family contains its true value 0 Familywise Coverage Rate The probability that every confidence interval contains its true value Simultaneous Test Procedures Every set of simultaneous con dence intervals is associated with a family of simultaneous tests Thus we have the family of tests Hos1 iM w 091 17 and we reject the individual H whenever the confidence interval for ul uj 0iJ39 does not contain 0 0 Individual Error Rate The probability for a single test in the family that the corresponding null hypothesis will be rejected if it is true 0 Familywise Error Rate The probability for the entire family of tests that at least one true null hypothesis will be rejected When planning and carrying out a study such as a oneway ANOVA the recommended Best Practice is to use procedures guaranteeing the claimed Familywise coverage and error rates The easiest way to attain this is to use Bonferroni Confidence Intervals and Tests This general method works for oneway ANOVA and for many other statistical settings For oneway ANOVA the TukeyKramer Method gives slightly more powerful tests and slightly shorter confidence intervals Bonferroni Method for Tests and CIs o A general method for doing multiple tests or con dence intervals resp for any family of k tests con dence intervals In the context of oneway ANOVA there are I groups and hence how 2 2 0 Denote the desired familyWise error rate by of desired familyWise coverage rate by l 05 0 Compute individual tests con dence intervals at level azax con dence intervals at individual coverage 1 05 This guarantees the familyWise error rate is at most 05 and the familyWise coverage rate is at least 1 05 Why Bonferroni Works In the general case there are k null hypotheses Label them H0 j lk The probability that an individual type I error is made on H 0 is PHOJ rejH0j PEj say where E denotes the event of rejecting H 0 1 given that H 0 j is true The probability that any error is made in the entire family of tests is PE1UuEksipltEjkakaai Thus the familywise error rate is S 04 as desired NOTE that the inequality in is generally a strict inequality Hence one should expect the Bonferroni procedure to have familyWise error rate strictly less than the nominal 05 but it is hard to know how much less The proof for a family of confidence intervals is similar 16 To Use Bonferroni with J MP in a oneway ANOVA I 1 1 Determine k v1a k 2 Where 1 denotes the number of comparison groups Chooseaquot Usually a005 Calculate 05 05 Go to the arrow menu inside the Fit Y by X platform Select Set alpha level gtother and enter the value of 05 05 Then perform the individual Compare means Each pair Student s t as before This will give the desired confidence intervals Cy say and the corresponding tests of H0713 can be performed by rejecting Whenever O 93 Ca Fund Example cont Bonferroni 4 x 3 In the example there are 4 groups to compare So k 6 For 05 005 we have a 05 00 00833 We get the output Comparisons for each pair using Student39s t Alpha 2718 000833 Note that critical tvalue here is 2718 compared to the earlier t 1995 for 04005 Level Level Diff ce Lower CL Upper CL Difference G GL 9417 4844 G B 8645 3258 G GL 5206 534 G B 4433 1037 G G 4211 1000 B GL 772 5594 HERE we can only reject 4 of the null hypotheses instead of 5 as with the individual ttests procedure We also conclude Fund G is the best better than all others TukeyKramer Method Note that Bonferroni uses simultaneous CIs of the form For Mi LLJ39I Y io Y joitBoanMSE Where tBonf tn 11 ak239 TukeyKramer uses CIs of the same form but with a different slightly smaller value of t Thus it has the form For Mi LLJ39I Y io Y joiqTK where C TK is specially chosen to give an error rate at most 05 More precisely the TK procedure has familywise error rate EXACTLY 05 when all 11 are the same Otherwise it has error rate AT MOST of This was conjectured by Tukey amp Kramer in the 505 and proven in the 705 by Hayter IMP performs the TK procedure automatically Use the command Compare meansAll Pairs Tukey HSD Be sure the Alpha Level is set at 05 and not at a05 Example cont The TK Method For 05 005 we make sure the Alpha Level command is at 005 and then request the TK output We get Comparisons for all pairs using TukeyKramer HSD q Alpha 263 tBON27 8 005 NOTE that this q is slightly less than that forthe Bonferroni procedure hence the confidence intervals are slightly shorter Level Level Difference Lower CL Upper CL Difference G GL 9417 4985 13849 mamamem G B 8645 3424 r GI GL 5206 679 GI B 4433 868 G GI 4211 1099 B GL 772 5397 6941 These CIs are slightly shorter and hence more precise than the Bonferroni ones It turns out that we can still reject only the same 4 hypotheses as with Bonferroni Note The difference between Bonferroni and TK becomes more pronounced as the number of groups grows larger 20 Other Issues to be Addressed in Lecture Optional additional material 1 How and Where to nd ClS for the different factor means 2 How and Where to nd prediction ClS for future observations on a given factor 3 Where to find estimates for the parameters u and a Hints Use Fit Model and the Dropdown Expanded Estimates option 4 How to validate the model for homoscedasticity and normality 5 Would it have been preferable to use LogRetum here rather than return 6 Why isn t linearity a validation issue here as it was in ordinary regression or multiple regression 7 How does JMP and other standard statistical software use indicator variables to produce the Least Squares analysis See Chapter 7 for an introduction to indicator variables We won t need to master this material because JMP performs these operations automatically 21
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'