Popular in Course
Popular in Statistics
This 12 page Class Notes was uploaded by Orval Funk on Monday September 28, 2015. The Class Notes belongs to STAT102 at University of Pennsylvania taught by Staff in Fall. Since its upload, it has received 12 views. For similar materials see /class/215434/stat102-university-of-pennsylvania in Statistics at University of Pennsylvania.
Reviews for INTROBUSINESSSTAT
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/28/15
Statistics 102 Twoway Anova Spring 2000 I TwoWay Analysis of Variance Administrative Items Getting help See me Monday 3530 or Wednesday from 4530 Send an email to stinewharton Visit the StatLabT As particularly for help using the computer Plan for this week Assignment 4 Review of OneWay Anova Underlying model see text Defn111 Assume that the data have the form I groups with ni in the ill yijui ij i1I j1ni where 1 observations are independent 2 observations have constant variance 02 and 3 observations are normally distributed The error terms are thus also independent from one observation to another and are normally distributed with mean zero and variance 02 written as an N0 02 Estimates of unknown parameters Estimate the group means using the sample means A E and estimate the error variance using the within sum of squares 6 WithinSS 2 Ed 72 Zea2 n I n I n I Questions to answer in a oneway an0va a Is there a difference among the underlying means ui b Is the group with the highest lowest average significantly better worst c Are there significant differences between some mean values d What should be done if the data are not normal Methods for answering these questions a Ftest from the Anova table b Hsu s multiple comparisons c TukeyKramer multiple comparisons confirm these by hand d Outliers Consider a nonparametric ranksum comparison Statistics 102 Twoway Anava Spring 2000 2 Example Application for OneWay Anova Context from last time Claims of better mileage for a brand of gasoline based on the following experiment A refiner rents 60 cars of the same model for a day and randomly divides the cars into 4 groups of 15 each The fuel was drained from the cars and replaced by fresh gasoline of four types B E S X The cars were then driven for a day along a standard course that combined interstate and city driving At the end of the day the mileage for each was measured and recorded Q 11 e s t io n s a Is there evidence of a difference in mileage among the 4 types of gas b Is the formulation obtaining the highest mileage significantly the best c Are there any other differences in mileage among the 4 brands d Are the data suited to the standard analysis or should alternatives be sought that would better accommodate the features of this data HVIP How do I build an analysis of variance with one factor Fit Y be with continuous response Y and single categorical predictor X Fratio in anova table handles the overall null hypothesis Multiple comparison methods are used for other comparisons Graphically Comparison circles show which are different Tabular summaries Read the labels to interpret the output Statistics 102 Twoway Anava Spring 2000 3 Answering the questions Analysis preliminaries See notes from previous class regarding important preliminary steps eg aws in the data collection process sources of dependence a Evidence of a difference among the 4 types of gas Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model 3 24453 8151 85227 Error 56 53558 956 ProbgtF C Total 59 78011 1322 lt0001 Fratio indicates a significant difference exists but does not indicate where the difference is For that we must go on to multiple comparison methods Note that the table also includes an estimate of oz namely 6 WithinSS n I ErrorMeanSquare 956 b High mileage group significantly better than others 30 25 39O i 15 39 39 With Best 8 E s Hsu39s MCB Refiner 03905 MeaniMeanjLSD E S X E 238 027 215 283 B 503 238 O5O 017 S 691 426 238 171 X 759 493 305 238 If a column has any positive values the mean is significantly less than the max Yes both graphically and in the table E is better by a significant margin Statistics 102 Twoway Anava Spring 2000 4 c Other differences in mileage among the 4 brands 30 zsif iO v 15 39 Mileage 39 39 39 With Best All Pairs B E s X Hsu39s MCB TukeyKramer Re ner 005 005 AbsDifLSD E B S X E 299 034 154 222 B 034 299 111 044 S 154 111 299 232 X 222 044 232 299 Positive values show pairs of means that are significantly different It is useful to see how to calculate the values in the table see text Section 113 JMP shows the lower endpoints of confidence intervals for the absolute value of the difference in means The confidence interval is computed as dz erenceinmeans i qanumber ofmeans error elf1 zum pergmup with q from Table 8 For example the interval for E B is x005 2445 218374955 265i374798 265i298 Notice that the term after the final i is the diagonal value in JlVTP s table The lower endpoint of this interval is 033 matching JlVTP s value 034 up to my rounding Thus only the differences E S and E X are significant in this comparison This does indeed obtain a different answer than using the previous Hsu s comparison that showed a significant difference between E and B WHY Statistics 102 Twoway Anava Spring 2000 5 d Use the standard analysis or nonparametrically Start by checking assumptions There s no eVident lack of constant variance and the data appear close to normal Were the data contaminated by outliers consider a nonparametric procedure Below is JMP s output for the Wilcoxon ranksum test also obtained with an option from the Fit Y be output Level Count Score Sum Score Mean MeanMean0IStd0 B 15 492 328000 0580 E 15 6795 453000 3782 S 15 3635 242333 1596 X 15 295 196667 2766 1Way Test ChiSquare Approximation ChiSquare DF ProbgtChiSq 187395 3 00003 Since the data are close to normal the two methods agree compare chisquare to the Anova table Statistics 102 Twoway Anava Spring 2000 6 Introduction to TwoWay Anova Two factors in the experiment Often have more than one factor that is of interest For example consider an experiment in webpage design A company wants to learn about the effect of oneclick checkouts It also has two types of page layouts One layout is more graphical but loads more slowly and some consider it complex The other layout is simpler with less graphics Rather than run two experiments the company would like to run one This choice also has other advantages The questions of interest Does the number of checkout clicks matter Does the layout matter And as a bonus Is there an interaction between the two These can of course be followed up with multiple comparisons Interaction in twoway anova Interaction occurs when certain combinations of factors have more or less effect than what you would expect based on their overall performance Learning about interaction is often the most important benefit of a twoway and fancier analysis of variance Underlying model Assume that the data have the form yijk cell mean eijk ilI j lJ k lnij where we represent the cell mean as cell mean overall mean row effect column effect interaction Again the other assumptions are l observations are independent 2 observations have constant variance oz and 3 observations are normally distributed so that the error terms are eijk N0 oz Estimate error variance Again estimate the error variance using the error sum of squares 2 62 ErrorSS 207W JU 2 Z W Error MS n IJ n IJ n IJ Statistics 102 Twoway Anava Spring 2000 7 HVIP for TwoWay Anova Fit model command Use the more complex fit model command Watch with care the formation of interaction terms Add a factor twice then with both it and the other term highlighted in the JlVlP dialog use the cross button to build the interaction term Profile plots are constructed by the button near the graphic for the interaction term in the output of the fit model command Concatenate function Use the concatenation of labels formula to look at indiVidual cell means The concatenate function is a character formula Example TwoWay Analysis of Variance Web page experiment Customers in a focus group representing the target audience of the web site were randomized into six groups with 10 in each group Each group was shown a page with either high or low graphics layout and with either one two or three clickout options After using the page customers rated the convenience and esthetics of the design on a 0100 scale The design of the experiment thus has a twoway arrangment Oneclick Twoclick Three 10 10 20 Initial conceptual analysis Statistics 102 Twoway Anava Spring 2000 8 Initial graphical analysis no severe outliers similar variances trends in mean values are weak neither oneway analysis is significant 1 u 1 U 100 39 100 I I 8039 i 80 39 l 39 I g 60 eA 39 U 60 A z E v 4039 39 40 20 20 i I 0 I I 0 20 I 20 I 01 C2 03 High Low Clicks Layout Anova table overall differences in means First check for interaction before considering marginal effects of row and column factors In this example we find a lot of interaction F 1123 with pvalue lt0001 lEffect Test Source Nparm DF Sum of Squares F Ratio ProbgtF Clicks 2 2 25821 370 00312 Layout 1 1 6534 187 01768 LayoutCicks 2 2 78373 1123 lt0001 In the presence of so much interaction interpretation of the marginal effects is potentially misleading Although differences in the cell means are present you should not attempt to judge them from the marginal levels of Clicks or Layout Statistics 102 Spring 2000 Plot of cell means is very useful aka a pro le plot Profile Plot Rating LSMeans 12C 100 80 60 40 20 0 2O C1 C2 Clicks Twoway Anava The crossing lines in this figure the points are the cell means describe the interaction indicated in the anova table The layout with low graphics is preferred when featured with oneclick checkout whereas users are ambivalent or prefer high graphics with the more elaborate checkouts Multiple Comparisons which means are significantly different Using JlVlP you can simply reduce the problem to a oneway problem and use those tools Join the labels of the two factors using the concatenation function then do a oneway anova Join the labels of Clicks and Layout as a new column then use this one column of categories as a oneway anova Here s a table of the cell means and SDs These are the mean values plotted in the profile plot shown above Means Level High Low High Low High Low and Std Deviations Std Dev Std Err Mean Mean 422 796 537 534 535 362 1489 1767 2451 1820 1495 2008 471 559 775 575 473 635 Statistics 102 Spring 2000 Shown graphically with comparison circles Twoway Anava 10 The oneclick option with low graphics layout is signi cantly different from the others using TukeyKramer 120 100 80 I I 39 i l 39 i I v m 60 I 39 g 40 39 I 39 20 0 39 20 01 Hi h Ic1 Lowl c2 Hi h I02 Low I03 Hi h I03 Low All Pairs g g g TukeyKramer 005 Cell Here are details for the TukeyKramer figure What other differences are significant Comparisons for all pairs using TukeyKramer HSD q 295448 AbsDifLSD C1 Low C2 High C3 High C2 Low C1 High C3 Low C1 Low 2468 122 142 152 1272 1872 C2 High 122 2468 2448 2438 1318 718 C3 High 142 2448 2468 2458 1338 738 C2 Low 152 2438 2458 2468 1348 748 C1 High 1272 1318 1338 1348 2468 1868 C3 Low 1872 718 738 748 1868 2468 Positive values show pairs of means that are significantly different Conclusions Significant interaction so that marginal effects cannot be interpreted directly TukeyKramer intervals find significant differences among cell means Business implications Statistics 102 Twoway Anava Spring 2000 II 120 120 100 39 I 100 I 8039 80 39 39 I c 60 A 60 I c U A 39 394 E E v Q 40 I E 40 20 20 0 39 0 20 I 20 I Cl 02 03 High Low Clicks Layout lEffect Test Source Nparm DF Sum of Squares F Ratio ProbgtF Clicks 2 2 25821 370 00312 Layout 1 1 6534 187 01768 LayoutCicks 2 2 78373 1123 lt0001 Profile Plot 120 100 U g 80 D E 60 igh lt0 4 40 OW U E 20 V n 0 20 I C1 C2 C3 Clicks Means and Std Deviations Level Number Mean Std Dev Std Err Mean C1 High 10 422 1489 471 C1 Low 10 796 1767 559 C2 High 10 537 2451 775 C2 Low 10 534 1820 575 C3 High 10 535 1495 473 C3 Low 10 362 2008 635 Statistics 102 Twoway Anava Spring 2000 I2 1 L 100 80 I I 39 I v C 60 I E I g 40 I 39 I 39 20 0 2U C1 High Ic1 LowI c2 High I c2 Low I03 High I03 Low All Pairs TukeyKramer Cell 005 dz erence in means i qanumber ofmeans error df PAW per group Tu keyKramer HSD Comparisons for all AbsDifLSD C1 Low C2 High C3 C Low 468 122 q 4 295448 c2 High 122 c3 High 142 C2 Low 152 c1 High 1272 c3 Low 1872 pairs using 2468 2448 2438 1318 718 High C2 Low C1 High C3 Low 142 2448 2468 2458 1338 738 152 2438 2458 2468 1348 748 Positive values show pairs of means that are significantly different 1272 1872 13 18 718 1338 738 1348 748 2468 1868 1868 2468