Introductory Design & Analysis
Introductory Design & Analysis STAT 313
Popular in Course
Popular in Statistics
This 11 page Class Notes was uploaded by Lilly Rutherford on Saturday September 12, 2015. The Class Notes belongs to STAT 313 at West Virginia University taught by Gerald Hobbs in Fall. Since its upload, it has received 21 views. For similar materials see /class/202819/stat-313-west-virginia-university in Statistics at West Virginia University.
Reviews for Introductory Design & Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/12/15
Chapter 6 Supplemental Text Material S6 1 Factor Effect Estimates are Least Squares Estimates We have given heuristic or intuitive explanations of how the estimates of the factor effects are obtained in the textbook Also it has been pointed out that in the regression model representation of the 2k factorial the regression coef cients are exactly onehalf the effect estimates It is straightforward to show that the model coefficients and hence the effect estimates are least squares estimates Consider a 22 factorial The regression model is y1 o 1x11 2x12 12x11x12 81 The data for the 22 experiment is shown in the following table Runz39 Xi XuXiz Response 1 1 1 a 1 b 1 ab The least squares estimates of the model parameters are chosen to minimize the sum of the squares of the model errors 4 2 L y1 o 191 292 129192 1 1 It is straightforward to show that the least squares normal equations are A A 4 A 4 A 4 4 0 12x11 22x12 122x11x12 1abab 11 11 11 A 4 A 4 A 4 A 4 2 2 oZX1 1Zx11 22x11x12 122x11x12 1 61 b ab 11 11 11 11 A 4 A 4 A 4 A 4 2 2 ozxn 12x11x12 22x12 122x11x12 1abab 11 11 11 11 A 4 A 4 A 4 A 4 2 2 2 2 ozxnxn 12x11x12 22x11x12 122x11x12 1a bab 11 11 11 11 4 4 4 4 4 39 2 2 Now s1nce 2x11 2 x12 lelx lelx lelx 0 because the des1gn1s 1 11 11 11 1 l I orthogonal the normal equations reduce to a very simple form 4 o1abab 4 1 1a bab 4 2 1 abab 4 u1 a bab The solution is go 2 labab 1 la4 b ab 2 l a4b ab A l a bab ll 4 These regression model coefficients are exactly onehalf the factor effect estimates Therefore the effect estimates are least squares estimates We will show this in a more general manner in Chapter 10 S6 2 Yates s Method for Calculating Effect Estimates While we typically use a computer program for the statistical analysis of a 2quot design there is a very simple technique devised by Yates 1937 for estimating the effects and determining the sums of squares in a 2k factorial design The procedure is occasionally useful for manual calculations and is best leamed through the study of a numerical example Consider the data for the 23 design in Example 61 These data have been entered in Table 1 below The treatment combinations are always written down in standard order and the column labeled quotResponsequot contains the corresponding observation or total of all observations at that treatment combination The first half of column 1 is obtained by adding the responses in adjacent pairs The second half of column 1 is obtained by changing the sign of the first entry in each of the pairs in the Response column and adding the adjacent pairs For example in column 1 we obtain for the fth entry 5 4 l for the sixth entry 6 l 5 and so on Column 2 is obtained from column 1 just as column 1 is obtained from the Response column Column 3 is obtained from column 2 similarly In general for a 2 design we would construct k columns of this type Column 3 in general column k is the contrast for the effect designated at the beginning of the row To obtain the estimate of the effect we divide the entries in column 3 by r121quot1 in our example 712 39 8 Finally the sums of s uares for the effects are obtained by squaring the entries in column 3 and dividing by n in our example n2k 223 l6 Table 1Yates39s Algorithm for the Data in Example 61 Estimate Sum of Treatment of Effect Squares Combination Responsel 2 3 Effect 3n2k39132n2k391 4 3 1 16 I a 1 4 15 24 A 300 3600 b 1 2 11 18 B 225 2025 ab 5 13 13 6 AB 075 225 c 1 5 7 14 C 175 1225 ac 3 6 11 2 AC 025 025 bc 2 4 1 4 BC 050 100 abc 11 9 5 4 ABC 050 100 The estimates of the effects and sums of squares obtained by Yates39 algorithm for the data in Example 61 are in agreement with the results found there by the usual methods Note that the entry in column 3 in general column k for the row corresponding to 1 is always equal to the grand total of the observations In spite of its apparent simplicity it is notoriously easy to make numerical errors in Yates39s algorithm and we should be extremely careful in executing the procedure As a partial check on the computations we may use the fact that the sum of the squares of the elements in the jth column is 2 times the sum of the squares of the elements in the response column Note however that this check is subject to errors in sign in column j See Davies 1956 Good 1955 1958 Kempthome 1952 and Rayner 1967 for other errorchecking techniques S6 3 A Note on the Variance of a Contrast In analyzing 2k factorial designs we frequently construct a normal probability plot of the factor effect estimates and visually select a tentative model by identifying the effects that appear large These effect estimates are typically relatively far from the straight line passing through the remaining plotted effects This method works nicely when 1 there are not many significant effects and 2 when all effect estimates have the same variance It turns out that all contrasts computed from a 2k design and hence all effect estimates have the same variance even if the individual observations have different variances This statement can be easily demonstrated Suppose that we have conducted a 2quot design and have responses yly2 quotka and let the variance of each observation be of 039 0 respectively Now each effect estimate is a linear combination of the observations say 2k 20 Effect sz where the contrast constants c are all either 71 or 1 Therefore the variance of an effect estimate is l 252 VE ect ZZQZVUJ 11 1 21 2 2 C I 1 2k 3 because C 1 Therefore all contrasts have the same variance If each observation y in the above equations is the total of 71 replicates at each design point the result still holds S6 4 The Variance of the Predicted Response Suppose that we have conducted an experiment using a 2k factorial design We have fit a regression model to the resulting data and are going to use the model to predict the response at locations of interest in side the design space 1 S x S 1 139 l2k What is the variance of the predicted response at the point of interest say x39 x1 x2 xk Problem 632 asks the reader to answer this question and while the answer is given in the Instructors Resource CD we also give the answer here because it is useful information Assume that the design is balanced and every treatment combination is replicated n times Since the design is orthogonal it is easy to find the variance of the predicted response We consider the case where the experimenters have fit a main effects only model say A k A x E J7 o Z 1x 11 0392 0392 n2k N where N is the total number of runs in the design The variance of the predicted response is Now recall that the variance of a model regression coefficient is V 5 mm V90 2m V oZV x MD Z xfw O2 O2 k 2 x N N OZ k l x2 Nl In the above development we have used the fact that the design is orthogonal so there are no nonzero covariance terms when the variance operator is applied The DesignExpert software program plots contours of the standard deviation of the predicted response that is the square root of the above expression If the design has already been conducted and analyzed the program replaces 0392 with the error mean square so that the plotted quantity becomes k 11 JVMX If the design has been constructed but the experiment has not been performed then the software plots on the design evaluation menu the quantity ViiX k 2 0392 N 1 xlj which can be thought of as a standardized standard deviation of prediction To illustrate consider a 22 with n 3 replicates the first example in Section 62 The plot of the standardized standard deviation of the predicted response is shown below Ms N DESIGNEXPERT Plot 3 Std E n of D e n StdErr of Design X A A YBIB 0 Design Points 0 507 m 000 70 507 The contours of constant standardized standard deviation of predicted response should be exactly circular and they should be a maximum within the design region at the point x1 i1 and x2 i1 The maximum value is V x1 i 2 2 T 1211 1 i 12 05 This is also shown on the graph at the corners of the square Plots of the standardized standard deviation of the predicted response can be useful in comparing designs For example suppose the experimenter in the above situation is considering adding a fourth replicate to the design The maximum standardized prediction standard deviation in the region now becomes V x1 i 2 2 02 1211 1 l 16 0433 The plot of the standardized prediction standard deviation is shown below DESIGNEXPERT Plot StdErr of Design X A A Y B B 0 Design Points Notice that adding another replicate has reduced the maximum prediction variance from 052 025 to 04332 01875 Comparing the two plots shown above reveals that the standardized prediction standard deviation is uniformly lower throughout the design region when an additional replicate is run Sometimes we like to compare designs in terms of scaled prediction variance de ned as N VJ7X O2 This allows us to evaluate designs that have different numbers of runs Since adding replicates or runs to a design will generally always make the prediction variance get smaller the scaled prediction variance allows us to examine the prediction variance on a per observation basis Note that for a 2k factorial and the main effects only model we have been considering the scaled prediction variance is NVJ7X k 2 OJ l q 1p2 where p2 is the distance of the design point where prediction is required from the center of the design space x 0 Notice that the 2quot design achieves this scaled prediction variance regardless of the number of replicates The maximum value that the scaled prediction variance can have over the design region is Max NVX 1 k 039 It can be shown that no other design over this region can achieve a smaller maximum scaled prediction variance so the 2 design is in some sense an optimal design We will discuss optimal designs more in Chapter 11 S6 5 Using Residuals to Identify Dispersion Effects We illustrated in Example 64 Section 65 on unreplicated designs that plotting the residuals from the regression model versus each of the design factors was a useful way to check for the possibility of dispersion effects These are factors that in uence the variability of the response but which have little effect on the mean A method for computing a measure of the dispersion effect for each design factor and interaction that can be evaluated on a normal probability plot was also given However we noted that these residual analyses are fairly sensitive to correct specification of the location model That is if we leave important factors out of the regression model that describes the mean response then the residual plots may be unreliable To illustrate reconsider Example 64 and suppose that we leave out one of the important factors C Resin ow If we use this incorrect model then the plots of the residuals versus the design factors look rather different than they did with the original correct model In particular the plot of residuals versus factor D Closing time is shown below BES39GN39EXPERT PM Residuals vs Closing time efects SUD lSEI VlSEI m Studentized Residuals VSEIEI Closing tim e This plot indicates that factorD has a potential dispersion effect The normal probability plot of the dispersion statistic Ff in Figure 628 clearly reveals that factorB is the only factor that has an effect on dispersion Therefore if you are going to use model residuals to search for dispersion effects it is really important to select the right model for the location effects S6 6 Center Points versus Replication of Factorial Points In some design problems an experimenter may have a choice of replicating the comer or cube points in a 2k factorial or placing replicate runs at the design center For example suppose our choice is between a 22 with n 2 replicates at each comer of the square or a single replicate ofthe 22 with MC 4 center points We can compare these designs in terms of prediction variance Suppose that we plan to fit the firstorder or main effects only model A 2 A x E J7 o Z 1x 11 If we use the replicated design the scaled prediction variance is see Section 64 above NV A 2 ny 2 12 0 11 1p2 Now consider the prediction variance when the design with center points is used We have A 2 A VJ7X V o 2599 11 A 2 A V o Ell x 11 A k A V o Z Ell5 11 O2 O2 Z 2 x 8 4 1 I 2 k 0 1 22 x3 8 11 2 039 12 2 8 p Therefore the scaled prediction variance for the design with center points is N VJ7X T 1 21 Clearly replicating the corners in this example outperforms the strategy of replicating center points at least in terms of scaled prediction variance At the comers of the square the scaled prediction variance for the replicated factorial is NVyx 2 1 p Oquot 12 3 while for the factorial design with center points it is WW 2 0 MM 122 5 However prediction variance might not tell the complete story If we only replicate the corners of the square we have no way to judge the lack of fit of the model If the design has center points we can check for the presence of pure quadratic secondorder terms so the design with center points is likely to be preferred if the experimenter is at all uncertain about the order of the model he or she should be using S6 7 Testing for Pure Quadratic Curvature using a t Test In Section 66 of the textbook we discuss the addition of center points to a 2k factorial design This is a very useful idea as it allows an estimate of pure error to be obtained even thought the factorial design points are not replicated and it permits the experimenter to obtain an assessment of model adequacy with respect to certain secondorder terms Speci cally we present an F test for the hypotheses Ho n 22 quot39 kk 0 H1 u 22 quot39 kk 7 0 An equivalent t statistic can also be employed to test these hypotheses Some computer software programs report the t test instead of or in addition to the F test It is not difficult to develop the t test and to show that it is equivalent to the F test Suppose that the appropriate model for the response is a complete quadratic polynomial and that the experimenter has conducted an unreplicated full 2k factorial design with m design points plus nc center points Let it and is represent the averages of the responses at the factorial and center points respectively Also let 6392 be the estimate of the variance obtained using the center points It is easy to show that E6 I FOM m nF n mm u zz quot39 kk and E0 tow o Therefore EWF 7c u 522 kk and so we see that the difference in averages it is is an unbiased estimator of the sum of the pure quadratic model parameters Now the variance of it is is l l VyF yc 02 quotF quotC Consequently a test of the above hypotheses can be conducted using the statistic Err J70 which under the null hypothesis follows a t distribution with no 7 1 degrees of freedom We would reject the null hypothesis that is no pure quadratic curvature if l to lgt t azmca This t test is equivalent to the F test given in the book To see this square the t statistic above 2 yF C2 to 62 quotF quotC nFnCyF c2 quotF no 6quot2 This ratio is computationally identical to the F test presented in the textbook Furthermore we know that the square of a t random variable with say v degrees of freedom is an F random variable with 1 numerator and v denominator degrees of freedom so the ttest for pure quadratic effects is indeed equivalent to the F test Supplemental References Good I J 1955 The Interaction Algorithm and Practical Fourier Analysis Journal ofthe Royal Statistical Society Series B Vol 20 pp 361372 Good I J 1958 Addendum to The Interaction Algorithm and Practical Fourier Analysis Journal ofthe Royal Statistical Society Series B Vol 22 pp 372375 Rayner A A 1967 The Square Summing Check on the Main Effects and Interactions in a 2quot Experiment as Calculated by Yates Algorithm Biometrics Vol 23 pp 571573