Introductory Design & Analysis
Introductory Design & Analysis STAT 313
Popular in Course
Popular in Statistics
This 44 page Class Notes was uploaded by Lilly Rutherford on Saturday September 12, 2015. The Class Notes belongs to STAT 313 at West Virginia University taught by Gerald Hobbs in Fall. Since its upload, it has received 18 views. For similar materials see /class/202819/stat-313-west-virginia-university in Statistics at West Virginia University.
Reviews for Introductory Design & Analysis
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/12/15
Chapter 3 Supplemental Text Material S3 1 The De nition of Factor Effects As noted in Sections 32 and 33 there are two ways to write the model for a single factor experiment the means model and the effects model We will generally use the effects model 139 12 a r g y y 1 J3912n where for simplicity we are working with the balanced case all factor levels or treatments are replicated the same number of times Recall that in writing this model the 1th factor level mean 1 is broken up into two components that is 1 u r where 1 l I and r is the 1th treatment effect and u is an overall mean We usually define u this implies that Z r 0 1 This is actually an arbitrary definition and there are other ways to define the overall mean For example we could define yZwHuI where Zwl1 11 11 This would result in the treatment effect defined such that Z w r 0 11 Here the overall mean is a weighted average of the individual treatment means When there are an unequal number of observations in each treatment the weights w could be taken as the fractions of the treatment sample sizes niN S3 2 Expected Mean Squares In Section 331 we derived the expected value of the mean square for error in the single factor analysis of variance We gave the result for the expected value of the mean square for treatments but the derivation was omitted The derivation is straightforward Consider EMSmmmzm a 1 Now for a balanced design 1 a 1 SS 2 2 Trzatments n E M an y and the model is il2a yljyrlgy J amquot In addition we will nd the following useful E U E81E8 0E8 O39ZE812 nO39ZE82 and2 l a l ESSTrzatm2nts EZy12 E y2 n H an Consider the first term on the right hand side of the above expression 1 a l E ny ZEn1znr1 81 2 7 11 7 11 Squaring the expression in parentheses and taking expectation results in l a l E Zylz anu2 7122 139 anal 7 11 7 11 an 2 112 139 6102 11 because the three crossproduct terms are all zero Now consider the second term on the right hand side of E SSmmm l l a E y2 Eanunz 139 8 2 an an H iEcmu 8 2 an since 2 r 0 Upon squaring the term in parentheses and taking expectation we obtain 11 E iyzj icmu2 an0392 an an an 2 0392 since the expected value of the crossproduct is zero Therefore 1 a l ESSTrzatm2nts EZ y12 E y2 n H an a an 2 712 139 6102 any2 0392 11 02a lnZ 139 11 SSTrzatmzntS EMSTr2atm2nts E a l 02a lnZ 13912 11 Consequently the expected value of the mean square for treatments is 61 1 n2 139 11 02 a l This is the result given in the textbook S3 3 Con dence Interval for 62 In developing the analysis of variance ANOVA procedure we have observed that the error variance 0392 is estimated by the error mean square that is SS E 2 5 N a We now give a confidence interval for 0392 Since we have assumed that the observations are normally distributed the distribution of SS E 2 Oquot is 13H Therefore SS PlfraZNra S of S Ila2Nra1 0 where 124127 N70 and IlaUH are the lower and upper 0c2 percentage points of the x2 distribution with N a degrees of freedom respectively Now if we rearrange the expression inside the probability statement we obtain SS E J l a 02 SS P E 2 IliaLNra Z laZJra Therefore a 10010c percent confidence interval on the error variance 62 is ZSSE s o 2 s ZSSE Imamm IliaLNra This confidence interval expression is also given in Chapter 12 on experiments with random effects Sometimes an experimenter is interested in an upper bound on the error variance that is how large could 62 reasonably be This can be useful when there is information about 6 from a prior experiment and the experimenter is performing calculations to determine sample sizes for a new experiment An upper 10010c percent confidence limit on 62 is given by 2 2 S SSE QM If a 10010c percent confidence interval on the standard deviation 6 is desired instead then as SSE 2 IliaLNra S3 4 Simultaneous Con dence Intervals on Treatment Means In section 333 we discuss finding confidence intervals on a treatment mean and on differences between a pair of means We also show how to find simultaneous confidence intervals on a set of treatment means or a set of differences between pairs of means using the Bonferroni approach Essentially if there are a set of r confidence statements to be constructed the Bonferroni method simply replaces 0c2 by xZr this produces a set of r confidence intervals for which the overall confidence level is at least 10010c percent To see why this works consider the case where r 2 that is we have two 10010c percent confidence intervals Let E1 denote the event that the first confidence interval is not correct it does not cover the true mean and E2 denote the even that the second confidence interval is incorrect Now PE1 PE2 0 The probability that either or both intervals is incorrect is PE1 UEZ PE1PE2PE1 Ez From the probability of complimentary events we can find the probability that both intervals are correct as PltEni1 PEuEZ l PE1 PEZPE1 0E2 Now we know that PE1 n E2 2 0 so from the last equation above we obtain the Bonferroni inequality HE NE 2 1PE1PE2 In the context of our example the lefthand side of this inequality is the probability that both of the two confidence interval statements is correct and P E PE2 a so PEni21 a a 21 20 Therefore if we want the probability that both of the con dence intervals are correct to be at least lx we can assure this by constructing 10010c 2 percent individual con dence interval If there are r con dence intervals of interest we can use mathematical induction to show that PltEnEnnE21 ZPE 11 Zl ra As noted in the text the Bonferroni method works reasonably well when the number of simultaneous con dence intervals that you desire to construct r is not too large As r becomes larger the lengths of the individual con dence intervals increase The lengths of the individual con dence intervals can become so large that the intervals are not very informative Also it is not necessary that all individual con dence statements have the same level of con dence One might select 98 percent for one statement and 92 percent for the other resulting in two con dence intervals for which the simultaneous con dence level is at least 90 percent S3 5 Regression Models for a Quantitative Factor Regression models are discussed in detail in Chapter 10 but they appear relatively often throughout the book because it is convenient to express the relationship between the response and quantitative design variables in terms of an equation When there is only a singe quantitative design factor a linear regression model relating the response to the factor is y o 1x 8 where x represents the values of the design factor In a singlefactor experiment there are N observations and each observation can be expressed in terms ofthis model as follows y 0 19q glz39 12N The method of least squares is used to estimate the unknown parameters the s in this model This involves estimating the parameters so that the sum of the squares of the errors is minimized The least squares function is 1 ng Zy1 0 139xx2 To find the least squares estimators we take the partial derivatives of L with respect to the s and equate to zero 6L N 6 0 2 y 0 1x o 6L N 6 12y1 0 1x1x10 After simpli cation we obtain the least squares normal equations A A N N N o 1zx1 2 11 A N A N 1 ozx1 121x12 2x1y1 11 11 11 where go and 1 are the least squares estimators of the model parameters So to t this particular model to the experimental data by least squares all we have to do is solve the normal equations Since there are only two equations in two unknowns this is fairly easy In the textbook we t two regression models for the response variable etch rate 1 as a function of the RF power x the linear regression model shown above and a quadratic model y o 1x 2x28 The least squares normal equations for the quadratic model are A A N A N N N o IZgt9 ZZx 2 11 11 11 A N A N A N N Z 3 02961 1299 2ch 299 11 11 11 11 A N A N A N N Z 3 4 Z 0299 1299 2ch 299 y 11 11 11 11 Obviously as the order of the model increases and there are more unknown parameters to estimate the normal equations become more complicated In Chapter 10 we use matrix methods to develop the general solution Most statistics software packages have very good regression model tting capability S3 6 More About Estimable Functions In Section 391 we use the least squares approach to estimating the parameters in the singlefactor model Assuming a balanced experimental design we ne the least squares normal equations as Equation 348 repeated below a n an nil ni392ni39a ZZyU 11 1 n Wm 2y 1 n n m 2 Zyzj 1 n my m 2y 1 where an N is the total number of observations As noted in the textbook if we add the last a of these normal equations we obtain the first one That is the normal equations are not linearly independent and so they do not have a unique solution We say that the effects model is an overparameterized model One way to resolve this is to add another linearly independent equation to the normal equations The most common way to do this is to use the equation 2 iquot 0 This is 11 consistent with de ning the factor effects as deviations from the overall mean u If we impose this constraint the solution to the normal equations is it y i J7 i 12a That is the overall mean is estimated by the average of all an sample observation while each individual factor effect is estimated by the difference between the sample average for that factor level and the average of all observations Another possible choice of constraint is to set the overall mean equal to a constant say it 0 This results in the solution 0 J7i12a m h Still athird choice is in 0 This is the approach used in the SAS software for example This choice of constraint produces the solution itZ i EZil2a 1 20 There are an infinite number of possible constraints that could be used to solve the normal equations Fortunately as observed in the book it really doesn t matter For each of the three solutions above indeed for any solution to the normal equations we have 2 42 il2a That is the least squares estimator of the mean of the 1th factor level will always be the sample average of the observations at that factor level So even if we cannot obtain unique estimates for the parameters in the effects model we can obtain unique estimators of a function of these parameters that we are interested in This is the idea of estimable functions Any function of the model parameters that can be uniquely estimated regardless of the constraint selected to solve the normal equations is an estimable function What functions are estimable It can be shown that the expected value of any observation is estimable Now Eyjr so as shown above the mean of the ith treatment is estimable Any function that is a linear combination of the lefthand side of the normal equations is also estimable For example subtract the third normal equation from the second yielding 1392 1391 Consequently the difference in any two treatment effect is estimable In general any contrast in the treatment effects 2 q r where Z q 0 is estimable Notice that the 1 11 individual model parameters u 1391 Ta are not estimable as there is no linear combination of the normal equations that will produce these parameters separately However this is generally not a problem for as observed previously the estimable functions correspond to functions of the model parameters that are of interest to experimenters For an excellent and very readable discussion of estimable functions see Myers R H and Milton J S 1991 A First Course in the Theory ofthe LinearModel PWSKent Boston MA S3 7 The Relationship Between Regression and ANOVA Section 39 explored some of the connections between analysis of variance ANOVA models and regression models We showed how least squares methods could be used to estimate the model parameters and how the ANOVA can be developed by a regression based procedure called the general regression significance test can be used to develop the ANOVA test statistic Every ANOVA model can be written explicitly as an equivalent linear regression model We now show how this is done for the singlefactor experiment with a 3 treatments The singlefactor balanced ANOVA model is 139 123 139 8 y y 1 J3912n The equivalent regression model is 139 123 yy o 1x1 2x2 ng 12n where the variables x1 j and xzj are de ned as follows 1 if observation j is from treatment 1 1 0 otherwise 1 if observation j is from treatment 2 2 0 otherwise The relationships between the parameters in the regression model and the parameters in the ANOVA model are easily determined For example if the observations come from treatment 1 then xlj l and xzj 0 and the regression model is y o 11 20 81 o l 81 Since in the ANOVA model these observations are de ned by y1 j u 1391 81 this implies that o 1 1 71 Similarly if the observations are from treatment 2 then yzj 0 10 zl 82 o 2 and the relationship between the parameters is 0 2 2 rz Finally consider observations from treatment 3 for which the regression model is ya o 510 z 0 83 o 3 and we have u 3 13 Thus in the regression model formulation of the oneway ANOVA model the regression coefficients describe comparisons of the rst two treatment means with the third treatment mean that is ozlus 1ILl1 IL 3 z 2I 3 In general if there are a treatments the regression model will have a 7 l regressor variables say 139 12 a yy o 1x1 2x239quot a71xa71 8Uj12n where I if observation j is from treatment 139 U 0 otherwise Since these regressor variables only take on the values 0 and 1 they are often called indicator variables The relationship between the parameters in the ANOVA model and the regression model is oat zz3912a 1 Therefore the intercept is always the mean of the ath treatment and the regression coefficient 5 estimates the difference between the mean of the 1th treatment and the ath treatment Now consider testing hypotheses Suppose that we want to test that all treatment means are equal the usual null hypothesis If this null hypothesis is true then the parameters in the regression model become o a l0il2a 1 Using the general regression significance test procedure we could develop a test for this hypothesis It would be identical to the F statistic test in the oneway ANOVA Most regression software packages automatically test the hypothesis that all model regression coefficients except the intercept are zero We will illustrate this using Minitab and the data from the plasma etching experiment in Example 3l Recall in this example that the engineer is interested in determining the effect of RF power on etch rate and he has run a completely randomized experiment with four levels of RF power and five replicates For convenience we repeat the data from Table 31 here RF Power Observed etch rate w l 2 3 4 160 575 542 530 539 570 180 565 593 590 579 610 200 600 651 610 637 629 220 725 700 715 685 710 The data was converted into the xy 01 indicator variables as described above Since there are 4 treatments there are only 3 of the x s The coded data that is used as input to Minitab is shown below c N b N L Etch rate 575 542 530 539 570 565 593 590 579 610 600 651 610 637 629 725 700 715 685 C C C C hi hi hi h be C C C C C C C C C C C C C C C C C C C C C C C C be hi hi hi hi x C C C C C C C C C bi hi hi hi hi C C C C C The Regression Module in Minitab was run using the above spreadsheet where X1 through X3 were used as the predictors and the variable Etch rate was the response The output is shown below Regression Analysis Etch rate versus x1 x2 x3 The regression equation is Etch rate 707 7 156 x1 7 120 x2 7 816 x3 Predictor Coef SE Coef Constant 707000 8169 8654 0000 x1 715580 1155 71349 0000 x2 711960 1155 71035 0000 x3 78160 1155 7706 0000 s 182675 R7Sq 926 R7Sqadj 912 Analysis of Variance Source DE Ms E P Regression 3 66871 22290 6680 0000 Residual Error 16 5339 334 Notice that the ANOVA table in this regression output is identical apart from rounding to the ANOVA display in Table 34 Therefore testing the hypothesis that the regression coefficients l z A 4 0 in this regression model is equivalent to testing the null hypothesis of equal treatment means in the original ANOVA model formulation Also note that the estimate of the intercept or the constan term in the above table is the mean of the 4111 treatment Furthermore each regression coefficient is just the difference between one of the treatment means and the 43911 treatment mean Chapter 12 Supplemental Text Material S12 1 The Taguchi Approach to Robust Parameter Design Throughout this book we have emphasized the importance of using designed experiments for product and process improvement Today many engineers and scientists are exposed to the principles of statistically designed experiments as part of their formal technical education However during the 19601980 time period the principles of experimental design and statistical methods in general were not as widely used as they are today In the early 1980s Genichi Taguchi a Japanese engineer introduced his approach to using experimental design for 1 Designing products or processes so that they are robust to environmental conditions 2 Designingdeveloping products so that they are robust to component variation 3 Minimizing variation around a target value Note that these are essentially the same objectives we discussed in Section 1171 Taguchi has certainly de ned meaningful engineering problems and the philosophy that recommends is sound However as noted in the textbook he advocated some novel methods of statistical data analysis and some approaches to the design of experiments that the process of peer review revealed were unnecessarily complicated inef cient and sometimes ineffective In this section we will brie y overview Taguchi39s philosophy regarding quality engineering and experimental design We will present some examples of his approach to parameter design and we will use these examples to highlight the problems with his technical methods As we saw in Chapter 12 of the textbook it is possible to combine his sound engineering concepts with more ef cient and effective experimental design and analysis based on response surface methods Taguchi advocates a philosophy of quality engineering that is broadly applicable He considers three stages in product or process development system design parameter design and tolerance design In system design the engineer uses scienti c and engineering principles to determine the basic system con guration For example if we wish to measure an unknown resistance we may use our knowledge of electrical circuits to determine that the basic system should be con gured as a Wheatstone bridge If we are designing a process to assemble printed circuit boards we will determine the need for speci c types of axial insertion machines surfacemount placement machines ow solder machines and so forth In the parameter design stage the speci c values for the system parameters are determined This would involve choosing the nominal resistor and power supply values for the Wheatstone bridge the number and type of component placement machines for the printed circuit board assembly process and so forth Usually the objective is to specify these nominal parameter values such that the variability transmitted from uncontrollable or noise variables is minimized Tolerance design is used to determine the best tolerances for the parameters For example in the Wheatstone bridge tolerance design methods would reveal which components in the design were most sensitive and where the tolerances should be set If a component does not have much effect on the performance of the circuit it can be specified with a wide tolerance Taguchi recommends that statistical experimental design methods be employed to assist in this process particularly during parameter design and tolerance design We will focus on parameter design Experimental design methods can be used to find a best product or process design where by quotbestquot we mean a product or process that is robust or insensitive to uncontrollable factors that will in uence the product or process once it is in routine operation The notion of robust design is not new Engineers have always tried to design products so that they will work well under uncontrollable conditions For example commercial transport aircraft y about as well in a thunderstorm as they do in clear air Taguchi deserves recognition for realizing that experimental design can be used as a formal part of the engineering design process to help accomplish this objective A key component of Taguchi39s philosophy is the reduction of variability Generally each product or process performance characteristic will have a target or nominal value The objective is to reduce the variability around this target value Taguchi models the departures that may occur from this target value with a loss function The loss refers to the cost that is incurred by society when the consumer uses a product whose quality characteristics differ from the nominal The concept of societal loss is a departure from traditional thinking Taguchi imposes a quadratic loss function of the form Lo ko 732 shown in Figure 1 below Clearly this type of function will penalize even small departures of y from the target T Again this is a departure from traditional thinking which usually attaches penalties only to cases where y is outside of the upper and lower specifications say y gt USL or y lt LSL in Figure 1 However the Taguchi philosophy regarding reduction of variability and the emphasis on minimizing costs is entirely consistent with the continuous improvement philosophy of Deming and Juran In summary Taguchi s philosophy involves three central ideas 1 Products and processes should be designed so that they are robust to external sources of variability 2 Experimental design methods are an engineering tool to help accomplish this objective 3 Operation ontarget is more important than conformance to specifications new Tlehmsl ls Figum 1 Taguehl39s QuaalaueLessFuheTseh These are suund eeheest and Thelmlue shuuld he madlly apparm39 Funhermure as we have seehlh The Teslheek expmmemal desyl meTheds eah play amaT Dr rule m Tsahslauhg These ldas lhTe plamsee fur applylhg hs eeheest m pm lce As we wlll see hs appmaeh Te expmmemal desyl and data analysls eah belmpmved Tag this Technical Melhmls An Exg h We wlll use The cunneclurpullru fume example desmbed m The TexTheelsTe lllusTsaTe Taguehl39sTeehmeal meTheds Furmuremfurmauunabuulthe rublem here Te The TexT arms The e ame em Qualrgr Progress lhDeeemhes 1987 see The Ta pp T 31 yD M EymeandS ague Qualt Progress Dec nber1987pp lgeza Remll ThaT The expehmmTlhyelyes and eThed Te h asTe e rm ahylehTuhe T uld dellver The require pullr uffpel39furmance Te he sulTahle furuse m an auTemelee ehgheapphmueh The wecl uncuntmllahle helse mm W eldmu ed These mars are de ned m The TexTheels andrepealed fur eehyemeheelh Table l helew We Wanna nd The levels erThe mammum pullrufffurce Neuee ThaTalTheugh The helse T39aeTers are heT eehTmllahle duhhg muuhe upeauehs They En he cuntmlled fur The purpuses era TesT Eaeh Remll rum The dlSEuSSEIn m The TexTh uuk ThaT m The Taguehl palameTes desl W Th Tl l Taguehl refers Te These desyls as nrlhngnnal arrays and represmls The merleyels wnh integers l 2 and 3 lhThls easeThe desyls seleeTed areTusTa s39andard 23 ahda 3 2 haeTlenal cmnal Taguehl mils These The L3 and Ly unhugunal assays respemvely Table 1 Factors and Levels for the Taguchi Parameter Design Example Controllable Factors Levels Interference Low Medium Connector wall thickness Thin Medium Insertiondepth Shallow Medium Percent adhesive in Low Medium connector predip Uncontrollable Factors Levels E Conditioning time 24 h 120 h F Conditioning temperature 72 F 150 F G Conditioning relative humidity 25 75 Table 2 Designs for the Controllable and Uncontrollable Factors a L9 Orthogonal Array b Lg Orthogonal Array for the Controllable for the Uncontrollable Factors Factors Variable Variable Run B C D Run E F E X F G Ex G Fx G e 11 1 1 1 1 1 1 1 1 1 1 21 2 2 2 2 1 1 1 2 2 2 31 3 3 3 3 1 2 2 1 1 2 42 1 2 3 4 1 2 2 2 2 1 52 2 3 1 5 2 1 2 1 2 1 62 3 1 2 6 2 1 2 2 1 2 73 1 3 2 7 2 2 1 1 2 2 83 2 1 3 8 2 2 1 2 1 1 93 3 2 1 Nt t Nt NNH High Thick Deep High The two designs are combined as shown in Table 1122 in the textbook repeated for convenience as Table 3 below Recall that this is called a crossed or product array design composed of the inner array containing the controllable factors and the outer array containing the noise factors Literally each of the 9 runs from the inner array is tested across the 8 runs from the outer array for a total sample size of 72 runs The observed pulloff force is reported in Table 3 Data Analysis and Conclusions The data from this experiment may now be analyzed Recall from the discussion in Chapter 11 that Taguchi recommends analyzing the mean response for each run in the inner array see Table 3 and he also suggests analyzing variation using an appropriately chosen signal to noise ratio SN These signaltonoise ratios are derived from the quadratic loss function and three of them are considered to be quotstandardquot and widely applicable They are de ned as follows 1 Nominal the best SN 2 Larger the better n SN lOlog 4 7 11 y Table 3 Parameter Design with Both Inner and Outer Arrays Outer Array L3 1 1 1 2 2 2 2 F 1 2 2 1 1 2 2 2 1 2 1 2 1 2 Inner Array Lg Responses Run A B c D y SNL 1 1 1 1 1 156 95 169 199 196 196 200 191 17525 24025 2 1 2 2 2 150 162 194 192 197 198 242 219 19475 25522 3 1 3 3 3 163 167 191 156 226 182 233 204 19025 25335 4 2 1 2 3 183 174 189 186 210 189 232 247 20125 25904 5 2 2 3 1 197 186 194 251 256 214 275 253 22825 26908 6 2 3 1 2 162 163 200 198 147 196 225 247 19225 25326 7 3 1 3 2 164 191 184 236 168 186 243 216 198 25711 8 3 2 t 3 142 156 151 168 178 196 232 242 18338 24852 9 3 3 2 1 161 199 193 173 231 227 226 286 21200 26152 3 Smaller the better 1 n SNL lOlog Zy12 quot11 Notice that these SN ratios are expressed on a decibel scale We would use SNT ifthe objective is to reduce variability around a speci c target SN L if the system is optimized when the response is as large as possible and SNS if the system is optimized when the response is as small as possible Factor levels that maximize the appropriate SN ratio are optimal 1n thrs problern we would use SM because the objectlve ls to maxlmlze the pullroff rce The lasttwa colurnns of Table 3 contatn y and SM values for each ofthe mne mnerrarray runs Taguchronenteol practrtroners olten use the analysls of vanance to y ratro They also employ graphs othe rnargrnal rneans of each factor such as the ones s own tn Flgures 2 and 3 The usual approach ls to examlne the graphs and plck the Wmner 1n thrs case factors A and c have larger effects than do B andD 1n terrns of maxlmlzmg SNL we would select Annm Cuup Brhnnm and DW In terrns of maxlmlzmg the average pullroff force 7 we would choose Alarm chum anmanol DW Notace that there ls alrnostno dlfference between crhunrn and cheap The vanabrlrty tn the pull roff force J izt it g L o t Te rt Nu Lt tr1vm root Figurez The Effects of Controllable Factors on Each Response gum v ttcntwn ontoltatrtottt Figure 3 The Effects of Controllable Factors on the slgnal to Nolse Rauo Ml u h sornetarnes looklng at these rnteractrons lmproves process understandlng The authors of s study found that th theAG and DErnteraeuons were large Analysls ofthese rnteraeuons shown m Flgure 4 suggests that Anamrs best It gwes the hrghest pullroff or force regardless ofthe eondltronrng tune nally deeroleolto use Ammm Bum cnrunnn and DW Bnmwas rnueh less expenslve than B unrn and cnunmwas feltto gwe shghtly less vanabrhty than cDeey slnee thr on then se vanables were ELM Flew and GW The enpenrnent the levels used auon test ol authors report thatgooolresults were obtarneol from the con rm lt4 nutn new tu no w M out m Figure 4 The AG and DE Interaeuons Crmgue nr Tag 11 s Egmmemal Staten and Deng arra desrgns two ofwhlch the Ls and the L9 were presentedrn the foregolng example yThere L L arrays th T T wdT Apu th t Frttnlrtulth 2 fraeuonal faetorral the Ll ls a HackettrBurman desrgn the L15 ls a 2 fraetr Box Brsgaarol and Fung 1988 traee the ongln ofthese deslgns onal faetonal and so on Twn l ul r h Kany does not get the eorreet answer Taguchi argues that we do not need to consider twofactor interactions explicitly He claims that it is possible to eliminate these interactions either by correctly specifying the response and design factors or by using a sliding setting approach to choose factor levels As an example of the latter approach consider the two factors pressure and temperature Varying these factors independently will probably produce an interaction However if temperature levels are chosen contingent on the pressure levels then the interaction effect can be minimized In practice these two approaches are usually difficult to implement unless we have an unusually high level of process knowledge The lack of provision for adequately dealing with potential interactions between the controllable process factors is a major weakness of the Taguchi approach to parameter design Instead of designing the experiment to investigate potential interactions Taguchi prefers to use threelevel factors to estimate curvature For example in the inner and outer array design used by Byrne and Taguchi all four controllable factors were run at three levels Let x1 xz X3 and X4 represent the controllable factors and let 21 22 and 23 represent the three noise factors Recall that the noise factors were run at two levels in a complete factorial design The design they used allows us to fit the following model 4 4 3 3 3 4 y 0 Z 1x Z x2 2712 2 271212 ZZEUZIx 8 1 11 11 llt 12 11 1 Notice that we can fit the linear and quadratic effects of the controllable factors but not their twofactor interactions which are aliased with the main effects We can also fit the linear effects of the noise factors and all the twofactor interactions involving the noise factors Finally we can fit the twofactor interactions involving the controllable factors and the noise factors It may be unwise to ignore potential interactions in the controllable factors This is a rather odd strategy since interaction is a form of curvature A much safer strategy is to identify potential effects and interactions that may be important and then consider curvature only in the important variables if there is evidence that the curvature is important This will usually lead to fewer experiments simpler interpretation of the data and better overall process understanding Another criticism of the Taguchi approach to parameter design is that the crossed array structure usually leads to a very large experiment For example in the foregoing application the authors used 72 tests to investigate only seven factors and they still could not estimate any of the twofactor interactions among the four controllable factors There are several alternative experimental designs that would be superior to the inner and outer method used in this example Suppose that we run all seven factors at two levels in the combined array design approach discussed on the textbook Consider the 22 fractional factorial design The alias relationships for this design are shown in the top half of Table 4 Notice that this design requires only 32 runs as compared to 72 In the bottom half of Table 4 two different possible schemes for assigning process controllable variables and noise variables to the letters A through G are given The first assignment scheme allows all the interactions between controllable factors and noise factors to be estimated and it allows main effect estimates to be made that are clear of twofactor interactions The second assignment scheme allows all the controllable factor main effects and their twofactor interactions to be estimated it allows all noise factor main effects to be estimated clear of twofactor interactions and it aliases only three interactions between controllable factors and noise factors with a twofactor interaction between two noise factors Both of these arrangements present much cleaner alias relationships than are obtained from the inner and outer array parameter design which also required over twice as many runs In general the crossed array approach is often unnecessary A better strategy is to use the combined array design discussed in the textbook This approach will almost always lead to a dramatic reduction in the size of the experiment and at the same time it will produce information that is more likely to improve process understanding For more discussion of this approach see Myers and Montgomery 1995 and Example 116 in the textbook We can also use a combined array design that allows the experimenter to directly model the noise factors as a complete quadratic and to fit all interactions between the controllable factors and the noise factors as demonstrated in the textbook in Example 1 17 Table 4 An Alternative Parameter Design A onequarter fraction of 7 factors in 32 runs Resolution IV IABCDF ABDEG CEFG Aliases AF BCD CG EF B AG BDE DE ABG C EFG BCADF DFABC D BD ACF AEG DG ABE E CFG BE ADG ACEAFG F CEG BFACD ACG AEF BGADE BCEBFG AB CDF DEG CD ABF BCG BEF AC CEFG CDEDFG ADBCFBEG CFABD EG CDGDEF AF BDG Factor Assignment Schemes 1 Controllable factors are assigned to the letters C E F and G Noise factors are assigned to the letters A B and D All interactions between controllable factors and noise facto s can be estimated and all controllable factor main effects can be estimated clear of twofactor interactions 2 Controllable factors are assigned to the letters A B C and D Noise factors are assigned to the letters E F and G All controllable factor main effects and twofactor interactions can be estimated only the CE CF and CG interactions are aliased with interactions of the noise factors Another possible issue with the Taguchi inner and outer array design relates to the order in which the runs are performed Now we know that for experimental validity the runs in a designed experiment should be conducted in random order However in many crossed array experiments it is possible that the run order wasn t randomized In some cases it would be more convenient to fix each row in the inner array that is set the levels of the controllable factors and run all outerarray trials In other cases it might be more convenient to fix the each column in the outer array and the run each on the inner array trials at that combination of noise factors Exactly which strategy is pursued probably depends on which group of factors is easiest to change the controllable factors or the noise factors If the tests are run in either manner described above then a split plot structure has been introduced into the experiment If this is not accounted for in the analysis then the results and conclusions can be misleading There is no evidence that Taguchi advocates used splitplot analysis methods Furthermore since Taguchi frequently downplayed the importance of randomization it is highly likely that many actual inner and outer array experiments were inadvertently conducted as splitplots and perhaps incorrectly analyzed We introduce the splitplot design in Chapter in Chapter 13 A good reference on splitplots in robust design problems is Box and Jones 1992 A final aspect of Taguchi s parameter design is the use of linear graphs to assign factors to the columns of the orthogonal array A set of linear graphs for the L8 design is shown in Figure 5 In these graphs each number represents a column in the design A line segment on the graph corresponds to an interaction between the nodes it connects To assign variables to columns in an orthogonal array assign the variables to nodes rst then when the nodes are used up assign the variables to the line segments When you assign variables to the nodes strike out any line segments that correspond to interactions that might be important The linear graphs in Figure 5 imply that column 3 in the L8 design contains the interaction between columns 1 and 2 column 5 contains the interaction between columns 1 and 4 and so forth If we had four factors we would assign them to columns 1 2 4 and 7 This would ensure that each main effect is clear of twofactor interactions What is not clear is the twofactor interaction aliasing If the main effects are in columns 1 2 4 and 7 then column 3 contains the 12 and the 47 interaction column 5 contains the 14 and the 27 interaction and column 6 contains the 17 and the 24 interaction This is clearly the case because four variables in eight runs is a resolution IV plan with all pairs of twofactor interactions aliased In order to understand fully the twofactor interaction aliasing Taguchi would refer the experiment designer to a supplementary interaction table Taguchi 1986 gives a collection of linear graphs for each of his recommended orthogonal array designs These linear graphs seem to have been developed heuristically Unfortunately their use can lead to inefficient designs For examples see his car engine experiment Taguchi and Wu 1980 and his cutting tool experiment Taguchi 1986 Both of these are 16run designs that he sets up as resolution 111 designs in which main effects are aliased with twofactor interactions Conventional methods for constructing these designs would have resulted in resolution IV plans in which the main effects are clear of the twofactor interactions For the experimenter who simply wants to generate a good design the linear graph approach may not produce the best result A better approach is to use a simple table that presents the design and its full alias structure such as in Appendix Table XII These tables are easy to construct and are routinely displayed by several widely available and inexpensive computer programs rum 2 tmtmrttaut rt t um Figure 5 Lrhear Graphs fur the La Desrph Cn gne of Tam Ehl 5 Data M21295 Methm ls Several ufTaguchl s data analysls methplls are questwhahle Fm example he mm H r H and llfetestlng data Fm a mscussllm and muque pfthesemethplls refertp Bax Rl amd andFlm thererh lh thrs seeuph we fucus m three aspects ufhls Iecnmmendahuns spheemrhp aahalysrs theuse Df marglnal means plpts tn ppumrze factm settmps the use at srphaltphprse raups and same nfhls uses pfthe analysls ufvanance S Clmsldel the use uf margmal means plpts andthe assmatell plck the wrhherl pptrmrzauphthat was dmunshatedplevmusly rh thepullpfffpreeprphleh Tu keep the lmahn lmnl A A m haHhr l l a hum rh Tables The maxglnalmeans plutsmeshuwnlnFlgme melwhngatthese graphs we wuuld select A and Br as the ppumum upmhrhauph assumrhp that we wlsh mezlml v mahmum value ufy lh general playrhp plnk the wrhhel wrth margmal averages can The Tamll Elm l39matllm expenmehthe run althqu thrs nffasm guarantees erther Wemrgrthe set pfppumum candltmnslswlth the use uflesplmse surface methplls as dlscussed and rllustratellrh chapte ll pfthe texthppk p t r t vanety pfsrtuahphs By mammrzrhp the apprppnatemrahp he clalms that vanablllty ls mlmmlzed Table 5 Data forthe Mar mal Means P1ots m F1 ure 6 FaetorA 1 2 3 B Averages 1 10 10 13 11 00 FaetorB 2 8 10 14 9 67 3 6 9 10 8 33 A Averages 8 00 9 67 1167 pm144x Wt rm or m m m t rm Figure 6 Margmal Means Plots for the Datam Table 5 hp 144 11 W Figure 7 The A8 Interaeuoh Plot for the Data m Table 5 Consrder rst the srgha1to noxse rauo for the target 15 best ease 2 y SN 1010gS 2 It has been suggested by Taguehr that rt 15 preferable to work wrth SN rhstead othe re1ated As hgets1argercgets1arger or example 1r sueh eases he argues thatwe Taguchi claims he found empirically that the use of the SNT ratio coupled with a two stage optimization procedure would lead to a combination of factor levels where the standard deviation is minimized and the mean is on target The optimization procedure consists of l nding the set of controllable factors that affect SNT called the control factors and setting them to levels that maximize SNT and then 2 nding the set of factors that have significant effects on the mean but do not in uence the SNT ratio called the signal factors and using these factors to bring the mean on target Given that this partitioning of factors is possible SNT is an example of a performance measure independent of adjustment PERMIA see Leon et al 1987 The signal factors would be the adjustment factors The motivation behind the signaltonoise ratio is to uncouple location and dispersion effects It can be shown that the use of SNT is equivalent to an analysis of the standard deviation of the logarithm of the original data Thus using SNT implies that a log transformation will always uncouple location and dispersion effects There is no assurance that this will happen A much safer approach is to investigate what type of transformation is appropriate Note that we can write the SNT ratio as 2 y SNT lOlog2 1010gS2 If the mean is fixed at a target value estimated by 5 then maximizing the SNT ratio is equivalent to minimizing log S2 Using log S2 would require fewer calculations is more intuitively appealing and would provide a clearer understanding of the factor relationships that in uence process variability in other words it would provide better process understanding Furthermore if we minimize log 82 directly we eliminate the risk of obtaining wrong answers from the maximization of SNT if some of the manipulated factors drive the mean y upward instead of driving S2 downward In general if the response variable can be expressed in terms of the model y xdaxa8xd where xd is the subset of factors that drive the dispersion effects and xa is the subset of adjustment factors that do not affect variability then maximizing SNT will be equivalent to minimizing the standard deviation Considering the other potential problems surrounding SNT it is likely to be safer to work directly with the standard deviation or its logarithm as a response variable as suggested in the textbook For more discussion refer to Myers and Montgomery 1995 The ratios SN L and SNS are even more troublesome These quantities may be completely ineffective in identifying dispersion effects although they may serve to identify location effects that is factors that drive the mean The reason for this is relatively easy to see Consider the SNS smallerthebetter ratio 1quot 2 SNS 1010 gtnzy 11 The ratio is motivated by the assumption of a quadratic loss function with y nonnegative The loss function for such a case would be lquot 2 L Cngyl where C is a constant Now 1 n logL logClog Zy12 n 11 and SNS 10 log C 10 logL so maximizing SNS will minimize L However it is easy to show that lquot 2 2 l quot 2 2 quotEly y ugly my 2 n ljsz 71 Therefore the use of SNS as a response variable confounds location and dispersion effects The confounding of location and dispersion effects was observed in the analysis of the SN L ratio in the pulloff force example In Figures 3 and 3 notice that the plots of y and SN L versus each factor have approximately the same shape implying that both responses measure location Furthermore since the SNS and SN L ratios involve y2 and 1 they will be very sensitive to outliers or values near zero and they are not invariant to linear transformation of the original response We strongly recommend that these signalto noise ratios not be used A better approach for isolating location and dispersion effects is to develop separate response surface models for E and logS2 If no replication is available to estimate variability at each run in the design methods for analyzing residuals can be used Another very effective approach is based on the use of the response model as demonstrated in the textbook and in Myers and Montgomery 1995 Recall that this allows both a response surface for the variance and a response surface for the mean to be obtained for a single model containing both the controllable design factors and the noise variables Then standard response surface methods can be used to optimize the mean and variance Finally we turn to some of the applications of the analysis of variance recommended by Taguchi As an example for discussion consider the experiment reported by Quinlan 1985 at a symposium on Taguchi methods sponsored by the American Supplier Institute The experiment concerned the quality improvement of speedometer cables Specifically the objective was to reduce the shrinkage in the plastic casing material Excessive shrinkage causes the cables to be noisy The experiment used an L15 orthogonal array the 2 design The shrinkage values for four samples taken from 3000foot lengths of the product manufactured at each set of test conditions were measured and the responses 5 and SNShmla computed Quinlan following the Taguchi approach to data analysis used SNS as the response variable in an analysis of variance The error mean square was formed by pooling the mean squares associated with the seven effects that had the smallest absolute magnitude This resulted in all eight remaining factors having significant effects in order of magnitude E G K A C F D H The author did note thatE and G were the most important Pooling of mean squares as in this example is a procedure that has long been known to produce considerable bias in the ANOVA test results To illustrate the problem consider the 15 NID0 1 random numbers shown in column 1 of Table 6 The square of each of these numbers shown in column 2 of the table is a singledegreeoffreedom mean square corresponding to the observed random number The seven smallest random numbers are marked with an asterisk in column 1 of Table 6 The corresponding mean squares are pooled to form a mean square for error with seven degrees of freedom This quantity is MSE 00727 05088 7 Finally column 3 of Table 6 presents the F ratio formed by dividing each of the eight remaining mean squares by MSE Now F005 559 and this implies that five of the eight effects would be judged significant at the 005 level Recall that since the original data came from a normal distribution with mean zero none of the effects is different from zero Analysis methods such as this virtually guarantee erroneous conclusions The normal probability plotting of effects avoids this invalid pooling of mean squares and provides a simple easy to interpret method of analysis Box 1988 provides an alternate analysis of Table 6 Pooling of Mean Squares NID01 Random Mean Squares with One Degree F0 Numbers of Freedom 08607 07408 1019 08820 07779 1070 03608 01302 00227 00005 01903 00362 03071 00943 12075 14581 2006 05641 03182 4038 03936 01549 06940 04816 663 03028 00917 05832 03401 468 00324 00010 10202 10408 1432 06347 04028 554 the Quinlan data that correctly reveals E and G to be important along with other interesting results not apparent in the original analysis It is important to note that the Taguchi analysis identi ed negligible factors as signi cant This can have profound impact on our use of experimental design to enhance process knowledge Experimental design methods should make gaining process knowledge easier not harder Some Final Remarks In this section we have directed some major criticisms toward the speci c methods of experimental design and data analysis used in the Taguchi approach to parameter design Remember that these comments have focused on technical issues and that the broad philosophy recommended by Taguchi is inherently sound On the other hand while the Taguchi controversy was in full bloom many companies reported success with the use of Taguchi s parameter design methods If the methods are awed why do they produce successful results Taguchi advocates often refute criticism with the remark that quotthey workquot We must remember that the quotbest guessquot and quotone factoratatimequot methods will also workand occasionally they produce good results This is no reason to claim that they are good methods Most of the successful applications of Taguchi s technical methods have been in industries where there was no history of good experimental design practice Designers and developers were using the best guess and one factor at a time methods or other unstructured approaches and since the Taguchi approach is based on the factorial design concept it often produced better results than the methods it replaced In other words the factorial design is so powerful that even when it is used inefficiently it will often work well As pointed out earlier the Taguchi approach to parameter design often leads to large comprehensive experiments often having 70 or more runs Many of the successful applications of this approach were in industries characterized by a highvolume lowcost manufacturing environment In such situations large designs may not be a real problem if it is really no more difficult to make 72 runs than to make 16 or 32 runs On the other hand in industries characterized by lowvolume andor highcost manufacturing such as the aerospace industry chemical and process industries electronics and semiconductor manufacturing and so forth these methodological inefficiencies can be significant A final point concerns the learning process If the Taguchi approach to parameter design works and yields good results we may still not know what has caused the result because of the aliasing of critical interactions In other words we may have solved a problem a shortterm success but we may not have gained process knowledge which could be invaluable in future problems In summary we should support Taguchi39s philosophy of quality engineering However we must rely on simpler more efficient methods that are easier to learn and apply to carry this philosophy into practice The response surface modeling framework that we present in the textbook is an ideal approach to process optimization and as we have demonstrated it is fully adaptable to the robust parameter design problem Supplemental References Leon R V A C Shoemaker and R N Kackar 1987 Performance Measures I J J of A JJ39 quot T tbs Vol 29 pp 253265 Quinlan J 1985 Product Improvement by Application of Taguchi Methods Third Supplier Symposium on Taguchi Methods American Supplier Institute Inc Dearbom MI Box G E P and S Jones 1992 SplitPlot Designs for Robust Product Experimentation Journal oprplied Statistics Vol 19 pp 326 Chapter 2 Supplemental Text Material S2 1 Models for the Data and the tTest The model presented in the text equation 223 is more properly called a means model Since the mean is a location parameter this type of model is also sometimes called a location model There are other ways to write the model for a ttest One possibility is z39 12 yU u r 8Uj12 nl where u is a parameter that is common to all observed responses an overall mean and r is a parameter that is unique to the ith factor level Sometimes we call 17the ith treatment effect This model is usually called the e ects model Since the means model is il2 8 y M J3912n we see that the 1th treatment or factor level mean is 1 u r that is the mean response at factor level 139 is equal to an overall mean plus the effect of the 1th factor We will use both types of models to represent data from designed experiments Most of the time we will work with effects models because it s the traditional way to present much of this material However there are situations where the means model is useful and even more natural S2 2 Estimating the Model Parameters Because models arise naturally in examining data from designed experiments we frequently need to estimate the model parameters We often use the method of least squares for parameter estimation This procedure chooses values for the model parameters that minimize the sum of the squares of the errors 8 We will illustrate this procedure for the means model For simplicity assume that the sample sizes for the two factor levels are equal that is n1 r12 n The least squares function that must be minimized is Z n 2 L Z Z 81 11 1 2 n Z Z yU M 2 11 11 L 22 yzj uz and equating these partial derivatives 2 1 6L quot Now 2 y uand 611 E 1 1 to zero yields the least squares normal equations quot1211 Zyu 11 quot1212 Zyu 11 The solution to these equations gives the least squares estimators of the factor level means The solution is 211 J71 and 212 J72 that is the sample averages at leach factor level are the estimators of the factor level means This result should be intuitive as we learn early on in basic statistics courses that the sample average usually provides a reasonable estimate of the population mean However as we have just seen this result can be derived easily from a simple location model using least squares It also turns out that if we assume that the model errors are normally and independently distributed the sample averages are the maximum likelihood estimators of the factor level means That is if the observations are normally distributed least squares and maximum likelihood produce exactly the same estimators of the factor level means Maximum likelihood is a more general method of parameter estimation that usually produces parameter estimates that have excellent statistical properties We can also apply the method of least squares to the effects model Assuming equal sample sizes the least squares function is 2 n 2283 11 11 2 r1 2 2204 7 11 11 and the partial derivatives of L with respect to the parameters are 6L 2 6L 6L 2 Jaw11 2 y rand 2 y r 1 2 Equating these partial derivatives to zero results in the following least squares normal equations 2 2n n 1 7122 ZZyU 11 11 1121 71 5391 Z y1 1 11 quot1 fl 7 2392 2 y 2 11 Notice that if we add the last two of these normal equations we obtain the first one That is the normal equations are not linearly independent and so they do not have a unique solution This has occurred because the effects model is overparameterized This situation occurs frequently that is the effects model for an experiment will always be an overparameterized model One way to deal with this problem is to add another linearly independent equation to the normal equations The most common way to do this is to use the equation 5391 5392 0 This is in a sense an intuitive choice as it essentially defines the factor effects as deviations from the overall mean u If we impose this constraint the solution to the normal equations is it y i E yz3912 That is the overall mean is estimated by the average of all 2n sample observation while each individual factor effect is estimated by the difference between the sample average for that factor level and the average of all observations This is not the only possible choice for a linearly independent constraint for solving the normal equations Another possibility is to simply set the overall mean equal to a constant such as for example it 0 This results in the solution 12172 211 2 iquot 0 There are an infinite number of possible constraints that could be used to solve the normal equations An obvious question is which solution should we use It turns out that it really doesn t matter For each of the three solutions above indeed for any solution to the normal equations we have it 2la i12 That is the least squares estimator of the mean of the ith factor level will always be the sample average of the observations at that factor level So even if we cannot obtain unique estimates for the parameters in the effects model we can obtain unique estimators of a function of these parameters that we are interested in We say that the mean of the ith factor level is estimable Any function of the model parameters that can be uniquely estimated regardless of the constraint selected to solve the normal equations is called an estimable function This is discussed in more detail in Chapter 3 S2 3 A Regression Model Approach to the t Test The twosample t test can be presented from the viewpoint of a simple linear regression model This is a very instructive way to think about the t test as it fits in nicely with the general notion of a factorial experiment with factors at two levels such as the golf experiment described in Chapter 1 This type of experiment is very important in practice and is discussed extensively in subsequent chapters In the t test scenario we have a factor x with two levels which we can arbitrarily call low and high We will use x l to denote the low level ofthis factor and x 1 to denote the high level of this factor Figure 231 below is a scatter plot from Minitab of the portland cement mortar tension bond strength data in Table 21 of Chapter 2 Figure 231 Scatter plot of bond strength 1750 I 1725 0 u 0 2 1700 g g o 5 39U 3 1675 c o I 1650 0 o I I I I I 10 05 00 05 10 Factor level We will a simple linear regression model to this data say y o 1xy ng where o and l are the intercept and slope respectively of the regression line and the regressor or predictor variable is x1 l and x2 1 The method of least squares can be used to estimate the slope and intercept in this model Assuming that we have equal sample sizes 71 for each factor level the least squares normal equations are A 2 n zn o 22 11 11 Zn l Zyu Zy1 11 11 The solution to these equations is o 7 A 1 a 5Jh h Note that the least squares estimator of the intercept is the average of all the observations from both samples while the estimator of the slope is onehalf of the difference between the sample averages at the high and low levels of the factor x Below is the output from the linear regression procedure in Minitab for the tension bond strength data Regression Analysis Bond Strength versus Factor level The regression equation is Bond Strength 169 0139 Factor level Predictor Coef SE Coef T P Constant 169030 00636 26593 0000 Factor level 013900 006356 219 0042 s 0284253 RrSg 210 ReSgadj 166 Analysis of Variance Source DF F P Regression 1 038642 038642 478 0042 Residual Error 18 145440 008080 Total 19 184082 Notice that the estimate of the slope given in the column labeled Coef and the row 1 1 labeled Factor level above is 0139 5y2 y1 5070420 167640 and the estimate of the intercept is 169030 Furthermore notice that the t statistic associated with the slope is equal to 219 exactly the same value apart from sign that we gave in the Minitab twosample ttest output in Table 22 in the text Now in simple linear regression the t test on the slope is actually testing the hypotheses H0 l 0 H0 l 0 and this is equivalent to testing H0411 2 It is easy to show that the t test statistic used for testing that the slope equals zero in simple linear regression is identical to the usual twosample t test Recall that to test the above hypotheses in simple linear regression the t statistic is qgt ZW l 2 n where S Z Z xv if is the corrected sum of squares ofthe x s Now in our 11 1 speci c problem 9 0 x1 l and x2 1 so S 271 Therefore since we have already observed that the estimate of 039 is just S A it t l 2y2 yl 2J72y1 3 1 2 Sp S p SEC 211 n This is the usual twosample t test statistic for the case of equal sample sizes S2 4 Constructing Normal Probability Plots While we usually generate normal probability plots using a computer software program occasionally we have to construct them by hand Fortunately it s relatively easy to do since specialized normal probability plotting paper is widely available This is just graph paper with the vertical or probability scale arranged so that if we plot the cumulative normal probabilities 139 7 05n on that scale versus the rankordered observations yg a graph equivalent to the computergenerated normal probability plot will result The table below shows the calculations for the unmodi ed portland cement mortar bond strength data 739 yo 139 7 0510 20 1 1662 005 164 2 1675 015 104 3 1687 025 067 4 1698 035 039 5 1702 045 013 6 1708 055 013 7 1712 065 039 8 1727 075 067 9 1734 085 104 10 1737 095 164 Now if we plot the cumulative probabilities from the nexttolast column of this table versus the rankordered observations from the second column on normal probability paper we will produce a graph that is identical to the results for the unmodi ed mortar formulation that is shown in Figure 211 in the text A normal probability plot can also be constructed on ordinary graph paper by plotting the standardized normal z scores 20 against the ranked observations where the standardized normal z scores are obtained from j 05 PZSzd3zj where 0 denotes the standard normal cumulative distribution For example if 139 7 05n 005 then zj 005 implies that z l64 The last column ofthe above table displays the values of the normal z scores Plotting these values against the ranked observations on ordinary graph paper will produce a normal probability plot equivalent to the unmodi ed mortar results in Figure 211 As noted in the text many statistics computer packages present the normal probability plot this way SZ S More About Checking Assumptions in the t Test We noted in the text that a normal probability plot of the observations was an excellent way to check the normality assumption in the t test Instead of plotting the observations an alternative is to plot the residuals from the statistical model Recall that the means model is z39 12 g y M 1 J3912n and that the estimates of the parameters the factor level means in this model are the sample averages Therefore we could say that the tted model is y J7i12 andj12n That is an estimate of the ijth observation is just the average of the observations in the 1th factor level The difference between the observed value of the response and the predicted or fitted value is called a residual say eUyU Zil2 The table below computes the values of the residuals from the portland cement mortar tension bond strength data Observation ylj elf ylj 71 yzj elf yzj 72 j ylj 1676 yzj 1704 1 1685 009 1662 042 2 1640 036 1675 029 3 1721 045 1737 033 4 1635 041 1712 008 5 1652 024 1698 006 6 1704 028 1687 017 7 1696 020 1734 030 8 1715 039 1702 002 9 1659 017 1708 004 10 1657 019 1727 023 The gure below is a normal probability plot of these residuals from Minitab Percent Normal Probability Plot of the Residuals response is Bond Strength l 050 I I 025 000 Residual I 025 As noted in section 23 above we can compute the ttest statistic using a simple linear regression model approach Most regression software packages will also compute a table or listing of the residuals from the model The residuals from the Minitab regression model fit obtained previously are as follows Factor Bond Obs level Strength Fit SE Fit Residual St Resid 7100 168500 167640 00899 00860 032 2 7100 164000 167640 00899 703640 7135 3 7100 172100 167640 00899 04460 165 4 7100 163500 167640 00899 704140 7154 5 7100 165200 167640 00899 702440 70 90 6 7100 170400 167640 00899 02760 1 02 7 7100 169600 167640 00899 01960 0 73 8 71 00 17 1500 16 7640 00899 0 3860 1 43 9 7100 165900 167640 00899 70 1740 70 65 10 7100 165700 167640 00899 70 1940 70 72 11 1 00 16 6200 17 0420 00899 70 4220 71 56 12 100 167500 170420 00899 702920 71 08 13 100 173700 170420 00899 03280 1 22 14 1 00 17 1200 17 0420 00899 0 0780 0 29 15 100 169800 170420 00899 70 0620 70 23 16 100 168700 170420 00899 701720 70 64 17 1 00 17 3400 17 0420 00899 02980 1 11 18 100 170200 170420 00899 70 0220 70 08 19 100 170800 170420 00899 00380 0 14 20 100 172700 170420 00899 02280 0 85 The column labeled Fit contains the averages of the two samples computed to four decimal places The residuals in the sixth column of this table are the same apart from rounding as we computed manually S2 6 Some More Information about the Paired t Test The paired t test examines the difference between two variables and test whether the mean of those differences differs from zero In the text we show that the mean of the differences ud is identical to the difference of the means in two independent samples 1 uz However the variance of the differences is not the same as would be observed if there were two independent samples Let 67 be the sample average of the differences Then M Va m V071 V72 2C0V71J72 20 2 1 p n assuming that both populations have the same variance 02 and that p is the correlation between the two random variables y1 and y2 The quantity S n estimates the variance of the average difference 67 In many paired experiments a strong positive correlation is expected to exist between y1 and y2 because both factor levels have been applied to the same experimental unit When there is positive correlation within the pairs the denominator for the paired t test will be smaller than the denominator for the twosample or independent t test If the twosample test is applied incorrectly to paired samples the procedure will generally understate the signi cance of the data Note also that while for convenience we have assumed that both populations have the same variance the assumption is really unnecessary The paired t test is valid when the variances of the two populations are different Chapter 7 Supplemental Text Material S7 1 The Error Term in a Blocked Design Just as in any randomized complete block design when we run a replicated factorial experiment in blocks we are assuming that there is no interaction between treatments and blocks In the RCBD with a single design factor Chapter 4 the error term is actually the interaction between treatments and blocks This is also the case in a factorial design To illustrate consider the ANOVA in Table 72 of the textbook The design is a 22 factorial run in three complete blocks Each block corresponds to a replicate of the experiment There are six degrees of freedom for error Two of those degrees of freedom are the interaction between blocks and factorA two degrees of freedom are the interaction between blocks and factor B and two degrees of freedom are the interaction between blocks and the AB interaction In order for the error term here to truly represent random error we must assume that blocks and the design factors do not interact S7 2 The Prediction Equation for a Blocked Design Consider the prediction equation for the 24 factorial in two blocks with ABCD confounded from in Example 72 Since blocking does not impact the effect estimates from this experiment the equation would be exactly the same as the one obtained from the unblocked design Example 62 This prediction equation is 7 7006108125x1 49375x3 73125x4 90625x1x3 83125x1x4 This equation would be used to predict future observations where we had no knowledge of the block effect However in the experiment just completed we know that there is a strong block effect in fact the block effect was computed as blOCk 979 717105161 717105162 1839625 This means that the difference in average response between the two blocks is 718625 We should compensate for this in the prediction equation if we want to obtain the correct fitted values for block 1 and block 2 Defining a separate block effect for each block does this where block1 ef ect 93 125 and block2 e ect 93125 These block effects would be added to the intercept in the prediction equation for each block Thus the prediction equations are fimml 7006 block1 e ect 108125x1 49375x3 73125x4 90625x1x3 83125x1x4 7006 93125 108125x1 49375x3 73125x4 90625x1x3 83125x1x4 607475108125x1 49375x3 73125x4 90625x1x3 83125x1x4 and blmkl 7006 block2 effect 108125x1 49375x3 73125x4 90625x1x3 83125x1x4 7006 93125108125x1 49375x3 73125x4 90625x1x3 83125x1x4 793725 108125x1 49375x3 73125x4 90625x1x3 83125x1x4 S7 3 Run Order is Important Blocking is really all about experimental run order Speci cally we run an experiment in blocks to provide protection against the effects of a known and controllable nuisance factors However in many experimental situations it is a good idea to conduct the experiment in blocks even though there is no obvious nuisance factor present This is particularly important when in takes several time periods days shifts weeks etc to run the experiment To illustrate suppose that we are conducting a single replicate of a 24 factorial design The experiment is shown in run order is shown in Table 2 Now suppose that misfortune strikes the experimenter and after the first eight trials have been performed it becomes impossible to complete the experiment Is there any useful experimental design that can be formed from the first eight runs Table 2 A 24 Factorial Experiment It turns out that in this case the answer to that question is no Now some analysis can of course be performed but it would basically consist of fitting a regression model to the response data from the first 8 trials Suppose that we fit a regression model containing an intercept term and the four main effects When things have gone wrong it is usually a good idea to focus on simple objectives making use of the data that are available It turns out that in that model we would actually be obtaining estimates of Intercept Intercept AB CD ABCD AAABBCABCACDBCD BBABBCABC CCABCACDBCD DDABDACDBCD Now suppose we feel comfortable in ignoring the threefactor and fourfactor interaction effects However even with these assumptions our intercept term is clouded or confused with two of the twofactor interactions and the main effects of factors A and B are confused with the other twofactor interactions In the next chapter we will refer to the phenomena being observed here as aliasing of effects its proper name The supplemental notes for Chapter 8 present a general method for deriving the aliases for the factor effects The DesignExpert software package can also be used to generate the aliases by employing the Design Evaluation feature Notice that in our example not completeing the experiment as originally planned has really disturbed the interpretation of the results Suppose that instead of completely randomizing all 16 runs the experimenter had set this 24 design up in two blocks of 8 runs each selecting in the usual way the ABCD interaction to be confounded with blocks Now if only the rst 8 runs can be performed then it turns out that the estimates of the intercept and main factor effects from these 8 runs are Intercept Intercept A A BCD B B ACD C C ABD D D ABC If we assume that the threefactor interactions are negligible then we have reliable estimates of all four main effects from the rst 8 runs The reason for this is that each block of this design forms a one half fraction of the 24 factorial and this fraction allows estimation of the four main effects free of any twofactor interaction aliasing This specific design the onehalf fraction of the 24 will be discussed in considerable detail in Chapter 8 This illustration points out the importance of thinking carefully about run order even when the experimenter is not obviously concerned about nuisance variables and blocking Remember If something can go wrong when conducting an experiment it probably will A prudent experimenter designs his or her experiment with this in mind Generally if a 2kfactorial design is constructed in two blocks and one of the blocks is lost ruined or never run the 2k 2 2 runs that remain will always form a one half fraction of the original design It is almost always possible to learn something useful from such an experiment To take this general idea a bit further suppose that we had originally set up the l6run 24 factorial experiment in four blocks of four runs each The design that we would obtain using the standard methods from this chapter in the text gives the experiment in Table 3 Now suppose that for some reason we can only run the first 8 trials of this experiment It is easy to verify that the first 8 trials in Table 3 do not form one of the usual 8run blocks produced by confounding the ABCD interaction with blocks Therefore the first 8 runs in Table 3 are not a standard onehalf fraction of the 24 A logical question is what can we do with these 8 runs Suppose as before that the experimenter elects to concentrate on estimating the main effects If we use only the first eight runs from Table 3 and concentrate on estimating only the four main effects it turns out what we really are estimating is Intercept Intercept ACD AACD BBABCD CCAD DDAC Once again even assuming that all interactions beyond order two are negligible our main effect estimates are aliased with twofactor interactions Table 3 A 24 Factorial Experiment in Four Blocks If we were able to obtain 12 of the original 16 runs that is the first three blocks of Table 3 then we can estimate Intercept Intercept 0333 AB 0333 ACD 0333 BCD A A ABCD B B ABCD C C ABC D D ABD AC AC ABD AD AD ABC BC BC ABD BD BD ABC CD CD 7 ABCD If we can ignore three and fourfactor interactions then we can obtain good estimates of all four main effects and five of the six twofactor interactions Once again setting up and running the experiment in blocks has proven to be a good idea even though no nuisance factor was anticipated Finally we note that it is possible to assemble three of the four blocks from Table 3 to obtain a 12run experiment that is slightly better than the one illustrated above This would actually be called a 34Lh fraction of the 24 an irregular fractional factorial These designs are mentioned brie y in the Chapter 8 exercises