Design of Experiments
Design of Experiments STAT 3140
Popular in Course
Popular in Statistics
This 4 page Class Notes was uploaded by Tyshawn Gorczany on Sunday October 25, 2015. The Class Notes belongs to STAT 3140 at University of North Carolina - Charlotte taught by Staff in Fall. Since its upload, it has received 18 views. For similar materials see /class/229027/stat-3140-university-of-north-carolina-charlotte in Statistics at University of North Carolina - Charlotte.
Reviews for Design of Experiments
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/25/15
Con dence Intervals and Hypothesis Testing for Population Means and differences between Population Means In this course we are interested in determining the differences between population means of dependent variable values associated with different factor levels We also would like to be able to give a meaningful estimate for these means Furthermore we are working under the assumption that the errors in our models are normally distributed with a common variance that has to be estimated from the data This means that when calculating confidence intervals or performing hypothesis tests the appropriate test statistic will be a t 7 statistic with some df degrees of freedom Let s review then how to compute confidence intervals and how to perform hypothesis tests in this specialized setting Con dence Intervals For a single population mean of a normally distributed random variable with 039 unknown Illustrative Example Suppose that the price of regular gasoline in a given area in a given week is normally distributed and we wish to estimate the mean price of a gallon of gasoline in this region for the first week of August If we randomly select ten 10 gas stations in the area and determine what they are charging for a gallon of regular gas then the average of those ten prices should give us an idea of what the overall average price of a gallon of regular in the area for the week is However we know that if we take another random sample often stations and calculate the average price for them that it very possibly might differ from the average determined by our first sample This means that the average price of gas in the region during the first week of August for random samples often 10 stations is actually a random variable and not something fixed just like the individual prices at specific stations is a random variable which we have assumed is normally distributed At this point we might want to know just out of curiosity what the distribution is of the random variable of average gas prices for ten randomly selected gas stations from our area during the first week of August quite a mouthful Let s put curiosity aside for the moment and back up and suppose that instead we want to estimate the mean price of regular gas by just picking one station at random and using its price some way For example suppose George s BP charges 181 we re rounding up Now were pretty certain that 181 is not the true average for the entire region It could be but it isn t too likely Therefore the statement that the mean price u equals 181 is almost certainly false In order to have a reasonable chance of saying anything true we could soften our statement to say the mean is nearly or approximately 181 However this makes our statement somewhat vague because we haven t pinned down what approximately means Maybe we could say that the mean price is between two numbers say between 171 and 191 181 i 1 But this can t be atrue statement either because we originally assumed that our gas prices were normally distributed and normal distributions assign some nonzero probability to values outside any finite interval Of course you might now say that assumption was stupid because anybody knows that the price certainly is between 0 and say 10 But we aren t willing to throw this assumption out besides a more precise statement of what we meant by it is that except for a range of prices of probability so small that they are completely negligible the random variable of prices is normal Thus a statement like the true mean is between 171 and 191 with some high probability is something that we might be able to say Thus we are willing to say to someone you tell us or we pick in advance a con dence level 90 95 99 999 whatever and we will tell you how to pick an interval range of values such that the true population mean lies within the interval 90 95 99 999 whatever if the time For technical reasons we prefer to talk about a con dence level 0 1 a100 so if a 05 then c 95 How should we go about choosing a c con dence interval Returning to the gas price problem in particular how should we choose a c con dence level for gas price based on the price at a single gas station If we don t know any more than we have revealed so far we can t but if we have some information concerning the variance of the original distribution we can use it Just for discussion purposes let s suppose that we know the variance completely ie we know the value of 0392 and hence 039 Then we know that x u is the 039 E E Px E lt ult xE Pu E lt xlt uE P i lt z lt 7 Where 2 039 039 so called 2 7 score Under our assumptions 2 is distributed according to a normal distribution with a mean of 0 and a standard deviation of 1 the standard normal distribution So if we can answer the question given con dence level 0 what value of E is such that P 7 lt z lt i Of course Since we know 039 this is the same as 039 039 E asking what value of i 251s such that P zc lt z lt z From what we know 039 a about the standard normal distribution we are asking for that value of 2 such that 3 units of area lie above it or below its negative Once we have found 25 we then take E 25039 and this would solve our problem Thus for the gas prices suppose that we knew that 039 2 and we want a 95 con dence interval Then 25 196 from tables or from Excel and E 25039 1962 292 and using the price at George s BP the interval with end points 181 i29 is a 95 con dence interval We cannot be certain that it contains the true mean but we know that if we generate intervals in this way then 95 of the time the resulting interval will contain the true mean In reality though we aren t going to know 039 and we are going to have to take a sample bigger than 1 in order to learn anything about it But taking a sample helps in two ways rst it gives us a way to estimate 039 and second the distribution of sample means of samples of size n drawn at random from a population has the same mean as the population and a much tighter smaller variance which means that using a point from the distribution of sample means of samples of size n to estimate the mean of that distribution will give us an interval estimate for the original distribution s mean with a much smaller width than we could get if we used a point from the original distribution Here is the result that justi es these statements LetX be a normally distributed random variable with mean u and standard deviation 039 and let Ybe the random variable of means of samples of size n Then the mean of Y u and the standard deviation of Y 0 i 42 Thus a con dence interval based on a sample of size 100 will have 1103911 the size of one based a single value from the original distribution Now there is one more complication and then we are done We don t know 039 we have to use the sample standard deviation 3 to approximate 039 But this introduces more variability which we need to take into account in calculating the con dence interval width Our estimate for the standard deviation of the distribution of sample means of samples of size n is s we refer to s as the standard error of estimate We still n z E E havePx E lt u lt xE Pu E lt x lt uEP 7lt t lt 7 Here S2 S2 x 7 7c t u but despite its s1m11ar1ty to z tis not distributed like a standard 3 s xZ normal but instead takes the distribution of a Student s t distribution with degrees of freedom df equal to n 7 1 Inside this distribution we now seek a critical value I such that P tc lt t lt t5 amp and then we can take E tcs t5 Returning to our gas i 42 price example suppose a random sample of 10 gas stations yield a mean f 181 s 2 then s 06325 and df 10 71 9 So I 2262 ifc 95 and L 2 J M E 3215 226206325 143 so our 95 con dence interval would have endpoints 181 i143 When calculating con dence intervals after an AN OVA we may use MSW as a pooled estimate for the population variance The square root of this value may be used for s and MSW in the calculation of a con dence interval for and column s can be taken to be mean 1 The degrees of freedom for the appropriate t distribution however will be RC C which is greatly to our advantage Let s use these guidelines to calculate con dence intervals for the factor level means for the column means in the Golfcourse data in EXcel File PB2 7 16 Here is a summary of the steps 1 Perform an ANOVA 2 On the result page for the ANOVA enter alpha in a convenient cell and calculate tcrit using the Tiny function with input alpha and df df of SSW from the ANOVA report table Calculate the error term as E rm Q 3 in the fth row choose contiguous blank cells and then a enter the formula column 1 mean 7 E in the rst cell b enter the formula column 1 mean E in the second cell 4 fill down for all the rows corresponding to columns in the factor analysis table Now lets do hypothesis tests for differences in the column means For the Golfcourse data in PB2 7 15