Class Note for STAT 528 at OSU 10
Class Note for STAT 528 at OSU 10
Popular in Course
Popular in Department
This 13 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 22 views.
Reviews for Class Note for STAT 528 at OSU 10
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Stat 528 Autumn 2008 Inference for the mean of a population One sample t procedures Reading Section 71 o Inference for the mean of a population 0 The t distribution for a normal population 0 Small sample Cl for M in a normal population 0 Robustness of the t procedures 0 Testing hypotheses about a single mean the one sample tetesti 0 Methods for matched pairs 7 The paired tetest 7 The sign test for matched pairs 0 The power of the one sample tetesti Inference for the mean of a population 0 So far we have based inference for the population mean on the Z statistic 7 7 M Z UW For large 71 Z is approximately N01i 0 Problem in practice we do not know the population stan7 dard deviation7 at 7 Instead we use the sample standard deviation7 s as an estimate for at The distribution oft for a normal population 0 Let X17X27 Xn be a SR8 from a normal population with population mean n Then the standardized variable X M sW has a t distribution with n 7 1 degrees of freedom dfi t o The impact of estimating a is to add uncertainty about our standardization o Smaller 71 leads to fewer degrees of freedom and less certainty c We say that t has a twl distribution 0 The quantity7 sW is the estimated standard error for the sample mean 7 It is denoted SE mean in MlNlTABi Properties of the t distribution I34 standard normal 7 7 mm 5 df 39 a 7 mm 2 df mm l df na prabablllty denslty n 2 Ell nn value c The density curve is symmetric with mean zero and is belle shaped like the normal distribution 0 The t distribution has heavier tails than the normal dis tribution more spread out about zero 0 As the degrees of freedom increase the tails become thinner and more of the density is concentrated in the center of the distribution to0 standard normal distribution A small sample CI for M The normal population case o For one random sample of normal data a C 10017a level con dence interval for M is given by s W where twmg is the critical value ofthe t distribution with i j t 1Ot2 n 7 1 degrees of freedom 0 The tn1ag value is tabulated in Table D 1 Look at the bottom of the table for the con dence level C of the two sided interval OR 2 Look up 622 as the upper tail probability p 0 Recall that the Cl for M comes from a family of hypothesis tests about M Robustness of the tprocedures o What if the population is not normal 7 can we still use the t distribution 0 Practical guidelines from the textbook 1 n lt 15 Use t procedures if data are close to normal If data are clearly nonenormal or if outliers are present do not use the t procedure 2 n 2 15 Use t procedures except in presence of strong skewness or outliers 3 Roughly 71 2 40 The t procedures are valid even for clearly skewed distributions 0 Use plots of the data to help you decidel Polymerization example The article Measuring and understanding the aging of craft in sulating paper in power transformers contained the following observations on the degree of polymerization for paper specimens for which Viscosity times concentration fell in a certain middle range 418 421 421 422 425 427 431 434 437 439 446 447 448 453 454 463 465 Plots of the data show that a normality assumption for the data is reasonable Note that i 438297 5 15147 71 17 Form a 95 con dence interval for the true average degree of polymerization as did the authors of the article Does the interval suggest that 440 is a plausible value for the true average degree of polymerization What about 4507 Testing hypotheses about a single mean The one sample t test 0 Data We assume 1112 xn is a random sample from a normal population with mean u c We state our hypotheses H0 11 no for some constant value no Hai MltM07M7 M070RMgtM0 remember to de ne what u is in words for your problem 0 We calculate the test statistic7 t 95 0 w 0 Under H07 the test statistic follows a tn11 distribution 0 Decision Compare the observed testatistic to the critical value found in Table D Drawing conclusions in the onesample ttest o For a test of signi cance at the level a 7 If the observed testatistic is in the tail we reject H0 in favor of H A 7 If the observed testatistic is not in the tail we do not reject H0 0 Alternatives and tails 7 For a twoetailed alternative reject if ltl 2 tagi 7 For an upperetailed alternative reject if t 2 ta 7 For a loweretailed alternative reject if t g ta 0 As always write your conclusions in words o It is important to think about the assumptions that you made to carry out the tetest 7 Remember that some assumptions can be validated using plots of the data Example The oneesample t statistic from a sample of n 50 observations for the twoesided test of H0 1 50 versus Ha u 75 50 has the value if 165 o What are the degrees of freedom for the test statistic t o ls the value If 165 statistically signi cant at the 10 level At the 5 level 0 Locate the two critical values 75 from Table D that bracket 75 What are the rightetail probabilities for these two values How would you report the Pevalue for this test 10 Matched pairs revision and analysis 0 Suppose we have two treatments 0 In the matched pairs design we try to gain precision in the response by matching pairs of similar individuals 7 we assign each treatment randomly to each subject each subject only receives one treatment 0 Or an individual serves as his her own partner 7 the individual receives both treatments 0 Each pair of subjects individual form their own block 0 To analyze the results of this type of experiment we com pare the responses across the pairs individuals 7 We usually take differences and carry out the statistical inference using the paired ttest 11 Football example Two identical footballs one aire lled and one heliume lled were used outdoors on a windless day at The Ohio State University s athletic complex The kicker was a novice punter and was not informed which football contained the helium Each football was kicked 39 times The kicker changed footballs after each kick so that his leg would play no favorites if he tired or improved with practice Source Lafferty M B 1993 77OSU scientists get a kick out of sports controversy 7 The Columbus Dispatch 21 Nov 1993 B7 12 The data A scatterplot all distances are in yards Scallerplot of Air vs Helium TriallAir Helium TriallAir Helium Trial Air Helium 1 25 25 14 25 31 27 22 30 2 23 16 15 34 22 28 31 27 3 18 25 16 26 29 29 25 33 4 16 14 17 20 23 3U 2U 11 5 35 23 18 22 26 31 27 26 6 15 29 19 33 35 32 26 32 7 26 25 20 29 24 33 28 30 8 24 26 21 31 31 34 32 29 Helium 9 24 22 22 27 34 35 28 30 10 28 26 23 22 39 36 25 29 11 25 12 24 29 32 37 31 29 12 19 28 25 28 14 38 28 30 13 27 28 26 29 28 39 28 26 The paired t procedure 7 the setup 0 Suppose we have pairs of data values 17yil7127y27 9an eigi7 In our example the pairs of values are the helium filled7 air filled distances for each kicki 0 Clearly the x and 3 values are not independent 0 Instead we calculate the differences dz ya 7 0627 foreachz 1ni c We assume all7 d2 dn is a random sample from a normal population with mean Md and stdev 0dr 7 ad is the population mean of the di erences between the x and 3 values 7 ad is the population stdev of the di erences 15 The paired t procedure c We want to test H0 Md no for some constant value 0 Ha Md lt 07 M 75 07 OR Md gt 0 c We compute the test statistic7 d MO SdWl where dis the sample average of the differences and 15 5d is the sample stdeV of the differences 0 Under Ho7 the test statistic follows a twl distribution 0 We make our decision in the same way that we did for the onesample tetest 7 if the observed testatistic is in tail7 we reject H 0 7 if the observed testatistic is not in the tail7 we do not reject Her 16 Identifying the hypotheses Summary gures Histogram of differences AirrHelium 0 There is a belief that on average a heliumr iied bail travels 12 further than the airr iied bail State the appropriate H0 and m Ha Be sure to identify the parameters appearing in the E hyp otheses Frequency u 715 a u a 15 AirrHeiium Normal Q Q plot of difference AirrHelium 2n AirrHelium Performing the test Inference for nonnormal populations 0 Carry out a test Can you reject H0 at the 5 signi cance level At the 1 signi cance level Write down you conclue sion in words Variable N N Mean SE Mean StDev AiriHelium 39 O Oi462 1 10 68 Variable Minimum Q1 Median Q3 Maximum AiriHelium 1700 i400 aliOO 2 00 14iOO 0 Provide a 90 con dence interval for the mean difference in the distances aire lled minus heliume lled 19 o If the data do not seem to be drawn from a normal populae tion7 then the t procedures may not be valid 0 Three possible strategies 1 Learn about other probability distributions For exame ple7 there plenty of skewed distributions erg7 exponential7 gamma7 Weibulli Use methods for these distributions in stead of the methods for the normal distribution 2 Transform your data to make it look as normal as pos sible recall the ladder of power transformations Can be hard to interpret the results when using a transforma tion 3 Use distributionfree tests These tests do not assume a particular distribution for the population Often these test are based on other parameters of the distribution such as the median rather than the mean These tests can be less powerful in practice 20 The Sign test for matched pairs 0 Example of a distributionfree test 0 As before consider pairs of data values 951717107 96271707 96m 27 We will test H0 population median of di erences 07 versus Ha population median of di erences 75 0 0 Let dZ yZ 7 x1 239 17 771 be the di erencesi o Exclude the di erences that are zero Let X denote the count out of the remaining m di erences that are positive 0 Then under Ho7 X is Binomialm0i5i If the median is zero7 then half the nonzero di erences are above zero7 and the other half are below zero o If x is the observed X value7 then the P7value is 2xPX ac or 2xPXZx 21 The Sign test for matched pairs cont o For the football example 7 Out ofn 39 differences7 m 37 differences are nonzero 7 Thus under H07 X is Binomial377 0 5 7 Out of the 37 we observe 17 that are above zero 7 P7value 2 x PX g 17 2 x 03714 07428 7 No evidence to reject Hot 0 See the textbook for the one7sided test 0 Note If the population of di erences is normally or ap7 proximately normally distributed then this test will be less powerful at detecting differences than the paired t7testi 22 The power of the one sample ttest o The power calculation for the one sample tetest is simi lar to the power calculation for the Zetest o But7 the math is much harderl 7 Instead we use MlNlTAB o Stat 7gt Power and Sample Size 7gt 1Sample t 0 Under Options select the Alternative Hypothesis and Signi cance Level 0 Then enter any two of the following three items 1 Sample sizes 2 Differences 3 Power values 0 Enter the Standard deviation the sample stdev in this case and click OK 23 A value for a c There are four main ways to obtain a value for a 7 Literature search Use historical data from similar studies 7 Pilot study Use the results of a pilot study The estimate of a will often need to be adjusted 7 Elicit a Two useful methods are the Range4 method and the Range6 method 7 Construct a value for 0 Some probability mod7 els yield a value for 0 eg For a Bernoulli RV7 a xp1 7 19 0 Be conservative Use several methods and consider a slightly larger value of 0 than these methods suggest 24 An agricultural eld trial example An agricultural eld trial compares the yield of two varieties of tomatoes for commercial use The researchers divide in half each of 10 small plots of land and plant each tomato variety on one half of each plot After harvest they compare the yields in pounds per plant at each location The ten differences Variety A 7 Variety B give the following statistics i 046 and s 092 ls there convincing evidence that Variety A has the higher mean yield Let M denote the population mean of the difference in the yields We test H0 Md 0 versus Ha Md gt 0 The MlNlTAB output for the paired t test is OneSamp1e T Test of mu 0 VS gt O 95 Lower N Mean StDev SE Mean Bound T P 10 0460000 0920000 0290930 0073307 158 0074 25 Agricultural trial cont The tomato experts who carried out the eld trial suspect that the relative lack of signi cance is due to low power They would like to detect a mean difference in yields of 06 pounds per plant at the 005 signi cance level Based on the previous study use 092 as an estimate of the population a o What is the power of the test with n 12 against the alter native of M 067 o If the sample size is increased to n 30 plots of land what will be the power against the same alternative 26
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'