555 Class Note for STAT 30100 with Professor Howell at Purdue

This 19 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015.

Date Created: 02/06/15
Lecture 8 Sections 71 amp 72 Inference for the Mean of a Population Previously we made the assumption that we know the population standard deviation 0 We then developed a confidence interval and used tests for significance to gather evidence foragainst an hypothesis all with a known a In normal practice a is unknown In this section we must estimate a from the data though we are primarily interested in the population mean u Con dence Interval for a Mean First Assumptions for Inference about a mean 0 Our data are a simple random sample SRS of size n from the population 0 Observations from the population have a normal distribution with mean u and standard deviation 0 If population distribution is not normal it is enough that the distribution is unimodal and symmetric and that the sample size be large ngt15 Both u and a are unknown parameters Because we do not know a we make two changes in our procedure 0 J5 l The standard error 5 is used in place of Standard Error When the standard deviation of a statistic is estimated from the data the result is called the standard error of the statistic The standard error of the sample mean E is SELL x J17 Where s is the sample standard deviation n is the sample size 2 We calculate a different test statistic and use a different distribution to calculate our pvalue Lecture 8 Section 71 amp 72 Page 1 The tdistributions o The tdistribution is used when we do not know a The tdistributions have density curves similar in shape to the standard normal curve but with more spread o The tdistributions have more probability in the tails and less in the center than does the standard normal This is because substituting the estimate s for the fixed parameter 5 introduces more variation into the statistic 0 As the sample size increases the tdensity curve approaches the NOl curve Note This is because s estimates 5 more accurately as the sample size increases The t Distributions Suppose that an SRS of size n is drawn from a N ua population Then the onesample t statistic t m S J13 has the t distribution with nl degrees of freedom The OneSample t Con dence Interval Suppose that an SRS of size n is drawn from a population having unknown mean u A level C confidence interval for u is Ei z i J where t is the value for the tnl density curve with area C between t and t This interval is exact when the population distribution is normal and is approximately correct for large n in other cases Lecture 8 Section 71 amp 72 Page 2 Examples 1 Suppose X Bob s golf scores are approximately normal distribution with unknown mean and standard deviation A SRS of n 16 scores is selected and a sample mean of E 77 and a sample standard deviation of s 3 is calculated Calculate a 90 confidence interval for u 2 Example 71 in Textbook In 1996 the US Agency for International Development provided 238300 metric tons of corn soy blend CSB for development programs and emergency relief CSB is highly nutritious lowcost fortified food and can be incorporated into different food preparations worldwide As part of a study to evaluate appropriate vitamin C levels in this commodity measurements were taken on samples of CSB produced in a factory The following data are the amounts of vitamin C measured in milligrams per 100 grams of blend for a random sample of size 8 from a production run Compute a 95 confidence interval for u where u is the mean vitamin C content of the CSB 26 3123 22112214 31 By hand 225 s7191 n 8 Lecture 8 Section 71 amp 72 Page 3 Using SPSS analyze gt descriptive statistics gt explore Move vitaminC to dependent list Click statistics and select clescriptives and changekeep a 95 con dence interval Click continue followed by OK Descriptives I Statistic Std Error Vitamin C Mean 2250 2542 95 Confidence Lower Bound 1649 Interval for Mean Upper Bound 2851 5 Trimmed Mean 2267 Median 2250 Variance 51714 Std Deviation 1191 Minimum 11 Maximum 31 Range 20 Interquartile Range 14 Skewness 443 752 Kurtosis 631 1481 The OneSample t test 1 State the Null and Alternative hypothesis 2 Find the test statistic Suppose that an SRS of size n is drawn from a population having unknown mean ii To test the hypothesis H 0 u 0 based on a SRS of size n compute the onesample t statistic z M s J17 3 Calculate the p value In terms of a random variable T having the tnl distribution the Pvalue for a test of H 0 against Ha ngtn0 is PT2t Ha nltn0 is PTSt Ha u u0 is 2PT2t These Pvalues are exact if the population distribution is normal and are approximately correct for large n in other cases Lecture 8 Section 71 amp 72 Page 4 4 State the conclusions in terms ofthe problem Choose a significance level such as 1 005 then compare the Pvalue to the 1 level lfPvalue S a then reject H 0 lfPvalue gt 1 then fail to reject H 0 Examples 1 Experiments on learning in animals sometimes measure how long it takes mice to find their way through a maze The mean time is 18 seconds for one particular maze A researcher thinks that a loud noise will improvedecrease the time it takes a mouse to complete the maze She measures how long each of 30 mice take to complete the maze with noise stimulus She finds their average time is 16 seconds and their standard deviation is s 3 seconds Do a hypothesis test to test the researchers assertions with 1 01 2 Example 72 in Textbook Suppose that we know that sufficient vitamin C was added to the CSB mixture to produce a mean vitamin C content in the final product of 40 mg 100 g It is suspected that some of the vitamin is lost or destroyed in the production process To test this hypothesis we can conduct a onesided test to determine if there is sufficient evidence to conclude that the CSB mixture lost vitamin C content at 1 005 level By hand Lecture 8 Section 71 amp 72 Page 5 Using SPSS analyze gt compare means gt One sample T test Move vitaminc into the test variable box and type in 40 for the test value To change the con dence interval Click options and Click continue Lastly click OK OneSam ple Statistics change con dence interval from 95 to whatever I did not do this as I will keep the 95 default Std Error N Mean Std Deviation Mean Vitamin C 8 2250 7191 2542 OneSample Test Test Value 40 95 Confidence Interval of the Difference Mean t df Sig 2tailed Difference Lower Upper Vitamin C 6883 7 000 17500 2351 1149 Lecture 8 Section 71 amp 72 Page 6 Matched Pairs Design A common design to compare two treatments is the matched pairs design One type of matched pair design has 2 subjects who are similar in important aspects matched in pairs and each treatment is given to one of the subjects in each pair With only one subject 2 treatments are given in random order Another type of matched pairs is beforeand after observations on the same subject Paired t Procedures To compare the mean responses to the two treatments in a matched pairs design apply the onesample t procedures to the observed differences d Example Problem 731 is done by hand and using SPSS The researchers studying vitamin C in CSB in example 71 were also interested in a similar commodity called wheat soy blend WSB Both these commodities are mixed with other ingredients and cooked Loss of vitamin C as a result of this process was another concern of the researchers One preparation used in Haiti called gruel can be made from WSB salt sugar milk banana and other optional items to improve the taste Samples of gruel prepared in Haitian households were collected The vitamin C content in milligrams per 100 grams of blend dry basis was measured before and after cooking Here are the results Sample 1 2 3 4 5 Before 73 79 86 88 78 After 20 27 29 36 17 Set up appropriate hypotheses and carry out a significance test for these data It is not possible for cooking to increase the amount of vitamin C Lecture 8 Section 71 amp 72 Page 7 By hand Using SPSS gt Analyze gt Compare Means gt Paired 7 Sample T test Move before and after to paired variable box whichever variable is listed rst will come rst in the subtraction Click OK Paired Samples Statistics Std Error Mean N Std Deviation Mean Pair1 Before 8080 5 6140 2746 After 2580 5 7530 3367 Paired Samples Test Paired Differences 95 Confidence Interval of the Difference Std Std Error Mean Deviation Mean Lower Upper t df Sig 2tailed Pa 1 22 39 55000 3937 1761 50112 59888 31238 000 Lecture 8 Section 71 amp 72 Page 8 A confidence interval or statistical test is called robust if the confidence level or P value does not change very much when the assumptions of the procedure are violated The t procedures are robust against nonnormality of the population when there are no outliers especially when the distribution is roughly symmetric and unimodal Robustness and use of the OneSample t and Matched Pair t procedures 0 Unless a small sample is used the assumption that the data comes from a SRS is more important than the assumption that the population distribution is normal 0 nlt15 Use t procedures only if the data are close to normal with no outliers o ngt15 The t procedure can be used except in the presence of outliers or strong skewness o n is large in 40 The t procedure can be used even for clearly skewed distributions Lecture 9 Section 71 amp 72 Page 9 Comparing Two Means TwoSample Problems A situation in which two populations or two treatments based on separate samples are compared A twosample problem can arise o from a randomized comparative experiment which randomly divides the units into two large groups and imposes a different treatment on each group 0 From a comparison of random samples selected separately from different populations Note Do not confuse twosample designs with matched pair designs Assumptions for Comparing Two Means 0 Two independent simple random samples from two distinct populations are compared The same variable is measured on both samples The sample observations are independent neither sample has an in uence on the other 0 Both populations are approximately normally distributed 0 The means ul and u2 and standard deviations 01 and 02 of both populations are unknown Typically we want to compare two population means by giving a confidence interval for their difference 1 u2 or by testing the hypothesis of no difference Howl 220 The TwoSample t Con dence Interval Suppose that an SRS of size 111 is drawn from a normal population with unknown mean Ltl and that an independent SRS of size 112 is drawn from another normal population with unknown mean 2 The confidence interval for 1 u2 given by 2 2 s s xl x2il 1 2 quot1 quot2 has confidence level at least C no matter what the population standard deviations may be Here t is the value for the tk density curve with area C between t and t The value of the degrees of freedom k is approximated by software or we use the smaller of 111 l and 112 l Lecture 9 Section 71 amp 72 Page 10 TwoSample t Procedure 1 Write the hypotheses in terms of the difference between means Howl 220 HazJl JZgt0 or HaLl 12lt0 or HazJl yzio 2 Calculate the test statistic A SRS of size 111 is drawn from a normal population with unknown mean ul and draw an independent SRS of size 112 from another normal population with unknown mean 2 To test the hypothesis H 0 ul 2 0 the twosample t statistic is x1 x 2 2 2 1 2 quot1 quot2 and use Pvalues or critical values for the tk distribution where the degrees of freedom k are either approximated by software or are the smaller of 111 l and I n l Note The twosample t statistic does not have a t distribution The software however uses a t distribution to do inference for twosample problems This is because it is approximately a t distribution with degrees of freedom calculated by a complex formula called the Welch Approximation 3 Calculate the PValue Note Unless we use software we can only get a range for the Pvalue We use the following formulas Hazyl y gt0 is PT2I Haul u2lt0 is PTSI Ha ul u2 20 is 2PT2I Note Instead of using the degrees of freedom found by software you can use the smaller of n1 1 and n2 1 The resulting procedure is conservative 4 State the conclusions in terms of the problem Choose a significance level such as 1 005 then compare the Pvalue to the 1 level If Pvalue S 1 then reject H 0 If Pvalue gt 1 then fail to reject H 0 Lecture 9 Section 71 amp 72 Page 11 Robustness and use of the TwoSample t Procedures The twosample t procedures are more robust than the onesample t methods particularly when the distributions are not symmetric They are robust in the following circumstances If two samples are equal and the two populations that the sample come from have similar distributions then the t distribution is accurate for a variety of distributions even when the sample sizes are as small as 111 2 n2 2 5 When the two population distributions are different larger samples are needed 171 n2 lt15 Use twosample t procedures if the data are close to normal If the data are clearly non normal or if outliers are present do not use I 11 11 gt15 The t procedures can be used except in the presence of outliers l 2 or strong skewness n1 n2 Z 40 The t procedures can be used even for clearly skewed distributions Lecture 9 Section 71 amp 72 Page 12 Examples 1 The US Department of Agriculture USDA uses many types of surveys to obtain important economic estimates In one pilot study they estimated wheat prices in July and in September using independent samples Here is a brief summary from the report Month n a S AM July 90 295 0023 September 45 361 0029 a Note that the report gave standard errors Find the standard deviation for each of the samples b Use a significance test to examine whether or not the price of wheat was the same in July and September Be sure to give details and carefully state your conclusion c Give a 95 confidence interval for the increase in price between July and September Lecture 9 Section 71 amp 72 Page 13 2 The survey for Study Habits and Attitudes SSHA is a psychological test designed to measure the motivation study habits and attitudes toward learning of college students These factors along with ability are important in explaining success in school Scores on the SSHA range from 0 to 200 A selective private college gives the SSHA to an SRS of both male and female firstyear students The data for the women are as follows 154 109 137 115 152 140 145 178 101 103 126 126 137 165 165 129 200 148 Here are the scores of the men 108 140 114 91 180 115 126 92 169 146 109 132 75 88 113 151 70 115 187 104 a Examine each sample graphically with special attention to outliers and skewness Is use of a t procedure acceptable for these data Lecture 9 Section 71 amp 72 Page 14 Lecture 9 Section 71 amp 72 Page 15 b Most studies found that the mean SSHA score for men is lower than the mean score in a comparable group of women Test this supposition here That is state the hypotheses carry out the test and obtain a PValue and give your conclusions Using SPSS Note The data needs to be typed in using two columns In the rst column you need to put all the scores In the second column de ne the grouping variable as gender and enter women next to the women s scores and men next to the men s scores Analyze gt Compare means gt Independent Sample T test Move score to Test Variable box and gender to grouping variable box Click de ne groups and enter women for group I and men for group 2 Click continue followed by OK Group Statistics Std Error group N Mean Std Deviation Mean score Women 18 14056 26262 6190 men 20 12125 32852 7346 Levene39s Test for Equality of Variances ttest for Equality of Means 95 Confidence Interval of the Mean Std Error D39 erence Sig 2 Dif ferenc Dif ferenc F Sig t df tailed e e Lower Upper score Equal variances 1030 317 1986 36 055 19306 9721 410 39021 assumed Equal Ezi39ances 2010 35537 052 19306 9606 185 38797 assumed Independent Sam ples Test We use the second line Equal variances not assumed to get the ttest statistic p value etc Lecture 9 Section 71 amp 72 Page 16 c Give a 95 confidence interval for the mean difference between the SSHA scores of male and female firstyear students at this college 3 Suppose we wanted to compare how students performed on test 1 versus test 2 in stat 301 Below is data for a random sample of 10 students taking stat 301 along with the printout of the results from running the matched pairs test ub39ect est 1 est 2 6 5 9 8 9 0 9 8 7 8 OJIIOOOOJOJCDU I Paired Samples Statistics Std Error Mean N Std Deviation Mean Pair Test 1 82 80 10 14 382 4 548 1 Test 2 80 70 10 14 507 4 588 Paired Samples Test Paired Differences 95 Con dence Interval ofihe Sid Error Di erence Mean Sid Deviation Mean Lower Upper 1 df Sig 2 iailed Pair 1 Test 1 Test 2 2100 3573 1130 456 4656 1859 9 096 Lecture 9 Section 71 amp 72 Page 17 Answer the questions below based on the test a Why was a matched pairs test used as opposed to a two sample t test b Is there a difference between test 1 and test 2 scores Write out the hypotheses and give the Pvalue Are our results significant at the 5 significance level c Suppose one of our friends thought test 2 was easier and the students generally did better on it We want to test whether the student is correct Write out the hypotheses to test this and give the Pvalue Are our results significant at the 5 significance level 4 An instructor thought the material for the second test was more difficult and hence the students may have done worse on the second test She decided to randomly sample 10 Exam 1 tests and 10 Exam 2 tests with each sample taken from all the test taken Below is data for the two random samples and the analysis Sample ample Test 1 est 2 5 6 8 8 0 9 9 7 7 8 Lecture 9 Section 71 amp 72 Page 18 G roup Statistics Std Error test N Mean Std Deviation Mean score 1 10 82 80 14 382 4 548 2 10 80 70 14 507 4 588 Independent SamplesTest Levene s Test for Equaht ofVanances Hestfor Equaht ofMeanS 95 Con dence lnterval ofthe Mean Std Error Werer F Srg t of Srg Hailed Drrrerence Drrrerence Lower Upper score Equal vanances assumed 015 905 325 18 749 2 100 6 460 711472 15 672 Equal vanances not assumed 325 17 999 749 2 100 6 460 711472 15 672 a What procedure was used and why b Write out the hypotheses to test this and give the PValue Are your results significant at the 5 significance level Lecture 9 Section 71 amp 72 Page 19

