209 Class Note for STAT 30100 with Professor Sorola at Purdue

This 14 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015.

Date Created: 02/06/15

Chapter 6 Confidence Intervals and Hypothesis Testing Why do we even bother analyzing data We want to draw conclusions from the data Why can t we just accept our sample mean or sample proportion as the official mean or proportion for the population Every time we estimate the statistics Y 13 sample mean and sample proportion we get a different answer due to sampling variability Two most common types of formal statistical inference 0 Confidence Intervals when we want to estimate a population parameter 0 Si nificance Tests when we want to assess the evidence provided by the data in favor of some claim about the population yesno question about the population Confidence Intervals allow us to estimate a range of values for the population mean or population proportion The true mean or proportion for the population eXists and is a fixed number but we just don t know what it is Using our sample statistic we can create a net to give us an estimate of where to eXpect the population parameter to be Con dence interval net Population parameter invisible stationary butter yamp We don t know exactly where the butter y is but from our sample we have a pretty good estimate of the location Density curve of If we take a single sample our single confidence interval net may or may not include the population parameter However if we take many samples of the same size and create a confidence interval from each sample statistic over the long run 95 of our confidence intervals will contain the true population parameter if we are using a 95 confidence level Figure 63 Introduction to the Practice nfStatistics Fifth Edition 2005 w HFreeman and Company If you increase the sample size 11 you decrease the size of your net or your margin of error n320 n1280 14000 16000 18000 20000 22000 24000 26000 Figure 55 introduction to the Practice of Statistics Fifth Edition C 2005 W H Freeman and Company If you increase your con dence level C then you increase the size of your net or your margin of error 99 confidence 95 confidence 14000 16000 18000 20000 22000 24000 26000 Fig ure 66 Introductian to the Practice of Statistics Fifth Editiun 2005 W HFreeman and Company A smaller net is good because it gives you more information It is a smaller range for Where to expect your true population parameter Freeman applet Go to course website Freeman link statistical applets con dence interval Confidence intervals look like estimate i margin of error Confidence Interval for a Population Mean 11 i27quot J Where 2 is the value on the standard normal curve with area C between 72 and 2 Table D at the back of the book also contains more 2 values on the bottom row 2 1645 1960 2576 90 95 99 C Remember from Ch 5 that the mean and standard deviation for a sample mean are J czlux mic J Also remember that if X is normally distributed then X will be too and if n is large the sample mean will be approximately normally distributed even if X is not normally distributed Central Limit Theorem What if your margin of error is too large Here are ways to reduce it 0 Increase the sample size bigger n 0 Use a lower level of confidence smaller C 0 Reduce 0 Sample Size n for Desired Margin of Error m 20 2 n 7 m Note that it is the sample size n that in uences the margin of error The population size has nothing to do with it Be careful You can only use the formula f iz under certain 71 circumstances 0 Data must be an SRS from the population 0 Do not use if the sampling is anything more complicated than an SRS 0 Data must be collected correctly no bias The margin of error covers only random sampling errors Undercoverage and nonresponse are not covered 0 Outliers can have a big effect on the confidence interval This makes sense because we use the mean and standard deviation to get a CI 0 You must know the standard deviation of the population 0 Examples 39 l A questionnaire of drinking habits was given to a random sample of fraternity members and each student was asked to report the of beers he had drunk in the past month The sample of 30 students resulted in an average of 22 beers with standard deviation of 9 beers a Give a 90 confidence interval for the mean number of beers drunk by fraternity members in the past month b Is it true that 90 of the fraternity members each month drink the number of beers that lie in the interval you found in part a Explain your answer c What is the margin of error for the 90 confidence interval d How many students should you sample if you want a margin of error of l for a 90 confidence interval 2 A sample of 12 STAT 301 students yields the following Exam 1 scores 78 62 99 85 94 53 88 90 86 92 75 92 Assume that the population standard deviation is 10 The sample mean can be calculated using SPSS or calculator to be 8283 Note Do NOT use any SPSS confidence intervals they are good only for Chapter 7 not this type of CI You must get these Z confidence intervals by hand a b d Find the 90 confidence interval for the mean score LL for STAT 301 students Find the 95 confidence interval Find the 99 confidence interval How do the margins of error in b c and 1 change as the confidence level increases Why 6 H othesis Testin The 4 steps common to all tests of significance 1 State the null hypothesis H0 and the alternative hypothesis Hg 2 Calculate the value of the test statistic 3 Draw a picture of what Ha looks like and find the Pvalue 4 State your conclusion about the data in a sentence using the Pvalue andor comparing the Pvalue to a significance level for your evidence STEP 1 State the null hypothesis H0 and the alternative hypothesis Ha To do a significance test you need 2 hypotheses 0 H0 Null Hypothesis the statement being tested usually phrased as no effect or no difference 0 Ha Alternative Hypothesis the statement we hope or suspect is true instead of H 0 Hypotheses always refer to some population or model Not to a particular outcome Hypotheses can be onesided or twosided 0 Onesided hypothesis covers just part of the range for your parameter H0 ulO OR H0u10 Haugt10 Ha3ylt10 0 Twosided hypothesis covers the whole possible range for your parameter H0 u 10 Ha It 10 Even though H is what we hope or believe to be true our test gives evidence for or against H0 only We never prove H0 true we can only state whether we have enough evidence to reject H0 which is evidence in favor of Ha but not proof that H is true or that we don t have enough evidence to reject H0 Example Exercise 637 p 418 Each of the following situations requires a significance test about a population mean u State the appropriate null hypothesis H0 and alternative hypothesis Ha in each case a Census Bureau data shows that the mean household income in the area served by a shopping mall is 72500 per year A market research firm questions shoppers at the mall to find out whether the mean household income of mall shoppers is higher than that of the general population b Last year your company s service technicians took an average of 18 hours to respond to trouble calls from business customers who had purchased service contracts Do this year s data show a different average response time STEP 2 Calculate the value of the test statistic A test statistic measures compatibility between the H 0 and the data The formula for the test statistic will vary between different types of problems In problems like those we studied in Chapter 6 the test statistic will be the Zscore STEP 3 Draw a picture of what Ha looks like and find the Pvalue P value the probability computed assuming that H0 is true that the test statistic would take a value as extreme or more extreme than that actually observed due to random uctuation It is a measure of how unusual your sample results are 0 The smaller the Pvalue the stronger the evidence against H0 provided by the data 0 Calculate the Pvalue by using the sampling distribution of the test statistic only the normal distribution for Chapter 6 STEP 4 Compare your Pvalue to a significance level State your conclusion about the data in a sentence 0 Compare Pvalue to a signi cance level 01 o If the P value S 04 we can reject H0 0 If you can reject H0 your results are significant 0 If you do not reject H0 your results are not significant To LestthehypoLhesxs H u gbased on an sRs ofsxze n from a populauon mun unlmown mean a and known standard devxauon 17 compuLeLheLesLsLausuc z m Lhe Prvalues for a Best 0ng agamst H gt dxsFZ2Z H 414152quotFZ2Zg 2 H lt axsFZ Zn z ThesePrvalues are aiachthe populauon ls nonnauy dxsmbumd and are approxxmately correct for large n In other cases 1 Examples 39 1 Last year the government made a claim that the average income of the American people was 33950 However a sample of 50 people taken recently showed an average income of 34076 with a population standard deviation of 324 Is the government s estimate too low Conduct a significance test to see if the true mean is more than the reported average Use an 0L001 An environmentalist collects a liter of water from 45 different locations along the banks of a stream He measures the amount of dissolved oxygen in each specimen The mean oxygen level is 462 mg with the overall standard deviation of 092 A water purifying company claims that the mean level of oxygen in the water is 5 mg Conduct a hypothesis test with 0L0001 to determine whether the mean oxygen level is less than 5 mg 11 How does X relate to confidence intervals If you have a 2sided test and if the Gland confidence level add to 100 you can my H0 if M the number you were checking is not in the confidence interval Example An agroeconomist examines the cellulose content of a variety of alfalfa hay Suppose that the cellulose content in the population has a standard deviation of 8 mg A sample of 15 cuttings has a mean cellulose content of 145 mg a A previous study claimed that the mean cellulose content was 140 mg Perform a hypothesis test to determine if the mean cellulose content is different from 140 mg if 0L005 b Find a 95 confidence interval for the mean cellulose content c Now try the test from part a again using the confidence interval from part b to do the hypothesis test The result should be the same 12 Annual Drinking Water Quality Report 2004 Town of Brookston IN I m pleased to report that our drinking water is safe and meets federal and state requirements Test Results MCL is the maximum contaminant level the highest level of a contaminant that is allowed in drinking water Contaminant Violation Level Unit MCL YN Detected measurement Betaphoton N 21r 32 mremyr 4 emitters Alpha N 0i16 pCil 15 emitters B arium N 02 1 6 ppm 2 Copper N 0039 to ppm 13 045 3 Fluoride N 001 ppm 4 Sodium N 00 ppm NA One of these violation reports should actually be a yes instead of a no Which one is it and Why What hypotheses go along with these confidence intervals Note When I called the town of Brookston o ice to ask them about this the water manager called the state EPA o ice to get more information What they told him was that yes technically I was correct but that they don t use the con dence intervals that are reported Apparently these are the FEDERAL EPA rules They only use the mean I tried to get sample size or other information butI wasn t able to learn anything more 13 Pvalues can be more informative than a rejectdo not reject H0 based on I As P value gets smaller the evidence for rejecting H0 gets stronger Just because we use I 005 a lot doesn t mean that s the level you have to use it s just the most common There s nothing particularly special about that level In a large sample even tiny deviations from the null hypothesis can be important If we fail to reject H0 it may be because H0 is true or because our sample size is insufficient to detect the alternative Plot your data m look at your Pvalue to determine your conclusions Could outliers be part of the problem A confidence interval actually estimates the size of an effect rather than simply asking if it is too large to reasonably occur by chance alone You must have a welldesigned eXperiment in order for statistical inference to work Randomization is important 14

