This 9 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015.

Date Created: 02/06/15

Stat 528 Autumn 2008 Inference for a single proportion Reading Section 8 1 0 Review of counts and proportions 7 Means and stdevs of counts proportions 7 Approximate sampling distributions 0 The signi cance test for a single proportion 0 Coin ipping example 0 Con dence intervals for a single proportion 7 1i lnverting a family of hypothesis tests 7 2 The plugein method in textbook 7 3 The p 12 make the standard error big method 7 4 An exact method using the binomial distribution A motivating example An entomologist samples a eld for egg masses of a harmful insect by placing a yardesquare frame at random locations and carefully inspecting the ground Within the frame An SR8 of 75 locations selected from a county s pastureland found egg masses in 13 107 cations 0 Compute the proportion of locations which contain egg masses 0 ls the proportion of locations with egg masses at least 010 0 Form a 90 con dence interval for the proportion of all pos7 sible locations that are infested Review Counts and proportions 0 Suppose we ask a random sample of size n a yes1no0 question Let the RV X denote the number of yes answers 0 The sample proportion is X 5 Tl 0 Assume the following 1 There are a xed number of n observations in the sample 2 The n observations are all independent 3 There are only two outcomes for each observation 4 The success probability p constant for each observation 0 Then X has a Binomial Bnp distribution where p is the true population proportion Review Means and stdevs of countsproportions o For the count X we have MX 71177 and 0X np17p o For the sample proportion 1 thus 1 is an unbiased estimator of p and p1p 0 p Tl o The standard deviations 0X and 05 both depend on the pop ulation proportion p Review Approximate sampling distributions 0 IF mp 210 and n17p 210 1 The number count of successes is the sample X has approximately a N np np1 17 distribution 2 Also the sample proportion of successes 13 is approx imately a Npp1 Pln RV 0 Problem 0X and 05 both depend on the population pro portion p 7 We can handle this trivially when conducting a hypothesis test 7 We need to think carefully about this when forming con7 dence intervals The signi cance test for a population proportion 0 Hypotheses H0 I p pa for some constant p0 versus Ha 11975290729 gtpo7OR pltpo o The test statistic is 7 5 170 Z 7 1701 7170 n 0 Check that the population size is at least ten times the size of the sample and that npo 2 10 and n1 7 pg 2 10 o If so then under H 0 the test statistic approximately follows a N0 1 distribution Test for a population proportion cont o PValue Find the appropriate area under the standard nor7 mal Z density curvei As usual 7 For Ha I lt p0 the Pevalue is PZ g 7 For Ha I gt p0 the Pevalue is PZ 2 7 For Ha I 75 p0 the Pevalue is 2PZ 2 This Pevalue is approximate7 since the distribution of the test statistic is approximately normal 0 For a test of signi cance at the level a 7 If the PeValue g a we reject Hot 7 If the PeValue gt a we fail to reject Hoi Coin ipping example The English mathematician John Kerrich ipped a coin 10000 times and obtained 5067 heads 0 Do the data provide evidence7 signi cant at the 5 level7 that the probability that Kerrich s coins comes up heads is not 057 Con dence intervals for a proportion 0 Using similar arguments to deriving a con dence interval for the population mean a 1001 7 60 Cl for the population proportion p is found by inverting a family of hypothesis tests If the standard error did not depend on p we could write M17m iZaQ n M17m o The standard error SE of is n 0 But we don t have a value for p What can we do 7 There are many common ways to solve this problem The text discusses a couple of them 1 Inverting a family of hypothesis tests c We use reasoning identical to that used when constructing a con dence interval for the sample mean 7 Begin with a family of hypothesis tests 7 A member of the family is H0 p p0 against HA p 75 p0 The tests are all performed at a speci ed level a 7 Collect together all values of po that are not rejected by the hypothesis test 7 This set of values is the 1001 7 60 con dence interval for p o The usual test is based on the largesample approximation of the sampling distribution of o The points of indecision occur when jag2 amp 17017170 Tl 10 o The con dence interval is given by 5 2 22 AM i n 22 22 m 7237 7 1 1 11 2 The plugin method o A second approach is based on introducing another level of approximation The standard error is estimated by plugging pin for p to get mm Tl m 0 Studies show that with a 0 Cl for p using this SE then we tend to end up with a true con dence level that is less than 0 7 We say that we obtain a liberal Cl for pr 7 For a xed value of p7 the intervals tend to become less liberal as 71 increases 7 These intervals match the method 1 intervals as 71 tends to 00 12 3 Wilson s estimate the plus four method c We add two successes and add two failures h Let N7 X2 7X2 n22 7 n4 and the estimated standard error be A 5mm SE03 w 7 These intervals match the method 1 intervals as 71 tends to 00 o This method has shown a resurgence of popularity To see where it comes from7 begin with method 17 plug in 2 for 2112 multiply numerator and denominator by n and discard a term or two while simplifying 13 4 The p 12 make the standard error big77 method 0 p1 7p is largest when p 127 and so a conservative value for the standard error is o If we calculate a 0 Cl for p using this SE then we end up with a true con dence level that is more than 0 7 This is a conservative con dence interval for pr 7 For a xed value of p7 the intervals become less conserva7 tive as 71 increases 7 These intervals do not match the method 1 intervals as 71 tends to 00 o This method is often used when reporting the results of a large survey erg7 the results in the poll have a margin of error due to sampling of 3 14 5 Using the binomial distribution c There is an exact calculation that uses the fact that X has a Binomial distribution 0 Dif cult to calculate by hand 0 This is the method that MlNlTAB uses as the default in Stat gt Basic Statistics gt 1Proportioni 0 Note this exact method produces approximate con dence intervals because it hard to get a certain con dence level7 0 with a discrete RVi Thus the intervals tend to be conservative o If you check the box Use test and interval based on the normal distribution under Options7 Minitab will use a normal approximation for the hypothesis test as in method 1 and will use the plugin method method 2 for the con dence interval Though rare7 the two methods may con ict try a twoetailed test of H0 p 05 with data consisting of 57 successes in 95 trials 15 Returning to the entomology example 0 Form a 90 con dence interval for the proportion of all pos sible locations that are infested Compare the four methods for calculating the con dence interval Which one would you pick 16 Coin ipping example cont 0 5067 heads out of 10000 ips corresponds to a proportion of 05067 Using lVllNlTAB7 calculate the number of ips that Kerrich would have needed to make to detect a population proportion value of 05067 with at least 80 power Test for One Proportion Testing proportion 0 5 Versus not 0 5 Alpha 0 05 Alternative Sample Target Proportion Size Power Actual Power 0 5067 43710 0 8 0 800007 17

