# Class Note for STAT 528 at OSU 02

This 16 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015.

Date Created: 02/06/15

Stat 528 Autumn 2008 Elly Kaizar Inference for a single proportion Reading Section 81 0 Review of counts and proportions Means and stdevs of counts proportions Approximate sampling distributions 0 The signi cance test for a single proportion 0 Coin ipping example 0 Con dence intervals for a single proportion 1 lnverting a family of hypothesis tests 2 The plug in method in textbook 3 The p 12 make the standard error big77 method 4 An exact method using the binomial distribution A motivating example An entomologist samples a eld for egg masses of a harmful insect by placing a yard square frame at random locations and carefully inspecting the ground Within the frame An SR8 of 75 locations selected from a county7s pastureland found egg masses in 13 lo cations Compute the proportion of locations Which contain egg masses 0 ls the proportion of locations With egg masses at least 010 0 Form a 90 con dence interval for the proportion of all pos sible locations that are infested Review Counts and proportions 0 Suppose we ask a random sample of size n a yes1no0 question Let the RV X denote the number of yes answers 0 The sample count is X o The sample proportion is X 23 TL 0 Suppose the following are true 1 There are a xed number of n observations in the sample 2 The n observations are all independent 3 There are only two outcomes for each observation 4 The success probability p constant for each observation 0 Then X has a Binomial 30117 distribution where p is the true population proportion Review Means and stdevs of countsproportions o For the count X we have MX npa and 0X xnp pgt o For the sample proportion 1 lug I7 thus I is an unbiased estimator of p and 130 1 O39A p n o The standard deviations 0X and 05 both depend on the pop ulation proportion p Review Approximate sampling distributions 0 IF np Z 10 or 5 and M1 p Z 10 or 5 1 The number count of successes is the sample X has approximately a N011 np1 17 distribution 2 Also the sample proportion of successes 13 is approx imately a N p 1 RV 0 Problem 0X and 05 both depend on the population pro portion p We can handle this trivially when conducting a hypothesis test We need to think carefully about this when forming con dence intervals The signi cance test for a population proportion o Hypotheses H 0 I p p0 for some constant p0 VGFSUS Ha ppopgtpoORpltpo o A test statistic sometimes called the score test statistic i Po z 7 Polt1 P0 71 0 Check that the population size is at least ten times the size of the sample and that npo Z 10 and M1 pg 2 10 o If so then under H0 the test statistic approximately follows a NO 1 distribution 0 Another test statistic sometimes called the Wald test statis tic I P0 230 23gt 39 n 2W7 Test for a population proportion cont o Pvalue Find the appropriate area under the standard nor mal Z density curve As usual For Ha p lt p0 the P value is PltZ g For Ha p gt p0 the P value is PltZ Z For Ha 2p y p0 the P value is 2PltZ Z This P value is approximate since the distribution of the test statistic is approximately normal 0 For a test of signi cance at the level or If the P Value 3 oz we reject H0 If the P Value gt oz we fail to reject H0 Coin ipping example The English mathematician John Kerrich ipped a coin 10000 times and obtained 5067 heads 0 Do the data provide evidence signi cant at the 5 level that the probability that Kerrich7s coins comes up heads is not 05 Con dence intervals for a proportion 0 Consider a population parameter 6 If an unbiased estimator g has an approximately normal distribution an approximate 1001 00 Cl for 6 is 5a zaQsm 0 Thus for a population proportion an approximate 1001 00 Cl for p is 130 13 Aj 2a P 2 n 0 But we don7t have a value for p What can we do There are many ways to solve this problem We focus on inverting a family of hypothesis tests Inverting a family of hypothesis tests We use reasoning identical to that used when constructing a two sided con dence interval for the sample mean 0 Begin with a family of hypothesis tests 0 A member of the family is H0 p p0 against HA p y p0 Different Ha result in onesided intervals The tests are all performed at a speci ed level oz 0 Collect together all values of po that are not rejected by the hypothesis test 0 This set of values is the 1001 00 con dence interval for p 10 Inverting a family of hypothesis tests o If the test statistic is i Po z 7 P0lt1 P0 n Then based on the largesample normal approximation of the sampling distribution of 1 the con dence interval is given by 2 A 2 13 2W i ZaQ p lim gal W n 4712 2 Z 1 o If the test statistic is i P0 2W 7 A A plt1 p n Then based on the largesample normal approximation of the sampling distribution of 1 the con dence interval is given by gal 23gt Aia P 22 n This is also called a plugin interval o If the sample size is large are these very different 11 About the inverted Wald test Studies show that if we attempt to get a 1001 00 Cl p using the inverted Wald test then we tend to end up with a Cl that has a true con dence level that is less than 1001 00 c We say that we obtain a liberal Cl for p o For a xed value of p the intervals tend to become less liberal as 71 increases 12 Approximating the inverted score test interval 0 Consider the inverted score test interval 2 A 2a A17 2a mafia2 Lame22gt 2 2042 71 1 0 Approximate Zag with 2 lti2gt 77 A 2 A1 2 p niza2 M mlt2 gt 1 22 1 g 1 c We add two successes and add two failures77 by noting the center of the interval is iX2i X2 i 1 Ain447n2247R and further approximating the standard error by 50 13 73 S m n4 o This interval matches the score interval as 71 tends to 00 o This method has shown a resurgence of popularity 13 Inverting a test based on the binomial distribution c There is an exact calculation that uses the fact that X has a Binomial distribution 0 Di icult to calculate by hand 0 This is the method that MlNlTAB uses as the default in Stat gt Basic Statistics gt 1Proportion 0 Note this exact method produces approximate con dence intervals because it hard to get a certain con dence level 3 with a discrete RV Thus the intervals tend to be conservative o If you check the box Use test and interval based on the normal distribution under Options Minitab Will use the score test for the hypothesis test and Will use the inverted Wald test for the con dence interval Though rare the two methods may con ict try a two tailed test of H0 p 05 with data consisting of 57 successes in 95 trials 14 Returning to the entomology example 0 Form a 90 con dence interval for the proportion of all pos sible locations that are infested Compare the four methods for calculating the con dence interval Which one would you pick 15 Coin ipping example sample size 0 5067 heads out of 10000 ips corresponds to a proportion of 05067 How many ips would Kerrich have needed to make to detect a population proportion value of 05067 with at least 80 power 0 About how many ips would Kerrich have needed to create a con dence interval with a margin of error of 001 16

