Date Created: 02/06/15
Stat 528 Autumn 2008 Comparing two proportions Reading Section 82 0 Comparing two proportions A motivating example 0 Mean and variance of the difference in sample proportions 0 Sampling distribution for the difference in sample proportions o The signi cance test for a difference in proportions The variance of the difference under H0 0 Approximations for the standard error of the difference 0 A con dence interval for the difference in proportions Comparing two proportions a motivating example A study was designed to nd reason why patients leave a health maintenance organization Patients were classi ed as to whether or not they had led a complaint with the HMO We want to compare the proportion of complainers who leave the HMO with the proportion of those who do not le complaints ln the year of the study 639 patients led complaints and 54 of these patients left the HMO voluntarily For comparison the HMO chose an SR8 of 743 patients who did not le complaints Twenty two of these patients left voluntarily o ls there a difference in the two proportions 0 Provide a 95 con dence interval for the difference in the two proportions Comparing two proportions 0 Let p1 denote the proportion of successes for population 1 and let p2 be the success proportion for population 2 0 Suppose we have a SR8 of size 711 from population 1 and an independent SR8 of size 712 from population 2 Let X1 denote the number of successes in the sample from population 1 and 131 X1 n1 be the associated sample proportion Let X 2 denote the number of successes in the sample from population 2 With sample proportion 132 Xgng 0 Let D l g be the difference between the two sample proportions Mean and variance for the difference in sample pro portions By the rules for means MD 1517152 Mp1 152 P1 P2 0 Thus D is an unbiased estimator for the difference in the population proportions o By independence of the samples and the rules for variances 2 i 2 0D 01917192 01271lt1gt201272 i 2 2 i 0171 0172 P1lt1 P1 P2lt1 P2 n1 n2 Sampling distribution for the difference in sample proportions o For large m and 712 D has an approximate 1 1 Nltp1 p27 P1lt P1 P2lt 132 n1 n2 distribution o If SEltDgt did not depend on p1 and p2 a test would be based OD 1 132 50 Z SED o A 1001 00 Cl for the difference in the population pro portions pl p2 would be 31 g l Zag 0 Since p1 and p2 are unknown in practice we need to approxi mate SED The signi cance test for a difference in proportions o Hypotheses H0 3 P1 P2 0 equale tly P1 P2 versus Ha 2131 132 lt 00Rp1 p2 070Rp1 p2 gt 0 o The test statistic is 1 132 SEltDP Where Dp is the standard error of the pooled estimate of the common value of p1 and p2 as we now explain The variance under H0 0 Under H0 p1 p2 let p p1 p2 be the common population parameter 0 Then X1 is a 301113 RV and X2 is a 301217 RV 0 These two RVs are independent and so X1 X2 is a 3011 71213 RV 0 An estimate of p is X1X2 n1n2 gt H and so SEltDpgt n1 23gt i i n1 n2 The test for a difference in proportions cont 0 Check 1 711131 2 and 7111 1gt Z 2 712132 2 10 and 7121 132 Z 0 Then under H0 the test statistic approximately follows a N O 1 distribution 0 Pvalue Using Table A calculate the area under the Z distribution curve For HQ 2131 p2 lt O the P value is PltZ g For HQ 2131 p2 gt O the P value is PltZ Z For HQ 2131 p2 y O the P value is 2PltZ Z The test for a difference in proportions cont o The P vahie is approximate since the distribution of the test statistic is approximately normal 0 For a test of signi cance at the level or If the P Vahie 3 oz we reject H0 If the P Vahie gt oz we fail to reject H0 Testing in the HMO example c We expect a higher proportion of oomplainers to leave Do the data support this belief 10 Horn tests to intervals o Hypothesis tests The hypothesis test for no difference77 was simpli ed by the presumption of a common proportion under the null Other hypothesized differences are more di icult How would you estimate p1 and p2 under the restriction that p2 p1 6 for some speci ed 6 0 Con dence intervals Without the base of a solid family of hypothesis tests the construction of a con dence interval becomes fuzzy Idea Since the plug in method worked for a single pro portion we could try the same for two proportions This idea motivates the most commonly used con dence interval for the difference between two proportions Alternatively we could patch up the interval a little with a parallel to the plus four77 method 11 Approximations for SED 1 Plug in method A A 1 A A 1 A SEltDgt Pd P1gtP2ltn P2 1 2 2 Wilson7s plus four77 estimate We add one success and one failure to each proportion Let X1 1 d X2 1 pl 711 2 p2 712 2 The estimated standard error is SEltDgt P1lt P1 P2lt P2 n12 7122 12 The con dence interval 0 An approximate 1001 00 con dence interval for p1 p2 is given by 131 132 i Za2SEltDgt7 Where D is given by the plug in method 0 Alternatively rarely used an approximate 1001 Oz con dence interval for p1 p2 is given by 171 I72 3 Za2SEltDgt7 Where D is given by the plus four method 13 A con dence interval in the HMO example 0 Form a 95 con dence interval for the difference in the two proportions 14 Other issues 0 Power calculations The calculations follow the same methodology as the ear lier power calculations With Minitab use the command sequence Stat gt Basic Statistics gt 2Proportions 0 Many studies investigate small proportions Does a particular prescription increase the risk of death due to cardiovascular disease ln these settings inference is often about the ratio of the two proportions 12 P1 ln this setting no difference corresponds to a ratio of 1 15

