Data Analysis II

by: Alison Vandervort

Data Analysis II STAT 529

Alison Vandervort
OSU
Staff

This 3 page Class Notes was uploaded by Alison Vandervort on Monday September 21, 2015. The Class Notes belongs to STAT 529 at Ohio State University taught by Staff in Fall.

Date Created: 09/21/15
STAT 529 The standard error of the proportion estimate The computer simulations in Chapter 3 examine the robustness of the pooled t procedures against deviations from normality and equal variance assumptions The tabled values in Display 34 and Display 35 are the percentages of 95 con dence intervals for M1 7 2 in 1000 replications that successfully contained a prespeci ed true value of M1 7 2 using the pooled t Each con dence interval was based on two independent samples from each of the distributions that the authors speci ed If the pooled t procedures were valid7 the proportion of 95 con dence intervals containing the true value of M1 7 2 would be exactly 95 If the actual coverage probability of 95 con dence intervals for each scenario can be mathematically obtained7 any deviation from the nominal level 95 indicates that the procedure is not valid As the exact coverage probabilities are dif cult to get in general7 the authors estimated them by the sample proportions of the con dence intervals successfully containing 1 7 2 out of 1000 replicates Given a xed scenario7 let p be the coverage probability of pooled t 95 con dence intervals Then the number of successful intervals X in 1000 replicates follows the binomial distribution Bnp with n 17 000 and the probability of success7 p By the central limit theorem7 the sample proportion 16 Xn has approximately normal distribution with mean M13 p and standard deviation 03 p1 7 So7 the standard error ofp is 161 7 13 n7 and an approximate 95 con dence interval for p is given by 13 i 196 M For example7 the estimate 955 in Display 34 for strongly skewed distribution and n1 n2 5 gives 0955 i1964095517 09551000 0955 i 19600066 09420968 as a 95 con dence interval for p Note that when p 0957 the standard deviation off is 4 0951 7 09517 000 x 00069 Thus7 roughly 095 i 19600069 095 i 0014 09367 0964 is the range ofp values for which H0 1 095 would not be rejected at the signi cance level of 5 In light of this calculation7 we conclude that the pooled t procedure would not be valid for the long tailed distribution across all the sample sizes in Display 34 For departures from the equal variance assumption in Display 357 the success rates for the cases with 0201 74 1 and n1 74 n2 suggest that the coverage probabilities are signi cantly different from the nominal level of 95 STAT 529 Birthweight One measure of the overall health of a newborn baby is its birthweight There are many factors which affect birthweight including both genetic factors such as mother s size or mother s birthweight and environmental factors One environmental factor which is believed to lower birthweight is maternal smoking The data below present birthweights of a small number of infants with the mother s smoking status during pregnancy recorded as nonsmoker someone who has never smoked former smoker light smoker or heavy smoker Birthweight is recorded in pounds with the ounces part translated into a decimal Non Former Light Heavy 75 58 59 62 62 73 62 68 69 82 58 57 74 71 47 49 92 78 83 62 83 72 71 76 62 58 54 The question of primary concern is whether a mother s smoking reduces the mean birthweight of an infant A graphical investigation produces the boxplots below 1 non smoker 2 former smoker 3 light smoker 4 heavy smoker 35 1 2 Maternal Smoking Birthweight l Descriptive Statistics Birthweight Variable Smoking N Mean SE Mean StDev Birthweight 1 7 7 586 0 363 0 962 2 5 7240 0408 0913 3 7 6329 0431 1140 4 8 6013 0255 0720 Two issues to consider when performing an analysis First we would like to make use of all of the information in our data Second we would like to avoid performing so many analyses on our data that we nd signi cant differences where there are none

