Elementary Statistics STAT 200
Popular in Course
Popular in Statistics
This 0 page Class Notes was uploaded by Hilbert Denesik on Sunday November 1, 2015. The Class Notes belongs to STAT 200 at Pennsylvania State University taught by Staff in Fall. Since its upload, it has received 18 views. For similar materials see /class/233132/stat-200-pennsylvania-state-university in Statistics at Pennsylvania State University.
Reviews for Elementary Statistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 11/01/15
CHAPTER 12 MORE ABOUT CONFIDENCE INTERVALS How much difference is there between the mean pulse rates of women and men What is the mean amount that Pen State students study per day Both of these questions are examples of questions that involve the estimation of a population value The statistical method we use to make estimates is a confidence interval You may recall that in Chapter 10 we encountered this de nition A con dence interval is a range of values that is likely to contain the population value In this Chapter we ll learn how to determine confidence intervals for several different types of research questions The general format of the intervals will be the same in each scenario For each scenario the objective is also the same which is that we use sample information to estimate a population value The following two definitions will be useful in the discussion of this objective gt A parameter is a population characteristic The numerical value of a parameter is not known We have to estimate the parameter using sample information V A statistic or estimate is a characteristic of a sample A statistic estimates a parameter 12 1 Examples of Different Situations Involving Estimation Here are four common research situations that can be analyzed using confidence intervals We ve already covered one of those situations the estimation of a proportion but we include it here to emphasize the fact that the confidence intervals presented in this Chapter all have the same general structure For each situation we provide two examples the notation for the parameter and notation for the sample estimate 1 Estimating the proportion falling into a category of a categorical variable gt Example Research Questions 0 What proportion of Penn State students believe there is extraterrestrial life 0 What proportion of American adults think marijuana should be legalized 357 VPVV VV Equot V V 395 V V V V Notation for population parameter p Notationfor Sample Estimate p Estimating the mean of a quantitative variable Example Research Questions 0 What is the mean time that Penn State students study per day 0 What is the mean pulse rate of women Notation for population parameter u pronounced mu Notation for Sample Estimate Z the sample mean Estimating the difference between two populations with regard to the proportion falling into a category of a categorical variable Example Research Questions 0 How much difference is there between the proportions of PSU males and PSU females with regard to the proportion who believe there is extraterrestrial life 0 How much difference is there between men who snore and men who don t snore with regard to the proportion who have heart disease Notation for population parameter p1 p2 where pland p2 represent the proportions in populations 1 and 2 respectively Notation for Sample Estimate 131 132 the difference between two sample proportions Estimating the difference between two populations with regard to the mean of a quantitative variable Example Research Questions 0 How much difference is there between the mean foot lengths of men and women 0 How much difference is there between science majors and liberal arts majors with regard to the mean hours of spent studying per week Notation for population parameter LLl uz where uland uz represent the means in populations 1 and 2 respectively Notation for Sample Estimate Z 2 the difference between two sample means 358 12 2 Approximate 95 Con dence Intervals It s not very likely that a sample estimate will exactly equal the population parameter There s nearly always going to be an error due to the act of sampling The size of this error varies from sample to sample and because we don t know the population value we don t know the precise amount of error in any particular estimate We can however determine approximately the average sampling error that may occur over the many possible samples that could be taken from a population Standard Error of a Sample Statistic The standard error of a sample statistic measures roughly the average difference between the statistic and the population parameter This average difference is over all possible random samples that can be taken from the population With a suf cient sample size it is generally true that for 95 of all random samples from a population the difference between the sample statistic and the population parameter is less than two standard errors The phrase sufficient sample size is of course a vague phrase Also the sample size requirements depend upon whether you re estimating a proportion or a mean We ll provide guidelines within the context of each con dence interval described in this section In Chapter 10 we learned that an approximate 95 confidence interval for a proportion is calculated as sample proportioni39 2 X standard error The general format of this interval applies to all four situations described in this Chapter 359 Approximate 95 Con dence Interval for a Parameter With a sufficient sample size it is generally true that a 95 con dence interval for a population parameter is sample estimate i 2 X standard error For 95 of all random samples from a population this interval contains the population value Calculating the Standard Error The general format of the approximate 95 con dence interval is the same for all of the situations in this Chapter The formula for calculating of the standard error however depends upon the situation Here are the formulas for the standard error in the four different scenarios we re considering We use the notation seestimate to represent a standard error Remember that in all situations the standard error measures the average difference between the sample estimate and the population parameter 1 Standard Error of a Sample Proportion A 13113 sep n 2 Standard Error of a Sample Mean sei i JR 3 Standard Error of the Difference between Two Proportions P11l31 l321l32 n1 n2 SeP1l32 4 Standard Error for the Difference between Two Means s s seX1 X2 H1 360 Example 1 Mean Hours per Day that College Students Watch Television In one of our class surveys a question was In a typical day about how much time do you spend watching television Immediately below there s a numerical summary of the responses in hours The sample mean was 209 hours for the n175 students who participated in the survey Variable N Mean Median TrMean StDev SE Mean 175 209 2000 1950 1644 0124 Based on this information what is our con dence interval estimate of the mean hours of television per day in the population represented by this sample From the Minitab output we see that the standard error of the mean is 0124 look under SE Mean We can verify this value by computing se i w 0124 IE J A somewhat loose interpretation of this standard error is that it measures roughly the likely difference between the sample mean and unknown value of the population mean An approximate 95 con dence interval estimate of the population mean is ii 2 X se 209 i 2 X 0124 209 i 0248 1842 to 2338 This interval is likely to contain the population mean In a research a1tic1e the confidence interval might be described as follows We are 95 con dent that the mean time that college students spend watching television per day is somewhere between 1842 anal 2338 hours Tur 0n Ynur Mind gt What populauon do you thnle represented by the sample of175 students m Example 17 teleyrsron between 1 842 and 2 338 hours oftelevlslon per day7 Knot what exaetly does the rnteryal tell 57 Flgure121aboxplotofthelnd1vldua1 responses should help you wth tlus task 17in 11 1 v v l v v r 2 4 6 8 10 12 Hours Watching TV in Typical Day Example 2 Dn Men Lnse Mare Weight by Diet hr by Exercise Wood et al 1988 also reported by Iman 1994 p 258 smded agroup of 89 sedentary m v F r r l V an exerclse routrne The group on a det lost an average of7 2 kg wth a standard deyratron of 37 kg The men who exerersedlost an average of4 0 kg wrth a standard deyratron of3 9 kg Wood et al 1988 The dafference between the two means ls erz 7 274 0 3 2kg about 7 pounds wrth the deters 105mg more The standard 2 2 2 z enorofthlsstanstlclsse xriz 2 1081 The standard error measures the accuracy of the statistic x1 x2 An approximate 95 con dence interval for the difference between population means is sample difference i 2 X standard error 32 i 2 X 081 32 i 162 It is likely that the difference between the mean losses is contained in this interval Again we note that the dieters lost more weight 123 General Con dence Intervals for Means In the previous section we looked at approximate 95 confidence intervals The format if the sample was large was sample estimate 1 2 X standard error This format is a speci c case of a more generally useful format for a confidence interval sample estimate i39 multiplier X standard error In general the appropriate multiplier depends upon the desired con dence level the sample size and the type of statistic that we re dealing with When we re dealing with means the multiplier is denoted as t and the general format of a con dence interval con dence interval for a population mean is sample mean i39 t X standard error The multiplier t is determined using a probability distribution called the t distribution This is a close cousin to the standard normal distribution Theoretically the t distribution arises when a sample standard deviation is used in place of a population standard deviation when a standardized score for the observed mean is calculated A t distribution has a bell shape it s centered at 0 and it is more spread out than the standard normal curve in that there s more probability in the extreme areas than there is for the standard normal curve A parameter called degrees of freedom abbreviated as df is associated with any tdistribution In most applications this parameter is a function of the sample size for the problem but the formula for the degrees of freedom depends on the type of problem For problems involving inference about a single mean df n 1 where n is the sample 363 size A calculus property of the tdistribution is that as the degrees of freedom value increases the distribution gets closer to the standard normal curve All of this may sound confusing but it s actually simple to determine a 95 con dence interval for a population mean All we need is a table or statistical software or the right calculator to nd the multiplier for the desired con dence level Table 121 shows multipliers for four dilTerent con dence levels Example 3 Imagine that we are calculating a 95 con dence interval estimate of a mean based on a sample of n9 values The degrees of freedom918 From table 121 we see that the correct multiplier is 231 So the 95 con dence interval will be calculated as ii 231gtlt 1 J Calculating a Con dence Interval for a Population Mean 1 Determine the sample mean and standard deviation Eand s 2 Calculate the standard error of the mean se n 3 Calculate df n1 and choose a con dence level Use Table 121 or statistical software to nd t 4 The interval is Xi tquot X se Note 1 This procedure is correct for all sample sizes but it is common conceptual practice to use the value 2 as the multiplier when n 230 Note 2 In Minitab the calculations are automated Click StatgtBasic Statsgt1 Sample t 364 Table 121 t Multipliers for Con dence Intervals for Means 01 Difference Between Means Confidence Level DF 090 095 098 099 1 631 1271 3182 6366 2 292 430 696 992 3 235 318 454 584 4 213 278 375 460 5 202 257 336 403 6 194 245 314 371 7 189 236 300 350 8 186 231 290 336 9 183 226 282 325 10 181 223 276 317 11 180 220 272 311 12 178 218 268 305 13 177 216 265 301 14 176 214 262 298 15 175 213 260 295 16 175 212 258 292 17 174 211 257 290 18 173 210 255 288 19 173 209 254 286 20 172 209 253 285 21 172 208 252 283 22 172 207 251 282 23 171 207 250 281 24 171 206 249 280 25 171 206 249 279 26 171 206 248 278 27 170 205 247 277 28 170 205 247 276 29 170 205 246 276 30 170 204 246 275 40 168 202 242 270 50 168 201 240 268 60 167 200 239 266 70 167 199 238 265 80 166 199 237 264 90 166 199 237 263 100 166 198 236 263 1000 165 196 233 258 365 Example 4 Students in a statistics class at Penn State were asked About how many minutes do you typically exercise in a week The responses from 16 women in the class were 60 240 0 360 450 200 100 70 240 0 60 360 180 300 0 270 Here is output from the Minitab procedure for determining a 95 con dence interval for the population mean 0 Variable N Mean StDev SE Mean 950 6 CI Exercise 16 1806 1443 361 1037 2575 The sample mean is 1806 minutes of exercise per week and we are 95 con dent that the population mean is captured by the interval 1037 minutes to 2575 minutes The s 1443 standard error is JH J36 for this problem the multiplier for a 95 con dence interval is 213 The 95 361 The Hm116415 In Table 121 we see that con dence interval was calculated as 1806 i 213 gtlt361 366 The Difference between Two Means Unfortunately there s a muddy and confusing mathematical story behind the calculation of a con dence interval for the difference between two population means On the surface however the story is easy An approximate con dence interval for ul uz is x1 2 it The problem however is that it s not exactly mathematically correct to use a t distribution to determine the multiplier It is approximately correct to do so but the approximation involves an extraordinarily ugly formula for the degrees of freedom Aproximate degrees of freedom used to find t Con dence Interval for ul 12 2 2 2 S1 S1 n1 n2 dfm 1 nl l n1 nz l n2 The good news is that we ll let Minitab do the work for us and you won t have to calculate these degrees of freedom bv hand Also 39 that for a suf ciently large sample size we can approximate a 95 interval by using the value 2 as the multiplier Example 5 The Effect of a Stare 0n Driving Behavior In a lecture earlier this semester we discussed an experiment done by social psychologists at the University of California at Berkeley The researchers either did not stare or did stare at automobile drives stopped at a campus stop sign The researchers then measured how long it took the drivers to drive from the stop sign to a mark on the other side of the intersection The researchers believed that the average crossing times would be faster for those who experience the stare The crossing times in seconds were No Stare Group n14 83 55 60 81 88 75 78 71 57 65 47 69 52 47 Stare Group n13 56 50 57 63 65 58 45 61 48 49 45 72 58 367 Here s Minitab output for a 95 con dence interval for the difference between population means Two sample T for CrossTime Group N Mean StDev SE Mean NoStare 14 663 136 036 Stare 13 559 0822 023 95 CI for mu NoStare mu Stare 014 193 TTest mu NoStare mu Stare vs gt T 241 P 0013 DF 21 For the No Stare data the sample mean is Z 663 seconds while for the Stare data i2 559 seconds The difference between the sample means is 663559 104 seconds The 95 confidence interval for ul 12 is 014 to 193 seconds This interval is likely to contain the true population difference between the means We see from the output the nal item that df21 for this problem Minitab calculated this quantity using the approximation formula on the previous page From Table 121 we see that when df21 the t multiplier for a 95 confidence interval is 208 So Minitab calculated the confidence interval as sample estimate i multiplier X standard error s 2 s 2 E1 2 i208 gtlt 1 2 n1 n 2 1362 08222 1 13 l04i208 X A Special Case Assuming Equal Variances If we assume that the two populations have the same standard deviations a clean mathematical solution for a confidence interval for the difference between two population means occurs With this assumption the solution involves using a quantity called the pooled variance in place of each of SI2 and s in the formula for the standard error of the 368 difference between the sample means This causes the degrees of freedom for the t multiplier to be exactly df r11 n2 2 The pooled variance is computed as S n 1sf n 1sf n1 n2 Substituting the pooled variance for each of SI2 and s in the standard error formula produces this formula 2 2 S S l l SeX1 X2 p p SZ J n n 2 P 1 2 111 n So the confidence interval for ul uz the difference between the population means is where t is found using df n1n2 2 The good news is that Minitab or any other statistical software program will do the calculations for us In Minitab check the dialog box item that says Assume Equal Variances Example 5 Continued Here s the Minitab output that resulted from clicking Assume Equal Variance Two sample T for CrossTime Group N Mean StDev SE Mean NoStare 14 663 136 036 Stare 13 5592 0822 023 95 CI for mu NoStare mu Stare 014 19 TTest mu NoStare mu Stare vs gt T 237 P 0013 DF 25 Both use Pooled StDev 114 The 95 confidence interval for ul uz is 014 to 194 seconds an interval that s only slightly different than the interval that we got when we didn t assume equal variances Note that the degrees of freedom are reported as 25 The calculation is df r11 n2 2 14 13 2 Also note that a pooled standard deviation is reported This is the square root of the pooled variance described above 369 Summary of Formulas for Con dence Intervals The basic structure of intervals for the parameters in the table below is sample statistic i multiplz39er X standard error The following table describes the speci c details for each type of parameter we ve considered Remember that for a sufficiently large sample size the multiplier for a 95 con dence interval is approximately equal to 2 Parameter Statistic Standard Error Multiplier O M S quot ne ean H X J t see 1 Difference s12 s 2 Between Means ul ul 1 2 11 1 E t see A A 1 A One proportion p p M 2 see 3 n D39ff A A A 1 A A 1 A 1 erence p1p2 p1p2 P1 P1P2 P2 Z see 3 Between Proportions n1 n2 1 Use Table 121 df n 1 2 Use Table 121 df m 3 Z multipliers are Conf Level 090 095 098 099 Z 165 1 s12 n1 1 n1 196 usually rounded to 2 233 258 370