Note 6 for PUBHLTH 540 at UMass
Note 6 for PUBHLTH 540 at UMass
Popular in Course
Popular in Department
This 73 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 18 views.
Reviews for Note 6 for PUBHLTH 540 at UMass
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Puleth 540 Estimation Page 1 of 73 Unit 6 Estimation 1 Introduction 2 2 Goals of Estimation 5 T Opic 3 Some notation and de nitions 7 4 How to Interpret a Con dence Interval 10 5 Normal Con dence Interval for u 62 Known 17 6 Introduction to the Student s tDistribution 22 7 Normal Con dence Interval for u 62 Unknown 28 8 Introduction to the Chi Square Distribution 30 9 Normal Con dence Interval for 62 37 10 Normal Con dence Interval for uDIFFERENCE 7 Paired Data Setting 40 11 Normal Con dence Interval for H1 Hz 7 Two Independent Groups 45 12 Introduction to the FDistribution 5 4 13 Normal Con dence Interval for 03912 03922 14 Binomial Estimation of a Proportion TE 6 4 15 Binomial Con dence Interval for 72391 72392 7 Two Ind Groups Appendix 17 Derivation of Con dence Interval for p 7 Single Normal with 52 Known 67 Appendix 27 Derivation of Con dence Interval for o2 7 Single Normal 70 Appendix 3 7 SE of a Binomial Proportion 72 Puleth 540 Estimation Page 2 of 73 1 Introduction Recall our introduction to biostatistics It is the application of probability models and associated tools to observed phenomena for the purposes oflearning about a population and gauging the relative plausibility of alternative explanations Description Information in a sample is used to summarize the sample itself It is also used to make guesses ofthe characteristics ofthe source population Hypothesis Testing Information in a sample is used to gauge the plausibility of competing explanations for the phenomena observed Unit 6 is about using information in a sample to make estimates of the characteristics parameters of the source population We already have a feel for the distinction between statistics and parameters Sample Statistics are Estimators of Population Parameters Sample mean Y I Sample variance S2 0 Puleth 540 Estimation Page 3 of 73 What does it mean to say we know X from a sample but we don t know the population mean u We have asample ofn observations X1 Xn and from these we have calculated their average X Interest is in learning about the population from which these observations came Any one ofa number of possible populations might have been the source of our sample of data Which population appears most likely to have given rise to the sample mean 7 that we have observed For simplicity here suppose there are 3 possibilities X This is the Y we see Puleth 540 Estimation Page 4 of 73 We cannot ask What is the correct 11 Any of these distributions could have been the source population distribution So we construct an interval of plausible p In the following picture I m imagining four possibilities instead of three II III lt Y gt Con dence Interval We are con dent that u could be the mean parameter in populations II or III Puleth 540 Estimation Page 5 of 73 2 Goals of Estimation What we regard as a good estimator depends on the criteria we use to de ne good There are actually multiple criteria Here we consider one set of criteria Conventional Criteria for a Good Estimator 1 In the long run correct unbiased 2 In the short run in error by as little as possible minimum variance s In the Long Run Correct Tip Recall the introduction to statistical expectation and the meaning of unbiased See Unit 4 Bernoulli amp Binomial pp 46 and exercise 5 for Unit 3 Populations and Samples In the long run correct says If we imagine replicating the study over and over again and each time calculating a statistic of interest so as to produce the sampling distribution of that statistic of interest the mean of the sampling distribution for that statistic of interest is actually equal to the target parameter value being estimated Eg Consider S2 as an estimate of 02 In the long run correct means that the statistical expectation of S2 computed over the sampling distribution of S2 has the value that is equal to its target 02 s 2 allpossiblesamples samples in sampling distn Mathematically this is actually saying that the statistical expectation of S2 is equal to its target 02 Recall we write this as E S2 62 For the mathematically inclined among you JSZfSZSZdSZ 0392 Puleth 540 Estimation Page 6 of 73 In Error bV as Little as Possible In error by as little as possible is about the following We would like that our estimates not vary wildly from sample to sample in fact we d like these to vary as little as possible This is the idea of precision Their smallest variability from sample to sample is the idea of minimum variance Putting together the two criteria long run correct and in error bV as little as possible Suppose we want to identify the minimum variance unbiased estimator of u in the setting of data from a normal distribution Candidate estimators might include the sample mean Y or the sample median X as estimators of the population mean 11 Which would be a better choice according to the criteria in the long run correct and in the short run in error by as little as possible Step 1 First identify the unbiased estimators Step 2 From among the pool of unbiased estimators choose the one with minimum variance Illustration for data from a normal distribution L The unbiased estimators are the sample mean X and median i variance X lt variance i Choose the sample mean It is the minimum variance unbiased estimator For a random sample of data from a normal probability distribution X is the minimum variance unbiased estimator of the population mean u Take home message In this course we will be using the criteria of minimum variance unbiased However other criteria are possible Puleth 540 Estimation Page 7 of 73 3 Some Notation and De nitions Estimation Estimator Estimate 4 Estimation is the computation of a statistic from sample data often yielding a value that is an approximation guess ofits target an unknown true population parameter value 4 The statistic itself is called an estimator and can be of two types point or interval 4 The value or values that the estimator assumes are called estimates Point versus Interval Estimators 4 An estimator that represents a quotsingle best guess is called a point estimator When the estimate is of the form of a quotrange of plausible values it is called an interval estimator Thus A point estimate is ofthe form Value Whereas an interval estimate is ofthe form lower limit upper limit Example The sample mean n calculated using data in a sample of size n is a point estimator of the population mean u If n I the value 10 is called a point estimate of the population mean 1 Puleth 540 Estimation Page 8 of 73 Sampling Distribution 4 It s helpful to recall the idea of a sampling distribution again One can produce a population of all possible sample means in by replicating simple random sampling over and over again each time with some same sample size n and compiling the resulting collection of sample means in into a kind of population that we now call a sampling distribution 4 It is also helpful to recall that the sampling distribution of Yquot plays a fundamental role in the central limit theorem Unbiased Estimator A statistic is said to be an unbiased estimator of the corresponding population parameter if the mean or expected value of the statistic taken over its sampling distribution is equal to the population parameter value Intuitively this is saying that the quotlong run average of the statistic is equal to the population parameter value Estimators Point Estimators Interval Est39mators con dence interval other Puleth 540 Estimation Page 9 of 73 Con dence Interval Con dence Coef cient A con dence interval is a particular type of interval estimator Interval estimates de ned as con dence intervals provide not only several point estimates but also a feeling for the precision ofthe estimates This is because they are constructed using two ingredients 1 a point estimate and 2 the standard error of the point estimate Manv Con dence Interval Estimators are of a Speci c Form lower limit point estimate multiplestandard error upper limit point estimate multiplestandard error 4 The quotmultiplequot in these expressions is related to the precision ofthe interval estimate the multiple has a special name con dence coef cient 4 A wide interval suggests imprecision of estimation Narrow con dence interval widths re ects large sample size or low variability or both Exceptions to this generic structure of a con dence interval are those for a variance parameter and those for a ratio of variance parameters Take care when computing and interpreting a Con dence Interval Many users of the con dence interval idea produce an interval of estimates but then err in focusing on its midpoint Puleth 540 Estimation Page 10 of 73 4 How to Interpret a Con dence Interval A con dence interval is a sa e net Tip In this section the focus is on the idea of a con dence interval For now don t worry about the details just yet We ll come to these later Example Interest is in estimating the average income from wages for a population of 5000 workers XlX 5000 The average income to be estimated is the population mean u Suppose A 19987 and suppose we do not know this value We wish to estimate p from a sample of wages The population standard deviation 039 is a population parameter describing the variability among the 5000 individual wages 5000 2 5000 2 XXX XXX 19987 i1 i1 5000 5000 Reminder Notice in the calculation of 039 that 1 squared deviations are computed about the reference value equal to the actual population mean I and 2 division is by the actual population size N5 000 division is not by N1 Do you remember why this is the correct calculation Answer this is a population parameter value calculation not a statistic calculated from a sample Suppose we know that 039 12573 Puleth 540 Estimation Page 11 of 73 We will use the standard error to describe a typical departure of a sample mean away from the population mean p Puleth 540 Estimation Page 12 of 73 We illustrate the meaning of a con dence interval using two samples of different sizes Carol wants to estimate u She has a sample of data from interviews of n10 workers Data are X1 X10 X 10 219887 212573 L mzm SE 3976 Ed wants to estimate 11 He has a sample of data from interviews of n100 workers Data are X1 X100 2 100 219813 212573 SE L 1257 Puleth 540 Estimation Page 13 of 73 Compare the two SE one based on n10 and the other based on n100 0 Notice that the variability of an average of 100 is less than the variability of an average of 10 o It seems reasonable that we should have more con dence smaller safety net in our sample mean as a guess ofthe population mean when it is based on 100 observations than when it is based on 10 o By the same token we ought to have complete 100 con dence no safety net required at all if we could afford to interview all 5000 This is because we would obtain the correct answer of 19987 every time De nition Con dence Interval Informal A con dence interval is a guess point estimate together with a safety net interval of guesses of a population characteristic It has 3 components 1 A point estimate eg the sample mean I 2 The standard error of the point estimate eg SE GxI 3 A con dence coef cient conf coeff The safety net con dence interval that we construct has lower and upper limits de ned Lower limit point estimate con dence coef cientSE Upper limit point estimate con dence coef cientSE Puleth 540 Estimation Page 14 of 73 Example Carol samples n 10 workers Sample mean X 19887 Standard error of sample mean SE axI 3976 for n10 Con dence coef cient for 95 con dence interval 196 Lower limit point estimate con dence coef cientSE 19887 1963976 12094 Upper limit point estimate con dence coef cientSE 19887 1963976 27680 Width 27680 12094 15586 Example Ed samples n 100 workers Sample mean X 19813 Standard error of sample mean SE GxZ 1257 for n100 Con dence coef cient for 95 con dence interval 196 Lower limit point estimate con dence coef cientSE 19813 1961257 17349 Upper limit point estimate con dence coef cientSE 19813 1961257 22277 Width 22277 17349 4928 n Estimate 95 Con dence Interval Carol 10 19887 12094 27680 Wide Ed 100 19813 17349 22277 Narrow Truth 5000 19987 19987 No safety net De nition 95 Con dence Interval If all possible random samples an in nite number of a given sample size eg 10 or 100 were obtained and if each were used to obtain its own con dence interval then 95 of all such con dence intervals would contain the unknown the remaining 5 would not Puleth 540 Estimation Page 15 of 73 But Carol and Ed Each Have Onlv ONE Interval So now what The de nition above doesn t seem to help us What can we say Carol says With 95 con dence the interval 12094 to 27680 contains the unknown true mean 11 Ed says With 95 con dence the interval 17349 to 22277 contains the unknown true mean u Caution on the use of Con dence Intervals 1 It is incorrect to say T he probability that a given 95 con dence interval contains p is 95 A given interval either contains u or it does not 2 The con dence coef cient recall this is the multiplier we attach to the SE for a 95 con dence interval is the number needed to ensure 95 coverage in the long run in probability Puleth 540 Estimation Page 16 of 73 A picture helps in getting a feel for the ideas of con dence interval safety net and precision For each sampling plan n10 or n100 or nin nity the gure admittedly not the fanciest of art gives a feel for the collection of all possible con dence intervals Notice 1 Any one con dence interval either contains is or it does not This illustrates that it is incorrect to say There is a 95 probability that the con dence interval contains 1 2 For a given sample size 10 or 100 or 00 the width of all the con dence intervals is the same 3 Con dence intervals based on larger sample sizes are more narrow more precise 4 When n is equal to the size of the population p is in the interval every time Puleth 540 Estimation Page 17 of 73 Some additional remarks on the interpretation of a con dence interval might be helpful 0 Each sample gives rise to its own point estimate and con dence interval estimate built around the point estimate The idea is to construct our intervals so that IF all possible samples of a given sample size an in nite 1 were drawn from the underlying distribution and each sample gave rise to its own interval estimate THEN 95 of all such con dence intervals would include the unknown 1 while 5 would not 0 Another Illustration of It is NOT CORRECT to say T he probability that the interval 13 95 contains u is 095 Why Because either u is in 13 95 or it is not For example if u53 then u is in 13 95 with probability 1 If u10 then u is in 13 95 with probability0 o I toss a m coin but don t look at the result The probability of heads is 12 I am 50 con dent that the result of the toss is heads In other words I will guess heads with 50 con dence Either the coin shows heads or it shows tails I am either right or wrong on this particular toss In the long run ifI were to do this I should be right about 50 of the time hence 50 con dence But for this particular toss I m either right or wrong 0 In most experiments or research studies we can t look to see if we are right or wrong but we de ne a con dence interval in a way that we know in the long run 95 of such intervals will get it right Puleth 540 Estimation Page 18 of 73 4 Normal Con dence Interval for u 0392 Known Introduction and where we are going In this and in subsequent sections the idea of the con dence interval introduced in the previous section is operationalized Hopefully you will see that the logic and mechanics of con dence interval construction are very similar across a variety of settings In this lecture in particular we consider the setting of data from a normal distribution or two normal distributions and the setting of data from a binomial distribution or two binomial distributions We have seen that there are 3 elements to a con dence interval 1 Point estimate 2 SE of the point estimate 3 Con dence coef cient Consider the task of computing a con dence interval estimate of p for a population distribution that is normal with 6 known Available are data from a random sample of sizen 0 Presented in this and the next pages is instruction in how to construct a con dence interval 0 Presented in Appendix 1 is the statistical theory underlying this methodology I encourage you strongly to have a look at this too 1 The Point Estimate of u is the Sample Mean i Recall that for a sample of sizen the sample mean is calculated as Xquot 11 Puleth 540 Estimation Page 19 of 73 Features 1 Under simple random sampling the sample mean 2 is an unbiased estimator of the population mean parameter p regardless of the underlying probability distribution 2 When the underlying probability distribution is normal the sample mean Y also satis es the criterion of being minimum variance unbiased See section 2 page 5 2 The Standard Error of in is ax The precision of in as an estimate of the unknown population mean parameter p is re ected in its standard error Recall SEX 1lvarianceZ n 4 SE is smaller for smaller 039 measurement error 4 SE is smaller for larger 11 study design 3 The Con dence Coef cient The con dence coef cient for a 95 con dence interval is the number needed to insure 95 coverage in the long run in probability See again the picture on page 15 to get a feel for this 4 For a 95 con dence interval this number will be the 975th percentile of the Normal 01 distribution 4 For a la100 con dence interval this number will be the lcx2100th percentile of the Normal 01 distribution 4 On the next page are some of these values in the setting of constructing a con dence interval estimate of p when data are from a Normal distribution with 0392 known Puleth 540 Estimation Page 20 of 73 Con dence Level Percentile Con dence Coef cient Percentile Value from Normal 1011 50 75 0674 75 87 5 115 80 90 1282 90 95 1645 95 97 5 196 99 995 2576 10 1a2100 Eg For a 50 CI 50 lDl says F50 and says 11275 Thus use 75 percentile ofN010674 Example We are given the weight in micrograms of drug inside each of 30 capsules after subtracting the capsule weight Requested is a 95 con dence interval estimate of u 06 03 01 03 03 02 06 14 01 00 04 05 06 07 06 00 00 02 16 02 16 00 07 02 14 10 02 06 10 03 We re told u The data are simple random sample of size n30 from a Normal distribution with mean p and variance 0392 4 The population variance is known and has value 0392 025 4 Remark In real life we will rarely know 0392 1 Thus the solution in real life is actually slightly different it will involve using a new distribution the Student s tdistribution Here however it is considered known so that the ideas can be introduced more simply Puleth 540 Estimation Page 21 of 73 Recall that the basic structure of the required con dence interval is point estimate i safety net Lower limit point estimate multiple SE of point estimate Upper limit point estimate multiple SE of point estimate Point Estimate of p is the Sample Mean if L051 n30 n30 The Standard Error of in is ax a V 025 SE X 1var1ance X 00913 F30 F30 J3 m The Con dence Coef cient For a 95 con dence interval this number will be the 975th percentile of the Normal 01 distribution See the table on page 19 and locate that value is 196 Desired Con dence Level Value of Con dence Coef cient 95 196 Here 196 1 052100th 975th percentile of the Normal01 distribution Putting this all together Lower limit point estimate multiple SE of point estimate 051 19600913 033 Upper limit point estimate multiple SE of point estimate 051 196 00913 069 Puleth 540 Estimation Page 22 of 73 Thus we have the followinggeneral formula for a 1 0010000 con dence interval 2 l 101210039h percentile ofNormal0l SEQ n How to Calculate the Proportion of Sample Means in a Given Interval Use the idea of standardization This is an exercise in computing a probability and draws on the ideas of topic 5 The Normal Distribution We consider an example This question is addressed using the transformation formula for obtaining a zscore Example A sample of size n100 from a normal distribution with unknown mean yields a sample mean YEW 26743 The population variance of the normal distribution is known to be equal to 0392 3676423 What proportion of means of size100 will lie in the interval 200300 if it is known that u 250 Solution The random variable that we need to standardize is YEW Mean 250 1 SE aJIOO213676423 10019174 Probability 200 lt YEW lt 300 by the standardization formula is Przoo 250 x g 300 250 lt lt Pr2608 lt Z lt 2608 19174 aJZ 19174 PrZlt2608 PrZlt2608 PrZlt2608 PrZgt2608 PrZlt2608 1 PrZlt2608 PrZlt2608 1 PrZlt2608 2 Probability Z lt 2608 1 209955 1 by looking at the table for z261 in Rosner 6th Edition page 828 column A 09910 Puleth 540 Estimation Page 23 of 73 6 Introduction to the Student s tDistribution In section 5 we permitted ourselves the luxury of pretending that 0392 is known and obtained a con dence interval for p ofthe form 1m1000 wh upper limit Y z GMh la2100 lower limit Y z The required con dence coef cient ZN2 was obtained as a percentile from the standard normal N01 distribution eg for a 95 CI we used the 975th percentile More realistically however 0392 will not be known Now what Reasonably we might replace 039 with s Recall that s is the sample standard deviation and we get it as follows 2Xi if ss2 where s2 i1 So far so good But there is a problem X N is NOT distributed Normal 01 Whereas IS distributed Normal 01 7 5M Thus we have to modify our machinery speci cally the SE piece of our machinery to accommodate the unknownness of 0392 Fortunately this is conceptually not dif cult Whereas we previously used when 62 was known With 62 unknown we now use zscore tscore Percentile from Normal01 Percentile from Student s t Thus for the setting of seeking a con dence interval for an unknown mean u the con dence interval will be of the following form Puleth 540 Estimation Page 24 of 73 gtlt lower limit tDF la2100 S tDF 1m100s xH gtlt upper limit There is a variety of random variables or transformations of random variables that are distributed student s t A particularly advantageous de nition is one that appeals to our understanding of the zscore A De nition of a Student s t Random Variable In the setting of a random sample X1Xn of independent identically distributed outcomes ofa Normalp 0392 distribution Where we calculate Y and S2 in the usual way n 2 ix Y X 1 and S2 H n nl a student s t distributed random variable results if we construct a tscore instead of a z score Y t score tDFn1 f is distributed Student s twith degrees of freedom nl S 11 Note We often use the abbreviation df to refer to degrees of freedom Puleth 540 Estimation Page 25 of 73 The features of the Student s tDistribution are similar but not identical to those of a Normal Distribution 39 Bell Shaped 39 Symmetric about zero 39 Flatter than the Normal 01 This means i The variability of a t is greater than that of a Z that is normal01 ii Thus there is more area under the tails and less at center iii Because variability is greater resulting con dence intervals will be Wider The relative greater variability of a Student s t distribution compared to a Normal should make intuitive sense We have added uncertainty in our con dence interval because we are using an estimate of the standard error rather than the actual value ofthe standard error Puleth 540 Estimation Page 26 of 73 Each degree of freedom df de nes a separate student s tdistribution As the degrees of freedom gets larger the student s tdistribution looks more and more like the standard normal distribution with mean0 and variance1 Normal 01 Student 5 t DF25 Student s t DF5 Degrees of freedom5 Degrees of freedom25 Puleth 540 Estimation Page 27 of 73 How to Use the Student s tTable in the course text Rosner 6th Edition Source Table 5 page 831 Each row gives information for a separate t distribution de ned by the degree of freedom The notation df to denote degrees of freedom The column heading tells you the left tail probabilitv or area under the curve eg the column heading 075 separates the lower 75 of the distribution from the upper 25 The body ofthe table is comprised of values of the student t random variable The following is a picture explaining the table entry of 3078 for df1 and column heading90 tDistribution with 1 df Area 010 Area090 Probability Student s t random variable with df1 3 3078 010 and Probability Student s t random variable with df1 5 3078 090 Puleth 540 Estimation Page 28 of 73 Now You Try Illustration of Use of Table 5 in Rosner 6th Edition What is the probability that a Student s t random variable with 10 degrees of freedom assumes a value between 18125 and 18125 twith 10 df Area 18125 0 18125 1 Locate the row of table corresponding to df10 2 Read across the body of Table 5 in Rosner 6th edition page 831 to nd the tscore value 18125 Because the column heading is 095 we learn Area05 Pr tDF10 3 18125 005 0 18125 3 Because of symmetry of the Student s t distribution about t0 we also learn that Put 10 5 18125005 18125 0 Thus Pr 18125 5 tDF10 i18125 1 05 05 90 Puleth 540 Estimation Page 29 of 73 7 Normal Con dence Interval for p when 0392 is Unknown When 62 is not known the computation of a con dence interval for the mean u is not altered much 0 We simply replace the con dence coef cient from the N01 with one from the appropriate Student s tDistribution the one with df nl 0 We replace the now unknown standard error with its estimate The latter looks nearly identical except that it utilizes s in place of 039 0 Recall 0 Thus Con dence Interval for u in two settings of a sample from a Normal Distribution 0392 is KNOWN 0392 is NOT Known gt lt iZ1a26n gt lt i Tn11a2SVn Puleth 540 Estimation Page 30 of 73 Example A random sample of size n20 durations minutes of cardiac bypass surgeries has a mean duration of Y 267 minutes and variance s2 36700 minutesz Assuming the underlying distribution is normal with unknown variance construct a 90 CI estimate of the unknown true mean u Step I Point Estimate of p is the Sample Mean Y 2x X 20 L 267 minutes n20 Step 2 The Estimated Standard Error of Xquot is SxH SEC THO lvananc O FZO 393600 427 minutes 11 Step 3 The Con dence Coef cient For a 90 con dence interval this number will be the 95th percentile of the Student s tDistribution that has degrees of freedom n1 19 This value is 1729 Putting this all together Lower limit point estimate conf coeff SE of point estimate 267 1729 427 19317 Upper limit point estimate conf coeff SE of point estimate 267 1729 427 34083 Thus a 90 con dence interval for the true mean duration of surgery is 1932 3408 minutes Puleth 540 Estimation Page 31 of 73 8 Introduction to the Chi Square Distribution Where we are going We want a con dence interval for O392 Its solution involves percentiles from the chi square distribution 0 Following are some settings where our interest lies in estimation of the variance 0392 Standardization of equipment repeated measurement of a standard should have small variability Evaluation oftechnicians are the results from person I too variable Comparison of measurement techniques is a new method more variable than a standard method 0 We have an obvious point estimator of 0392 It is S2 which we have shown earlier is an unbiased estimator See exercise 5 in the assignment for Unit 3 Populations and Samples 0 How do we get a con dence interval The answer will utilize a new standardized variable based on the way in which S2 is computed It is a chi square random variable Heuristic De nition of a Chi Square Random Variable We will be interested in calculating on the basis of a random sample from a Normal distribution a con dence interval estimate of the normal distribution variance parameter 62 The required standardizing transformation is given by Y n1S2 2 a 0 where S2 is the sample variance This new random variable Y that is a function of information in a random sample from a Normalu0392 distribution through the calculation of S2 is said to follow a chi square distribution with nl degrees of freedom 2 Y 11 128 is distributed Chi Square with degrees of freedom nl 0 Puleth 540 Estimation Page 32 of 73 Mathematical De nition Chi Square Distribution The above can be stated more formally 1 If the random variable X follows a normal probability distribution with mean u and variance 62 Then the random variable V de ned X2 2 0 follows a chi square distribution with degree of freedom 1 V 2 If each ofthe random variables V1 Vk is distributed chi square with degree of freedom 1 and if these are independent Then their sum de ned V1 Vk is distributed chi square with degrees of freedom k Now we need to Reconcile the Two De nitions ofa Chi Square Distribution The two de nitions heuristic on page 30 and mathematical on this page are consistent n 1s2 2 0 independent chi square random variables V each with degrees of freedom 1 because it is possible with a little algebra to rewrite Y as the sum of n1 NOTE For this course it is not necessary to know the probability density function for the chi square distribution Puleth 540 Estimation Page 33 of 73 Features of the Chi Square Distribution 1 When data are a random sample of independent observations from a normal probability distribution and interest is in the behavior of the random variable de ned as the sample variance S2 the assumptions of the chi square probability distribution hold 2 The rst mathematical de nition of the chi square distribution says that it is de ned as the square of a standard normal random variable 3 Because the chi square distribution is obtained by the squaring of a random variable this means that a chi square random variable can assume only nonnegative values That is the probability density function has domain 0 00 and is not de ned for outcome values less than zero Thus the chi square distribution is NOT symmetric Here is a picture Puleth 540 Estimation Page 34 of 73 4 The fact that the chi square distribution is NOT symmetric about zero means that for Yy where ygt0 PrY gt y is NOT EQUAL to PrY lt y However because the total areas under a probability distribution is 1 it is still true that 1 PrYlty PrYgty 5 The chi square distribution is less skewed as the number of degrees of freedom increases Following is an illustration of this point Degrees of freedom1 Degrees of freedom5 Degrees of freedom20 2 1 Source httpwww dsum itVLVLENspecialspecial4html 6 Like the degrees of freedom for the Student39s tDistribution the degrees of freedom associated with a chi square distribution is an index of the extent of independent information available for estimating population parameter values Thus the chi square distributions with small associated degrees of freedom are relatively at to re ect the imprecision of estimates based on small sample sizes Similarly chi square distributions with relatively large degrees of freedom are more concentrated near their expected value Puleth 540 Estimation Page 35 of 73 How to Use the Chi SquareTable in the course text Rosner 6th Edition Source Table 6 page 832 The format of this table is similar to that of Table 5 Student s t on page 831 of Rosner Each row gives information for a separate chi square distribution de ned by the degree of freedom The notation d tells you the degrees of freedom The column heading tells you the left tail probabilitv or area under the curve eg The column heading 0005 separates the lower 05 of the distribution from the upper 995 The body ofthe table is comprised of values ofthe chi square random variable The following is a picture explaining the table entry of 271 for df1 and column heading90 Area 90 drf 12 value 1 i 21 i Probability Chi Square random variable with df1 5 271 090 Puleth 540 Estimation Page 36 of 73 Now You Try Illustration of Table 6 of Rosner 6th edition Suppose the random variable V follows a chi square distribution with 27 degrees of freedom What is the probability that V assumes a value between 3153 and 4964 The chi square table in your text Rosner 6th edition is Table 6 on page 832 Internet Solution using httpwwwstatsceduWestappletschisqdemohtml This applet produces right tail areas under the curve In the box degrees of freedom type in 27 Area to right of 3153 Area to right of 4964 In the box area right of In the box area right of type in 3153 type in 4964 Click on compute Click on compute You should see 02499 You should see 0005 Puleth 540 Estimation Page 37 of 73 Thus you should see the following density density 005 005 005 005 004 004 003 003 002 002 001 001 U 0 1o 20 33 4g 53 53 D 0 10 20 30 40 50 60 chisquare chisquare degrees m freedom 2 degrees offreedom 2 Arearightef 3153 02499 Arearightof 4964 01305 Thus Pr 3153 lt VDF27 lt 4964 P1 VDF27 gt 3153 PI VDF27 gt 4964 02499 0005 0245 Puleth 540 Estimation Page 38 of 73 9 Con dence Interval for 62 The de nition ofthe chi square distribution gives us What we need to construct a con dence interval estimate of 62 when data are a simple random sample from a normal probability distribution The approach here is similar to that for estimating the mean u 0 Presented here is instruction in how to construct a con dence interval 0 Presented in Appendix 2 is the derivation of the formula that you will be using Formula for a 11100 Con dence Interval for 62 Setting Normal Distribution nlS2 Lower limit 2 Xi az nlS2 Upper limit 2 XII2 Example to Illustrate the Calculation A precision instrument is guaranteed to read accurately to Within i 2 units A sample of 4 readings on the same object yield 353 351 351 and 355 Find a 95 con dence interval estimate of the population variance 62 and also for the population standard deviation 6 Puleth 540 Estimation Page 39 of 73 1 Obtain the point estimate of 62 It is the sample variance S2 To get the sample variance S2 we will need to compute the sample mean rst n n 2X 2Xi i XL3525 and Szz 2367 n nl 2 Determine the correct chi square distribution to use It has df 41 3 3 Obtain the correct multipliers Because the desired con dence level is 095 we set 095 101 Thus a 05 For a 95 con dence level the percentiles we want are i a2100 39 2 5th percentile ii 1 12100th 975th percentile Obtain percentiles for chi square distribution with degrees of freedom 3 i 2821025 02158 ii 253197 9348 Note I used the following URL http wwwstattamueduwestap pletschisg demohtml 4 Put it all together obtain nlSZ 3367 2 1178 gm 9348 i Lower limit nlS2 3367 2 5102 M 02158 ii Upper limit Puleth 540 Estimation Page 40 of 73 Obtain a Con dence Interval for the Population Standard Deviation 6 Step 1 Obtain a con dence interval for 62 1178 5102 Step 2 The associated con dence interval for 6 is obtained by taking the square root of each ofthe lower and upper limits 95 Con dence Interval V11785102 l097l4 Point estimate V367 192 Remarks on the Con dence Interval for 62 o It is NOT symmetric about the point estimate the safety net on each side of the point estimate is of different lengths 0 These intervals tend to be Wide Thus large sample sizes are required to obtain reasonably narrow con dence interval estimates for the variance and standard deviation parameters Puleth 540 Estimation Page 41 of 73 10 Normal Distribution Con dence Interval for Difference Paired Data Setting Introduction to Paired Data 0 Paired data arises when each individual more speci cally each unit of measurement in a sample is measured twice 0 Paired data are familiar The two occasions of measurement might be quotprepost quotbeforeafter rightleft parentchild etc 0 Here are some examples of paired data 1 Blood pressure prior to and following treatment 2 Number of cigarettes smoked per week measured prior to and following participation in a smoking cessation program 3 Number of sex partners in the month prior to and in the month following an HIV education campaign 0 Notice in each of these examples that the two occasions of measurement are linked by virtue of the two measurements being made on the same individual 0 One focus in an analysis of paired data is the comparison of the two outcomes For continuous data especially this comparison is often formulated using the difference between the tWO measurements Note We ll see later that when the data are discrete an analysis of paired data might focus on the ratio eg relative risk of the two measurements rather than on the difference For example 1 Blood pressure prior to and following treatment Interest is d pre post Large differences are evidence of blood pressure lowering associated w treatment 2 Number of cigarettes smoked per week measured prior to and following participation in a smoking cessation program Interest is dpre post Large differences 1 are evidence of smoking reduction 3 Number of sex partners in the month prior to and in the month following an HIV education campaign Interest is dpre post Large differences are evidence of safer sex behaviors Puleth 540 Estimation Page 42 of 73 In this section we consider paired data that are a simple random sample from a normal distribution and we seek a con dence interval for quotDifference 0 Without worrying for the present about the details consider the following if two measurements ofthe same phenomenon eg blood pressure cigarettesweek etc X and Y are measured on an individual and if each is normally distributed then their difference is also distributed normal 0 Thus the setting is our focus on the difference D and the following assumptions 1 D X Y is distributed Normal with 2 Mean of D quotdifference Let s write this as 111 3 Variance of D USIFFERENCE Let s write this as a 0 Thus estimation for paired data is a special case of selected methods already presented Attention is restricted to the single random variable de ned as the difference between the two measurements The methods already presented that we can use here are 1 Con dence Interval for 111 Normal Distribution 0 unknown 2 Con dence Interval for a Normal Distribution Example source Anderson T W and Sclove SL Introductory Statistical Analysis Boston Honghton Mif in 1974 page 339 A researcher is interested assessing the improvement in reading skills upon completion of the second grade Y in comparison to those prior to the second grade X The comparison is made by calculating the difference d in the scores on a standard reading test A total of n30 children are studied Following are the data PubH 1th 540 Estim ation Page 43 of 73 Of interest are ID PREX POSTY dYX 1 11 17 06 2 15 17 02 3 15 19 04 4 20 20 00 5 19 35 16 6 14 24 10 7 15 18 03 8 14 20 06 9 18 23 05 10 17 17 00 11 12 12 00 12 15 17 02 13 16 17 01 14 17 31 14 15 12 18 06 16 15 17 02 17 10 17 07 18 23 29 06 19 13 16 03 20 15 16 01 21 18 25 07 22 14 30 16 23 16 18 02 24 16 26 10 25 11 14 03 26 14 14 00 27 14 20 06 28 15 13 02 29 17 31 14 30 16 19 03 1 A 99 con dence interval for 111 2 An 80 con dence Interval for a Puleth 540 Estimation Page 44 of 73 Solution for a 99 Con dence Interval for 11 Step I Point Estimate of 11 is the Sample Mean 3F Zdi quot1 051 quot 30 n30 Step 2 The Estimated Standard Error of an is SCIJ SEGHO lvarianc an30 sTd 00897 11 Step 3 The Con dence Coef cient For a 99 con dence interval this number will be the 995th percentile of the Student s tDistribution that has degrees of freedom nl 29 This value is 2756 Step 4 Substitute into the formula for a con dence interval Lower limit point estimate conf coeff SE of point estimate 051 2756 00897 02627 Upper limit point estimate conf coeff SE of point estimate 051 2756 00897 07573 Puleth 540 Estimation Page 45 of 73 Solution for an 80 Con dence Interval for 0 Step I Obtain the point estimate of 621 n 2 2611 4 Si z z 02416 nl Step 2 Determine the correct chi square distribution to use It has df 301 29 Step 3 Obtain the correct multipliers Because the desired con dence level is 080 set 080 lu Thus a 20 For a 80 con dence level the percentiles we want are i a2100 39 10th percentile ii 1 a2100 39 90th percentile From either your text or the url online use the row for df 29 i Xeif2910 21977 ii Xeif2990 3909 Step 4 Substitute into the formula for the con dence interval n 1s 2902416 2 01792 Mm 3909 i Lower limit n 1s 2902416 2 03544 gm 1977 ii Upper limit Puleth 540 Estimation Page 46 of 73 11 Normal Distribution Con dence Interval for 111 112 Two Independent Groups Illustration of the Setting of Two Independent Groups A researcher performs a drug trial involving two independent groups 0 A control group is treated with a placebo while separately 0 The intervention group is treated with an active agent 0 Interest is in a comparison ofthe mean control response with the mean intervention response under the assumption that the responses are independent 0 The tools of con dence interval construction described for paired data are not appropriate Interest is in a comparison ofthe mean response in one group with the mean response in a separate group under the assumption that the responses are independent Here are some examples In every example we are interested in the similarity of the two groups 1 Is mean blood pressure the same for males and females 2 Is body mass index BMI similar for breast cancer cases versus non cancer patients 3 Is length of stay LOS for patients in hospital A the same as that for similar patients in hospital B For continuous data the comparison oftwo independent groups is often formulated using the difference between the means of the two groups I Thus evidence of similarity of the two groups is re ected in a difference between means that is near zero I Focus is on Growl quotGroupll Puleth 540 Estimation Page 47 of 73 Recall again the idea that there are 3 components to a con dence interval 1 A point estimator 2 The standard error of the point estimator 3 Con dence coef cient Point Estimator How do we obtain a point estimate of the difference Group 1 Group 2 0 An obvious point estimator of the difference between population means is the X difference between sample means i Group 1 Group 2 Standard Error ofthe Point Estimator What noise is associated with this point X estimator We need to know the standard error of i Group 1 Group 2 1 I Each Xi in the sample of sizen1 from group 1 is Normal p4 03912 I Each Xi in the sample of sizen2 from group 2 is Normal uz 0 I This is great We know the sampling distribution of each sample mean imp 1 distributed Normal 11 of m imp 2 distributed Normal 2 a n2 I Now without worrying about the details we then have the following tool for the sampling distribution of the difference between two independent sample means each ofwhich is distributed normal D Groupl XGroup 2 is also distributed Normal with Mean IuGroupl IuGroup 2 2 2 01 02 Varlance n1 n2 Puleth 540 Estimation Page 48 of 73 I Be careful The standard error of the difference is NOT the sum of the two separate standard errors Following is the correct formula Notice You must first sum the variance and then take the square root of the sum SE iGroup 1 XGrouP 2 A General Result that is Sometimes Useful If random variables X and Y are independent with E X quotX and Var X a E Y quotY and Var Y 0 Then EaX bY aux buy VaraX bY azaflt bzai and 2 2 2 2 VaraXbY aO39X bO39Y NOTE This result ALSO says that when X and Y are independent the variance of their difference is equal to the variance of their sum This makes sense if it is recalled that variance is de ned using squared deviations which are always positive Puleth 540 Estimation Page 49 of 73 Three solutions for Standard Error ofthe Point Estimator Recall that interest is in the standard error of iGroupl XGroup 2 Use the solution that is appropriate to your situation Solution 1 Use this when 039 and 039 are both known 2 2 01 02 SF Groupl XGroup 2 n1 n2 Solution 2 Use when 039 and 039 are both NOT known but are assumed EQUAL sEi ltGroupl i Group 2 Sgoolis a weighted average of the two separate sample variances with weights equal to the associated degrees of freedom contributions 2 111 1slzn2 1S n1 1n2 1 Solution 3 Use when a and 0 are both NOT known and NOT EQUAL so i i 512 S Group 1 Group 2 Z n n 1 2 Puleth 540 Estimation Page 50 of 73 Con dence Coef cient Multiplier There are 3 solutions depending on the standard error calculation as per above Solution 1 Use this when 039 and 039 are both known Use percentile of Normal01 Solution 2 Use when 039 and 039 are both NOT known but are assumed EQUAL Use percentile of Student s t Degrees of freedom n1 1 n2 1 Solution 3 Use when 039 and 0 are both NOT known and NOT EQUAL Use percentile of Student s t Degrees of freedom f Where f is given by formula Satterthwaite 2 f Ill 112 iS ii iS ii 111 n2 n1 1 n2 1 Horrible isn t it PubH 1th 540 Here s a summary table Estim ation Page 51 of 73 Normal Distribution Con dence Interval for 111 112 Two Independent Groups CI point estimate i confcoefiSEpoint estimate 2 2 039X and O39Y are both known 039 and 039 are both NOT known but are 039 and 039 are both NOT known and NOT assumed EQUAL Equal Estimate XGroup l XGroup 2 XGroup l XGroup 2 XGroup l XGroup 2 SEtouse 02 02 A S21 521 A s2 s2 SEXG39 917XG39 92 nn SEXGrou917XGrou92 11 2 SE XGroupl XGrOHPZ 1 nn where you already have obtained S2 n1ilsi n2 TDS p001 n1 1n2 71 Con dence Normal Student s t Student s t Coef cient Use Percentiles from 2 2 2 f 111 112 Degrees Not applicable n1 1 n2 1 freedom Puleth 540 Estimation Page 52 of 73 Example Data are available on the weight gain of weanling rats fed either of two diets The weight gain in grams was recorded for each rat and the mean for each group computed Diet 1 Group Diet 2 Group n1 12 rats n2 7 rats XI 120 grams X2 101 grams On the basis of a 99 con dence interval is there a difference in mean weight gain among rats fed on the 2 diets For illustration purposes we ll consider all three scenarios according to what is known or can or cannot be assumed about the separate population variances Solution 1 03912 and 03922 are both known 400 grams2 Step I Point Estimate of 111 112 X X 19g Group 1 Group 2 Step 2 Standard Error of Point Estimate 02 02 400 400 SF Groupl XGroup2 n 1n 2 1 2 Step 3 The Con dence Coef cient Since 03912 2 03922 2 0392 is known the multiplier is a percentile from the Normal 01 For a 99 con dence interval the required percentile is th 995th This has value 2575 Step 4 Substitute into the formula for a con dence interval CI point estimate i conf coeff SE of point estimate 19 i 2575 951 55 g 435 g Puleth 540 Estimation Page 53 of 73 Solution 2 03912 and 03922 are both NOT known but are assumed EQUAL and we have sf 45725 s 42533 Step I Point Estimate of 111 112 X X 19g Group 1 Group 2 Step 2 Estimated Standard Error of Point Estimate is in two steps 2 n1 1sl2 112 1s 1145725 642533 1 n1 1ltn2 1 lt116 82 s2 445 98 445 98 SE X PM W 39 39 100437 Groupl Group2 n1 112 12 7 g Step 3 The Con dence Coef cient Since 03912 2 03922 2 0392 but UNknown the multiplier is a percentile from the Student s twith degrees of freedom 121 71 17 For a 99 con dence interval the required percentile is the 995th This has value 28982 S 44598 grams2 gtlt Step 4 Substitute into the formula for a con dence interval CI point estimate i conf coeff SE of point estimate 19 1 28982 100437 101 g 481 g considerably wider than for scenario I Puleth 540 Estimation Page 54 of 73 Solution 3 03912 and 03922 are both NOT known and are UNEQUAL and we have sf 2 45725 s 42533 Step I P0int Estimate of 111 112 X X 19g Group 1 Group 2 Step 2 Estimated Standard Error of Point Estimate is in just one step now A 82 S2 45725 42533 S Groupl XGroup2 n 1n 2 7 1 2 Step 3 The Con dence Coef cient With 03912 7 03922 and both UNknown the multiplier is a percentile from the Student s t with degrees of freedom given by that horrible formula 2 2 2 2 5Si 4572242533 n1 112 12 7 f 2 2 2 2 2 2 2130793 5 45725 42533 12 7 111 112 11 6 nl l nZ l Round down so as to obtain an appropriately conservative wide interval So we ll use f13 The 995th percentile of the Student s t with df13 has value 30123 Step 4 Substitute into the formula for a con dence interval CI point estimate 1 conf coeff SE of point estimate 19 1 30123 9943 110 g 490 g Note This is the widest 0f the 3 solutions Puleth 540 Estimation Page 55 of 73 12 Introduction to the F Distribution A Rationale for Introducing the F Distribution is Interest in Comparing Two Variances 0 Unlike the approach used to compare two means in the continuous variable setting recall we look at their difference the comparison of two variances is accomplished by looking at their ratio Ratio values close to one are evidence of similarity 0 Of interest is a con dence interval estimate of the ratio of two variances in the setting where data are comprised of two independent samples of data each from a separate Normal distribution We are often interested in comparing the variances of2 groups 0 This may be the primary question ofinterest I have a new measurement procedure are the results more variable than those obtained using the standard procedure 0 Comparison of variances is sometimes a preliminary analysis to determine whether or not it is appropriate to compute a pooled variance estimate or not when the goal is comparing the mean levels of two groups Thus for comparing variances we use a RATIO rather than a difference 0 Speci cally we look at the ratios of variances of the form sXZsy2 o If this ratio is 1 then the variances are the same If it is far from 1 then the variances differ o It is because we wish to make probability statements about ratios of variances and to compute con dence intervals that we introduce the F distribution Puleth 540 Estimation Page 56 of 73 A De nition of the FDistribution Suppose X1 an is a simple random sample from a normal distribution with mean uX and variance 0 Suppose further that Y1 Y y is a simple random sample from 2 a normal d1str1butlon w1th mean uY and varlance 0 If the two sample variances are calculated in the usual way x HY 2Xi 32 2yi z Si 2 i1 and Si 2 11 11 l nYl X Then the following is said to be distributed F Sig0 nxilnyl 2 2 SY 0y with two degree of freedom speci cations Numerator degrees of freedom nXl Denominator degrees of freedom nyl A Wonderful Result There is a relationship between the values of percentiles for pairs of F Distributions that is de ned as follows 1 Fd1d2a2 F d2 d11a2 Notice that 1 the degrees of freedom are in opposite order and 2 the solution for a left tail percentile is expressed in terms of a right tail percentile This is useful when the published table does not list the required percentile value usually the missing percentiles are the ones in the left tail Puleth 540 Estimation Page 57 of 73 How to Use the F Distribution Table in Rosner 6th Edition 0 Percentiles of selected F Distributions are provided in Table 9 page 836 39 Each section of rows de nes a different denominator df 39 Each column defines a different numerator df 39 The body of the table gives values of selected percentiles of the F distribution 39 Only the upper tail percentiles 90 999 are provided to you 0 Example What is the 95th percentile value of an F distribution random variable with numerator degrees of freedom equal to 24 and denominator degrees of freedom equal to 3 39 Locate on page 836 the column for numerator df 24 and row for denominator df3 39 Within this block find the 95th percentile value 864 0 Example What is the 5th percentile value of an F distribution random variable with numerator degrees of freedom equal to 8 and denominator degrees of freedom equal to 6 39 To obtain this percentile value requires using the wonderful resul just described 39 Locate on page 836 the column for numerator df 6 and row for denominator df8 39 Within this block find the 95th percentile value This is F6395 358 39 Thus F3505 1 F5395 135802793 Puleth 540 Estimation Page 58 of 73 13 Normal Con dence Interval for 012022 Recall that the Setting is that of Two Independent Groups Of interest now however is a comparison ofthe two variances 0 We might want to know if the reproducibilities of two laboratory assays are similar 0 More simply we might be interested in an exploration of two independent normal population distributions this would include a comparison of their two variance parameters 0 Sometimes we are interested in a comparison of two variance parameters as a preliminary step in a larger analysis plan for the reason that some statistical analysis techniques make assumptions about equality of variances Formula for a 1a100 Con dence Interval for 03912 O3922 Setting Two Independent Normal Distributions 2 Lower limit 8 12 FHi h z l az SZ 1 s2 82 Upper limit 1 F 1 2 n iln llOl2 2 Fn171nzila2 SZ 2 1 82 Notice that the formula for the upper con dence limit incorporates the trick of solving for a left tail percentile of an F distribution in terms of a right tail percentile of another F distribution Be careful of the degrees of freedom speci cations Puleth 540 Estimation Page 59 of 73 Example Source Daniel WW Biostatistics A F onndation for Analysis in the Health Sciences Fourth Edition 1987 Page 163 Reaction time to a stimulus was examined in two independent groups each a simple random sample from a Normal population distribution One group X1 an is comprised of nx21 healthy adults The other group Y1 Yny includes ny 16 Parkinson s disease patients Interest is in a 95 con dence interval estimate of 030 Following are the relevant sample statistics it HY Xi 2 Y1 z Si 211 21600 and Si 211 21225 n l nYl Numerator degrees of freedom nxl 20 Denominator degrees of freedom nyl 15 Step 1 Solution for Point Estimator SiS sis 160012251306 Step 2 Solution for Con dence Coef cient Multipliers 1 1 1 FnlilnzilliaZ F2015975 l Fn271n171lioi2 E520975 257 Fn171n2710i2 Puleth 540 Estimation Page 60 of 73 Step 3 Solution for Lower and Upper Con dence Interval Limit Values Lower Limit value 2 l306 047 FH1 1H2 11 a2 SZ Upper Limit Value 1 sf 82 F 1 257 1306 2336 FH11n21a2S n21n111m2sz 2 Puleth 540 Estimation Page 61 of 73 14 Binomial Estimation ofa Proportion 7391 Recall The Binomial Distribution was introduced in Unit 4 Bernoulli and Binomial Distributions o The setting is results ofN independent trials each ofwhich produces two possible outcomes we called these event and nonevent o Eventnonevent might refer to alivedead tumorremission successfailure headstails etc 0 Associated with each trial is the same probability of event occurrence 7391 0 An estimate ofthe probability of event occurrence 7391 is given by the observed proportion of the N trials that yielded event occurrence 0 The binomial distribution is the probability model used to describe the outcome of XX events among the N independent trials With apology There are a variety of notations for representing an estimate of 71 The most clear is 7339 The caret on the top is an indication that this is a guess 0 Another is p for proportion This is awkward because sometimes the notation p is used for the population parameter 7391 itself Therefore I recommend against using this to represent the estimate of 71 0 Better is the use of f because it has the caret on top This is most likely to be used when the writers of the text you are reading refer to the population parameter 7391 as p 0 Another is XN This makes sense since you can discern from this that it is referring to an observed proportion 0 Still another is This also makes sense since it is the sum of 0 s and 1 s divided by N the number of trials Putting these all together Notice I left off the notation p Puleth 540 Estimation Page 62 of 73 In constructing a con dence interval for 71 of a Binomial distribution just as we did for the mean parameter p of a Normal distribution we need 1 Point estimate 2 SE of the point estimate 3 Con dence coef cient Example Source Daniel WW Biostatistics A Foundation for Analysis in the Health Sciences Fourth Edition 1987 page 149 Interest is in estimating the proportion of individuals who obtain a dental check up twice a year in a certain urban population Of N300 persons identi ed by simple random sampling and interviewed X123 reported having had 2 dental check ups in the last year Construct a 95 con dence interval for 71 the unknown true proportion 1 The Point Estimate of 71 is the Sample Mean 7339 X i 2 041 N 300 A A X1 2 The Standard Error of 7ZX is estimated using SE7Z This formula makes sense for two reasons 0 If X is distributed BinomialN7c Then VarianceXN739c17t o VarianceconstantX constant2 Variance X o For the interested Appendix 3 is the solution for this SE formula A 3192 041059 Puleth 540 Estimation Page 63 of 73 3 The Con dence Coef cient is a Percentile from the Normal01 Distribution This may seem connterintnitive but it is not It is not correct that because the SE has to be estimated that the percentile is Student s t The correct percentile is one from the Normal01 for reasons having to do with the central limit theorem 4 As we saw before For a 95 con dence interval this number will be the 975th percentile of the Normal 01 distribution 4 And in general For a la100 con dence interval this number will be the 112100th percentile of the Normal 01 distribution Z375 4 Putting it all together Lower point estimate multiple SE of estimate 041 1960028 036 Upper point estimate multiple SE of estimate 041 1960028 046 Puleth 540 Estimation Page 64 of 73 Con dence Interval for a proportion 71 a sample from a BinomialN7c Distribution 7 iz1a2SE7 Where the required calculations are X 1 XE the observed proportlon of events In the N tr1als 2 7 X A X1 X 3 SE N A 0505 4 For small number of tr1als N i 30 or so use SE Why For small number of trials N say N 5 30 it may be desirable to compute a more conservative Wider con dence interval by using a slightly different SE calculation 319 o A closer look at the SE calculation SE reveals that the largest value it can have is the one for which i050 in the SE calculation Puleth 540 Estimation Page 65 of 73 15 Two Independent Binomials Estimation of the Difference 711 712 We are often interested in comparing proportions from 2 populations 0 Is the incidence of disease A the same in two populations 0 Patients are treated with either drug D or with placebo Is the proportion improved the same in both groups Suppose that available to us are the results of two independent Binomial random variables 0 X distributed BinomialN1 711 0 Y distributed BinomialN2 712 We have therefore the following 7amp1 43 722 21 N1 N2 1 1 SE71 MN 71 SE65 2N 5 l 2 We have what we need for developing a con dence interval for the difference 711 712 Puleth 540 Estimation Page 66 of 73 Example In a clinical trial for a new drug to treat hypertension N1 50 patients were randomly assigned to receive the new drug and N2 50 patients to receive a placebo X 34 of the patients receiving the drug showed improvement while Y 15 of those receiving placebo showed improvement Compute a 95 con dence interval estimate for the difference between proportions improved 1 The Point Estimate of 711 712 is difference between the sample means 721 X XN1 3450 068 722 Y vN2 1550 030 701722 gt137 068 030 038 2 The Standard Error of 71 732 is estimated using SE7172 This formula is reasonable because both sample sizes are larger than 30 3415 371 37 6832 3070 0 0925 50 39 SE 1 2 N1 N2 50 Puleth 540 Estimation Page 67 of 73 3 The Con dence Coef cient is again a Percentile from the Normal01 Distribution Z375 4 Putting it all together Lower point estimate multiple SE of estimate 038 19600925 020 Upper point estimate multiple SE of estimate 038 19600925 056 Con dence Interval for a difference between two independent proportions 711 712 Two Independent Binomial Distributions 72172iZla2SE7172 where the required calculations are 1 i3 and 371 N1 N2 2 721 and 722 3 SE X1 X Y1 Y N1 N2 4 For small number of trials N i 30 or so in either group use SE 0505 0505 N1 N2 PubH 1th 540 Estim ation Appendix 1 Page 68 of 73 Derivation of Con dence Interval for u Single Normal 62 known The setting is the example in Section 4 Con dence Interval for u 62 known Recall that we were given the weight in micrograms of drug inside each of 30 capsules after subtracting the capsule weight 06 02 04 00 16 10 03 06 05 00 00 02 We re told that 62 025 Step I Obtain a point estimate 2 2051 n30 01 14 06 02 07 06 03 01 07 16 02 10 03 00 06 02 14 03 Step 2 Obtain the SE of the point estimate 2 by recalling that SEC Q aJH SEQ aJ 05 Step 3 Select desired con dence 1 a 00913 Suppose we want a 95 con dence interval Then 1 a 095 This means that a 005 The a 005 is the probability of error Puleth 540 Estimation Page 69 of 73 Step 4 Using tables for the Normal 01 distribution obtain symmetric values of a standard normal deviate Z call these 210Wquot and zupper such that Probabilityzlower S Z S z 095 upper Normal 01 Distribution This rea is 1195 This shaded This shaded area is area is 12 0025 12 0025 Zlower 0 Zupper See if you can do this using the table in Rosner 6th Edition Use Column D On page 827 locate in the body of the table the area value of95 Associated with that area is z196 Thus Zuluquot 196 and 210Wquot 196 Thus we have Probability196S z s 196095 so that Zlower zupper 196 This expression Probability l 96 S Z S 196 2 095 in this example and Probability con dence interval To arrive at the latter involves insertion of the standardization of X S Z S Zupperl 1 a more generally is the origin of the formula for a Zlower Puleth 540 Estimation Page 70 of 73 Probability l 96 S Z S 196 2 095 in this example is actually Probability lt Z s zupper 21a Zlower note 1 Because the Normal01 distribution is symmetric about the value 0 zlower 391 zupper So let s call zupper simply z This allows us to simplify the above expression with two convenient substitutions zupper z zlower 391 Probability z S Z S z l a note 2 Now we ll insert another convenient substitution X Z 0N Probability z S l a i u S 2 UN note 3 All that remains is to do the algebra necessary to isolate 11 Probability z S u S z1 a n n With confidence la100 Xijz S u S 2 which matches Puleth 540 Estimation Page 71 of 73 Appendix 2 Derivation of Con dence Interval for 62 Single Normal The setting here is the example in Section 9 A precision instrument is guaranteed to read accurately to Within 1 2 units A sample of 4 readings on the same object yield 353 351 351 and 355 Find a 95 con dence interval estimate of the population variance 62 Step I Obtain a point estimate S2 and its associated degrees of freedom S2 2367 df3 Step 2 Recalling the material from section 8 de ne the appropriate chi square random variable 111S2 Y 2 1s d1str1buted Ch1 Square Wlth degrees of freedom nl 0 Step 3 Select desired con dence 1 a As we want a 95 con dence interval 1 a 095 Step 4 Substitute for x2 in the middle of the area under the curve calculation for a chi square random variable as follows nl S2 Probabmtymim s s giltDiana Puleth 540 Estimation Page 72 of 73 Step 5 Do the algebra to obtain an expression that is the con dence interval for 62 n l s2 Pmbablhty Kim2 S 0 2 3 Z 1a2 1 0 9 2 039 1 Probablhty g g 1 a KimM2 032 132ng 2 2 With confidence la100 ribs S 0392 S II21 S which matches Zd 1az Zd az 540 Estimation Page 73 of 73 Appendix 3 The Standard Error 0f 5 is estimated using 81373 15lt N We take advantage of two statistical results o If X is distributed BinomialN7c Then VarianceXN739c17t o Variancec0nstantX constant2 Variance X Proof SEQ aVananceO Q VarianceE zgtlt 3 Vanancex zwl zw Email71 7 1 7 2 T he problem now is that a is not known So it is replaced by its estimate z E which matches N
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'