Intro Biostatistics PUBHLTH 540
Popular in Course
Popular in Public Health
This 23 page Class Notes was uploaded by Agustin Bechtelar on Friday October 30, 2015. The Class Notes belongs to PUBHLTH 540 at University of Massachusetts taught by Carol Bigelow in Fall. Since its upload, it has received 17 views. For similar materials see /class/232289/pubhlth-540-university-of-massachusetts in Public Health at University of Massachusetts.
Reviews for Intro Biostatistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/30/15
Puleth 540 The Normal Distribution Page 1 of 23 Unit 5 The Normal Distribution Topics 1 Introduction 3 2 De nition of the Normal Distribution 4 3 The Sample Average is Often Normally Distributed Introduction to the Central Limit Theorem 7 4 A Feel for the Normal Distribution 10 5 The Relevance of the Normal Distribution 12 6 Calculation of Probabilities for the Normal01 13 7 From Normal u0392 to Normal0 1 7 The ZScore 19 8 From Normal01 to Normal u 0392 22 Puleth 540 The Normal Distribution Page 2 of 23 1 Introduction Much of statistical inference is based on the normal distribution 0 The pattern of occurrence of many phenomena in nature happens to be described well using a normal distribution model 0 Even when the phenomena in a sample distribution are not described well by the normal distribution the sampling distribution of sample averages obtained by repeated sampling from the parent distribution is often described well by the normal distribution Central limit theory You may have noticed in your professional work especially in reading the literature for your eld that often researchers choose to report the average when heshe wishes to summarize the information in a sample of data The normal distribution is appropriate for continuous random variables only 0 Recall that in theory a continuous random variable can assume any of an in nite number of values Therefore we ll have to re ne our de nition of a probability model to accommodate the continuous variable setting 0 PrX x the calculation ofa point probability is meaningless in the continuous variable setting In its place we calculate Pr a lt X lt b the probability of an interval of values of X 0 For the above reason ZPI X x is also without meaning Puleth 540 The Normal Distribution Page 3 of 23 Following is the extension of the ideas of a probability distribution for a discrete random variable to the ideas underlying the meaning of a probability distribution for a continuous random variable The ideas of calculus sorry helps us out Discrete Random Continuous Random Variable Variable 1st List of all Eg List 9 range possible values that 1 2 3 4 N Eg 00 to 00 exhaust all possibilities 0 to 00 Point probability 9 probability 2nd Accompanying Pr X X density probabilities of each value Probability density of X written fXX Total must be 1 PIXx1 Xmin Unit total 9 unit integral JfXltxgtdx1 Puleth 540 The Normal Distribution Page 4 of 23 2 De nition of the Normal Distribution De nition of the normal probability distribution density function 0 The concept probability of XX is replaced by the probability density function fx evaluated at XX o A picture of this function with XX plotted on the horizontal and fx evaluated at XX plotted on the vertical is the familiar bell shaped Gaussian curve fXX 2 l 53 l 2 1 x 2 fX XX exp Where j 2702 2 X Value of X Range of possible values of X 00 to 00 Exp e Euler s constant 271828 note ei1 1 ii 0 l 2 3 4 717 mathematical constant 314 note 7r circumferencediametei for any circle u Expected value of X the long run average 0392 Variance of X Recall this is the expected value of Xp2 Puleth 540 The Normal Distribution Page 5 of 23 The Standard Normal Distribution is a particular normal distribution It is the one for which u0 and 621 It is an especially important tool in analysis of epidemiological data 0 It is the one for which 0 and 621 0 Tabulations of probabilities for this distribution are available 0 A random variable whose pattern of values is distributed standard normal has the special name zscore or normal deviate o By convention it is usually written as Z rather than X fz Zz Ji exp22 Introduction to the Z Score A tool to compute probabilities of intervals of values for X distributed Normal 0392 0 Of interest is a probability calculation for a random variable X that is distributed Normalu 6 0 However tabulated normal probability calculations are available only for the Normal Distribution with u 0 and 621 We solve our problem by exploiting an equivalence argument 0 Standardization expresses the desired calculation for X as an equivalent calculation for Z where Z is distributed standard normal Normal01 pra X bpr UJSZSCFJJ Thus a a X u Oquot Z score 0 Note The technique of standardization of X involves centering by subtraction of the mean of X which is 11 followed by rescaling using the multiplier 16 Puleth 540 The Normal Distribution Page 6 of 23 SometimesI we might want to know the values of selected percentiles of a Normal E162 distribution To do thisI we work the standardization technigue in the other direction For example we might want to know the median ofa normal distribution of gross income 0 We have only percentile values tabulated for Z distributed Normal01 o The inverse of Standardization relates the percentile for X to that for Z X 02 u pale pile The zscore and its relatives the tscore chi sguare and F statistics are central to the methods of hypothesis testing Puleth 540 The Normal Distribution Page 7 of 23 3 The Sample Average is Often Normally Distributed Introduction to the Central Limit Theorem Recall our focus is on the behavior of the average Xquot ofa sample It is the Central Limit Theorem that gives us what we need The Central Limit Theorem IF 1 We have an independent random sample ofn observations X1 Xn 2 The X1 Xn are all from the same distribution whatever that is 3 This distribution has mean u and variance 0392 THEN as 11 oo 2 X 11 the sampling distribution of Xquot is eventually Normal with mean p and variance fln In words In the long run averages have distributions that are well approximated by the Normal 2 The sampling distribution of Xn upon repeated sampling is eventually Normal u G n Later gSection 7 we ll learn how to compute probabilities ofintervals of values for Xquot distributed Normz ugzn bv using the zscore technigue pra S Sbprcsa HJHSZ SLSEjHH Thus in 27 Z score W W Puleth 540 The Normal Distribution Page 8 of 23 A variety of wordings of the central limit theorem give a feel for its signi cance 1 according to a certain theorem in mathematical statistics called the central limit theorem the probability distribution of the sum of observations from any population corresponds more and more to that of a normal distribution as the number of observations increases ie if the sample size is large enough the sum of observations from any distribution is approximately normally distributed Since many of the test statistics and estimating functions which are used in advanced statistical methods can be represented as just such a sum it follows that their approximate normal distributions can be used to calculate probabilities when nothing more exact is possible Matthews DE and Farewell VT Using and Understanding Medical Statisticsl 2ndl revised edition New York Karger 1988 page 93 2 quotWith measurement data many investigations have as their purpose the estimation of averages the average life of a battery the average income of plumbers and so on Even if the distribution in the original population is far from normal the distribution of sample averages tends to become normal under a Wide variety of conditions as the size ofthe sample increases This is perhaps the single most important reason for the use ofthe normal Snedecor G Wand Cochran WG Statistical Methodsl sixth edition Ames The Iowa State University Press I 96 7 page 35 3 quotIf a random sample ofn observations is drawn from some population of any shape Where the mean is a number a and the standard deviation is a number 6 then the theoretical sampling distribution of Xn the mean of the random sample is nearly a normal distribution with a mean of u and a standard deviation of aJH if n the sample size is 39large39 Moses LE Think and Explain with Statistics Reading Addison Wesley Publishing Company 1986 page 91 Puleth 540 The Normal Distribution Page 9 of 23 4 quotIt should be emphasized that the theorem applies almost regardless of the nature of the parent population that is almost regardless of the distribution from which X1 Xn are a random sample How large 11 must be to have a quotgoodquot approximation does depend however upon the shape ofthe parent population Anderson T W and Sclove SL Introductory Statistical Analysis Boston Honghton Mif in Company 19 74 page 295 Puleth 540 The Normal Distribution Page 10 of 23 4 A Feel for the Normal Distribution What does the normal distribution look like 1 a smooth curve defined everywhere on the real axis that is 2 bell shaped and 3 symmetric about the mean value note Because 0fsymmetty we know that the mean median General 3 This normal shape of a 2 distribution has mean Normal and median 0 Distribution Some Features of the Normal Distribution 1 Look again at the de nition of the normal probability density function on page 4 Notice that it includes only two population parameters the mean u and variance 62 Notice that there are no other population parameters present This allows us to say that the normal probability density function is completely speci ed by the mean and variance This feature is very useful in the calculation of event probabilities which will be described later Puleth 540 The Normal Distribution Page ll of 23 2 The mean it tells you about location Increase 11 Location shifts right Decrease u Location shifts left Shape is unchanged 3 The variance 62 tells you about narrowness or atness of the bell Increase 62 Bell attens Extreme values are more likely Decrease 62 Bell narrows Extreme values are less likely Location is unchanged 4 Very Useful Tool for Research Data If you are exploring some data let s say it is a sample of data X that is distributed normal with mean p and variance 0392 then roughly i 68 of the distribution of X lies in an interval of i la about its mean value 11 ii 95 of the distribution of X lies in an interval of i l96039 about its mean value 11 iii 99 ofthe distribution ofX lies in an interval of i 2576039 about its mean value p 5 Most often this seat of the pants rule is applied to the distribution of the sample mean 5 Roughly i 68 ofthe distribution of 5 lies in an interval of i l aJH about its mean value 11 ii 95 ofthe distribution of 5 lies in an interval of i 196 05 about its mean value 11 iii 99 ofthe distribution of X lies in an interval of i 2576o1i about its mean value p Puleth 540 The Normal Distribution Page 12 of 23 5 Relevance of the Normal Distribution What Data Follow the Normal Distribution There are two kinds of data that follow a normal probability distribution First Type Nature gives us this Nature includes many continuous phenomena yielding sample data for which the normal probability model is a good description For example Heights of men Weights of women Systolic blood pressure of children Blood cholesterol in adults aged 20 to 100 years Second Type Repeated sampling and the Central Limit Theorem gives this If we repeat our research study over and over again so as to produce the sampling distribution of the sample mean 5 this distribution is well described by a normal distribution model by virtue of the Central Limit Theorem This second class is particularly useful in research since often the focus ofinterest is in the behavior reproducibility and variability of sample means rather than individual values 0 Average response among persons randomized to treatment in a clinical trial Puleth 540 The Normal Distribution Page 13 of 23 6 Calculation of Probabilities for the Normal 01 With respect to studies of any normal distribution it eventually boils down to knowing how to work with one particular distribution the Normal01 also called the standard normal or standard gaussian distribution A random variable Z is said to follow the standard normal distribution if it is distributed normal with mean0 and variance1 Recall again the probability density function for this distribution fz Zz Ji exp22 Recall also 0 For a continuous random variable we cannot compute point probabilities such as Probability Zz o What we calculate instead are probabilities ofintervals of values such as Probability a j Z 5 b for some choice of a and b A Probability of an Interval such as Probability Z 5 z is called a cumulative probability 0 Tables for the Normal01 distribution typically provide values of cumulative probabilities of this form Sometimes more is provided Fortunately we don t actually have to do these calculations 0 Such calculations are exercises in calculus and involve the integral of the probability density function 0 Values ofProbability a 5 Z 5 b can be found by utilizing statistical tables for the Normal01 0 They can also be gotten from the computer either software or web Puleth 540 The Normal Distribution Page 14 of 23 Some Useful Tips for Using the Statistical Tables for the Normal01 Tip 1 Symmetry ofthe Normal01 distribution about its mean value 0 has useful implications with respect to the calculation of probabilities Recalling the notation quotZquot to denote the random variable Z and quotzquot to denote a possible actual value symmetry of the standard normal distribution about 0 means PrZ S zPrZ 2 2 Tip 2 Because the integral of a continuous probability distribution you can think of this as the sum of the probabilities associated with all possible outcomes when the distribution is discrete must total exactly one PrZltzPrZ 2 zl PrZltzlPrZ 2 z Tip 3 The facts of symmetry of the Normal01 distribution about 0 and its integral equaling one gives us Puleth 540 The Normal Distribution Page 15 of 23 Tip 4 Some tables for the Normal distribution provide values ONLY of the type Pr Zltz If you want to calculate Then use the table this way PrZ gt a PrZ gt a 1 PrZ lt a PrZ lt a PrZ lt a 1 PrZ lt a PraltZltb PraltZltbPrZlt b PrZlt a Tip 5 Inequalities PrZ lt 2 versus PrZ lt z or PrZ gt 2 versus PrZ gt 2 can seem confusing at rst The key is to know that because Z is continuous when it is distributed Normal01 point probabilities are meaningless That means PrZ 2 zPrZgtz PrZ SzPrZltz Beware Be careful not to make this assumption when you are working with discrete variables I 1 Notation With apology it is nevertheless useful to know the notation involved in the calculation of probabilities Here it is for the setting of probability calculations for the standard normal Normal01 o Fzzis called the cumulative probability density function It is the integral of the probability density function fzz that was introduced on page 5 o Often the letter Z is used to refer to a random variable distributed Normal01 o ProbNormal0l variable lt z ProbZ S zFZ z Puleth 540 The Normal Distribution Page 16 of 23 Example If Z is distributed standard normal What is the probability that Z is at most 182 Suggestion to 2008 class Have a look at Rosn er 6 h Edition table 3 Before following the examples here look at the pictures on the top of page 825 so that you know what probabilities area under the curve are being reported in each column caroI Solution 09656 Step 1 Translate the quotwordsquot into an event quotZ is at most 182 is equivalent to the event Z lt 182 The required probability is therefore PrZ lt 182 Step 2 Use the normal probability table to solve for probability of interest If we use the table in Rosner 6th edition the required table is Table 3 which begins on page 825 Thus our value z182 can be found as z182 on page 827 Looking at the entry for column A read PrZlt 182 09656 Example What is the probability that a standard normal random variable exceeds the value 238 Solution 00087 step 1 Translate the quotwordsquot into an event quotZ exceeds 238 is equivalent to the event Z gt 238 The required probability is therefore PrZ gt 238 step 2 Using the table in Rosner 6th edition on page 827 the correct column is B PrZ gt 238 00087 Puleth 540 The Normal Distribution Page 17 of 23 Alternative Solution Using URL on the Internet Visit httpWWWstatstanfordedu7Enaras39smFindProbabilityhtml Type in 238 and click on RIIHT The answer 000865 appears at bottom 39 238 E Between 50 00 50 The shaded area which is the probability to the right of 238 is 0008656319025 Example How likely is it that a standard normal random variable will assume a value in the interval 258 058 Solution 0714 step 1 Translate the quotwordsquot into an event Z assuming a value in the interval 258 058 is equivalent to Pr 258 lt Z lt 058 Puleth 540 The Normal Distribution Page 18 of 23 step 2 If you are using the table in Rosner 6th Edition reeXpress event of interest into a form that uses the event type Z lt z and use column A Or a form that uses the event of type Z gt z and use column B It s the same either way Pr 258 lt 2 lt 058 PrZ gt 258 PrZ gt 058 Notice now that PrZ gt 258 is not available in the table We can get around this by using Tip 1 which gives us PrZ gt 258 1 PrZ lt 258 1 PrZ gt 258 Thus Pr 258 lt 2 lt 058 PrZ gt 258 PrZ gt 058 1 PrZgt258 PrZgt058 1 PrZ gt 258 PrZ lt 058 step 3 We have What we need to use Table 3 of Rosner 6th edition Locate 258 on page 828 Read from column B that PrZ gt 258 00049 Locate 058 on page 825 Read that PrZ gt 058 02810 step 4 Put this information back into the translation obtained in step 2 Pr258 lt Z lt 058 PrZ gt 258 PrZ gt 058 1 PrZgt258 PrZgt058 1 PrZ gt 258 PrZ gt 058 1 00049 02810 07141 Puleth 540 The Normal Distribution Page 19 of 23 7 From Normal 11 62 to the Standardized Normal 01 The Z Score Seen already are two zscore transformations From these a useful generalization is also seen 2 X y 1 IfX is distributed Normal 11 6 Zscore is distributed Normale X 2 If Xis distributed Normal 11 tiln Zscore is distributed Normal 1 axZ 3 If a generic random variable Y is yE distributed Normal with ZSCOFC is distributed N0rma01 wVarY M EY 039 VarY Note T 0 appreciate the third row notice that in 1 the choice is Y X In 2 the choice is Y It is the zscore transformation that allows us to obtain interval probabilities for ANY Normal Distribution Puleth 540 The Normal Distribution Page 20 of 23 Example The Massachusetts State Lottery averages on a weekly basis a pro t of 100 million dollars The variability as measured by the variance statistic is 625 million dollars squared If it is known that the weekly pro ts is distributed normal what are the chances that in a given week the pro ts will be between 8 and 105 million dollars Solution 367 step 1 Translate the quotwordsquot into an event Since we39re no longer dealing with a standard normal random variable it is convenient to use X to denote the random variable de ned as the pro t earned in a given week X assuming a value in the interval 8105 is equivalent to Pr8 lt X lt 105 step 2 ReeXpress the event of interest into a form that uses the event type X lt X Pr8 lt X lt 105 PrX gt 8 PrX gt 105 step 3 Before using the transformation formula solve for the Vvariance of the normal distribution ofX To see that this is correct look at the 3rd row of the chart on the previous page page 19 This is the standard deviation of the normal distribution of X If 0392 625 Then G62525 Puleth 540 The Normal Distribution Page 21 of 23 step 4 Apply the zscore transformation formula PrX 3 8 PrX 3 105 for the random variable X distributed Normalu10 62625 Pr Xux Z 810 Pr Xux Z10510 5X 25 EX 25 PrZ gt 080 PrZ gt 020 for the random variable Z distributed Normal01 1 PrZ gt 080 PrZ gt 020 step 5 Use the normal probability table to solve for probabilities of interest Using the Table in Rosner 6th edition pages 8256 reveals PrZ gt 080 02119 PrZ gt 020 04207 step 6 Put this information back into the translation obtained in step 2 Pr 8 lt X lt 105 1 PrZ gt 080 PrZ gt 020 1 02119 04207 03674 Puleth 540 The Normal Distribution Page 22 of 23 9 From Normal 01 to Normal 11 62 Sometimes we will want to work the zscore transformation backwards 0 We might want to know values of selected percentiles ofa Normal distribution eg median cholesterol value 0 Knowledge of how to work this transformation backwards will be useful in con dence interval construction 1 If Z is distributed Normal01 Then XGZ u distributed Normalu 62 2 If Z is distributed Normal01 Then 2 Zu is Normal 11 tiln n 3 If Z is distributed Normal01 generic Y VVarYZEY is Normal HY EY 5 VarY Puleth 540 The Normal Distribution Page 23 of 23 Example Suppose it is known that survival time following a diagnosis of mesothelioma is normally distributed with u23 years and variance 6272 years squared What is the 75th percentile the elapsed time during which 75 of such cases are expected to die Solution 411 years step 1 The rst step is to recognize that a percentile references a left tail probability Using the table in Rosner 6th edition column A read that Prob z lt 067 7486 Prob z lt 068 7517 Thus approximately crude interpolation Prob Z lt 0675 75 Thus the 75th percentile of a Normal 01 is Z75 0675 approximately step 2 Let X represent the random variable for the normal distribution of survival times This normal distribution has mean xx 23 and variance 5272 step 3 Work the zscore transformation formula backwards X75 6XZ75 HX J7 2675 23 411 years Th as it is expected that 75 of persons newly diagnosed with mesoth elioma will have died within 411 years This is the same as saying that there is an expected 25 chance 0fsnrviving beyond 411 years
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'