Note for PUBHLTH 640 at UMass
Note for PUBHLTH 640 at UMass
Popular in Course
Popular in Department
This 36 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 17 views.
Reviews for Note for PUBHLTH 640 at UMass
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Puleth 640 3 Discrete Distributions Page 1 of 36 Unit 3 Discrete Distributions Topic 1 Proportions and Rates in Epidemiological Research 2 2 Review Bernoulli Distribution 7 3 Review Binomial Distribution ll 4 Poisson Distribution 18 5 Hypergeometric Distribution 27 6 Fisher s Exact Test 32 7 Discrete Distribution 7 Themes 35 Puleth 640 3 Discrete Distributions Page 2 of 36 1 Proportions and Rates in Epidemiological Research The concepts of proportion of and rate of describe different aspects of disease occurrence This distinction is important to epidemiological research A proportion is a relative frequency It is dimensionless Proportion events that actually occurred events that could have occurred Valid range 0 to l Eg 7 Toss a coin 10 times If we observe 2 heads Proportion heads 210 20 o Events that could have occurred 10 tosses o Events occurred 2 heads Prevalence measures are examples of proportions A rate is a count of event occurrence per unit of time As such it is measured relative to an interval of time It is not dimensionless Rate events that actually occurred time periods experienced Valid range 0 to 00 Eg 100 persons are known to have smoked collectively for 1000 pack years Ifwe observe 3 occurrences of lung cancer Rate lung cancer 31000 pack years 0 Time periods experienced 1000 pack years 0 Events occurred 3 Incidence densities are examples of rates Puleth 640 3 Discrete Distributions Page 3 of 36 Some Commonly Used Proportions and Rates Proportions describe either existing disease or new disease within a time frame Rates describe the force or ow of occurrence of new disease with time Prevalence persons with disease at a point in time persons in the population at a point in time Type of Measure Proportion Denominator events that could have persons in the population at a point in occurred time Numerator events that actually persons having the disease at a point in occurred time Valid range 0 to 1 Intepretation The proportion of the population with disease at a point in time Other names Point prevalence Example In 1988 the New York State Breast and Cervical Cancer Screening Program registry included 16529 women with a baseline mammogram Upon review of their medical records 528 were found to have a history of breast cancer The prevalence of breast cancer in this 1988 selected cohort was 528 P 00319 319 16529 These 528 women were excluded from the analyses to determine the factors associated with repeat screening mammogram Puleth 640 3 Discrete Distributions Page 4 of 36 Cumulative incidence disease onsets during interval of time persons at risk in population at start of interval Type of Measure Proportion Denominator events that could have happened Numerator events that actually occurred persons in the population at the start of the time interval who could possibly develop disease Thus all are disease free at the start of the interval persons developing the disease during the time interval Valid range 0 to 1 Interpretation The proportion of healthy persons who go on to develop disease over a speci ed time period Note We assume that every person in the population at the start was followed for the entire interval of time Example In this example the event of interest is not disease occurrence It is completion of a repeat screening mammogram There were 9485 women in the New York State Breast and Cervical Cancer Screening Program registry as of 1988 with a negative mammogram who did not have a history of breast cancer and provided complete data 2604 obtained a repeat screening mammogram during the 6 year period 19881993 The 6year cumulative incidence of repeat screening mammogram is therefore 2604 9485 02745 27 Further analysis is focused on the identi cation of its correlates Puleth 640 Incidence Density 3 Discrete Distributions disease onsets during interval of time Sum of individual lengths of time actually at risk Type of Measure Rate Denominator time periods experienced event free Numerator new event occurrences Valid range Intepretation Other names Note Sum of individual lengths of time during which there was opportunity for event occurrence during a speci ed interval of timea This sum is called person time and is N 2 time for i3911 person free of event 11 It is also called person years or risk time disease onsets during a speci ed interval of time 0 to 00 The force of disease occurrence per unit of time Incidence rate We must assume that the risk of event remains constant over time When it doesn t strati ed approaches are required Be careful in the reporting ofan incidence density Don t forget the scale of measurement of time I 7 per person year 752 013 per person week 736525 0019 per person day Page 5 of 36 Puleth 640 3 Discrete Distributions Page 6 of 36 What to Use A prevalence estimate is useful when interest is in 0 who has disease now versus who does not one time camera picture 0 planning services eg delivery of health care The concept of prevalence is not meaningfully applicable to etiologic studies 0 Susceptibililty and duration of disease contribute to prevalence 0 Thus prevalence functionsusceptibililty incidence survival Etiologic studies of disease occurrence often use the cumulative incidence measure of frequency 0 Recall that we must assume complete followup of entire study cohort The cumulative incidence measure of disease frequency is not helpful to us if persons migrate in and out of the study population 0 Individuals no longer have the same opportunity for event recognition Etiologic studies of disease occurrence in dynamic populations will then use the incidence density measure of frequency 0 Be careful here too Does risk of event change with time With age 0 If so calculate person time separately in each of several blocks of time This is a strati ed analysis approach Puleth 640 3 Discrete Distributions Page 7 of 36 2 Review Bernoulli Distribution In order to analyze variations in proportions and rates we ll need a few probability models There are 4 and they are interrelated Two of them were introduced in Puleth 540 Introductory Biostatisics Bernoulli and Binomial Two additional distributions that are often used in analyses of discrete data are the poisson and hypergeometric distributions Recall A Bernoulli Trial is the Simplest Binomial Random Variable A simple example is the coin toss 5050 chance ofheads can be recast as a random variable Let random variable representing outcome of one toss with l if heads 0 if tails 11 Probability coin lands heads Thus 11 Probability Zl Puleth 640 3 Discrete Distributions Page 8 of 36 We have what we need to de ne a probabilitv distribution Enumeration of all possible outcomes 1 outcomes are mutually exclusive 0 outcomes exhaust all possibilities Associated probabilities of each Outcome Pr outcome each probability is between 0 and l 1 PrZ1 11 sum of probabilities totals l 0 Pr Z0 111 Total l In epidemiology the Bernoulli mioht be a model for the description of ONE individual N1 This person is in one of two states He or she is either in a state of gt7 1 even with probability 11 or 2 non event with probability 11 The description of the likelihood of being either in the event state or the nonevent state is given by the Bernoulli distribution Puleth 640 3 Discrete Distributions Page 9 of 36 Bernoulli Distribution Suppose Z can take on only two values 1 or 0 and suppose Probability Z l 11 Probability Z 0 11 This gives us the following expression for the likelihood of Zz Probability Z z 11Z l11139Z for 20 or 1 Expected value on is EZ 11 Variance on is VarZ 11 11 Example Z is the result oftossing a coin once If it lands heads with probability 5 then 11 5 Later we ll see that individual Bernoulli distributions are the basis of describing patterns of disease occurrence in a logistic regression analysis Puleth 640 3 Discrete Distributions Page 10 of 36 Mean p and Variance 0392 of a Bernoulli Distribution Mean of Z u 11 The mean on is represented as EZ EZ 11 because the following is true EZ Z zPr0babilityZz 0 Fr z0 1 Fr z1 0 171 t 1101 TE Variance on 62 1111t The variance on is VarZ E Z 7 E Z2 VarZ 11111 because the following is true VarZ E Zn2 Z 2712 ProbabilityZz All possible 2 072392 PrZ0172392 PrZ1 712 171 1402 71 7t17I TE 171 TE 17 Puleth 640 3 Discrete Distributions Page 11 of 36 3 Review Binomial Distribution Two wavs to appreciate the Binomial distribution are the following I 1 A Binomial outcome Sum of several Bernoulli trials I 2 General formula 1 The Binomial as the Sum of Several Bernoulli Trials Consider appreciating one coin toss as an example of one Bernoulli trial 0 Z is distributed Bernoulli TE with Z1 when the event occurs 0 when it does not Toss the coin several times say N times 0 Z1 Z2 ZN are each distributed Bernoulli TE If we add up the Z s we re actually adding up 1 s and 0 s The total number of 1 s is the number of events of success in N trials Call this number of events of success a new random variable X N o ZZi X eventsinNtrials i1 X the outcome of a Binomial can be thought of as the net number of successes in a set of independent Bernoulli trials A simple example is the outcome of several coin tosses eg 7 how many heads did I get The word choice net is deliberate here to remind ourselves that we re not interested in keeping track of the particular trials that yielded events of success only the net number of trials that yielded event of success Eg What is the probability that 2 of 6 graduate students are female What is the probability that of 100 infected persons 4 will die within a year Puleth 640 3 Discrete Distributions Page 12 of 36 N Steps in calculating the probabilitv of ZZZi X X i1 Step 1 7 Pick just one arrangements of X events in N trials and calculate its probability The easiest is the arrangement of X events followed by NX non events 1111 0000 X events NX non events Pr arrangement Pr Z1 1 Zzl Zxl Zx10 Zxz0 ZN0 TC TC TC TE lnlnlnln 71X 1 7 Step 2 7 Determine the number of arrangements of 1 s and 0 s which all yield the same net numbers X events and NX non events N I Number of qualifying arrangements X X NX Step 3 7 Put it together Appreciate that the probability of a net number of events is Probability N trials yields X events qualifying arrangements Prone arrangement 11 PrX X Pr i lZi X Nj xlEN39x X Puleth 640 3 Discrete Distributions Page 13 of 36 2 The Binomial in General Form You can alternatively appreciate a binomial random variable outcome as a description of a sample of size N In this sample some experience the event while the remainder do not Binomial Distribution Among N trials where o The N trials are independent 0 Each trial has two possible outcomes 1 or 0 0 Fr outcome 1 TE for each trial and therefore Pr outcome 0 17 0 X events of outcome N N Probability X X 11x 11 39X for X0 quot39 N x N Expected value is EZ Z X N 11 11 N Variance is VarZ Zi X N 11 11 11 Nu xNI x N J ways to choose X from N x where N NN1N2N3 4321 and is called the factorial Puleth 640 3 Discrete Distributions Page 14 of 36 Your Turn A roulette wheel lands on each ofthe digits 0 l 2 3 4 5 6 7 8 and 9 with probability 10 Write down the expression for the calculation of the following 1 The probability of 5 or 6 exactly 3 times in 20 spins 2 The probability of digit greater than 6 at most 3 times in 20 spins Puleth 640 3 Discrete Distributions Page 15 of 36 1 Solution Event is outcome of either 5 or 6 Prevent 11 20 N 20 X is distributed BinomialN20 1120 PrX32203 1 20 20 203 8017 3 22054 2 Solution Event is outcome of either 7 or 8 or 9 Prevent 11 30 N 20 X is distributed BinomialN20 1130 Translation At most 3 times is the same as saying 3 times or 2 times or 1 time or 0 times which is the same as saying less than or equal to 3 times PrX s 3 PrXO PrX1 PrX2 PrX3 iiiojiiWT 702oix 30 7020 1203017019 ijo2 7018 ijof 70 0 0 10709 Puleth 640 3 Discrete Distributions Page 16 of 36 Review of the Normal Approximation for the Calculation of Binomial Probabilities Calculations of exact binomial probabilities become quite tedious as the number of trials and number of events gets large F01tunately the central limit theorem allows us to replace these exact calculations with very good approximate calculations The approximate calculations are actually normal probability calculations The following is an example where the calculation of the required binomial probability would be too much to do Example Calculate the chances of between 5 and 28 events inclusive in 180 trials with probability of event 041 Idea of Solution Translate the required exact calculation into a very good approximate one using the zscore X distributed Binomial N180 11041 says that HBINOMIAL 1175 18004l 738 61231NOMIAL 117513975 180041959 708 2 O BINOMIAL O BINOMIAL 266 The approx1mate calculatlon us1ng the zscore uses p uBlNOMIAL and o GBWOMIAL Pr5sxs28 s Pr 5 s Norma101 s 2839 G P g NormaKO Dg M 266 266 Pr 895 s Norma101 g 7752 22 Pr 895 S Norma101 because 7752 is in the extreme right tail Pr Normal 01 S 895 because of symmetry of the tails of the normal 8146 Puleth 640 3 Discrete Distributions Page 17 of 36 Review of the Normal Approximation for the Calculation of Binomial Probabilities continued More generally we can use a zscore and the normal distribution for the following reasons 0 Binomial probabilities are likelihood calculations for a discrete random variable Normal distribution probabilities are likelihood calculations for a continuous random variable 0 When substituting for the exact probabilities we use the Normal distribution that has mean and variance parameter values equal to those of our Binomial distribution Illnormal binomial 2 2 Onmmal Obinomial Desired Binomial Probability Calculation Normal Approximation w Correction Pr Xk Pr k12 X k12 PrXgtk PrXgtk12 PrXltk PrXltk12 Puleth 640 3 Discrete Distributions Page 18 of 36 4 Poisson Distribution So far we have considered 0 The description of event occurrence for 1 person Bernoulli o The description of event occurrence in sample of N persons Binomial Instead of thinking in terms of persons think instead about PERSON TIME A familiar example of the idea of person time is pack years smoking Eg 7 How shall we describe a small number of cancer deaths relative to a large accumulation of person time such as 3 cancer deaths in 1000 pack years of smoking Setting 0 It is no longer the analysis of events among N persons as a proportion 0 Instead it is an analysis of events over person time as an incidence rate The Poisson Distribution can be appreciated as an extension of the Binomial Distribution 0 The concept of N persons gt A large accumulation of person time o The likelihood of an event experienced by 1 person gt the likelihood of an event in 1 unit of person time This will be quite small Puleth 640 3 Discrete Distributions Page 19 of 36 The extension 0 We begin by constructing a binomial likelihood situation Let T total accumulation of person time e g 7 1000 pack years n number of subintervals of T eg 7 1000 Tn length ofl subinterval ofT eg 1 pack year 7 event rate per unit length of person time o What is our Binomial distribution probability parameter 11 11 7t Tn because it is ratelength of l subinterval o What is our Binomial distribution number of trials 11 number of subintervals of T We 11 need 3 assumptions 1 The rate of events in each subinterval is less than 1 Rate per subinterval Prl event per subinterval 0 lt 7239 AMTn lt 1 2 The chances of 2 or more events in a subinterval is zero 3 The subintervals are mutually independent Now we can describe event occurrence over the entire interval of length T with the Binomial Let X be the count of number of events n ProbabilityX X 1 x l11quot39x for X0 quot39n x i1 All because 7 MTn x 1 1 1 1 Puleth 640 Some algebra if vou care to follow along will get us to the poisson distribution probabilitv formula 3 Discrete Distributions The algebra involves two things Letting n gtoo in the binomial distribution probability and Recognizing that the expected number of events over the entire interval of length T is XT because 9 is the per unit subinterval rate and T is the number of units analogy for rate of heads 50 number of coin tosses 20 the expected number ofhead is 5020 10 This allows us eventually to make the substitution of 9LT LL E x 1 E N x 1 1 1 1 l l l ill Tl ill Tl 1 1 1 1 1 1 Work with each term on the right hand side one at a time n PrXX X 1st term II n X XnX As n9 in nity the product of terms in the numerator 9 nnn n nx nn1n2nX1 X nn1n2nX1 X 11 X Thus as n 9 in nity 11 nx X X Page 20 of 36 Puleth 640 3 Discrete Distributions Page 21 of 36 239ml term ll Tl El MET Thus What happens next is a bit of calculus As n9 in nity where e constant 2718 Thus as n 9 in nity 1H H n n 4 11 term Finally as n9 the quotient kTn is increasingly like 0n so that gt LET gt 1quot 1 II II Thus as n 9 in nity gt 1 11 Puleth 640 3 Discrete Distributions Page 22 of 36 Now put together the product of the 4 terms and what happens as n9 in nity Pr Xx w Mix w 1 X Poisson Distribution If X is distributed Poisson HX eXpH Probability X X X for X 0 1 oo EXpected value ofX is EX u Variance ofX is VarX u Puleth 640 3 Discrete Distributions Page 23 of 36 Example This example illustrates the correspondence between the Binomial and Poisson likelihoods If lung cancer occurs at a rate of2 per 1000 pack years calculate the probability of exactly 3 cases of lung cancer in 3000 pack years Binomial Poisson n trials What happens as n gtoo and 11 Pr event n gt0 Expected events 1111 p n x HX eXpH Pr Xx events X 75 13975 X Where exp numerical constant e 2718 Solution using the Binomial Solution using the Poisson trials n 3000 Length of interval T 3000 11 002 Rate per unit length 7 002 per 1 pack year u 7LT 002pack year3000 pack years6 Pr3 cases cancer Pr3 cases cancer 3000 3 3 00239982997 6 6X3 6 0891 0892 Puleth 640 3 Discrete Distributions Page 24 of 36 An example of the Poisson sampling design is a cross sectional study Disease Healthv Exposed a b ab Not exposed c d cd ac bd abcd a persons who are both exposed and with disease b persons who are both exposed and healthy 0 persons without exposure with disease d persons without exposure and healthy 0 The counts a b c and d are each free to vary and follow their own distribution 0 a b c and d are 4 independent Poisson random variables 0 Let s denote the means ofthese H11 H12 H21 and uzz Likelihood2x2 table Labcd luii eXplu11 luiz eXplu12 1 eXplu21 z eXplu22 a b c d An association between exposure and disease means the following 1 Prdisease given exposure i Pr disease given no exposure 11 i 21 ll12 Zl r ll 2 Prexp0sure given disease i Pr exposure given no disease 11 i 12 ll21 1222 Puleth 640 3 Discrete Distributions Page 25 of 36 An example of the Binomial sampling design is a cohort study Disease Healthy Exposed a b Not exposed c d ac bd a exposed persons develop disease b exposed persons who do not develop disease 0 unexposed persons who develop disease d unexposed persons who do not develop disease ab xed by design cd xed by design abcd o The counts a and c are each free to vary and follow their own distribution 0 The counts b and d do not vary because of the xed row totals o a and c are 2 independent Binomial random variables 0 Let s denote the means of these 111 and 112 Likelihood2x2 table Lac given ab and cd are fixed 51 b bc d d f l 1 75 a c An association between exposure and disease means the following Prdisease among exposed i Pr disease among nonexposed 72391i72392 Puleth 640 3 Discrete Distributions Page 26 of 36 An example of the Binomial sampling design is a case control study Disease Healthv Exposed a B ab Not exposed c D cd ac xed bd xed abcd a persons with disease whose recall reveals exposure 0 persons with disease whose recall reveals no exposure b healthy persons whose recall reveals exposure d healthy persons whose M reveals no exposure o The counts a and b are each free to vary and follow their own distribution 0 The counts 0 and d do not vary because of the xed column totals o a and b are 2 independent Binomial random variables 0 Let s denote the means ofthese 91 and 92 Likelihood2x2 table Lab give ac and bd are xed 61c bd b 01 6100 61 620 62 a b An association between exposure and disease means the following Prexposure among disease i Pr exposure among healthy 617262 Puleth 640 3 Discrete Distributions Page 27 of 36 5 Hypergeometric Distribution Sometimes a 2X2 table analysis of an exposuredisease relationship has as its only focus the investigation of association In this setting it turns out that the most appropriate probability model is the hypergeometric distribution model The Hypergeometric Distribution is introduced in two ways I l The Hypergeometric Distribution in Games of Poker I 2 The Hypergeometric Distribution in Analyses of 2X2 Tables 1 The Hypergeometric Distribution in Games of Poker Example In a ve card hand what is the probability of getting 2 queens o 2 queens is in reality 2 queens and 3 NON queens 52 0 We know the total number of poss1ble hands 1s 5 0 Because there are 4 queens in a full deck the selections of 2 queens from 4 is 2 0 Similarly there are 52 7 4 48 non queens in a full deck 48 Thus the selections of3 non queens from 48 is 3 J 0 Putting these together the number of hands that have 2 queens and 3 non queens is the product 4 48 2 3 Prp queens in ve card hand Number of ve card hands that have 2 queens Total number of ve card hands 32 Puleth 640 3 Discrete Distributions Page 28 of 36 2 The Hypergeometric Distribution in Analyses of a 2x2 Table Example A biotech company has N259 pregnant women in its employ 23 of them work with Video display terminals Of the 259 pregnancies 4 ended in spontaneous abortion Assuming a central hypergeometric model what is the probability that 2 of the 4 spontaneous abortions were among the 23 women who worked with Video display terminals Spontaneous Abortion HealthV Video Display Terminal 2 21 23 Not 2 234 236 4 255 259 In this example the number of women who worked with Video display terminals 23 is xed That is the m totals are xed Similarly the total number of spontaneous abortions 4 is also fixed That is the column totals are xed 0 And the total number of pregnancies 259 is fixed 0 Thus only one cell count can vary Having chosen one all others are known by subtraction o In this example we want to know if spontaneous abortion has occurred disproportionately often among the Video display terminal workers Is it reasonable that 2 of the 4 cases of spontaneous abortion occurred among the relatively small number 23 of women who worked with Video display terminals 0 The chance model is the central hypergeometric model just introduced Using the approach just learned Puleth 640 3 Discrete Distributions Page 29 of 36 Example At this biotech company what is the probability that 2 of the 4 abortions occurred among the 23 who worked with Video display terminals 2 abortions among VDT workers is in reality 2 abortions among VDT workers and 2 abortions among NONVDT workers 259 We know the total number ofpossible choices 0f4 abortions from 259 pregnancies is 4 J 23 As there are 23 VDT workers the selections of 2 from this group is 2 J Similarly as there are 259 7 23 236 nonVDT workers the number of selections of 236 2 from th1s group 1s 2 J Thus the probability of 2 of the 4 abortions occurring among VDT workers by chance is the hypergeometric probability 23 236 2 2 59 Puleth 640 3 Discrete Distributions Page 30 of 36 3 A Hypergeometric Probability Model for a 2x2 Table of Exposure Disease Counts Now we consider the general setting of a 2x2 table analysis of exposuredisease count data Case Control Exposed a b ab xed Not exposed c d cd xed ac xed bd xed 11 a b c d Here the only interesting cell count is a persons with both exposure and disease Example In this 2x2 table with xed total exposed and xed total number of events what is the probability of getting a events of exposed and event 4 gt7 0 a events of case among ab exposed is in reality 4 gt7 a events among ab exposed AND 4 gt7 0 events among cd NOT exposed 0 We know the total number of possible choices of ac cases from abcd total is abcd ac ab 0 As there are ab exposed the selectlons of a from th1s group 1s a 0 Similarly as there are cd NOT exposed the number of selections of c from this 0d group 1s 0 0 Thus the probability of a events of CASE occurring among ac EXPOSED by chance is the hypergeometric probability a b c d 7 a c 7 abcd ac Note This is actually the central hypergeometric More on this later Puleth 640 3 Discrete Distributions Page 31 of 36 Nice Result The central hypergeometric probability calculation for the 2x2 table is the same regardless of arrangement of rows and columns Cohort Design Case Control Design Prquota cases among quotabquot exposedi xed marginals Prquota exposed among quotacquot casesi xed marginals abcd abcd ac ab Puleth 640 3 Discrete Distributions Page 32 of 36 6 Fisher s Exact Test of Association in a 2x2 Table Now we can put this all together We have a chance model for the probability of obtaining whatever count a we get in the upper left cell of the 2x2 table It is the hypergeometric probability distribution model This chance model the central hypergeometric distribution is the null hypothesis probability model that is used in the Fisher s exact test of no association in a 2x2 table The odds ratio OR is a single parameter which describes the exposure disease association Consider again the VDT exposure and occurrence of spontaneous abortion data Disease HealthV Exposed 2 21 23 Not exposed 2 234 236 4 255 259 We re not interested in the row totals Nor are we interested in the column totals Thus neither the Poisson nor the Bionomial likelihoods are appropriate models for our particular question Rather our interest is in the number of persons who have both traits 7 exposureyes and disease es Is the count of 2 with exposure and disease signi cantly larger than what might have been expected if there were N 0 association between exposure and disease We re interested in this likelihood because it is the association the odds ratio OR that is of interest not the row totals nor the column totals Thus we will take advantage of the central hypergeometric model Puleth 640 3 Discrete Distributions Page 33 of 36 What is the null hypothesis likelihood of the 2x2 table if the row and column totals are xed Recall the layout of our 2x2 table With row and column totals xed only one cell count can vary Let this be the count a Disease Healthv Exposed a b ab Not exposed c d cd ac bd abcd Null Hypothesis Conditional Probability of 2x2 Table Conditional means Row totals xed and column totals xed When the OR 1 we have the central hypergeometric model just introduced ac bd a b Probability a given row and column totals given ORl abcd ab 4 aa optional 7 for the interested reader When the OR at l the correct probability model for the count a 1s a different hypergeometric distribution this latter is called a noncentral hypergeometric distribution Interestingly the probability calculation involves the magnitude of the OR which is now a number different from 1 Probability a given row and column totals when OR l is the non central hypergeometric probability ac bd a b minal7a a c b d ORT 012 umax0ard Ll a b Ll Note You will not need to work with the noncentral hypergeometric distribution in this course Puleth 640 3 Discrete Distributions Page 34 of 36 How to compute the p value for this hypothesis test The solution for the p value will be the sum of the probabilities of each possible value of the count a that are as extreme or more extreme relative to a null hypothesis odds ratio OR 1 Stay 1 If the row and column totals are xed what values of the count a are possible 4 gt7 Answer Because the column total is 4 the only possible values of the count a are 0 1 2 3 and 4 Stay 2 What are the probabilities of the ve possible 2x2 tables Answer We calculate the hypergeometric distribution likelihood for each of the 5 tables While we re at it we ll calculate the empirical odds ratio OR accompanying each possibility a0 Disease Healthy Exposed 0 23 23 Prtable6875 Not exposed 4 232 236 OR0 4 255 259 a1 Disease Healthy Exposed 1 22 23 Prtable2715 Not exposed 3 233 236 OR36 4 255 259 Puleth 640 3 Discrete Distributions Page 35 of 36 a2 Note 7 This is our observed Disease Healthv Exposed 2 21 23 Prtable0386 Not exposed 2 234 236 OR111 4 255 259 3 Disease Healthv Exposed 3 20 23 Prtable0023 Not exposed 1 235 236 OR3536 4 255 259 a4 Disease Healthv Exposed 4 19 23 Prtable0001 Not exposed 0 236 236 ORinfinite 4 255 259 Stay 3 What is the pvalue associated with the observed table Answer The observed table has a count of a 2 Recalling our understanding of the ideas of statistical hypothesis testing the pvalue associated with this table is the likelihood of this table or one that is more extreme The more extreme tables are the ones with higher counts of a because these tables correspond to settings where the OR are as large or larger than the observed value of 111 pvalue Pr Table with a 2 Pr Table with a 3 Pr Table with a 4 0386 0023 0001 0410 Stay 4 Interpretation of Fisher s Exact Test calculations Under the null hypothesis of no association between exposure and disease the chances of obtaining an OR111 or greater when the row and column totals are xed are approximately 4 in 100 Puleth 640 3 Discrete Distributions Page 36 of 36 7 Discrete Distributions Themes With Replacement Without Replacement Framework A proportion 11 are event A xed pOPUIatiOH 0f Size N contains a subset of size M that are event Sampling Sample of sizem with replacement Sample of sizen without replacement Outcome events X events X M N M Likelihood of outcome 72x 72H x n x N 11 Example Test of equality of proportion Test of independence
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'