Introductory Applied Statistics for the Life Sciences
Introductory Applied Statistics for the Life Sciences STAT 371
Popular in Course
Popular in Statistics
Mrs. Triston Collier
verified elite notetaker
This 13 page Class Notes was uploaded by Mrs. Triston Collier on Thursday September 17, 2015. The Class Notes belongs to STAT 371 at University of Wisconsin - Madison taught by Staff in Fall. Since its upload, it has received 12 views. For similar materials see /class/205079/stat-371-university-of-wisconsin-madison in Statistics at University of Wisconsin - Madison.
Reviews for Introductory Applied Statistics for the Life Sciences
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/17/15
Goodness of fit 2 classes R W 78 22 Do these data correspond reasonably to the proportions 31 We could use what we ve learned A couple of lectures ago we discussed several options for testing pR 075 o Exact pvalue o Normal approximation o Randomization test Goodness of fit 3 classes RR RW WW 35 43 22 Do these data correspond reasonably to the proportions 121 The X2 test Back to the first example R W totaI observed 78 22 100 expected 75 25 100 Say pR PrR and pW PrW1i pR We want to test Ho pR pW 3414 versus Ha pR pW 7f 3414 Consider the statistic 2 i observed 7 expected2 X 7 2 expected 78 i 752 22 i 252 048 75 25 Null distribution Observed counts nR nW with nR nW 100 Under the null hypothesis nR m binomialn 100p 34 Possible values of nn 0 12 100 Corresponding probabilities 120k100 k Consider the correponding values of the X2 statistic null distribution of X2 Alternatively use computer simulation to estimate the null distribution Even better for large samples the null distribution is approximately X2df 1 Exact null distribution 95th percentile 341 liliiim l l l l l l o Obs 048 P 055 x2 x2df1 distribution 95th percentile 384 o Obs 048 P 049 Generalization to more than two groups If we have k groups then the X2 statistic is still observed expected2 X2 Z lt expected If H0 is true and the sample size is large X2 N X2dfk 1 3 groups x2 df2 5 groups x2 df4 0 4 0 4 03 o 3 o 2 02 o 1 01 h 0 0 l l l i l l l 0390 l l l l 7 l 7 l o 5 10 15 2o 25 o 5 10 15 2o 25 7 groups x2 df6 9 groups x2 df8 0 4 0 4 03 o 3 o 2 o 2 Our 3 group example We observe data like that in the following table RR RW WW 35 43 22 We want to know Do these data correspond reasonably to the proportions 1217 Our 3 group example We observe data like that in the following table RR RW WW observed 35 43 22 expected 25 50 25 observed 7 expected2 X2 2 expected 7 35 i 25 43 i 50 22 i 25 4 25 50 25 534 1 pchisq534 2 R6996 Or chisqtest c354322 pc025 05 025 w Another example In a dihybrid cross of tomatos we expect the ratio of the phenotypes to be 9331 In 1611 tomatos we observe the numbers 926 288 293 104 Do these numbers support our hypothesis Phenotype Obs Exp ObsExp2Exp Tall cutleaf 926 9062 043 Tall potatoleaf 288 3021 065 Dwarf cutleaf 293 3021 027 Dwarf potatoleaf 104 1007 01 1 Sum 161 1 1 47 11 Results Obs x2 i The X2 statistics is 147 Using a X2df3 distribution we get a pvalue of 069 We therefore have no evidence against the hypothesis that the ratio of the phenotypes is 9331 Stepping back We observe data like that in the following table R R RW WW 35 43 22 We want to know Do these data correspond reasonably to the proportions 121 l have neglected to make precise the role of chance in this business Multinomial distribution o Imagine an urn with k types of balls Let pi denote the proportion of type i o Draw n balls with replacement o Outcome n1 n2 nk with 2i ni n where ni no balls drawn that were of type i Examples o The binomial distribution the case k 2 0 Self a heterozygous plant obtain 50 progeny and use test crosses to determine the genotypes of each of the progeny 0 Obtain a random sample of 30 people from UW and classify them according to studentfacultystaff Multinomial probabilities nl PXn77Xn 39 n1gtltgtlt nk 1 1 k k n1lx xnkp1 pk if Ognigm Zinin Otherwise PX1n177Xknk O Example Let p17p27p3 025 050 025 and n 100 Then 100 35 43 22 PX1357X2437X322 m 025 050 025 m 73 x 10 4 Rather brutal numerically speaking The solution take logs and use a computer Goodness of fit test We observe n17n27 n3 N multinomia n p17p27p3 We seek to test Ho p1 025 p2 05 p3 025 versus Ha H0 is false We need a A test statistic b The null distribution of the test statistic Test statistic 2 7 W X i Z expected Null distribution of test statistic What values of X2 should we expect if Ho were true The null distributions of these statistics may be obtained by c Bruteforce analytic calculations o Computer simulations o Asymptotic approximations The brute force method PrX2gHo Z Prn17n27n3Ho n1702703 givingX2g This is usually not feasible Computer simulation 1 Simulate a table conforming to the null hypothesis eg simulate n17n27 n3 N multinomial n100 14 12 14 2 Calculate your test statistic 3 Repeat steps 1 and 2 many 69 1000 or 10000 times Estimated critical value the 95th percentile of the results Estimated Pvalue the prop n of results 2 the observed value In R use rmultinomn size prob to do n simulations Of a multinomialsize prob Asymptotic approximation Very mathemathically savy people have shown that if the sample size n is large X2X2k 1 Example We observe the following data m 35 43 22 We imagine that these are counts n1 n2 n3 N multinomial n100 p17p2 p3 We seek to test Ho p1 147p2 127p3 14 We calculate X2 m 534 Referring to the asymptotic approximations X2 dist n with 2 degrees of freedom we obtain P m 69 With 10000 simulations under H0 we obtain P m 74 Est39d null dist39n of chi square statistic Observed 95th ile 600 l X2 Summary and recommendation For the X2 test 0 The null distribution is approximately X2k 1 if the sample size is large 0 The null distribution can be approximated by simulating data under the null hypothesis If the sample size is sufficiently large that the expected count in each cell is 2 5 use the asymptotic approximation without worries Otherwise consider using computer simulations