Introductory Applied Statistics for the Life Sciences
Introductory Applied Statistics for the Life Sciences STAT 371
Popular in Course
Popular in Statistics
This 12 page Class Notes was uploaded by Mrs. Triston Collier on Thursday September 17, 2015. The Class Notes belongs to STAT 371 at University of Wisconsin - Madison taught by Quoc Tran in Fall. Since its upload, it has received 52 views. For similar materials see /class/205074/stat-371-university-of-wisconsin-madison in Statistics at University of Wisconsin - Madison.
Reviews for Introductory Applied Statistics for the Life Sciences
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/17/15
Quoc Tran B248D MSC tran statlwiscledu STAT 371 Discussion 6 October 24 2006 1 The following table shows the number of bacteria colonies present in each of several petri dishes after Ecoli bacteria were added to the dishes and they were incubated for 24 hours The soap dishes contained a solution prepared from ordinary soap the control dishes contained a solution of sterile water see the data on the blackboard Question ls there any difference between these two groups to Comparing Two Groups a Standard error of Q1 7 372 2 2 7 51 52 532117272 71 72 b The con dence interval for M1 7 pg o If the population standard deviations are known the con dence interval for M1 7 Hg is 2 2 71 72 1 712 371 7 Y2 l 2 gtlt where z is selected so that the area between 72 and 2 under the StandardNormal curve is the desired con dence level o If the population standard deviations are unknown the con dence interval for M1 7 Hg is 7 7 2 2 Y17Y2ltgtlt V711 712 SE SE3 SEim 1 Slag712 1 where SE1 8117711 and SE2 8217712 3 Hypothesis Tests Here is the df formula df a Testing H0 3 M1 M2 0 Test statistic t 291 Q2 S 5 5 71 M 0 Alternative hypothesis and P values Alternative Hypothesis P values HA2p17p2gt0 PrTgtt9 HAu17u2lt0 PrTlttsPrTgt7ts HAu17u27 0 2PrTgtltsl b Interpreting a P Value The p value is a measure of how consistent the data is with the null hypothesis in consideration of a speci c alternative hypothesis The smaller the p value the more inconsistent the data is with the null hypothesis the stronger the evidence is against the null hypothesis in favor of the alternative Of ce hours 230430pm Tues at B248D MSC 1 httpwwwlstattwiscleduNtran Quoc Tran B248D MSC tran statiwisciedu c Comparing Oz and P values o The signi cance level 04 and p values are both areas under 25 curves but they are not the same thing 0 The signi cance level 04 is a prespeci ed arbitrary value that does not depend on the data 0 The p value depends on the data 0 Rejecting the null hypothesis when the p value is less than the signi cance level 04 d Type I and Type II Errors 0 Rejecting a true null hypothesis is a Type I error The probability of a Type I error is 04 Prrejecting H0 7 H0 is true You cannot make a Type I error when the null hypothesis is false or you accept the null hypothesis 0 Not rejecting a false null hypothesis is a Type II error The probability of Type II error is B Pr not rejecting H0 7 H0 is false You cannot make a Type II error when the null hypothesis is true or you reject the null hypothesis e Relationship between t tests and con dence intervals We would reject the null hypothesis H0 M1 7 p2 0 versus HA M1 7 p2 7 0 at the 04 005 level of signi cance if and only if a 95 con dence interval for M1 7 2 does not contain 0 f Power The chance of not making a Type II error when H0 is false7that is the chance of rejecting H0 when it is false7is called the power of a statistic test Power 1 7 B Prreject H07H0 is false 4 Examples 0 Example 1 737 on page 247 0 Example 2 Suppose that a 95 con dence interval for 17112 is calculated to be 724 708 If we test H0 p1 2 versus HA p1 7 2 using 04 010 will we reject H0 Why or why not Of ce hours 230430pm Tuesi at B248D MSC 2 httpwwwistattwiscieduNtran Quoc Tran B248D MSC tran statwiscedu STAT 371 Discussion 8 November 14 2006 1 2 gtlt 2 tables Test of Independence 0 State hypotheses H0 3 101 102 Calculate a test statistic We will again use the X2 test statistic7 but for a 2 gtlt 2 table we have a different formula for nding the expected values for each cell and a different formula for the degrees of freedom sum of row 239 gtlt sum of column j table total Expected count in cell z j degrees of freedom rows 1 columns 1 Compare the test statistic to its null distribution The X2 test statistic follows approximately a X2 distribution with 1 degrees of freedom under the null hypothesis 0 Compute a p value R code to compute the p value 1 pchisqX271 2 Con dence Intervals for pl 7102 The adjusted proportions are 151 2 for 239 17 2 The formula for the 95 con dence interval for a difference in population means pl 7 p2 then becomes 1511101 1521102 i i 196 101 102 m 2 m 2 0 Example 1 1017 on page 411 3 Examples Most salamanders of the species P are red striped7 but some of individuals are all red The all red form is thought to be a mimic of the salamander N7 which is toxic to birds In order to test whether the mimic form actually survives more successfully7 163 striped and 41 red individuals of P were exposed to predation by a natural bird population After two hours765 of the striped and 23 of the red were still alive Use a chi square test to assess the evidence that the mimic form survives more successfully Use a directional alternative and let a 05 0 Example 2 1035 on page 421 An ecologist studied the spatial distribution of tree species in a wooded area From a total area of 21 acres7 he randomly selected 144 plots7 each 38 feet square7 and noted the presence of absence of maples and hickories in each plot The results are shown in the table Test the null hypothesis that the two species are distributed independently of each other Use a nondirectional alternative and let a 001 In stating your con clusion7 indicate whether the data suggest attraction between the species or repulsion Support your interpretation with estimated conditional probabilities from the data Of ce hours 230 430pm Tues at B248D MSC 1 httpwwwstatwiscedutran Quoc Tran B248D MSC tran statwiscedu 0 Example 5 1059 on page 441 For women who are pregnant with twins7 complete bed rest in late pregnancy is com monly prescribed in order to reduce the risk of premature delivery To test the value of this practice7 212 women with twin pregnancies were randomly allocated to a bed rest group or a control group The accompanying table shows the incidence of preterm delivery Let p1 and p2 represent the probabilities of preterm delivery in the row con ditions Construct a 95 con dence interval for pl 7 p2 Does the con dence interval suggest that bed rest is bene cial Of ce hours 230 430pm Tues at B248D MSC 2 httpwwwstatwiscedutran Stat 371 Discussion 5 1 Sampling distributions 7 Central Limit Theorem For pretty much any distribution you are going to come across Yn will be normally distributed with the same mean as each Y and variance 0271 for large n A rule of thumb is that this holds for n 2 30 if the distribution is not very skewed If Y is normally distributed in the rst place then we have the same result but it is exact and holds for any sample size 7 Normal Approximation to the Binomial lf np and n1 7 p are both 5 or bigger then a binomial with parameters 71 and p can be approximated by a normal with mean hp and variance np17p This can be improved using the continuity correction which can be seen by writing PX z PX lt z 1 PX z 5 for X binomial 2 Con dence Interval for u e If X Nng then the 1 e a 100 CI is X 7 Za2X Za2gt 7 If n is large then the 17 Oz 100 CI is 7 039 7 039 X ZaZ v X ZaZ S 7 S X ZaZW7 X 2042 if a is unknown 3 Con dence Intervals for p An approximate 95 CI for p is 137196 41733 13 196 9541722 where 13 373 Quoc Tran B248D MSC tran statlwiscledu STAT 371 Discussion 5 March 3 2009 1 Standard Error of the Mean 0 We know that SD of the sampling distribution of the sample mean y can be computed by this formula 039 0 7 Y W 0 But if we only observe sample data yl yn we do not know the value of the population SD 0 so we cannot use the formula directly 0 However we can compute the sample standard deviation 3 which is an estimate of the popu lation standard deviation 0 o The expression 8 SE7 W is called the standard error of the sample mean and is an estimate of the standard deviation of the sampling distribution of the sample mean 2 Con dence Interval for Population Mean 1 a If a is known the con dence interval for u is 37 i z X o If the desired con dence level is 90 then let 2 1645 because the area between 72 and 2 under a standard normal curve is 09 039 Prg71645 x i lt pg 31 1645 x 7 09 71 f w o If the desired con dence level is 95 then let 2 196 because the area between 72 and 2 under a standard normal curve is 095 039 039 P 7196 7 lt lt 196 7 095 ry X W 7 M 7 y X W b If a is unknown we use the sample standard deviation as an alternative and the con dence interval for u is 37 i t x SEQ that is i t x i y W where t is selected so that the area between it and 25 under a 25 distribution curve with n 7 1 degrees of freedom is the desired con dence level c How big should n be Guessed SD 2 Desired SE d Summary of Conditions In summary Student s t method of constructing a con dence interval for u is appropriate if the conditions stated in the following hold Of ce hours 300500pm Mon at B248D MSC 1 httpwwwlstattwiscleduNtran Quoc Tran B248D MSC tran statiwisciedu 0 Condition on the design of the study a It must be reasonable to regard the data as a random sample from a large population b The observation in the sample must be independent of each other 0 Condition on the form of the population distribution a If n is small the population distribution must be approximately normal b If n is large the population distribution need not be approximately normal 3 Con dence Interval for a Population Proportion 13 a Sample proportion 13 where y is the number of observations out of 71 If 71 small we use 13 273 instead of13 b Standard error of 13 251 i p SE p n 4 c Con dence Interval for p 15 i t x SE1 d How big should n be Guessed 131 7 Guessed 13 4 n Desired SE2 If 13 is hard to guess use 13 05 4 Comparing Two Groups a Standard error of Q1 7 172 2 Q SEZ1272 n1 712 b The con dence interval for 111 7 122 o If the population standard deviations are known the con dence interval for 111 7 122 is 2 2 3717Y2izx 112 711 712 where z is selected so that the area between 72 and 2 under the StandardNormal curve is the desired con dence level o If the population standard deviations are unknown the con dence interval for 111 7 122 is 2 2 Y17Y2itgtlt1 n1 n2 7 SE SE32 SE fnl 71 SEgn2 71 where SE1 8117711 and SE2 8217712 Here is the df formula 5 Examples Prob 610 page 194 Prob 640 645 page 212 Prob 77 page 226 Of ce hours 300500pm Mon at B248D MSC 2 httpwwwistattwiscieduNtran Quoc Tran B248D MSC tran statlwiscledu STAT 371 Discussion 6 March 10 2009 1 Hypothesis Tests Question ls there any difference between these two groups a Testing H0 3 M1 M2 0 Test statistic t 7 291 i 92 s 7 SEZ1722 0 Alternative hypothesis and P values Alternative Hypothesis P values HACIU17ILL2gtO PrTgtt5 HAu17u2lt0 PrTltt9PrTgt7ts HAu17u27 0 2PrTgtltsl b Interpreting a P Value The p value is a measure of how consistent the data is with the null hypothesis in consideration of a speci c alternative hypothesis The smaller the p value7 the more inconsistent the data is with the null hypothesis7 the stronger the evidence is against the null hypothesis in favor of the alternative A O V Comparing Oz and P values o The signi cance level 04 and p values are both areas under 25 curves7 but they are not the same thing 0 The signi cance level 04 is a prespeci ed7 arbitrary value7 that does not depend on the data 0 The p value depends on the data 0 Rejecting the null hypothesis when the p value is less than the signi cance level 04 A CL V Type I and Type II Errors 0 Rejecting a true null hypothesis is a Type I error The probability of a Type I error is 04 Prrejecting H0 7 H0 is true You cannot make a Type I error when the null hypothesis is false7 or you accept the null hypothesis 0 Not rejecting a false null hypothesis is a Type II error The probability of Type II error is B Pr not rejecting H0 7 H0 is false You cannot make a Type II error when the null hypothesis is true or you reject the null hypothesis A D V Relationship between t tests and con dence intervals We would reject the null hypothesis H0 M1 7 p2 0 versus HA M1 7 p2 7 0 at the 04 005 level of signi cance if and only if a 95 con dence interval for M1 7 p2 does not contain 0 Power A l h V The chance of not making a Type II error when H0 is false7that is the chance of rejecting H0 when it is false7is called the power of a statistic test Power 1 7 B Prreject H07H0 is false Of ce hours 300500pm Mon at B248D MSC 1 httpwwwlstattwiscleduNtran Quoc Tran B248D MSC tran statwiscedu STAT 371 Discussion 11 May 5 2009 1 Linear Regression a Simple Linear Regression 0 Any line we can use to predict Y from X will have the form Y be le where be is the intercept and 191 will be the slope bzmrmm7m 1 Z 96139 i 92 intercept b0 y 7 bli Slope b0 and b1 are the estimates of 60 and 61 respectively7 in the following linear model MYlX 50 51X where MY X is the population mean Y value for a given X o The value qj be blz is the predicted value of Y if the explanatory variable X x For each data point xi77 the residual is the difference between the observed value and the predicted value7 yi 7 y Simple linear regression identi es the line that minimizes the residual sum of squares SSres d 7 fl2 Total sum of squares SSt0tal 7 fl2 Regresion sum of squares SSreg 7 Ii2 Equation SSt0tal SSreg SSresz39d b Residual Standard Deviation The residual standard deviation7 SY X7 is a measure of a typical size of a residual It tells how far above or below the regression line points tend to be lts formula is SSres d SY X n 7 2 Within the framework of the linear model and the random subsampling model7 SY X is an estimate of UY X Inference Standard error of b1 is A O V 5YlX Z 96139 i i 95 con dence interval for 61 is b1 i 15V025SE171 with degree of freedom for t is n 7 2 d Correlation o The correlation coef cient r is the measure ofthe strength of the linear relationship between two variables7 on a scale from 1 to 1 m7mmim Z 96139 i i Z in 7 7i SE51 Of ce hours Mon 300 500pm 1 httpwwwstatwiscedutran Quoc Tran B248D MSC tran statwiscedu o The correlation coef cient is 71 or 1 only when the data lies perfectly on a line with negative or positive slope7 respectively o If the correlation coef cient is near 17 this means that the data is tightly clustered around a line with a positive slope 0 Correlation coef cients near 0 indicate weak linear relationships 0 Alternative formula correlation coef cient7 r is the square root of coef cient of determination7 r2 with sign of the slope of the regression line 0 Coef cient of determination r2 1 7 0 Approximate relation of r to SY X and 8y 5 1 7 r2 N ile 5Y 2 Examples Example 1 126 on page 539 Example 2 R solution for 126 lnterpret the output of R gt Xc003366101020203030 gt Yc33331029827828029025523818315511710 gt fitlmYX gt summary fit Call lmformula Y X Coefficients Estimate Std Error t value Prgtt Intercept 3182979 055693 5715 653e14 O7120l 003589 1984 232e 09 Signif codes 0 0001 001 00501 1 Residual standard error 1295 on 10 degrees of freedom Multiple R Squared 09752 Adjusted R squared 09727 F statistic 3936 on 1 and 10 DF p value 2321e 09 gt anovafit Analysis of Variance Table Response Y Df Sum Sq Mean Sq F value PrgtF 1 66057 66057 39364 2321e 09 1678 168 X Residuals 10 Example 3 1223 on page 553 Construct a 95 con dence interval for the slope of the regression line in the previous example Of ce hours Mon 300 500pm 2 httpwwwstatWiscedutran Quoc Tran MSC Statistics B248D transtatwiscedu STATISTICS 371 Discussion 1 September 13 2006 Of ce B248D MSC 1300 University Ave Of ce hours 230 430pm Tuessday My website httpstatwiscedutran 1 Summary 0 Summaries of Frequency Distribution 7 A frequency distribution is list of the observed categories and a count of the number of observations in each i A frequency distribution may be displayed with a table or with a bar chart 0 Summaries of Categorical Variables Variables that are not numbers Can not be displayed with dotplot7 histogram7 stem and leaf diagrams o Summaries of Quantitative Variables Quantitative variables from very small samples can be displayed with the dotplot Histograms are a more general tool for displaying the distribution of the quanti tative variables 7 Too few classes is an over summary of the data7 and meanwhile too many classed can cloud important features of the data with noise 0 Stem and leaf diagrams They are useful for showing the shape of the distribution of the small data sets without losing any or much information 0 Measures of Center 7 The sample mean 21 yi 77 y where the ys are the observations in the sample and n is the sample size that is7 the number of ys Mean is sensitive to outlying values Quoc Tran MSC Statistics B248D transtatwiscedu D i The sample median is the middle value at least half of the values are larger and at least half are smaller lf sample size is odd7 only one number will be the median the one in the middle lf sample size is even7 median the average of the middle two 0 Quantiles The st quantile7 Q17 is the location that separates the smallest quarter of the data from the rest 7 The third quantile7 Q37 is the location that separates the top quarter of the data from the rest 7 The median is the second quantile o lnterquartile range lQR the difference between the rst and third quartiles QR Q3 Q1 0 Boxplots In a boxlpot7 a box extending from the rst to third quantiles represents the middle half of the data The box is divided at the median7 and whiskers extend from each end to the maximum and minimum Examples Example 1 210 on page 25 Example 2 211 on page 25 Example 3 Suppose we have the following data set 96797872718767574 a Construct a frequency distribution of these data and display it as a histogram b Determine the mean and median of the data7 and mark their location on the histogram Example 4 232 on page 39 Example 5 248 on page 49
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'