# Introductory Applied Statistics for the Life Sciences STAT 371

UW

GPA 3.57

This 7 page Class Notes was uploaded by Mrs. Triston Collier on Thursday September 17, 2015. The Class Notes belongs to STAT 371 at University of Wisconsin - Madison taught by Staff in Fall.

Date Created: 09/17/15

STAT 371 DISCUSSION 2 TA Lane Burgette O ice 1245F MSC 1300 Universtiy Avenue Email burgette statwiscedu URL wwwstatwisceduburgette371html or naviagate from statwiscedu O ice Hours M 115 215 T 925 1025 1 Probability 1 Interpretation We use the frequentist interpretation of probability in this class as opposed to the Bayesian interpretation If we imagine repeating the experiment a large number of times the ratio of successes to total trials will go to some sort of a limit which is the probability of our event If for example PE 5 does this mean that eventually the number of successes will approximately equal half the number of trials 2 Basic Laws of Probability o Probabilities are always between zero and one o The sum of the probabilities of all possible events is one o Probabilities of the union or of disjoint events add 0 Probabilites of the intersection and of independent events multiply 3 Examples 0 Roll two dice and add them together What are the assiciated probabilities See if you can use the random variable notation 0 You are playing a gambling game where you ip a coin heads you win a dollar and tails you lose it You start off with only one dollar What is the probability that you will have enough money to play the fourth round Hint using a tree diagram may be helpful 2 Random Variables Densities and Expectations 1 Random Variables We denote random variables by upper case letters like X or Y and the respective realiza tions by the corresponding lower case letter Then we can use the notation PX For example in the two dice example PX 3 336 Try to pay attention to upper and lower case letters in this class to Density Curves Density curves give us the shape of a continuous distribution How do we come up with Pa X g b 3 Expected Values For discrete random varialoles7 the mean or expected value is de ned as M EX 2miPX mi and the variance is 02 VarX 7 u2PX where the sums go over all possible mi An equivalent formula is 02 ZmPX 7 2 Write out expressions for the mean and variance of the dice example STY T371 DISCHJSSHDN 7 Ckmober1972002 1 Conditions for validity of estimation methods 0 Conditions regarding the design of the study i It must be reasonable to regard the data as a random sample from a large population 7 The observations in the sample must be independent of each other 0 i If n is samll7 the population distribution must be approxi mately normal i If n is large7 the population distribution need not be appro mately normal The requirement that the data are a random sample is the most important condition 2 Con dence interval for a population proportion o The 95 con dence interval for p is 15 i 196SEZ where y2 p n 4 and standard error for is 1 7 SE13 p p n 4 0 Planning a study to estimate p If a desired value of SE is speci ed7 and if a rough informed guess of is available7 than the require sample size n can be determined from the following equation Guessed 1 7 Guessedp n 4 Desired SE 3 Comparison of two independent samples 0 Standard error of 731 7 172 is 2 2 51 52 SElty17272gt 771 772 V SE SE3 whereSE1 SE21 Jill 1 and SE2 SEQ2 Vii 2 o The 1001 7 00 con dence interval for M1 7 2 is constructed as 171 172 i 1 SElty17zi2 The degree of freedom of Students t distribution is SE SE3 SE4 SE4 W711 ltn27igt df 52 where SE1 Jill 1 and SE2 m Qi Tang 1276 MSC qitang statwiscedu STAT 371 Discussion 9 April 4 2006 1 Analysis of Categorical Data 0 The data in each cell must be from observations randomly selected when we use the method of Goodness of Fit and test of independence 0 X2 Test of Goodness of Fit State hypotheses H0 Prcategorical l p17Prcategorical 2 pg7 HA the probabilities are different 7 Calculate a test statistic 2 X2 W7 where E npi iecatego39ries 1 Notice that X2 cannot be negative Larger values indicate larger discrepancy be tween what is observed and what is expected 7 Compare the test statistic to its null distribution The X2 test statistic follows approximately a X2 distribution with V degrees of freedom under the null hypothesis where V number of categories 71 7 Compute a p value R code to compute the p value l pChlSqX27V Here X2 is the test statistic7 and V is the degrees of freedom Interpret the results in the context of the problem The large p value indicates that the data is consistent with the null hypothesis We will reject the null hypothesis if the p value is less than or equal to the the signi cant level a 0 Test of Independence State hypotheses H0 Row variable and column variable are independent 7 Calculate a test statistic We will again use the X2 test statistic7 but for a 2 gtlt 71 table we have a different formula for nding the expected values for each cell and a different formula for the degrees of freedom sum of row 239 gtlt sum of column j Expected count in cell z j t b1 t t 1 a e o a degrees of freedom of rows l of columns l 7 Compare the test statistic to its null distribution The X2 test statistic follows approximately a X2 distribution with 1 degrees of freedom under the null hypothesis 7 Compute a p value R code to compute the p value l pchisqX27l Of ce hours M 100 200and T 930 1030 1276 MSC httpwwwstatwisceduqitang Qi Tang 1276 MSC qitang statwiscedu 0 Con dence Intervals for pl 7 p2 The adjusted proportions are 15 2 for 239 1 2 The formula for the 95 con dence interval for a difference in population means pl 7102 then becomes 1511 171 1721 172 i i 196 101 102 m 2 m 2 0 Example 1 1059 amp 1060 Page 441 For women who are pregnant with twins complete bed rest in late pregnancy is com monly prescribed in order to reduce the risk of premature delivery To test the value of this practice 212 women with twin pregnancies were randomly allocated to a bed rest group or a control group The accompanying table shows the incidence of preterm 2 Examples delivery Bed Rest Controls No of preterm deliveries 32 20 No of women 105 107 Let p1 and p2 represent the probabilities of preterm delivery in the row conditions Construct a 95 con dence interval for pl 7102 Does the con dence interval suggest that bed rest is bene cial The number of infants with low birthweight 2500g or less born to the women are shown in the table Bed Rest Control No of low birthweight babies 76 92 Total No of babies 210 214 Let p1 and p2 represent the probabilities of a low birthweight baby in the two conditions Explain why the above information is not suf cient to construct a con dence interval for 01 7 192 Example 2 1082 on Page 457 A group of mountain climbers participated in a trial to investigate the usefulness of the drug acetazolamide in preventing altitude sickness The climbers were randomly assigned to receive either drug or placebo during an ascent of Mt Rainier The exper iment was supposed to be double blind but the question arose when whether some of the climbers might have received clues as to which treatment they were receiving To investigate this probability the climbers were asked to guess which treatment they had received The results can be cast in the following frequency table for which x 507 Treatment Rec ieved Drug Placebo Guess Correct 20 12 Incorrect 11 21 Of ce hours M 100 200and T 930 1030 1276 M823 httpwwwstatwisceduqitang Qi Tang 1276 MSC qitang statwiscedu Alternatively the same results can be rearranged in the following contingency table for which x 001 Treatment Rec ieved Drug Placebo Guess Drug 2O 21 Placebo 11 12 Consider the null hypothesis H0 The blinding was perfect Carry out the chi square test of H0 against the alternative that the climbers did receive clues Let Oz 005 Example 3 1096 on Page 461 In a study of the effects of smoking cigarettes during pregnancy researchers examined the placenta from each of 58 women after childbirth They noted the presence or absenceP or A of a particular placental abnormality atrophied villi In addition each woman was categorized as a nonsmoker N moderate smoker M or heavy smoker The following table shows for each woman an ID number and the results for smoking S and atrophied villi S V S V S V S V 1 N A 16 H P 31 M A 46 M A 2 M A 17 H P 32 M A 47 H P 3 N A 18 N A 33 N A 48 H P 4 M A 19 M P 34 N A 49 H A 5 M A 20 N P 35 N A 50 N P 6 M P 21 M A 36 H P 51 N A 7 H P 22 H A 37 N A 52 M P 8 N A 23 M P 38 H P 53 M A 9 N A 24 N A 39 H P 54 H P 10 M P 25 N P 40 N A 55 H A 11 N A 26 N A 41 M A 56 M P 12 N P 27 N A 42 N A 57 H P 13 H P 28 M P 43 H A 58 H P 14 M A 29 N A 44 M A 15 M P 30 N A 45 M P A Test for a relationship between smoking status and atrophied villi Use a chi square test at 04 005 B Prepare a table that shows the total number of women in each smoking category and the number and percentage in each category who had atrophied villi C What patten appears in the table of part B that is not used by the test of part A Of ce hours M 100 200and T 930 1030 1276 MSE httpwwwstatwisceduqitang

