### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Intro Biostatistics PUBHLTH 540

UMass

GPA 3.88

### View Full Document

## 19

## 0

## Popular in Course

## Popular in Public Health

This 26 page Class Notes was uploaded by Agustin Bechtelar on Friday October 30, 2015. The Class Notes belongs to PUBHLTH 540 at University of Massachusetts taught by Staff in Fall. Since its upload, it has received 19 views. For similar materials see /class/232293/pubhlth-540-university-of-massachusetts in Public Health at University of Massachusetts.

## Similar to PUBHLTH 540 at UMass

## Popular in Public Health

## Reviews for Intro Biostatistics

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/30/15

Puleth 540 Introductory Biostatistics Page 1 of 1 Unit 6 Estimation Week 9 Practice Problems Due Monday November 24 2008 The results of IQ tests are known to be normally distributed Suppose that in 2007 the distribution of IQ test scores for persons aged 1835 years has a variance 62 225 A random sample of 9 persons take the IQ test The sample mean score is 115 1 Before you begin This exercise gives you practice calculating a con dence interval for the mean of a Normal distribution in the setting where the variance parameter is known See lecture notes pp 1 721 Calculate the 50 75 90 and 95 con dence interval estimates of the unknown population mean IQ score 2 Before you begin This exercise is asking you to think about and compare two aspects of the concept of a con dence interval 1 its width and 2 the level of con dence that we attach to the interval we are reporting Hint precision v con dence What tradeoffs are involved in reporting one interval estimate over another 3 Before you begin This exercise is reminder of some of the ideas of sampling distributions It is asking you to compute a probability for a sample mean It s actually drawing upon material presented in Unit 5 The Normal distribution Thus see lecture notes for Unit 5 speci cally page 22 If it is known that the population mean IQ score is u 105 what proportion of samples of size 6 will result in sample mean values in the interval 135150 docuwk9Jaracticedoc Puleth 540 Introductory Biostatistics Page 1 of 2 Unit 7 Hypothesis Testing Week 11 Practice Problems This Assignment is optional read You all get full credit no matter what 1 Betore you begin This exercise is a two sample ndependent groups t test of equality of means where the variances are unknown but assumed equal See lecture notes for Unit 7Hypothesis Testing pp 4043 See text pp 304308 An independent testing agency was hired prior to the November 2004 election to study whether or not the work output is different for construction workers employed by the state and receiving prevailing wages versus construction workers in the private sector who are paid rates determined by the free market A sample of 100 private sector workers reveals an average output of 743 parts per hour with a sample standard deviation of 16 parts per hour A sample of 100 state workers reveals an average output of 697 parts per hour with a standard deviation of 18 parts per hour You may assume equality ofvariances in your work a Is there evidence of a difference in productivity at the 010 level of signi cance b Is there evidence of a difference in productivity at the 005 level of significance c What is the achieved level of significance 2 Betore you begin This exercise is a continuation of the setting in Exercise 1 For the data in Exercise 1 what level of significance is achieved by the data if the sample means and sample standard deviations are unchanged but the within group sample sizes are a both equal to 10 b both equal to 200 c Comment on the role of sample size in the probability of a type I error ZWk117practicedoc Puleth 540 Introductory Biostatistics Page 2 of 2 3 Betore you begin This exercise is a paired data setting ttest of equality of means See lecture notes for Unit 7Hyp0thesis Testing pp 3639 See text pp 298302 Halcion is a sleeping pill that is relatively rapidly metabolized by the body and therefore having fewer hangover effects the next morning compared to other sleeping pills Opponents of Halcion argue that because this agent is so rapidly metabolized by the body patients do not sleep as long with this drug as with Dalmane Data on 10 insomniacs each of whom took Dalmane on one occasion and Halcion on a second is collected The variable measured is number of hours of sleep Number of Hours Sleep with Patient Dalmane Halcion 1 458 397 2 519 488 3 394 409 4 632 587 5 768 693 6 348 400 7 572 508 8 704 695 9 527 496 10 584 513 Do these data suggest that Halcion is not as effective as Dalmane with respect to number of hours of sleep Carry out an appropriate statistical test and interpret your ndings 4 Betore you begin This exercise is a con dence interval for the mean of the di erence in the paired data setting See lecture notesfar Unit 6Estimati0n pp 4043 See textpp 303 For the Halcion versus Dalmane data in Exercise 3 construct a 99 con dence interval estimate of discrepancy in the efflcacies of the two drugs Compare this to the acceptance region that would have been obtained had you constructed a statistical test with type I error prespecifled at 001 ZWk117practicedoc Puleth 540 Introductory Biostatistics Page 1 of 3 Unit 1 Summarizing Data Week 2 Practice Problems Due Monday September 22 2008 l Betore you begin This exercise gives you practice with the calculation and interpretation ofsome numerical summaries See lecture notespp 3647 andor the textpp 825 ofyour text Part c gives you some additional practice working with and understanding a box and whisker plot For this see lecture notes pp 25 andor the text pp 3033 a The following are behavioral ratings as measured by the Zang Anxiety Scale ZAS for 26 persons with a diagnosis of panic disorder 53 51 46 45 40 35 59 51 45 60 35 45 38 53 43 31 36 40 41 41 38 69 41 46 38 36 Compute the mean median mode range variance and standard deviation and the 25th and 75th percentiles b The following are behavioral ratings as measured by the Zang Anxiety Scale ZAS for 21 healthy controls 26 26 25 25 25 28 26 26 25 34 30 31 28 26 34 25 25 25 28 25 25 Compute the mean median mode range variance and standard deviation and the 25th and 75th percentiles quotan c Construct Box and Whisker plots using the data from parts and quotbquot In one or two sentences compare the two groups docuwk2Jaractice Puleth 540 Introductory Biostatistics Page 2 of 3 2 Betore you begin This exercise provides additional practice with the frequency and relative frequency table See lecture notes p9 and text pp 2527 From there you gain practice working with grouped data and in particular the challenge of estimating the values of numerical summaries when you don t have the individual values See Lecture notes page 33 weighted mean In applying these ideas use the midpoint ofeach age interval For a bit ofextra help in completing this exercise visit l hthwww duncanwil co ukaverage4htm 39 The following table shows the age distribution of cases of a certain disease reported during a year in a particular state Age Number of Cases 514 5 1524 10 2534 20 3544 22 4554 13 5564 5 TOTAL 75 2a Construct a frequency table with columns for class endpoints class midpoint frequency relative frequency cumulative frequency and cumulative relative frequency 2b Construct a cumulative relative frequency plot of the data Use this plot to estimate the 10th 25th 50th and 75th percentiles 2c Compute the mean median variance and standard deviation docuWkZJaractice Puleth 540 Introductory Biostatistics Page 3 of 3 3 Betore vou begin This exercise makes sure that you understand the speci cs afa box and whisker plat apart from clicking on a link For review visit hnv39statl rek camAPStalislics 7 quot2 quot7 t n W For women undergoing in vitro fertilization various therapies are used to stimulate the ovaries In one study comparing the effectiveness of a new hormone therapy on three groups of women with different types of fertility problems an outcome of interest is the number of oocytes that 39ripened Some summary statistics on the number of ripened oocytes per woman for each of the three groups are reported below Group Statistic l 2 3 n 3 8 l9 2 1 mean 136 64 82 median 8 8 7 P25 5 4 5 P75 1 l l 1 12 minimum 5 l 4 maximum 40 l3 l4 3a Compute box and whisker plots for the three groups 3b In your opinion which statistics are best for comparing these three groups Why d0cuwk2Jaractice Puleth 540 Introductory Biostatistics Page 1 of 3 Unit 2 Introduction to Probability Week 4 Practice Problems SOLUTIONS corrected 0142008 1 In introductory epidemiology one of the study designs that are introduced is the prospective cohort study In this type of study involving two groups the investigator enrolls preset and known numbers of participants into each of the two groups that are generically described as exposed and not exposed and follows them forward to a designated end of the observation period at which point some outcome is measured Consider the following prospective cohort study A total of 1500 never smoker consenting heart attack survivors aged 6065 are enrolled as nonexposed An equal number 1500 current smoker heart attack survivors aged 6065 are enrolled as exposed All are followed for a full 10 years and the occurrence of death recorded Following are the data Vital Status at 10 Years Dead Alive Exposure Current Smoker I 40 I 1460 I 1500 Status Never Smoker 10 1490 1500 50 2950 3000 a Is it possible to estimate the probability of 10 year survival on the basis of these data Answer Yes but the question is a poor one as it does not specify time zero nor among whom 2950300009833 b Is it possible to estimate the relative risk of 10 year mortality that is associated with current cigarette use Answer Yes but this question too is poor Without a meaningful time zero the interpretation of the answer is non existent 4015001015004 c Is it possible to estimate the probability that a randomly selected person with Vital status of Alive at 10 years is a current smoker Answer Yes Even though the study design called for xed numbers of enrollments of current smokers and never smokers it is possible to estimate the conditional probability of current smoker for a randomly selected person from among the vital status at 10 years of Alive d Using these data estimate the relative risk of 10 year mortality that is associated with current cigarette use Answer 4015001015004 e Using these data estimate the relative odds of 10 year mortality that is associated with current cigarette use Note 7 This question is asking you to compute an odds ratio Answer The event of interest is 10 year mortality For this event OR Odds among current smokersOdds among never smokers 4014601014904082 docu wk4gsolutionsdoc Puleth 540 Introductory Biostatistics Page 2 of 3 f Using these data estimate the relative odds of a current smoker notation for non survivors relative to survivors Answer Here the event of interest is current smoker For this event OR Odds among n0n surviv0rsOdds among survivors 4010146014904082 g What do you notice about your answers to e and f Answer They are the same This con rms what is noted in the lecture notes that the OR for the event of disease in a cohort study comparison of exposure groups is equal to the OR for the event of history of exposure in a case control comparison of diseasenon disease groups h How do your answers to e and f compare to your answer to d Answer The OR is bigger than the RR for rare diseases the discrepancy becomes negligible 2 Another study design that is 39 J J in 39 J y I 39J 39 39 is the retrospective case control study This is by definition study that compares two groups Here the investigator enrolls preset and known numbers of participants into each of the two groups de ned by disease status cases are the enrollees with disease controls do not have the disease under investigation Retrospective review of the histories of all study participants is performed to identify the subsets in each of the case and control groups who have a history of the exposure of interest Consider the following retrospective casecontrol study of the association between coffee consumption and tumors of the lower urinary tract The investigator enrolls 30 consenting cases that are patients with one or more tumors of the lower urinary tract For comparison purposes heshe also enrolls 100 consenting controls who have no such tumors Following are the data Tumors of Lower Urinary Tract Yes No History of 5 cupsday I 20 I 44 I 64 Coffee consumption ltl cupday I 10 I 56 I 66 30 100 130 a Is it possible to estimate the probability of one or more tumors of the lower urinary tract on the basis of these data Answer No because the case control study design calls for xed numbers of enrollment into the tumorsyes and tumorsn0 groups b Is it possible to estimate the relative risk of one or more tumors of the lower urinary tract that is associated with consumption of 5 or more cups of coffeeday Answer N 0 for a reason that relates to the answer to a It is not possible to estimate the component probabilities docu wk4gsolutionsdoc Puleth 540 Introductory Biostatistics Page 3 of 3 c Using these data estimate the relative odds of high coffee consumption 5 cupsday among cases relative to controls Answer For the event high coffee consumption OR Odds among tumoryes Odds among tumorsno 201044562545 d Using these data estimate the relative odds of tumors of the lower urinary tract among high coffee consumers 5 cups day relative to noncoffee drinkers Answer For the event tumorsyes OR Odds among high coffee Odds among non coffee 204410562545 e What do you notice about your answers to c and d Answer They are the same 3 Now consider a fully cross sectional study design this time with generic counts a b c and d In this design the investigator does not do any formal enrollment Counts are accumulated by observation CDC surveillance programs are examples Disease Yes No History of Yes I a I b I Exposure No I c I d I a Using the letters a b c and d what is the formula for estimating relative odds of the event of exposure for persons with disease compared to that for persons without disease Answer ac bd adbc b Using the letters a b c and d what is the formula for estimating relative odds of the event of disease for exposed persons compared to that for nonexposed persons Answer ab cd adbc c Using the letters a b c and d what is the formula for estimating relative risk of the event of disease for exposed persons compared to that for nonexposed persons Answer aab ccd d What happens to your formula in your answer to 3c when the counts of disease a and c are very very small Comment Answer As a gets smaller and smaller ab 9 b As c gets smaller and smaller cld 9 d As a result RR aab ccd 9 ab cd adbc OR So it is sometimes possible to estimate RR from a case control study because OR eventdisease OR eventexposure When the disease is rare OR even jisease RR eventdisease When the disease is rare OR evenexpusure RR evendisease docu Wk4gsolutionsdoc Page 1 of 10 Puleth 540 Introductory Biostatistics Unit 2 Summarizing Data Week 2 Practice Problems Solutions 1 A stem and leaf diagram might come in handy Stems are shaded leaves are not 3 68851865 3 15566888 4 50165165310 gt 4 001113555 5 39113 5 1 1 3 3 9 6 90 6 0 9 1 26 MEAN C Z X I n i1 1 1156 24446 so x 445 n n 1 26 1 MEDIAN F1rstsolve 2 j 135 MODE RANGE Median is midpoint of 13 11 and 143911 observation 7 lt4143 This sample is 11139 modal Maximum Minimum 269 31 WkZisolutionsdoc so 1742 384145 so range 38 Puleth 540 Introductory Biostatistics Page 2 of 10 VARIANCE Let s save ourselves the trouble of a very long brute force formula by using the formula for grouped data Let j index the unique values There are 14 unique values x1 if fjxj if J Xi f1 1 31 1 18225 18225 2 35 2 9025 18050 3 36 2 7225 14450 4 38 3 4225 12675 5 40 2 2025 4050 6 41 3 1225 3675 7 43 1 225 225 8 45 3 025 075 9 46 2 225 450 10 51 2 4225 8450 11 53 2 7225 14450 12 59 1 21025 21025 13 60 1 24025 24025 14 69 1 60025 60025 TOTALS 26 199850 14 f 2 Z x 7 J J 199850 5211 2 S0 S27994 gfjj1 25 F1 Standard deviation S xE So S 894 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 3 of 10 25th Percentile First solve 25 n 25 26 65 So 25th percentile is the 7th observation P25 38 75th Percentile First solve 75 n 75 26 195 So 75th percentile is the 20 observation P75 51 1B 2 5 5 5 5 5 5 5 5 5 2 6 6 6 6 6 2 8 8 8 3 0 l 3 4 4 l 21 l AJEAN 7c 1X1 i5682704 So 17270 n I n 1 21 1 AEDIAN Solvmg 11 2 2 Median is the 11th observation So i 26 MODE mode 25 RANGE Maximum Minimum 34 25 So Range 9 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 4 of 10 Variance There are 6 unique values 2 2 XI fI x x fx x 1 25 9 4 36 2 26 5 1 5 3 28 3 1 3 4 30 1 9 9 5 3 1 1 16 16 6 34 2 49 98 TOTALS 21 49 98 6 2 2 Elixi x 167 2 S 6 So S 835 2 f 1 20 11 J Star ard deviation S JS2 J835 So S 2289 25th Percentile Solving 25 n 25 21 525 So 25th percentile is 6th observation P25 25 Note I get this by noticing from the table above that the smallest value 25 occurs with a frequency of 9 times in the sample 75th Percentile Solving 75 n 75 21 1575 So 75th percentile is 16th observation P75 28 Note 7 Iget this by noticing in the table that the value 28 occurs with afrequency of3 times in the sample and comes after the first 9 observations all equal to 25 and after the next 5 observations all equal to 26 so that the value of 28 is the 15M 16m and 1 7m observations in the ordered sample WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 5 of 10 1C REMINDER Use the same scale when comparing two groups Group Patients Controls Mean 44 5 270 Median 42 26 38 25 P25 P75 5 l 28 Interquartile Range IQR l3 3 P25l5IQR 185 205 P75l5IQR 705 325 Min 3 l 25 MaX 69 34 Whisker Notes on Whiskers 1 IF P25 15 IQR lt minimum of the actual data so use minimum of actual data instead 2 IF P75 15 IQR gt data instead Exercise 1 C Box and Whisker Plot ZAS Score HEALTHY PANIC Hea1hyn21 Panic Disordern26 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 6 of 10 2A Class Class Relative Cumulative Cumulative Endpoints Midpoint Frequency Frequency Frequency Relative Freq 51499 10 5 067 5 067 152499 20 10 133 15 200 253499 30 20 267 35 467 354499 40 22 293 57 760 455499 50 13 173 70 933 556499 60 5 067 75 1000 TOTALS 1000 2B A cumulative relative frequency polygon for grouped data is unfortunately not straightforward in SAS or Stata or SPSS or minitab Solution using Excel Step 1 Enter your X and y points into your worksheet such that X Endpoint of class interval y Cumulative relative frequency for the interval note 7 Be sure to include an Xy 00 xage Ecumulative relative frequency 0 0 15 0067 25 02 35 0467 45 076 55 0933 65 1 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 7 of 10 Step 2 Use the chart wizard in excel as follows 0 U 15 0057 Highlight the data you 25 U2 want to plot 35 0 15 115 05 55 0933 55 1 Click on the chart a Wizard from the upper toolbar Chan Wizard 7 Step 1 of 4 7 Chan Type Standard Types I Custom Types 1 ghart type Chart subLypa M Column E Bar Under Chart Type Select XY Scatter I 35 Doughnut Radar g Surface Under Chart subtype the Bette m ikh data points connected by 39 smoot 9 Lines plot with the dots connected Press and Hold to law Sample Cantel u Einish Click Next Chan Wizard 7 Slap 2 M 4 Chan Snurce om You should see the following Click Next mm 5m 1 gate verge Seuss m 7 gm a slums Cancel ank l Brush WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 8 of 10 You Will then see a Chart Wizard A Step 3 of menu that lets you add legends and titles etc Axes I Gridlines Legend 1 Data Labels Chart thle 39 And if you like you Vaue x axis can change such things a shading tick marks Ealue Y axis smm e c Cancel l lt ack I ext I Einish After some aesthetics on my part this is what I got Week 2 Problem 23 5 Cumulative Relative Frequency A m DDS Estimates are P10 17 P50 36 P25 26 P75 445 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 9 of 10 2C Midpoint Frequency x J C 2 Xi fi Xifi j fx x 10 5 50 257 330245 20 10 200 157 246490 30 20 600 57 64980 40 22 880 43 40678 50 13 650 143 265837 60 5 300 243 295245 Total 75 2680 1243475 5 Z x J J 2680 MN 7 J 16 W So c 357 Elfj MEDIAN Note to reader 7 I ve consulted a number of texts on this There is no single correct answer With interval data whatever median you calculate is an approximation Here is what is J in Think and Explain with Statistics Lincoln E Moses page 64 38th observation First solve quotTH Examination of the table reveals that the 38th observation is in the interval 35 to 4499 Set the following quantities The letter 1 lower limit of interval 35 The letter u upper limit of interval 4499 R cumulative frequency up to the lower limit of interval 35 M observations contained in interval 22 N total observations 75 An approximate solution for the median is calculated as N2 R 752 35 1 Tu l 35 T4499 35 36135 or37 WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 10 of 10 VARIANCE 6 2 2 Elfxx 6 1243475 S S2 16804 E 1 74 so 11 J Standard deviation S xS 2 so S 130 3A Remember to use the same scale l 2 3 median 8 8 7 mean 136 64 82 left box edge P25 5 4 5 right box edge P75 11 ll 12 IQR P75P25 6 7 7 P2539l5IQR 4 65 55 P75l5IQR 20 215 225 left whisker 5 l 4 right whisker 20 l3 l4 3B When data are skewed by extreme values medians and quartiles give a better feel for the bulk of the data than do means and standard deviations This example also illustrates that as sample size increases the range can only increase Notice that the extreme value of 40 occurred in the sample with the largest sample size WkZisolutionsdoc Puleth 540 Introductory Biostatistics Page 1 of 3 Unit 6 Estimation Week 10 Practice Problems Due Monday December 1 2008 l Betore you begin This exercise is a straightforward con dence interval calculation for a binomial E proportion See lecture notes pp 6064 andor textpp 205206 An entomologist samples a field for egg masses ofa harmful insect by placing a yard square frame at random locations and carefully examining the ground within the frame A simple random sample of 75 locations selected from a county s pasture land found egg masses in 13 locations Compute a 95 confidence interval estimate of all possible locations that are infested Betore vou begin This exercise is NOT a mimicking of the lecture notes It is asking you to start your thinking from the WIDTH of a con dence interval and then reason your solution from there Have a look at the lecture notes page 21 See also textpp 187194 Alzheimers disease has a poorer prognosis when it is diagnosed at a relatively young age Suppose we want to estimate the age at which the disease was first diagnosed using a 90 con dence interval Under the assumption that the distribution of age at diagnosis is normal if the population variance is 6285 how large a sample size is required if we want a con dence interval that is 10 years wide Betore you begin In this exercise you get practice in combining the ideas of estimation in unit 6 and the z score methods thatyou learned in unit 5 In unit 6 see page 46 In unit 5 see page I The National Health and Nutrition Examination Survey of 19751980 give the following data on serum cholesterol levels in US males Age Population Mean Population Standard Deviation Group years 11 6 1 2024 180 43 2 2534 199 49 Suppose the distribution of serum cholesterol is normal in each age group If you draw simple random samples of size 50 from each of the two groups what is the probability that the difference between the two sample means Group 2 mean 7 Group 1 mean will be more than 25 Source National Center or Health Statistics R Fulwood W Kalsbeek B Rifkind et al Total serum cholesterol levels ofadults 20 74 years ofage United States 19761980 Vital and Health Statistics Series II No 236 DHHS Pub No PHD 861686 Public Health Service Washington DC US Government Printing O ice May 1986 Citedin Daniel p 140 5 41 Copyright 1999 by John Wiley amp Sons Inc By permission ofJohn Wiley Wkl OJaracticedoc Puleth 540 Introductory Biostatistics Page 2 of 3 4 Betore you begin The solution to this exercise uses the same approach that is described on page 37 of the UI unit 6 lecture notes Also the example on page 38 Pages in the text are pp 199201 The objectives of a study by Kennedy and Bhambhani 1991 were to use physiological measurements to determine the testretest reliability of the Baltimore Therapeutic Equipment Work Simulator during three simulated tasks performed at light medium and heavy work intensities and to examine the criterion validity of these tasks by comparing them to real tasks performed in a controlled laboratory setting Subjects were 30 healthy men between the ages of 18 and 35 The investigators reported a standard deviation of s057 for the variable peak oxygen consumption lmin during one of the procedures Assuming normality compute a 95 con dence interval for the population variance for the oxygen consumption variable Betore you begin This exercise draws from the material presented on pp 4044 of the unit 6 notes Because two measurements are made on each patient the data in this exercise are paired The purpose of an investigation by Alahuhta et al 1991 was to evaluate the in uence of extradural block for elective caesarian section simultaneously on several maternal and fetal hemodynamic variables and to determine if the block modi ed fetal myocardial function The study subjects were eight healthy parturient in gestational weeks 3842 with uncomplicated singleton pregnancies undergoing elective caesarian section under extradural anesthesia Among the measurements taken were maternal diastolic arterial pressure during two stages of the study The following are the lowest values of this variable at the two stages Compute a 95 con dence interval for the difference in diastolic blood pressure between the two stages IPatientID 1 2 3 4 5 I6 7 I8 Stage 1 70 87 72 70 73 66 63 57 Stage 2 79 87 73 77 80 64 64 60 Source Alahuhta S Rasanen J Jouppila R Jouppila P and Kangas Saarela T and Hoomen AI 1991 Uteroplacental and fetal hemoydynamics during extradural anesthesia for caesarian section British Journal ofAnesthesia 66 319323 Cited in Daniel p 248 743 Copyright 1999 by John Wiley amp Sons Inc By permission ofJohn Wiley Wkl Ojracticedoc Puleth 540 Introductory Biostatistics Page 3 of 3 6 Betore you begin The ideas and calculations pertaining to con dence intervals for the ratio of two variances is described in the lecture notes for unit 6 pp 5 759 SUGGESTION Use an F distribution calculator on the web This will spare you from having to use the formula on the bottom of page 55 A possible environmental determinant of lung function in children is the amount of cigarette smoking in the home To study this question two groups of children were studied Group 1 consisted of 23 nonsmoking children aged 59 both of whose parents smoke in the home Group 2 consisted f 20 nonsmoking children aged 59 neither of whose parents smoke The mean SD of FEBl for group 1 is 21 L 07 and for the Group 2 children the mean SD of FEVl is 23 L 04 Under the assumption of normality construct a 95 con dence interval for the ration of the variance of the two groups What is your conclusion regarding the reasonableness of the assumption of equality of variances 7 Betore you begin See lecture notes for unit 6 pp 4552 See also text pp 308310 For the same data in problem 6 and drawing upon your answer to 6 regarding the reasonableness of equality of variances compute a 95 con dence interval for the true mean difference in FEVl between 59 year old children whose parents smoke and comparable children whose parents do not smoke Wkl Ojracticedoc Puleth 540 Introductory Biostatistics Page 1 of 3 Unit 2 Introduction to Probability Week 4 Practice Problems Due Monday October 13 2008 These exercises as you will see emphasize introductory probability as it applies to epidemiology Take your time with these They are a little more involved conceptually But hopefully well worth it 1 Betore you begin Parts a 7 c invite you to pause in your thinking Parts d 7 are straightforward application ofthe ideas ofrelative risk and odds ratio See lecture notes pp 3947 andor the textpp 63064 In introductory epidemiology one of the study designs that are introduced is the prospective cohort study In this type of study involving two groups the investigator enrolls preset and known numbers of participants into each of the two groups that are generically described as exposed and not exposed and follows them forward to a designated end of the observation period at which point some outcome is measured Consider the following prospective cohort study A total of 1500 never smoker consenting heart attack survivors aged 6065 are enrolled as nonexposed An equal number 1500 current smoker heart attack survivors aged 6065 are enrolled as exposed All are followed for a full 10 years and the occurrence of death recorded Following are the data Vital Status at 10 Years Dead Alive Exposure Current Smoker I 40 I 1460 I 1500 Status Never Smoker 10 1490 1500 50 2950 3000 a Is it possible to estimate the probability of 10 year survival on the basis of these data b Is it possible to estimate the relative risk of 10 year mortality that is associated with current cigarette use c Is it possible to estimate the probability that a randomly selected person with a vital status of Alive at 10 years is a current smoker d Using these data estimate the relative risk of 10 year mortality that is associated with current cigarette use e Using these data estimate the relative odds of 10 year mortality that is associated with current cigarette use Note 7 This question is asking you to compute an odds ratio docuwk4Jaracticedoc Puleth 540 Introductory Biostatistics Page 2 of 3 f Using these data estimate the relative odds of a current smoker notation for non survivors relative to survivors g What do you notice about your answers to e and f h How do your answers to e and f compare to your answer to d 2 Betore you begin This question is really an elaboration of the thinking that was developed in question 1 Another study design that is introduced in introductory epidemiology is the retrospective case control study This is by definition a study that compares two groups Here the investigator enrolls preset and known numbers of participants into each of the two groups de ned by disease status cases are the enrollees with disease controls do not have the disease under investigation Retrospective review of the histories of all study participants is performed to identify the subsets in each of the case and control groups who have a history of the exposure of interest Consider the following retrospective casecontrol study of the association between coffee consumption and tumors of the lower urinary tract The investigator enrolls 30 consenting cases that are patients with one or more tumors of the lower urinary tract For comparison purposes heshe also enrolls 100 consenting controls who have no such tumors Following are the data Tumors of Lower Urinary Tract Yes No History of 5 cupsday I 20 I 44 I 64 Coffee consumption ltl cupday I 10 I 56 I 66 30 100 130 a Is it possible to estimate the probability of one or more tumors of the lower urinary tract on the basis of these data b Is it possible to estimate the relative risk of one or more tumors of the lower urinary tract that is associated with consumption of 5 or more cups of coffeeday c Using these data estimate the relative odds of high coffee consumption 5 cupsday among cases relative to controls d Using these data estimate the relative odds of tumors of the lower urinary tract among high coffee consumers 5 cups day relative to noncoffee drinkers e What do you notice about your answers to c and d docuwk4Jaracticedoc Pu leth 540 3 Introductory Biostatistics Page 3 of 3 Betore you begin This last question is a bit of a weaving of a story Ifyou follow along step by step you will end up seeingforyourselfa truly marvelous result It is the result ofpage 46 ofyour lecture notes Now consider a fully cross sectional study design this time with generic counts a b c and d In this design the investigator does not do any formal enrollment Counts are accumulated by observation CDC surveillance programs are examples Disease Yes No History of Yes I a I b I Exposure No I c I d I a Using the letters a b c and d what is the formula for estimating relative odds of the event of exposure for persons with disease compared to that for persons without disease b Using the letters a b c and d what is the formula for estimating relative odds of the event of disease for exposed persons compared to that for nonexposed persons c Using the letters a b c and d what is the formula for estimating relative risk of the event of disease for exposed persons compared to that for nonexposed persons d What happens to your formula in your answer to 3b when the counts of disease a and b are very very small Comment d0cuwk4Jaracticedoc Puleth 540 Introductory Biostatistics Page 1 of 1 Unit 4 Bernoulli and Binomial Distributions Week 6 Practice Problems Revised Due Date Monday November 3 2008 l Betore you begin This exercise gives you practice in calculating number of ways to choose See notes pp 1113 an dor text pp 9094 Need more Here are a few resources on the web too httDZWWW bndnr quot I Table Andf nmbinamri bttn39 matbfnrum org dr r r cnmb nerm btml Suppose that my 2008 BE540 class that meets in class in Worcester MA has 10 students a I wish to pair up students to work on homework together How many pairs of 2 students could I form b Next Iwish to form project groups of size 5 How many groups of 5 students could I form 2 Betore you begin This exercise is a straightforward application of a binomial probability calculation See notes pp 1418 an dor text pp 9495 Just in case here are two nice resources for the binomial httn39 tattrek coml P nh2Rihnmial a n VTutoriaFStat 39 nhn AP Statistics Curriculum 2007 Distrib Binomial httDZWiki tatm la A die will be rolled six times What are the chances that over all six rolls the die lands neither ace nor deuce exactly 2 times 3 This is also an application of a binomial probability calculation Suppose that in the general population there is a 2 chance that a child will be born with a genetic anomaly What is the probability that no congenital anomaly will be found among four random births 4 This is a slightly harder application of a bin omial probability calculation Suppose it is known that for a given couple there is a 25 chance that a child of theirs will have a particular recessive disease If they have three children what are the chances that at least one of them will be affected This exercise is the most involved Just try it Suppose a quiz contains 20 truefalse questions You know the correct answer to the first 10 questions You have no idea of the correct answer to questions 11 through 20 and decide to answer each using the coin toss method Calculate the probability of obtaining a total quiz score of at least 85 UI docuWk67practicedoc

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.