Introduction to Biostatistics

by: Mrs. Triston Collier

Introduction to Biostatistics STAT 541

University of Wisconsin - Madison
About this Document

Date Created: 09/17/15


Date Created: 09/17/15
Linear Regression Review Recall the study to relate birthweight to the estriol level of pregnant women Blrlhwelghl 9 I100 6810121416182022242628 Esmol mg 24 hr Linear Regression 7 Notation We have been using capital X or Y to denote a random variable and z or y to denote the values that the respective random variables could assume ln linear regression7 we assume that values of X are xed not random The notation above is consistent with this idea Many books follow this notation At the same time7 many books including your text do not We will follow the notation in your book The fu l linear regression model takes the following form ya ze where e is a normally distributed random variable with mean 0 and variance 02 Iggy denote data values There are 31 Iggy pairs shown in the scatterplot Linear Regression 7 Summary of Assumptions 0 Values of X are xed at o The outcomes of Y are normally distributed independent random variables with mean uy and variance 02y o Uzy is the same for all z This assumption of constant variability across all x values is known as homoscedasticity Uzy xZUz o The relationship between uy and z is described by the straight line 111le1 a 590 4 How do we get estimates of 5 and a 7 We would like the line to be as close to the data as possible The line is given in general by a x For a xed I the corresponding point on the line would be de ned as a m Consider measuring the distance from the data point y to the line 11170475961 5 The distance from the data point y to the value of the line for a given I is yr 7 a 7 x We could sum up all of the differences As always it s a good idea to square the differences so that they don t cancel out S denotes the sum of squared differences distances between data points and the line 7 2 S SIM17075961 t The least squares line or estimated regression line is the line that minimizes S Finding the least squares line means nding the values of a and 5 that minimize S 7 Once estimates 61 and 5 of a and 5 have been computed the predicted value of y given I is obtained from the estimated regression line 131 6t 5 where Q is the prediction of the true value of y for observation 1 z 1 n ln our example 72 31 and 21111 7 56M 7 17 0608 311 It le dy7 i2152 The estimated regression line is 17 2152 060890 With a little calculus you can show that these values are When we estimate a and 5 based on data 114 pairs the estimates 61 and E are called estimated regression coefficients or just regression coefficients 3 The estimated regression line 1 2152 06081 is shown along with the data below Blrlhwelghl 9 I100 i 6810121416182022242628 Esmol mg 24 hr Linear regression inference We would like to be able to use the leastesquares regression line ga z to make inference about the population regression line mez We begin by noting that 61 and 5 are point estimates of the population intercept and slope respectively As always if we obtain different data values the estimates will change Suppose we are interested in 5 and want to infer something about 5 using the information in ln particular we want to know if there is evidence to reject the null Ho O 11 We also talked about the correlation coef cient Note that R2 denotes a quantity called the coef cient of determination and R2 T2 The statistic T has the following properties a 71 g T g 1 oT1iffy6 6f zforsome63gtO oT71iffy6 6 zforsome ltO o T remains the same under locationrscale transforms of z and y o T measures the extent of linear association 0 T tends to be close to Zero ifthere is no linear association between I and y 10 If the linear regression model holds and we were to select repeated samples of size n from the underlying population of paired outcomes 1 y and calculate a least squares line for each set of observations the estimated values of a and 5 would vary from sample to sample To test any hypotheses regarding the underlying population regression coef cients we need to know or estimate the standard error of the estimates we showed formulas for these standard errors last time We conducted a test of Ho 0 against HA 7 0 last time 12 Another simple linear regression example Consider a study that investigates the relationship between head circumference and gestational age for 100 infants Using techniques from linear regression a regression line was t to the data The line is shown below Head clrcumference cm 6 23 1 GEETEUDHEl age Weeks For that data 61 39143 5 07801 and so 1 39143 078011 ln the context of this examp e we know that the prediction of the mean head circumference at age 29 weeks is given by 1 Question How accurate is this prediction 7 Answer It depends on whether we are making predictions for the mean value of all infants that are 29 gestational weeks or one speci c infant at age 29 weeks The rst answer might be useful to a researcher interested in the relationship between head circumference and age at 29 weeks over a large population of infants the second answer might be useful to an MD interested in assessing the head circumference of a particular infant 15 Using these formulas we could show that for gestational age xed at 29 weeks a 95 con dence interval for the mean value of y is 26232685 Similarly we could show that for gestational age xed at 29 weeks a 95 prediction interval for an individual value of y is 23382970 Using this interval we can identify individuals that look unusual 14 We would like to construct a con dence interval around the true mean for a xed value of z MW and a con dence interva most commonly called a prediction interval for a future value new value of y again for a xed I A 10017 0 con dence interval for the mean value of y for a xed 1 MW is a 7 aim2m waymm where 1 is the predicted mean of the normally distributed outcomes The standard error of 1 is given by A 1001 7 0 con dence interval for a predicted future value of y denoted by 1 for a given I is a 7 way2540 a tum ew The standard error of g is always bigger than the standard error of 1 since an extra source of variability is being considered Nenana lce Classice 7 Alaska s coolest lottery See http Z Wwwnenanaakiceclassic com 1111111117111111111 x 1916 1929 1940 1952 1964 1976 1999 2000 Vear Tlme 1916 1929 1940 1952 1964 1976 1999 2000 Year 415 20 0510 20 10 U 10 Magnum


