New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Ramona Leannon


Ramona Leannon
GPA 3.89


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Biostatistics

This 55 page Class Notes was uploaded by Ramona Leannon on Wednesday September 9, 2015. The Class Notes belongs to BIOST 515 at University of Washington taught by Staff in Fall. Since its upload, it has received 15 views. For similar materials see /class/192297/biost-515-university-of-washington in Biostatistics at University of Washington.

Similar to BIOST 515 at UW

Popular in Biostatistics




Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/09/15
Lecture 6 Multiple Linear Regression cont BIOST 515 January 22 2004 BIOST 515 Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression coefficients For example we might be interested in testing whether two regression coefficients are equal Hoi z ZBj Equivalently H0 3 62 5339 I 0 Such hypotheses can be expressed as H0 T6 0 where T is an m x p matrix of constants such that only 7 of the m equations in T6 0 are independent BIOST 515 Lecture 6 1 For example consider the model 92 I 50 902151 061252 062353 62 and testing the hypothesis H0 1 51 52 I 0 This hypothesis is equivalent to H001 1060 We may also consider the hypothesis H0351 5207530 BIOST 515 Lecture 6 which is equivalent to HyT 0 0 1 10 T00 1 where BIOST 515 Lecture 6 We can use sums of squares to test general linear hypotheses The full model is y X6 6 with residual sum of squares SSEFM y y B X y n p degrees of freedom Obtain the reduced model by solving T6 0 for r of the regression coefficients in the full model in terms ofthe remaining p 1 7 regression coefficients Substitutin these values into the full model will yield the reduced model yZZ YJrE where Z is an n x p1 r matrix and y is a p1 r gtlt1 vector of unknown regression coefficients The residual sum of BIOST 515 Lecture 6 4 squares for the reduced model is SSERM y y yZ y n p 7 degrees of freedom SSERM SSEFM is called the sum of squares due to the hypothesis T6 0 We can test this hypthesis using SSERM SSEFMr F0 N 7 n p 1 BIOST 515 Lecture 6 5 CHS smoking example Recall the example where smoking status was recoded to 1 never smoked smoke 0 otherwzse and 1 former smoke Sm0k 2 0 otherwzse and we fit the model BB 60 lsmokeu 2sm0ke2 62 Estimate Std Error tvalue Prgtt Intercept 692963 16176 4284 00000 smokel 29860 17629 169 00909 smoke2 26239 18162 144 01492 BIOST 515 Lecture 6 We may be interested in testing H0 61 62 which is equiva lent to testing H0 0 1 1 6 The full model is BPi 60 lsmokeu 628m0k62 62 andthe reduced model is BPi 60 lsmokeu 613m0k 2i 62 60 61sm0k611 smokegi 61 70 7122 62 The reduced model is equivalent to the model we fit with current smokers vs former and never smokers BIOST 515 Lecture 6 7 Full model Df Sum Sq Mean Sq Fvalue PrgtF smokel 1 10165 10165 079 03737 smoke2 1 26761 26761 209 01492 Residuals 495 6346582 12821 Reduced model Df Sum Sq Mean Sq Fvalue PrgtF smoker 1 35493 35493 277 00965 Residuals 496 6348015 12798 6348015 63465821 12821 Therefore we fail to reject the null hypothesis 011 lt 386 F0 BIOST 515 Lecture 6 We could also test this hypothesis using the t statistic 2 0 0 0 861 62 0AZC111 C22 2012 0 where 00204 00204 00204 C 00204 00242 00204 00204 00204 00257 Therefore 2986 2624 160 I 335 lt tn p 1975 12821 x 0242 0257 2 x 0204 BIOST 515 Lecture 6 9 Consider the model BPi 60 lsmokeli 628m0k 2 gagei 62 Df Sum Sq Mean Sq Fvalue PrgtF smoke1 1 10165 10165 083 03638 smoke2 1 26761 26761 218 01409 AGE 1 268739 268739 2184 00000 Residuals 494 6077842 12303 Suppose we want to test H0351ZB2753ZO BIOST 515 Lecture 6 10 which is equivalent to 01 1 0 H00 0 0 130 The reduced model is 3132 I 50 518m0k611 8m0k 2 62 70 7122 62 6348015 60778422 F 1098 F 301 0 12303 gt 249495 We reject the null hypothesis BIOST 515 Lecture 6 11 Confidence intervals in multiple linear regression 0 Confidence interval for a single coefficient 0 Confidence interval for a fitted value Simultaneous confidence intervals on multiple coefficients BIOST 515 Lecture 6 12 Confidence interval for a single coefficient We can construct a confidence interval for j as follows Given that A A A Ntn p la 865339 02ij we can define 3 1001 oz confidence interval for j as j itn p 1a2020jj BIOST 515 Lecture 6 13 Confidence interval for a fitted value We can construct a confidence interval for the fitted response for a set of predictor values 0001 x02 mop Define the vector 0 as 1 9001 900 I 9002 The fitted value at this point is go I 9665 BIOST 515 Lecture 6 14 go is an unbiased estimator of Eyx0 and the variance of go is varyb 02x6X X1x0 Therefore the 1001 00 confidence interval for the fitted response at 0001 x02 mop is go I tnp17a2 02x6X X 1300 BIOST 515 Lecture 6 15 Example from CHS In the last lecture we fit the model BPi 60 weightz l heightz g agez g gendenm 6 Let s calculate the confidence interval for the fitted value for the 100th subject who has the covariate vector 1948 1592 700 00 The fitted value for BP is 7367 and x6X X 1x0 0007972541 and the 95 confidence interval is 7367 i 196 x 1111 x 0007972541 7172 7562 BIOST 515 Lecture 6 16 Confidence interval for BP Height Weight Age a n a a u a a o a on an a a no up n n can 1 Jonawuw o 3 Mai o w aawnaoz 90 0 o o wan uanya 80 aim shaft 9 aqu iuunoonoau in was at f a 1 w w on a an 9033 o o 5m Engages as new 2 a Z eQWio swung gnawua s in an acne a n 000 meaJMwm r w xfiw 65 100 200 300 55 95 140 170 80 17 BIOST 515 Lecture 6 Simultaneous confidence intervals Sometimes we may be interested in specifying a 1 a100 confidence interval or region for the entire set or a subset of the coefficients 6 6 X X6 6 p1 WSE Therefore we can define a 1 a100 joint confidence region for all the parameters in 6 as N Fp1n p 1 B 6 X XltB 6 p1 WSE S Fp l 1n p 1 BIOST 515 Lecture 6 18 Bonferroni intervals Another general pproach for obtaining simultaneous confi dence intervals is BjiAsAeWAj j01p 1 Using the Bonferroni method we set A tnp1a2p1 leading to a Bonferroni confidence interval of 6339 3 tn p laa2p1 gj BIOST 515 Lecture 6 19 Bonferroni intervals CHS example BPi 60 weightz l heightz g agez g gendenm e The Bonferroni intervals are 6339 i 754930053A 6Aj Lower Upper Intercept 4925 13164 WEIGHT 001 008 HEIGHT 022 022 AGE 057 009 GENDER 311 478 BIOST 515 Lecture 6 2O Hidden extrapolation in multiple regression X2 X1 BIOST 515 Lecture 6 21 R2 and adjusted R2 As in simple linear regression SSE R21 SSTO In general R2 increases whenever new terms are added to the model Therefore for model comparison we may prefer to use an R2 that is adjusted for the number of predictors in the model This is the adjusted R2 and is equivalent to MSE 2 1 R SSTOn 1 adj BIOST 515 Lecture 6 22 Predictors R2 Ridj weight height age gender 00464 00386 smokel smoke2 00058 00018 weight height 00221 00181 smokel smoke2 age 0048 0042 BIOST 515 Lecture 6 23 Lecture 11 Intro to logistic regression BIOST 515 February 10 2004 BIOST 515 Lecture 11 Modeling binary data Often in medical studies we encounter outcomes that are not continous but instead fall into 1 of 2 categories For example a Disease status disease vs no disease 0 Alive or dead 0 Low birth weight 0 Improved health status BIOST 515 Lecture 11 1 In these cases we have a binary outcome 7 gt 0 with probability 1 7n yl 1 with probability 7m where 7Tz and varlyil 7r 1 7Q Usually one 1 the categories is the outcome of interest like death or disease This category is usually coded as 1 BIOST 515 Lecture 11 2 We can use linear regression to model this outcome but this can present several problems as we will see Using the linear model approach we relate the expected value of y to a predictor x as Ellii 50 51 Just looking at this relationship we can see a potential problem VVhatisit BIOST 515 Lecture 11 3 Over small ranges of the predictor or when the relationship between the predictor and the outcome is not strong this may not be troubling 10 O 0 GD O 0000 cnoomn omnoocrmnmo 000 00 our 0 08 06 04 02 00 o 0030 own no Gamma unwo o oo o o l 2 0 2 4 6 BIOST 515 Lecture 11 4 However if the association is strong potential problems are more evident 7 imp wan 001301300000 omommummmno Doom moo l We could put constraints on the s that would prevent this from happening but this would be complicated and probably not the best way to address this problem BIOST 515 Lecture 11 5 The next obvious problem comes from the relationship varlyil 7r 1 7n Ely l1 Elyil 50 511 50 61 What is this problem We may be able to do a transformation to fix this problem but it would be better to use the information we have about the mean variance relationship to build a more appropriate regression model BIOST 515 Lecture 11 6 Review of 2 x 2 tables Disease Exposure Yes No YES 7T11 7T12 NO 7T21 7T22 where 7 Pr exposurezi amp disease 2 j Two of the most commonly used summaries of association are the relative risk and the odds ratio BIOST 515 Lecture 11 7 Relative risk PrDiseaseExposure 7T117T11 712 RR PrDiseaseNo Exposure 7T217T21 W22 7T117T21 7T22 7T217T11 7T12 BIOST 515 Lecture 11 Odds ratio Given exposure the odds of getting the disease are PrDiseaseExposure 7T117T11 7T12 7T11 PrNo DiseaseExposure 7T127T11 7T12 712 The odds ratio can then be expressed as Odds of DiseaseExposure 7T117T12 7T11 OR Odds of DiseaseNo Exposure 7T217T12 7T21 BIOST 515 Lecture 11 Regression models for probability of disease How do we relate the outcome y to an exposure 90 Recall the first lecture when we discussed relating functions of the mean to linear functions of predictors exposures We will take that approach to modeling the outcome in this case by modeling 9Eiyz ixz i97h 505190 Eiyz ixii 7Ti 91505196 7 where g is called a link function How do we interpret 7n BIOST 515 Lecture 11 10 Distribution of y In this case we know that yi follows a bernoulli distribution 7T 1yi i9150 51yii1 9150 611 7473 BIOST 515 Lecture 11 11 Relating 7f to exposure We will first look at the case where the exposure is dichoto mous exposedunexposed 210 One way we may relate 7r to the exposures is through the log link g log This gives the following relationship 10g7Tz 50 613 When a subject is exposed as 1 and 7f 7TDE probability of disease given exposure In the 2 x 2 table this was 7T117T11 W12 Therefore 10g7TDE 50 51 When a subject is unexposed 90 1 and 7f leEc prob BIOST 515 Lecture 11 12 ability of disease given no exposure In the 2 x 2 table this was 7r217r21 W22 Therefore 10g7TDEC 50 We can then get the relative risk as follows 10g7TDE 10g7TDEc 50 51 50 10g7TDE 51 7TDEC RR XPlt51gt What are some potential drawbacks of this modeling scheme BIOST 515 Lecture 11 13 Logistic regression In logistic regression we use the logit link which is defined as g7rlogit7rlog 7 1 7TZ This is equivalent to modeling the log odds We relate to the exposure using loglt7Tz 50 613 When a subject is exposed 90 1 and 7t 7TDE proba bility of disease given no exposure Therefore lOglt7TDE 50 51 BIOST 515 Lecture 11 14 This is equivalent to the log of the odds of disease given exposure When a subject is unexposed 90 0 and 7m 7TDEC probability of disease given no exposure Therefore lOglt7TDEc 50 This is equivalent to the log of the odds of disease given no exposure BIOST 515 Lecture 11 15 Calculating the odds ratio We can calculate the odds ratio as follows logit7TDE logit7TDEc 50 51 50 7T 7T 0 log 10g 61 1 7TDIE 1 7TDEc 7TDE 7TDEC 31 1 WDIE1 7TDIEC OR XPlt51gt BIOST 515 Lecture 11 16 10 y2 04 BIOST 515 Lecture 11 CHS Example In this example we will look at coronary heart disease We code gt 1 disease yz 0 no disease The exposure is male gender Our observed proportions are CHD Exposure Yes No Male 0098 0322 Female 0102 0478 PrDisease 0098 0102 02 PrDiseaseMaIe0098009803220233 PrDiseaseFemae0102 010204780176 PrDiseaseMale PrDiseaseFemale 0233 0176 0057 BIOST 515 Lecture 11 18 RR 00980098 032201020102 0478 132 OR 0098032201020478 143 Because this is a simple 2 x 2 table our estimates from linear regression and glm with log and logit links should match Linear regression lm11mCHDquotGENDER dat achs summarylm1 Estimate Std Error t value Prgtt Intercept 01759 00235 749 00000 GENDER 00575 00362 159 01133 7n 01759 00575xi BIOST 515 Lecture 11 19 GLM with log link g1m1g1m CHDquotGENDER familybinomia1linkquot1ogquot datachs summary glml Estimate Std Error 2 value Prgtz Intercept 17381 01271 1367 00000 GENDER 02828 01783 159 01128 10g7r 17381 02828xi How do we get the relative risk from this output BIOST 515 Lecture 11 20 Logistic regression GLM with ogit link g1m2g1mCHDquotGENDERfamilybinomia1linkquot1ogitquotdatachs summaryg1m2 Estimate Std Error zvalue Prgtz Intercept 15446 01542 1001 00000 GENDER 03551 02245 158 01138 Iogitm 15446 0355133 How do we get the odds ratio from this output BIOST 515 Lecture 11 21 Exposures measured on a continuous scale So far we ve only replicated the information we could from a 2 x 2 table What if instead we had an exposure that was measured on a continuous scale Examples 0 Age 0 An enviromental toxin that is hypothesized to be related to some disease 0 Score on an elementary school exam and subsequent enroll ment in college BIOST 515 Lecture 11 22 Relative risk regression with continuous predictor l097Ti 50 51 96 C 107T C50510 90C1 107T90 C15051C1 log7rx 0 l097rx C 1 I 60 610 50 610 7T96z 0 C log 510 1 7Txic1 aim 61 How do we interpret this BIOST 515 Lecture 11 23 Logistic regression with continuous predictor logit7Ti 50 51 It I C logit7TfB 0 I 60 610 xc1 ogit7rx c1 6061c1 ogit7rx c logit7rx 0 1 60 610 60 610 odds7rxi 0 odds7rxi C 1 eXp 61 BIOST 515 Lecture 11 24 Example In this example we will look at age as a predictor of CHD The regression model is gECHDZgt 60 lagei If we use linear regression CHDi 60 lagei 6 the results are Estimate Std Error t value Prgtt Intercept 00127 02346 005 09569 AGE 00029 00032 091 03636 BIOST 515 Lecture 11 25 Example with relative risk regression GLM with log link glml2g1mCHD AGEfamilybinomiallinkquot1ogquotdatachs summaryglm12 Estimate Std Error 2 value Prgtz Intercept 26598 11230 237 00179 AGE 00143 00152 094 03457 BIOST 515 Lecture 11 26 Example with logistic regression glml2g1mCHD AGEfamilybinomiallinkquot1ogquotdatachs summaryglm12 Estimate Std Error 2 value Prgtz Intercept 26846 14356 187 00615 AGE 00177 00195 091 03632 BIOST 515 Lecture 11 27 Lecture 3 Simple linear regression cont BIOST 515 January 13 2004 Breakdown of sums of squares The simplest regression estimate for Yi is 37 an intercept only model Yi Y is the total error and can be broken down further by total error residual error error explained by regression


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.