### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# SRVL DTA ANLYS EPDM EPI 537

UW

GPA 3.77

### View Full Document

## 32

## 0

## Popular in Course

## Popular in Epidemiology

This 269 page Class Notes was uploaded by Garry Marvin on Wednesday September 9, 2015. The Class Notes belongs to EPI 537 at University of Washington taught by Norman Breslow in Fall. Since its upload, it has received 32 views. For similar materials see /class/191971/epi-537-university-of-washington in Epidemiology at University of Washington.

## Similar to EPI 537 at UW

## Popular in Epidemiology

## Reviews for SRVL DTA ANLYS EPDM

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/09/15

BIOSTEPI 537 Survival Data Analysis in Epidmiology Norman Breslow Fifth Lecture 22 and 24 January 2008 Comparing Survival Functions o Nonparametric specialized tests D LogRank test connection to Mantel Haenszel D Generalized Wilcoxon tests D Stratified versions of above D K sample versions one way ANOVA D Trend test Tarone o Tests based on parametric regression models D Exponential Weibull o Tests based on semi parametric regression Cox model D Essentially the LogRank test BiostEpi 537 51 Ex 1 Remission Times of Leukemia Patients Remission Times months Proportions in Remission 6 M P Co ntrol KantMW Wares n 21 n 21 6 6 6 6 7 1 3 9 10 10 11 4 4 5 5 8 13 16 17 19 8 8 8 11 20 22 23 25 11 12 12 32 32 34 1517 22 35 23 denotes censored observation a limb Gehan EA A Generalized Wilcoxon Test for Comparing Arbitrarin Sineg Censored Samples Biometrika 52203 223 1965 Cited 2769 times as of 18 Jan 2008 BiostEpi 537 52 Comparing Survival Functions 0 We could compare the survival estimates at a particular time to D Using the Kaplan Meier survival estimate and Greenwood s variance formula we can calculate a Z statistic D HO 81t0 32050 D 1114331050 32050 Z 3102 3302 Waggon V 2to D Under HO Z N NO1 BiostEpi 537 53 Comparing Survival Functions Survival Survival Where do we put to a Test result may depend on choice Q How can we compare the entire survival curves 0 The LogRank test BiostEpi 537 54 Comparing Survival Functions H0 Slt 20 or Ala A2t for all t HA1 S20 31010 0r A10 0amp0 C 75 1 We will be able to detect this but not this KR BiostEpi 537 55 LogRank Test Notation 251 lt 232 ltt1 are ordered failure times in the pooled sample both groups combined 0 At each time ti 139 1 create a 2 x 2 table Group 1 Group 2 Totals deaths at ti dli dgi dZ survivors past ti 51 52 5 at I iSK at ti nu 7121 m o For each time ti we compare the observed number of failures d with what we would expect under H0 BiostEpi 537 56 LogRank Test 0 Key Quantities Observed Oi d d I Expected Ei M lt in group 1 under HO 7 d Variance V1 n211n21 18quot ni 1 from hypergeometric distribution null hypothesis 0 Test statistic LogRank I 2 I T8 yard Em Z M 21 21 or square 0 E2 of sum of O Ei deviations devided by sum of variances V1 BiostEpi 537 57 LogRank Test o Statistic based on weighted comparison of estimated hazards Or E dlz nu Z n1m2z lt 1 1 dlz nu Q What impacts the size of the weights Properties Under HO Slot E 820 o TE N X for large I o 2 sided p value PrX gt TE 2 n2 i o 1 sided p value Prb gt TE if 0 E has correct sign BiostEpi 537 58 6M P Leukemia Example 0 There are 17 unique failure times I 17 c2gtlt2tabefort66 6 MP Control Totals deaths at 26 3 O 3 survivors past 256 18 12 30 at risk at 26 21 12 33 06 3 21 x 3 E6 191 33 21 x 12 x 3 x 30 V6 2 0651 33 x 33 x 32 BiostEpi 537 59 6M P Leukemia Example I 2 X 2 table for t16 Z 22 6 MP Control Totals deaths at t16 1 1 2 survivors past t16 6 1 7 at risk at t16 7 2 9 016 1 7 x 2 E16 2 156 9 7 x 2 x 2 x 7 V16 2 0302 9 x 9 x 8 etc BiostEpi 537 510 6M P Leukemia Example 8138 test tX failure d censor analysis time t time Logrank test for equality of survivor functions Events Events tX I observed expected Control 21 1075 6MP 9 1925 Total 30 3000 chi21 1679 Prgtchi2 00000 BkmtEpi537 511 Comparing Herpes Treatment Groups 0 There are 2 main treatments none Acyclovir Q Is there evidence for a treatment effect 0 There are actually 4 total treatments since Acyclovir was administered in 3 separate modalities Q How to compare the four groups with respect to recurrence 0 There are 3 HSV subtypes Q Should we control for disease subtype or conduct analyses for each type BiostEpi 537 512 Herpes Recurrence by Treatment Hapl an Msier survival estimates 1353 05 1 03 I I g D25 1 0m a 1m 2amp0 3m 4m anewsis time txGontm quot txTrisd BiostEpi 537 513 Herpes Hazard by Treatment 01 DUB I 604 Smunthed hazard estquot mates BiostEpi 537 10 2amp0 analjmis time 0 403 514 Herpes Treatment by Disease Type tab tx group Chi2 HSV type tX I II III Total Control 32 206 32 270 Treated 38 121 27 186 Total 70 327 59 456 Pearson Chi22 78246 Pr 0020 c Possibility that HSV type may confound treatment effect D 54 Type I treated vs 37 for Type II 46 Type III D Slight advantage for treatment BKEtEpi537 515 Recurrence by Tx for HSV Type II Hapl an Msier survival estimates a 1m 2amp0 3m 4m anewsis time btGontm hm ts Triad BiostEpi 537 516 Recurrence by Tx for HSV Type II Haplan Meier survival estimates Gunman Treaisd 1 1s 26 ads aim 1139 1613 35m 36 am a na Lamsis tirna 95 CI Eu nriva r functi n Grailshit BiostEpi 537 517 Recurrence by Tx for HSV Type II Smunthed hazard estquot mates a 1m 2amp0 3m 4m analjmis time 0 BiostEpi 537 518 LogRank Test All HSV Types Combined 8138 test tX Logrank test for equality of survivor functions Events Events tX I observed expected Control 221 22109 Treated 151 15091 Total 372 37200 chi21 000 PrgtChi2 09924 BKmtEpi537 519 LogRank Test HSV Type II Only sts test tX if group Logrank test for equality of survivor functions Events Events tX I observed expected Control 178 18542 Treated 103 9558 Total 281 28100 chi21 089 Prgtchi2 03465 0 Need method to combine results over strata HSV type BbijpiSBY 520 Stratified Tests c To test for differences with adjustment for a confounding factor stratify the sample on categorical confounder 0 Suppose the stratification factor has K levels is 1 K D HO Akttreated Aktcontrol k 1 K D HA Akttreated Ckktcontrol k 1 K C 72 1 o Stratified version of TE 11 2 K 2 212101 1 E21 222101 2 E22 Z K1O K EiK I I I 211121 V11 2131 Vz 2 39 39 39 3121 Vz K where 07 Erik and Wk are calculated solely from subkects in the kth stratum 0 TE N X for large n under H0 BiostEpi 537 521 Compare Herpes TX Adjusting for HSV Type sts test tX if group stratagroup Events Events tX I observed expected Control 178 18542 Treated 103 9558 Total 281 28100 sum over calculations within group chi21 089 Prgtchi2 03465 c Identical results to those obtained without option stratagroup as they should be 0 Comparison restricted to HSV Type II group BbijpiSBY 522 Compare Herpes TX Adjusting for HSV Type sts test tx stratagroup Stratified logrank test for equality of survivor functions Events Events tX I observed expected Control 221 229 14 Treated 151 14286 Total 372 37200 sum over calculations within group chi21 079 Prgtchi2 03754 0 Now results are summed over all three strata HSV types BbijpiSBY 523 LogRank and MantelHaenszel Tests M H Test Series of independent tables at different levels of a confounder C 0 Data at level 0 2139 Exposed E Unexposed E Diseased D ai bi non Disased D 0 dz M H test compares PrDEC i and PrDEC i and is designed most powerful for the situation where the odds ratios 1b are constant across strata Hombizl HA kE 7amp1where PrDEC z PrDEC z PrDE C i PrDE C i w BiostEpi 537 524 LogRank and MantelHaenszel Tests Logrank Test Series of dependent tables at different observed distinct times of death a Data at ti Group 1 Group2 Deaths at ti dli dgi dZ Survivors past ti 51 52 51gt RM 7121 7 0 Thus we expect the logrank test to be powerful when odds ratios over infinitesimal time intervals are constant across time ie where m Prt TlttAtG 1T2t1 Prt TlttAtG 1T2t Prt TlttAtG2T2t1 Prt TlttAtG2T2t is the same for all t 1 21b BiostEpi 537 525 LogRank and MantelHaenszel Tests 0 But as At t O 1 1 P s T 1 2 Ratio of P s gt ratio of Xs ie N AtGroup 1 AtGroup 2 So the Iogrank test is designed for most powerful for the proportional hazards situation where RRt Dt RRt E RR constant in time BiostEpi 537 526 Alternatives to LogRank Q What happens when hazards are not proportional 0 Similar to M H LogRank usually OK provided hazard ratios are all consistent in sense of lt 1 or gt 1 Other tests available when a priori one expects to see a difference where hazard ratios are not consistent General family of nonparametric tests have form Tgv 23le MO E012 2111 where weights mi 11102 emphasize different time intervals where one expects larger or smaller differences to appear Parametric tests also available Exponential Weibull 0 But depend for validity on correction specification of model as well as equality of survival distributions BiostEpi 537 527 Alternatives to LogRank Choice of weights in nonparametric tests 1 111 1 gt LogRank test Mantel 1966 Peto2 1972 2 w m gt Wilcoxon Gehan1965 Bresow1970 test 3 w Tarone Ware 1977 text 4 111 ti gt Pet021972 Prentice1975 test 5 w ti1p 1 tiq gt Harrington Fleming 1982 o LogRank 1 emphasizes later differences in hazards o Wilcoxon analogs 3 and 4 emphasize earlier differences o Tests 2 and 3 depend on censoring pattern D May be undesirable when censoring substantial o Harrington Fleming gives flexible family of tests for increasing of decreasing hazard ratios D But how does one specify pq a priori BiostEpi 537 528 Example Alternative Tests with Herpes Data sts test tX if group2 Same result with fh0 0 option Logrank test for equality of survivor functions Events Events tX lobserved expected Control 178 18542 Treated 103 9558 Total I 281 28100 chi21 089 Pr 03465 sts test tX if group2 wilcoxon Wilcoxon Breslow test for equality of survivor functions Events Events Sum of tX lobserved expected ranks Control 178 18542 2069 Treated 103 9558 2069 Total 281 28100 0 chi21 164 Pr 02001 c Suggests slightly higher recurrence rate on treatment especially during early period BiostEpi 537 529 Example Alternative Tests with Herpes Data sts test tX if group2 p PetoPeto test for equality of survivor functions Events Events Sum of tX I observed expected ranks Control 178 18542 62279681 Treated 103 9558 62279681 Total 281 28100 0 chi21 163 Prgtchi2 02024 sts test tX if group2 fh1 0 FlemingHarrington test for equality of survivor functions Events Events Sum of tX I observed expected ranks Control 178 18542 63768179 Treated 103 9558 63768179 Total 281 28100 0 chi21 164 Prgtchi2 02007 c Peto Prentice and Fleming Harrington 10 tests virtually identical and here Close to Gehan Breslow Wilcoxon BiostEpi 537 530 Example Alternative Tests with Herpes Data sts test tX if group2 fh0 3 FlemingHarrington test for equality of survivor functions Events Events Sum of tX I observed expected ranks Control 178 18542 02700719 Treated 103 9558 02700719 Total 281 28100 0 chi21 000 Prgtchi2 09889 0 Emphasizing later recurrences shows virtually no difference in averagehazard rates between treated and control BiostEpi 537 531 Example Alternative Tests with Herpes Data streg tX if group2 distweibull nohr Fitting constantonly model Iteration 3 log likelihood 57739953 Fitting full model 111 Iteration 3 log likelihood 57677726 Weibull regression log relativehazard form No of subjects 327 Number of obs 327 No of failures 281 Time at risk 28564 Log likelihood 57677726 Prob gt chi2 02646 t Coef Std Err z Pgtz 95 Conf Interval tX 1391603 1239317 112 0261 1037414 3820621 cons 3509624 1915101 1833 0000 3884977 3134271 lnp 2688536 0473888 567 0000 3617339 1759733 p 7642551 0362171 6964677 8386404 1p 1308464 0620065 1192406 1435817 0 Results similar to Wilcoxon variants of nonparametric tests BiostEpi 537 532 One Way ANOVA with Censored Survival Data 0 Recall need to compare four treatments for Herpes 0 Consider comparisons among K gt 2 groups D H0 A1t E A204 E E D HA At least one inequality among K hazards D Data at ith observed event ti in the pooled sample ti Group 1 2 k K Total Deaths d d dim dKz dz Survivors 51 52 8m W 8m Sz Total at risk nu 712 my We 7 BiostEpi 537 533 One Way ANOVA with Censored Survival Data 0 As with 2 sample test we first calculate the differences in the observed number of deaths at time ti in group is and the expected number under the null hypothesis of no difference in death rates 0 Then we accumulate these differences over the I distinct failure times 2 x K tables and use a multivariate generalization of O E2V to form the test statistic o The key quantities for group is are BiostEpi 537 534 One Way ANOVA with Censored Survival Data Om dim observed EM M expected 7 n gt 71 71 gt damp kai kz2z I Z Z variance 7 1 kinkidisz covariance with d I 1 I ka z BiostEpi 537 535 One Way ANOVA with Censored Survival Data 0 Accumulate D 0k Ek 2111Okz Elm D W 2111 kaz I D kaI 211 Vick1 0 Test statistic T Ol El V11 V12 V1K 01 E1 T2 2 O2 E2 V12 V22 V2K O2 E2 OK EK V1K V2K VKK OK EK BiostEpi 537 One Way ANOVA with Censored Survival Data 0 Under null hypothesis T2 N X1 K 1 df 0 Negative covariance between dim and dki comes from conditioning on marginal totals d sin1 nKi of each 2 x K table lowering dim would mean raising the number of deaths in another group possibly k 0 V denotes generalized inverse of covariance matrix V 0 Test statistic may be calculated by dropping last term OK EK from O E vector and last row and column from V then use ordinary matrix inverse 0 Note simply calculated conservative approximation K 0k Ek2 Z s T2 k1 Ek BiostEpi 537 537 Example CCG803 Study of Childhood ALL o 268 children with acute lymphoblastic leukemia ALL D Remission successfully induced with PDN and MTX D Randomized to maintenance regimens using 6 MP and MTX with and without dactinomycin treatment D Compare relapse free survival curves Variable ValuesRange ID 1300 Institution 120 Age 015 years WBC White blood count 100 s RX 0 No DACT 1 DACT Duration 19998 Duration of PU days Indie 0 No relapse censored 1 Relapse uncensored BiostEpi 537 Example CCG803 Study Of Childhood ALL Nu mar mar IngWEC awn mm gen wbcg 1 wagt 50 wagt 200 tabulate rx wbcg col Regimen I WBC group I Low Med High I Total No DACT I 36 37 43 I 116 I 4138 3895 5000 I 4328 DACT I 51 58 43 I 152 I 5862 6105 5000 I 5672 Total I 87 95 86 I 268 BiostEpi 537 Example CCG803 Study Of Childhood ALL KapramMe ey sunwar esnmates Kap anrMerer sunwar esumares o Chance advantage for DACT slightly lower WBC o WBC appears to have a big effect on relapse rates r Does it BiostEpi 537 540 Example CCG803 Study of Childhood ALL sts test wbcg Logrank test for equality of survivor functions Events Events wbcg I observed expected Low 43 7263 Med 66 5992 High 72 4845 Total 181 18100 chi22 2455 Pr 00000 sts test wbcg cox Cox regressionbased test for equality of survival curves Events Events Relative wbcg I observed expected hazard Low 43 7263 06336 Med 66 5992 11869 High 72 4845 16035 Total 181 18100 10000 LR chi22 2495 0 LogRank is score test based on Cox model BiostEpi 537 541 Test Rx Effect W and WO Adjustment sts test rX Logrank test for equality of survivor functions Events Events rX I observed expected No DACT 81 6800 DACT 100 11300 Total 181 18100 chi21 406 Pr 00440 sts test rX stratawbcg Stratified logrank test for equality of survivor functions Events Events rX I observed expected No DACT I 81 7004 DACT I 100 11096 Total I 181 18100 chi21 291 Pr 00882 sum over calculations within wbcg o Adjustment for treatment advantage raises p value BiostEpi 537 542 Test Rx Effect W and WO Adjustment sts test rx p PetoPeto test for equality of survivor functions Events Events Sum of rX I observed expected ranks No DACT 81 6800 10087531 DACT 100 11300 10087531 Total 181 18100 0 chi21 511 Prgtchi2 00237 0 Weighting earlier relapses gt greater Rx effect 0 First evidence that Rx or other covariate effect on hazard may attenuate with increasing time BiostEpi 537 543 Test for Trend Tarone 0 Sometimes K groups have a natural order based on some scale or dose vector x1x2xK 0 Then we are interested in testing whether hazard functions tend to increase or decrease with dose o Hypotheses generic alternative gt H0 A1t 2 Ago 2 AKt D HA 1 A105 2 A205 2 Z AK OF HA A1t g A203 g g AKt with at least one lt or gt o More specifically test is optimal for testing D HO A1t A2t AKt D HA1 Cx1gt1t Cx2gt2t CxKAK C 75 1 BiostEpi 537 544 Test for Trend Tarone Q Does risk increase or decrease with increasing dose Test Statistic Tarone test for trend 2 251 moi Eu Z51 xink 2 Zkltkl xkxkka T12 Null distribution T N X2 single df Numerator large whenever O E s have large absolute magnitude for large 33k Effectively a test for regression of log Mt on the 33k under Cox model coming BiostEpi 537 545 Test for Trend Tarone Tarone trend test will detect this or this but not this BioStEpi 537 546 Example CCG803 Study of Childhood ALL Xi stcox iwbcg iwbcg Iwbcg13 naturally coded Iwbcg1 omitted Cox regression Breslow method for ties LR Chi22 2495 Prob gt Chi2 00000 t Haz Ratio Std Err z Pgtz 95 Conf Interval Iwbcg2 I 1873176 3685644 319 0001 1273787 275461 Iwbcg3 I 2530743 4897422 480 0000 1731913 3698028 stcox wbcg LR Chi21 2388 Prob gt Chi2 00000 Wbcg I 1567817 1456472 484 0000 1306834 188092 0 Trend accounts for nearly all the difference between groups D Compare LR values 2388 vs 2495 0 Tarone test is score test for grouped linear variable D Greater power to detect ordered alternative hypotheses BiostEpi 537 547 Comparison of Survival Curves Summary o Bewildering choice of test statistics all in STATA D LogRank Peto Prentice Fleming Harrington Choice amongst them must be made a priori D LogRank is standard best for constant hazard ratio D Peto Prentice good if hazard ratio decreases in t D FHO 3 OK if hazard ratio expected to increase in t Adjust for grouped confounders by stratification Use trend test if natural order in groups 0 Parametric tests available D but not usually recommended due to possible invalidity and lack of major gains in power BiostEpi 537 548 Binomial Proportions vs Survival Data Proportions Survival 1 Description P A SQ t Mt BRAD RR 2 Two Sample Z testX2 test Lograhk test Hypothesis Fisher exact Weighted version Test 3 Stratified Mantel Haehszel Stratified two sample test versions test BiostEpi 537 549 Binomial Proportions vs Survival Data 4 K group K group heterogeneity K group heterogeneity heterogeneity test stratified test stratified test stratified 5 K group Cochran Armitage Tarone trend test trend test trend test stratified stratified stratified 6 Regression Logistic regression Cox regression Models BiostEpi 537 550 BIOSTEPI 537 Survival Data Analysis in Epidmiology Norman Breslow Eighth Lecture Cox Regression III 12 February 2008 Left Truncation 0 Previous examples assumed follow up of all subjects began at t O D End of primary episode herpes D Diagnosis or start of treatment CCGGO4 CCG803 0 Sometimes observation does not begin at t 0 time zero for everyone in the study D Example 1 Atomic Bombing Survivors Study B ampD II Appendix IB D Example 2 Welsh Nickel Refiners Study B amp D II Appendix ID D Example 3 Herpes study with t O at beginning of primary episode 0 Risk set size may increase as well as decrease over time 0 Need modification of STATA setup to allow tstart BiostEpi 537 81 Example 1 Atomic Bombing Survivors Study 0 Schematic of design BioStEpi 537 82 Example 1 Atomic Bombing Survivors Study 0 Time zero is August 1945 time time since exposure 0 Observation of all subjects with the 1950 census 0 Subjects who died or moved away before 1950 were not in sample gt survival times are left truncated at 5 years Note how this is different from right censoring 0 With right censoring subjects with long survival times are in our sample we just don t get to see exactly what their survival time is 0 With left truncation subjects with short survival times are not in the sample and we have no idea how many of them there are BiostEpi 537 83 Example 1 Atomic Bombing Survivors Study Implications o No way of making inferences about Mt for t lt 5 years 0 If in a Cox model with Mt A0texp lx1 KxK there may be different relationships between 513113K and Mt when t lt 5 than when t gt 5 D but we have no way to detect this BiostEpi 537 84 Example 2 Welsh Nickel Refiners a Time time since beginning of employment D Time zero 2 employee s start date D All subjects began employment prior to 1925 0 Most subjects entered in 1934 some in 1939 1944 or 1949 gt follow up times are left truncated D but in contrast with Example 1 each subject has own left truncation time D had to have survived from initial employment to beginning of observation Q What are the implications of this Q Suppose we ignore it BiostEpi 537 85 Example 2 Welsh Nickel Reriners Historical Record based on Calendar Year 1900 1910 1920 1930 1940 1950 1960 Risk Set Formation A 10 2o 30 40 Time since firsi employment years BiostEpi 537 Example 2 Welsh Nickel Refiners o If we ignore left truncation and everyone is on study from t 0 then in Cox model for Atx hazard for death covariates of subject 3 who fails at 251 see diagram will be compared to those of all other subjects Q Is this reasonable Q Why or why not BiostEbi 537 87 Example 3 Study Of Recurrent Herpes Infection lt Duration gt7Old time t 0 Start 18t Erld 1 Next episode 17 New time t 0 75start 0 Previously considered time from end of first episode to start of next as basic time variable entered at t O 0 Now consider t time from start of first episode to start of next episode as basic time variable analysis time D Event of interest cannot occur until primary episode over D New time Duration of first episode Old time D If calculate period of risk using staggered entry left truncation at tstart avoid immortal time bias D However biological rationale for new time not clear BiostEpi 537 88 Dealing with Left Truncation 0 At failure time ti include in risk set 72 only subjects who have not yet failed and are under observation D gt ti beyond their left truncation time 0 Only this smaller risk set is representative of those at risk of being observed to fail at ti 0 With correction for left truncation risk sets are not neces sarily nested and can get bigger as time progresses o In STATA and other software specify entry time tstart 0 Cannot compensate for selection bias D Cohort membership conditional on survival to tstart D Statistical inferences apply only to time period after tstart BiostEpi 537 89 Example South wales Nickel Refiners 0 I384 D Appendix ID Risk of nasal sinus and lung cancer associated with nickel refining established in late 1930 s 0 In 1949 both diseases considered occupational disease in UK for men working in a factory where nickel is produced by decomposition of a gaseous compound Aims of the study 0 Determine whether risk of carcinoma of bronchi and nasal sinuses still present 0 Evaluate effect of age at exposure on cancer risk 0 Determine rate of change in mortality after exposure to carcinogen ceased BiostEpi 537 810 Example South wales Nickel Refiners Cohort identified using company paysheets o Paysheets examined in April 1934 39 44 49 later 29 0 Start follow up when name appeared on second sheet Follow up continued through 1981 0 788968 had died only 18 lost to follow up 0 Stop FU at age 85 missclassification of deaths thereafter 0 Cause of death nasal sinus or lung cancer other from ICD7 Analysis limited to 679 first employed before 1925 o Nickel carbonyl process moved to Norway about that time o No deaths from nasal sinus cancer in Welsh workers employed afterwards o Exposure 2 years employed in high risk areas before 1930 BiostEpi 537 811 Example South wales Nickel Refiners a Data in Appendix VIII of Breslow amp Day 1987 infile ID ICD EXpLev Birth AgeEmp AgeEnter AgeEXit gt using nickelrawtxt list ID ICD EXpLev Birth AgeEmp AgeEnter AgeEXit 3 0 50 18890192 174808 452273 929808 4 162 50 18859780 231864 482684 632712 6 163 100 18812548 252452 529917 541644 aoijpiSBY Example South wales Nickel Refiners o Examine causes of death tab ICD ICD Freq Percent 0 I 47 692 2 I 5 074 155 I 2 029 156 I 1 015 157 I 2 029 160 56 825 Nasal sinus cancer 161 I 2 029 162 55 810 Lung cancer 163 82 1208 Lung cancer 177 10 147 BiostEpi 537 Risk by Time since 1St Employment gen TimeEXit minAgeEXit85AgeEmp gen TimeEntr AgeEnterAgeEmp gen Nasal ICD160 gen YFE Birth AgeEmp twoway scatter TimeEntr YFE 35 q o 39I 25 20 I i Time Sims First Ern1lqrn1iantaitErrtrr 15 l 7 I Y I I 1905 1910 1915 192i 1925 Year of First Errpbymenl Q What causes the strange behavior BiostEpi 537 814 Survival Data in STATA stset timevar if exp weight options Options for use with stset in stata Version 60 o ididvar specifies subject id variable Needed when there are multiple records per id D Intermittent periods of follow up D Recurrent events D Multiple correlated failure times subject 2 cluster D Not specifying id implies single record per subject 0 failurefaivarnumist specifies failures 6 1 D If only failurefaivar is specified faivar assumed coded 1 failure 0 censored BiostEpi 537 Survival Data in STATA o originO determines the origin of the analysis time scale D originvarnamenumist used only with multiple record per subject data see stata Reference Manual D origintime exp specifies origin of analysis time as recorded in timevar default is time 0 D originmin rarely used sets origin to earliest recorded time 1 o scale specifies units of analysis time t tMne origin scale D To change time in days to time in years use sca1e36525 BKEtEpi537 816 Survival Data in STATA o entervarnamenumist time exp specifies start of period of risk 0 exitfailure varnamezznumist time exp specifies end of period of risk 0 time0varname now rarely used since most datasets do not contain this information o if ever never afterO beforeO select relevant records see Reference Manual 0 noshow suppress identities of key variables in output BiostEpi 537 817 ST Setup 1 for Nickel Data stset AgeExit idID failureICD160 origintime AgeEmp gt entertime AgeEnter id failure event obs time interval enter on or after exit on or before t for analysis exittime 85 ID ICD 160 AgeExitn1 AgeExit time AgeEnter time 85 timeorigin origin time AgeEmp 679 total obs O exclusions 679 obs remaining representing 679 subjects 56 failures in single failurepersubject data 1523408 total analysis time at risk at risk from t O earliest observed entry t 93449 last observed exit t 675192 BiostEbi 537 ST Setup 2 for Nickel Data recode Nasal 1O if AgeExitgt85 stset TimeExit failureNasal enterTimeEntr failure event Nasal O amp Nasal lt obs time interval 0 TimeExit enter on or after time TimeEntr exit on or before failure 679 total obs O exclusions 679 obs remaining representing 56 failures in single recordsingle failure data 1523408 total analysis time at risk at risk from t O earliest observed entry t 93449 last observed exit t 675192 label variable TimeExit quotTime since First Employmentquot 0 Two setups are identical BiostEpi 537 819 Size of Risk Sets for Nickel Workers sts list at 10555 failure analysis time d 1 enter on or after Nasal TimeEXit time TimeEntr Beg Survivor Std Time Total Fail Function Error 95 Conf Int 10 26 0 10000 15 279 0 10000 20 378 1 09965 00035 09753 09995 25 494 3 09893 00054 09713 09960 30 517 16 09593 00091 09372 09738 35 437 10 09397 00108 09146 09577 40 340 7 09233 00123 08953 09440 45 240 5 09065 00142 08744 09307 50 170 8 08710 00184 08299 09027 55 97 5 08406 00223 07913 08792 Note Survivor function is calculated over full data and evaluated at indicated times it is not calculated from aggregates shown at left BiostEpi 537 Nasal Sinus Cancer Deaths in Nickel Workers o sts graph na Cumulative Hazard Nasal Sinus Eancar Death I 2 H a ff 53 H5 J m far d El 7 723 1quot aid ah Tirna since firstan39plciyrnant yrears BiostEpi 537 821 Nasal Sinus Cancer Deaths in Nickel Workers BiostEpi 537 sts graph hazard xtitlequotTime since first employment yearsquot 005 nm Smuathed hazard estimate I 2121 40 E10 Ti ma si me first a rrp bymant yea 5 822 Nasal Sinus Cancer Deaths in Nickel Workers 0 Fit Cox model terms for exposure age and year of first employment YFE gen YFEC YFE191510 gen YFECSQ YFE1915 2100 gen loggen logtcox logAFE YFEC YFECSQ logEXp nohr basehcH0 basechCHO failure d Nasal analysis time t TimeEXit enter on or after time TimeEntr No of subjects 679 No of failures 56 LR chi24 8551 Prob gt chi2 00000 Log likelihood 28445116 t Coef Std Err z Pgtz 95 Conf Interval logAFE 2223047 4369308 509 0000 1366679 3079416 YFEC 0945044 3157334 030 0765 7133306 5243218 YFECSQ 1255019 5061985 248 0013 224715 2628883 logEXp 7660638 1739724 440 0000 4250841 1107043 o All covariates highly significant BiostEpi 537 823 Nasal Sinus Cancer Deaths in Nickel Workers 0 stcurve cumhaz xtitlequotTime since first employment yearsquot Cox proportional hazards regression l I Cu mu Latina Hazard IF 3 r 44 0 39 39 l 39 l 39 l CI 2U 40 ED BU Time since firstarrpbyment Hears BiostEpi 537 824 Nasal Sinus Cancer Deaths in Nickel Workers 0 stcurve hazard xtitlequotTime since first employment yearsquot Cox proportional hazards regression E I q riffKquot g i39 3 l39 E if 1 i ci E n B x E a E g PdP W I x a 39 Jay l 2quot 43 lCI Time since firstarrpbyment Enema BiostEpi 537 Nasal Sinus Cancer Deaths in Nickel Workers 0 Check to see if quadratic fit in YFE can be improved fracpoly stcox YFE logAFE logEXp nohr gt gen double IlogA1 logAFE2703311075 if esample gt gen double IlogE1 logEXp8486980027 if esample gt gen double IYFE1 X 37041820961 if esample gt gen double IYFE2 X 3lnX3701034945 if esample where X YFE10 Log likelihood 2844528 Prob gt Chi2 t Coef Std Err z Pgtz 95 Conf Interval IYFE1 I 0244126 0098502 248 0013 0051065 0437187 IYFE2 I 0043687 0017627 248 0013 0078235 0009139 IlogA1 I 2223053 4369175 509 0000 1366711 3079396 IlogE1 I 766155 1739636 440 0000 4251926 1107117 Deviance 57554 Best powers of YFE among 44 models fit 3 3 o No log likelihood almost identical to first model BiostEpi 537 826 Nasal Sinus Cancer Deaths in Nickel Workers o fracplot Fraotional Polynomial 3 3 adjusted for oovariatrea 94 m D I 9 w 9 E III l t a E 1 9 El 9 1 D Tiir VLW 39l39 71 re rr 1995 1919 1915 1929 1995 Year of First Errplziyrnant BiostEpi 537 827 Choice Of Time Variable 0 Time since first employment TFE more appropriate than age as time variable for nasal sinus cancer deaths in Nickel workers D TFE surrogate for time since onset of exposure D Virtually no cases expected in absence of exposure D Consider duration of time since onset of exposure to causal agent as most relevant time scale D Usual reasons for regarding age as key explanatory variable absent o However older age at 1 exposure strongly modified risk BiostEpi 537 828 Choice Of Time Variable o Often multiple time scales to choose from D Age time since birth D Time since onset of exposure D Time since diagnosis or treatment D Time since study enrollment o Farewell amp Cox Bioka 1979 A Note on Multiple Time Scales in Life Testing D If one time scale is clearly of primary practical importance then others can if necessary be introduced into a model as time dependent covariates D We look for a time scale that accounts for as much of the variation as possible BiostEpi 537 829 Choice Of Time Variable o Breslow et al JASA 1983 Multiplicative Models and Cohort Analysis D A major source of confusion about the proper application of Cox s model to cohort data relates to the choice of appropriate time variable t in the basic model tx One approach is to treat follow up or time on study as the fundamental time variable controlling the effects of age and calendar year based on a subject s status at entry into the study This is sensible in clinical trials It ensures that risk sets are decreasing in size as t increases gt However study time is often an inappropriate choice for the time variable in cohort studies for two reasons First since death rates from the major diseases of interest rise rapidly with age age effects should be controlled as precisely as possible Second many exposures are measured imperfectly so that duration of employment and therefore time on study are highly correlated with and may to some extent serve as surrogate measures for the cumulative exposures In such cases controlling for time on study in the analysis may mask the very effects one is attempting to uncover D The approach that we suggest is to consider t 2 age as the underlying time variable and to control for secular trends by time dependent strata BiostEpi 537 830 Choice Of Time Variable o Korn Graubard amp Midthune AJE 1997 Time to event Analysis of Longitudinal Follow up of a Survey Choice of the Time scale D D D BiostEpi 537 Of 14 papers conducting NHANES follow up analyses 12 used time on study as the time scale Well known identifiability problems associated with untangling the effects of age birth cohort and calendar period choose two only Because the greatest flexibility of the PH model is due to not having to specify the form of A0t it is best to use this function to model the variable that is expected to have the largest effect on the hazard in this case age In other biomedical applications A0t would typically be modeled as a function of variables other than age For example in a randomized clinical trial or in a natural history study of disease time since randomization or diagnosis would be used With follow up of a healthy population however we believe age will be the most appropriate time scale for most outcomes 831 Cumulative Risk of Death from Mesothelioma among North American Insulation Workers Peto J et al 1982 Br J Cancer45124 35 Cumulllive risk Wu Line type Age at Hire Dash dot dash 15 25 yrs Solid 25 24 yrs Dotted 35 yrs Eumulnllve risk l n 10 1 4O Time lnne llrsl exposure yslrs BiostEpi 537 832 BIOSTEPI 537 Survival Data Analysis in Epidmiology Norman Breslow Thirteenth Lecture Poisson Regression 11 March 2008 Poisson Distribution X N Pm is a distribution for the number of events occurring over an interval of time Assumptions 1 For a short interval t0t1 Pr1 event in t0t1 m M751 to PrO event in t0t1 1 At1 to Prgt 1 event in t0t1 m 0 2 A constant across time 3 Occurrence of each event is independent of the other Connection with exponential distribution 0 Number of recurrent events over a fixed interval of time under constant hazard exponential model has Poisson distribution BiostEpi 537 131 Poisson Distribution Example Calls arrive at switch board rate A per unit time For an interval of time t0t1 no longer necessarily small X number k events calls during t0t1 k O12 X N PM Where Mgtt1 to AMt1 to EX 2 VaertL Probability function Mke PrX k kl k012n BiostEbi 537 132 Poisson Distribution Normal Approximation to Poisson Distribution 0 For large ILL LC Like g alt M Fx PFX x k0 k PFZ 0 See plots X N NM02 M o MLE szor sample of X1XnPM BiostEpi 537 133 Poisson Distribution Poisson Approximation to Binomial Distribution 0 Suppose Y N binnp for large n and small p 0 Then Y Pu np D EY np D Var Y np1 p np for small p Example o Y Number of diagnoses of rare disease in large population BiostEpi 537 134 Normal r 39 39 t0 POiSSOH mu4 0 010020 00004 010 amtEm 537 Poisson Regression Let 139 refer to cell defined by covariate levels and time interval 139 1 i 2 etc age 20 30 age 30 40 Example height lt 60 height lt 60 weight lt 100 lbs weight lt 100 lbs time 1 interval time 1 interval Data di 2 number of events Ni person time in it cell Want to know how Ari depends on a Time interval x 1 2 l time interval x 1 3rd time interval T2 0 else T3 0 else quotquot o Grouped covariate values 513113K BiostEpi 537 136 Poisson Regression Model log A 941 ang2 OquETq 31 BKxK like Cox model 3909 Mt 3909 A005 31x1 KmK except baseline hazard is piecewise constant BiostEpi 537 137 Poisson Regression For analysis assume o dz N Poissonmi Niki 0 3909 M 041 04233T2 041qu 311 KwKz o log W log N1 log AZ o Note that xT terms also depend on 139 In STATA use a exposureNZgt for rate multiplier or equivalently o offsetOg Ni for offset BiostEpi 537 138 Poisson Regression Example Welsh Refiners Grouped data Appendix VII 0 dz number of deaths in W combination of categories of AFE YFE EXPG TFE 4 x 4 x 5 x 5 400 possible cells 0 242 have PY contributions Data 0 242 di s some can be 0 D in STATA o 242 Ni s all gt O Y in STATA 0 Discrete covariate levels for each 139 0 Time TFE interval for each 139 BiostEpi 537 139 Inference for Poisson Regression Model 0 dz N Poissonu Niki 39 3909 M 041 042T2 O quq 511 5KK Since Poisson likelihood is same as that of exponential regression for censored survival data under model of constant hazards within covariatetime intervals can use ordinary likelihood based inference COCf CientS 041 qu 31 K o All estimated by maximum likelihood MLE s BiostEpi 537 1310 Inference for Poisson Regression Confidence Intervals SE s from inverse information matrix 0 1 00 CI for Bk Bk Za2 SEltBkgtaBk 2052 SEBk 1 00 CI for 63k eBkZa2 SHEg 63kZa2 SHEJ Tests ofHO p1 BKO o Wald test p K 1 NNO 1 0 Score test N 9amp4 0 LR test DevianceHO DevianceH1 N e o All N for large n number of events under H0 BiostEpi 537 1311 Poisson Regression amp Nickel Workers o stset using nasal sinus cancer deaths ICD160 as endpoint set mem 10m use nickel stset YrEXit idID failureICD160 entertime YrEnter 0 Note use of calendar year as time variable for stset but will not be used for analysis time 0 Split records three times age year time since hire stsplit ageBirth at10585 3051 observations episodes created stsplit year at193151986 3153 observations episodes created stsplit TFEYrEmp at201050 1392 observations episodes created BiostEbi 537 1312 Poisson Regression amp Nickel Workers 0 Could split on TFE time since first employment only D Splits on age and year needed only for calculation of expected numbers based on standard rates for England amp Wales o Generate discrete fixed covariates D Age at first employment AFE D Calendar year at first employment YFE D Exposure group EXPG gen YFE YrEmpgt19099999YrEmpgt19149999YrEmpgt19199999 gen EXPG EXpLevgtOEXpLevgt4EXpLevgt8 EXpLevgt12 gen AFE AgeEmpgt19999AgeEmpgt274999AgeEmpgt349999 BiostEpi 537 1313 Poisson Regression amp Nickel Workers 0 Table nasal sinus cancer deaths and person years and save in file Nasal strate AFE YFE EXPG TFE outputNasal replace 8275 records included in the analysis AFE YFE EXPG TFE D Y Rate Lower Upper 0 0 1 20 0 18330 00eOO 0 0 1 30 0 100000 00eOO 0 0 1 40 0 08739 00eOO 0 0 2 20 0 72073 00eOO 0 0 2 30 0 300000 00eOO 0 1 1 0 0 09154 00eOO 0 1 1 20 1 80 7985 00123765 00017434 00878615 0 1 1 30 0 81 6233 00eOO 0 1 1 40 1 460958 00216939 00030559 01540068 0 1 1 50 0 271653 00eOO o Compare BampD II Appendix VII nearly the same BiostEpi 537 1314 Poisson Regression amp Nickel Workers 39 39 APPENDIXVII u vo AND was sinus ew mummy 1m WELSH N Y WOKKE SUMMARY DATA FOR Av39cnmssmicamu mmw V WWW L g rum w in Nani an m mm W 4mva mum mm amass y i Wm insui OStEpi 537 1315 Poisson Regression amp Nickel Workers 0 Fit exponential regression constant hazard model Xi streg iAFE iYFE iEXPG iTFE nohr distributionexponential No of subjects 679 Number of obs 8275 No of failures 56 Time at risk 1523409473 LR chi214 10683 Prob gt chi2 00000 Log likelihood 10771935 t Coef Std Err z Pgtz 95 Conf Interval IAFE1 1533027 7533013 204 0042 0565838 3009471 IAFE2 2241806 7644972 293 0003 743419 3740193 IAFE3 3536554 7893556 448 0000 1989446 5083663 IYFE1 8986812 3821842 235 0019 1496139 1647748 IYFE2 9009069 514299 175 0080 1071006 1908914 IYFE3 1421269 5240329 027 0786 1169213 8849588 IEXPG1 8231689 4026445 204 0041 0340001 1612338 IEXPG2 112435 4704719 239 0017 2022422 2046458 IEXPG3 2214087 511871 433 0000 1210838 3217335 IEXPG4 2806093 5692079 493 0000 1690466 392172 ITFE20 1606256 1048757 153 0126 4492707 3661782 ITFE30 1898862 1057205 180 0072 1732207 3970946 ITFE40 2570065 1074142 239 0017 4647861 4675344 ITFE50 3182741 1124967 283 0005 9778462 5387636 cons 1071707 137474 780 0000 1341151 8022626 BiostEpi 537 1316 Poisson Regression amp Nickel Workers 0 Look at structure of grouped data created by strate clear use Nasal sum Variable Obs Mean Std Dev Min Max YFE 242 138843 1084572 0 3 EXPG 242 1442149 1211443 0 4 AFE 242 1380165 1032711 0 3 TFE 242 2809917 155034 0 50 D 242 231405 5114511 0 2 Y 242 629508 9836747 0384521 6774666 o D contains number of nasal sinus cancer deaths and Y the person years observation time in each of 242 cells defined by three discrete covariates and intervals of TFE BiostEbi 537 1317 Poisson Regression amp Nickel Workers o Now fit Poisson regression model to grouped data poisson D iAFE iYFE iEXPG iTFE exposureY Xi Poisson regression Number of obs 242 LR chi214 10683 Prob gt chi2 00000 Log likelihood 12466511 Pseudo R2 02999 D Coef Std Err z Pgtz 95 Conf Interval IAFE1 1533027 7533013 204 0042 0565838 3009471 IAFE2 2241806 7644972 293 0003 743419 3740193 IAFE3 3536554 7893556 448 0000 1989446 5083663 IYFE1 8986812 3821842 235 0019 1496139 1647748 IYFE2 9009069 514299 175 0080 1071006 1908914 IYFE3 1421269 5240329 027 0786 1169213 8849588 IEXPG1 8231689 4026445 204 0041 0340001 1612338 IEXPG2 112435 4704719 239 0017 2022422 2046458 IEXPG3 2214087 511871 433 0000 1210838 3217335 IEXPG4 2806093 5692079 493 0000 1690466 392172 ITFE20 1606256 1048757 153 0126 4492707 3661782 ITFE30 1898862 1057205 180 0072 1732207 3970946 ITFE40 2570065 1074142 239 0017 4647861 4675344 ITFE50 3182741 1124967 283 0005 9778462 5387636 cons 1071707 137474 780 0000 1341151 8022626 Y exposure BiostEpi 537 1318 Poisson Regression amp Nickel Workers o Voila Results identical to those obtained using parametric exponential regression D Poisson regression uses 242 records of numbers of events and person years in distinct covariate categories D Exponential regression uses 8275 split time records for 679 subjects same discrete covariates 0 Two likelihoods as functions of unknown parameters are identical up to a constant multiplier D Hence MLE s and all likelihood inference on parameters identical BiostEpi 537 1319 Poisson Regression amp Calculated Rates 0 Fit Poisson model with TFE alone gen PY1000 YlOOO Xi poisson d iTFE exposurePY1000 Poisson regression Number of obs 242 LR chi24 1513 Prob gt chi2 00044 Log likelihood 17050854 Pseudo R2 00425 d Coef Std Err z Pgtz ITFE20 2330946 1025978 2272 0023 ITFE30 2318352 1028992 2253 0024 ITFE40 2614771 1037749 2520 0012 ITFE50 2668692 1080123 2471 0013 cons 9505651 1 0951 0342 PY1000 exposure BiostEpi 537 1320 Poisson Regression amp Calculated Rates o Fitted rates per 1000 PY Baseline 0 19 yrs TFE exp 95056 03865 20 29 yr TFE exp 95056233095 39764 30 39 yr TFE exp 95056231835 39267 40 49 yr TFE exp 95056261477 52815 50 yr TFE exp 95056266869 55741 0 Rates identical to those from elementary calculation strate TFE per1000 Estimated rates per 1000 and lowerupper bounds of 95 confidence intervals 8275 records included in the analysis TFE D Y Rate Lower Upper I 0 1 25872 0386523 0054447 2743951 I 20 19 47782 3976414 2536370 6234056 I 30 17 43294 3926653 2441045 6316394 I 40 13 24614 5281476 3066723 9095701 50 6 10779 5566228 2500687 12e01 BiostEpi 537 1321 Poisson vs Cox Regression for Nickel Workers 0 Compare Poisson with Cox regression using continuous time since first employment as analysis time use nickel stset YrEXit failureICD160 entertime YrEnter exittime 1982 gt origintime YrEmp obs time interval origin YrEXit enter on or after time YrEnter exit on or before time 1982 t for analysis timeorigin origin time YrEmp 56 failures in single recordsingle failure data 1523258 total analysis time at risk at risk from t O earliest observed entry t 9344849 last observed exit t 6751917 BiostEpi 537 1322 Poisson vs Cox Regression for Nickel Workers Xi stcox iAFE iYFE iEXPG nohr basehchaz Log likelihood 28059005 t Coef Std Err z Pgtz 95 Conf Interval IAFE1 I 1480452 7528414 197 0049 0049098 2955994 IAFE2 I 2208312 7622113 290 0004 7144052 3702219 IAFE3 I 363623 790407 460 0000 208706 5185399 IYFE1 I 102979 3803088 271 0007 2843985 1775182 IYFE2 I 111191 5136654 216 0030 1051446 2118676 IYFE3 I 0101656 5266777 002 0985 1022104 1042435 IEXPG1 I 8837732 4042418 219 0029 0914738 1676073 IEXPG2 I 1186168 4721537 251 0012 2607636 2111572 IEXPG3 I 2300019 5166391 445 0000 1287425 3312613 IEXPG4 I 2842197 5721708 497 0000 1720763 3963631 0 Compare Table 510 BampD II 1987 BiostEpi 537 1323 Poisson vs Cox Regression for Nickel Workers stcurve haz titlequotMortality from Nasal Sinus Cancerquot gt XtitlequotTime from first employment earsquot Baseline M l39tallw from Nasal Sinus Cancer I if E g m if 5 ll E 339 ill I E E 39 a E g a w E fr Jrquot D JefP 393 2h 4 din Time frum first employment yearail BiostEpi 537 1324 Quick Review Part II 0 Left truncation D Specify to as well as t and 6 D Immortal time bias D Choice of time variable o Survival model checking D Proportional hazards Schoenfeld residuals D Functional form for X martingale residuals D Outliers deviance residuals D Influence A BiostEpi 537 1325 Quick Review Part II o Model choice depends on goals D Confirmatory a priori hypothesis D Exploratory Descriptive generate new hypothesis o Time dependent covariates D t tvc D Xt split records D External vs internal BiostEpi 537 1326 Quick Review Part II o Grouped survival data D Discrete covariates D Discrete time splitting records into segments D Person years and numbers of events D Standard rates and SMRs D Poisson regression BiostEpi 537 1327 7 x T 27 3 J 7 W mg L 3 V 7 i N K7 r J J L lt31 E J 41mg J m L L is Us I L L J my 3 BiostEpi 537 1328 BIOSTEPI 537 Survival Data Analysis in Epidmiology Norman Breslow Sixth Lecture Cox Regression I 29 and 31 January 2008 COX Regression Model Introduction 0 COX 1972 Model o Proportional hazards PH assumption 0 hazard model o log hazard survival models o Interpretation of coefficients 0 Examples BiostEpi 537 COX Regression Model Estimation 0 Regression coefficients a Partial partial likelihood o Approximations for tied failure times 0 Cumulative baseline hazard and hazard curves o Survival curve Stratification 0 True stratification 0 Using covariates BiostEpi 537 62 Cox JRSSB 1972 0 DR Cox 1972 Regression models and life tables with discussion J Roy Statst Soc Ser B 74187 220 0 The present paper is largely concerned with the extension of the results of Kaplan and Meier to the comparison of life tables and more generally to the incorporation of regression like arguments into life table analysis p 187 Model proposed Mth A005 expXTr3 o we shall however concentrate on exploring the consequence of allowing A0t to be arbitrary main interest being in the regression parameters p 190 BiostEpi 537 63 Cox JRSSB 1972 Spawned new field semi parametric inference Wellner o Parametric part regression coefficients 3 o Nonparametric part cumulative hazard t Conditional likelihood later called partial likelihood 0 Cox Biometrika 1975 Score test LogRank test Peto amp Peto JRSS A 1972 0 Used Gehan s leukemia data Lect notes p 52 for illustration BiostEpi 537 64 Cox JRSSB 1972 Discussion 14 contributions 0 Second vote of thanks by Mr Richard Peto Oxford U I have greatly enjoyed Professor Cox s paper It seems to me to formulate and to solve the problem of regression of prognosis on other factors perfectly and it is very pretty 0 13 Kalbfleish and Prentice conditional likelihood not true conditional probability 0 14 Breslow derived 3 from profile likelihood yielding 7o D Breslow estimator of baseline cumulative hazard D Used CCG803 leukemia data Lect notes p 547 for illustration BiostEpi 537 65 Cox JRSSB 1972 Impact 0 Science Citation Index 22632 citations 26 Jan 2008 D N 700 per year since 2005 0 Sir David knighted in 1985 in recognition of scientific contributions 0 Received General Motors Kettering Prize in 1990 contri butions to clinical cancer research for development of the Proportional Hazards Regression Model BiostEpi 537 66 BiostEpi 537 67 Bordeaux France 13 September 2001 BiostEpi 537 COX Regression Model Response variable 0 Observed 3361 i1n 0 Of interest T gt Survival function 30 D Hazard function Mt Observed covariates X1 X2 XK o For jth subject observe Yj6jX1jX2j XKj Idea same as other regression models Model relates the covariates X1XK to distribution 80 or At of the response variable T BiostEbi 537 69 COX Regression Model M006 M15 1727 xK A005 GXDB1x1B2x2 BKxK Alternatively 0 3909 M75 127 513K 3909 Aot 1x1 22 3KK or 0 80 331332 733K SOteXD511 22m KxK Note definition of baseline hazard and survival 39 A005gtt12mK0 39 3005St12mK0 BiostEpi 537 610 Interpreting Cox Regression Coefficients Proportional hazards RR Atx17x27 397xK 207332207 7xKZO EXDWlxl 32x2 KwK RR above is rate ratio of death failure comparing subjects with covariate values 131332513k to subjects with covariate values 00O 0 Also known as relative risk D Some epidemiologists prefer to use relative risk to describe the ratio of two probabilities a summary measure of effect that has little to recommend it range restricted by baseline probability BiostEpi 537 611 Interpreting Cox Regression Coefficients In general under the PH model 3k is the log RR log hazard ratio for a unit Change in 33k assuming all other covariates remain constahtie the same for the groups compared Atx17 7xkx7 nxK A0teXD31x1 iAx 1 3KK1 A0teXD51X1 51 39 KxK expwk BiostEpi 537 612 Interpreting Cox Regression Coefficients The RR comparing 2 sets of values for the covariates ie RR comparing 51315132 33K to iS Alia a a RRxVS x f Atx1x2xk W x1xK At XD311 x1 32x2 2 KCBK leH which involves the differences 33k 33k BiostEpi 537 613 Interpreting Cox Regression Coefficients Log transform of continuous covariate usually appropriate 0 RR1320 expwx gt too steep dose response curve 0 RR1320 exp logx x5 risk increases as power of 1 D Suggested by mathematical models of carcinogenesis when 13 is duration or intensity of exposure 0 Effect of doubling dose from 1 to 213 easily determined RRQx x e llog2x Iogxl e log2 28 In article can state Each P percent increase in dose was estimated to increase the failure rate by 1 P1OO51OO percent Why BiostEpi 537 614 Cox Model Examples 1 exposed One dichtomous covariate 17E 2 O nonexposed Mt A0te IE Iog Mt log A0t ME BiostEpi 537 615 Two covariates Dichotomous exposure 22 as above dichotomous confounder 1 level 2 etc 2 0 level 1 so log Mt log A00 azc zE BiostEpi 537 616 Two dichotomous covariates with interaction log t 3909 0t airc IrE Wcm BiOStEpi 537 617 One continous covariate ch BiostEpi 537 log W log A00 xD K sample heterogeneity K24 Mt 2 E3 964 A0te 22 33 44 l l 1 group 2 0 else 1 group 3 0 else 1 group 4 0 else log Mt 3909 A005 22 33 44 BiostEpi 537 K sample trend K24 Mt A0teBD 0 group 1 1 group 2 2 group 3 3 group 4 33D log W log A005 wo BiostEpi 537 620 Comments on Cox Model Examples o In each example log hazard functions are parallel constant difference between each covariate group D gt constant ratio of hazards in time for each pair of covariate groups 0 For any likelihood based inference there are three possible tests of hypothesis likelihood ratio score and Wald o The score tests of B O in the Cox model are Example 1 LogRank Example 5 K sample heterogeneity generalizes LogRank D Hoi 233 40 Example 6 Tarone s trend test BiostEpi 537 621 Partial Likelihood M006 M75 1 733K A005 GXD311 KxK Ordered data assuming no ties in observed failure times 0 ti is it ordered failure time o x x1ix2ixKi covariate vector for subject who fails at ti o 731 risk set at ti all subjects with Yj 2 ti Partial likelihood no ties is I A0tie 1xi1m KxiK 73L I I l K Z12 11 ZjERiA0tie81jlquot39 K9K where the product is over the I observed failure times ti 0 Data at each ti conditionally independent of past BiostEbi 537 623 Partial Likelihood Estimating Equations alog 73MB 2 E X00 Z www j O a i1 jE39RZ39 where exj wzj ZEERZ 6X65 is weighted by RRzeXi5 sum of x s in the it risk set 0 Partial likelihood compares covariates of those who fail at ti to covariates of those who survive beyond ti o If a covariate mm is consistently larger smaller for the subject failing at ti than for the rest of 72 Bk will be positive negative BiostEbi 537 Risk Set Construction Failure times t1l t23 t34 t46 Withdrawals t2 t 7 Risk sets Study BiostEpi 537 Partial Likelihood Cox 1972 Justification o No information can be contributed about 3 by time intervals in which no failures occur because the component A005 might conceivably be identical in such intervals 0 We therefore argue conditionally on the set ti of instants at which failures occur 0 For the particular failure at time ti conditional on the risk set 72 the probability that the failure is on the individual as observed is XD511 520 39 5KK ZjERi GXDWMM 322j Kij o This likelihood contribution has the exact same form as a matched logistic regression conditional likelihood BiostEpi 537 626 Partial Likelihood Cox 1972 Justification Q What is the probability that the subject with covariate vector x died at ti given that one person died from those in 72 Note PrT 6 252 AtT 2 t m AtAt Person who died A0texp 1x1i KxKa At E P Generic j e 72 A0t exp lx1j KijAt E Pj BiostEbi 537 627 Partial Likelihood Cox 1972 Justification Probability of One Death Was 139 Pa gtlt 1 P1gtlt 1 P2 gtltskipi gtlt1 PK Probability Of One Death PrOne Death Pr1 died others lived Pr2 died others lived Prk died others lived X H 1 Pk may Prj died others lived Note 1 Pj m 1 for small At BiostEpi 537 628 Partial Likelihood Cox 1972 Justification 0 Now calculate desired quantity answer query above PrOny i Dies Pr1 Death Po Hk7 z 1 Pk EjeRi Pj Hk7sj1 Pk PrObserved data1 Death Pa 27671in Pa AoteXD 1x1 BKxK At ZjeRi Pj ZjeRi A005 GXDWWU BKijAt GXD311 32x20 I Kng ZjERi GXDWMM 322j Kij BiostEpi 537 629 Partial Likelihood Comments 0 Model is 3909 Mtx1xK 0405 31x1 3 where at log A005 but PL does not depend on at 0 Using partial likelihood PL provides estimates only of j 0 Model called semi parametric above Q Why not just use standard maximum likelihood as outlined in lecture notes pp 240 246 A Would require Choosing a model for baseline hazard but we don t need and prefer not to do that o Parameterize only that part of the model of major scientfic interest the log relative risks 57 BiostEpi 537 630 Partial Likelihood and Ties o If d gt 1 more than one death at 25 denominator of PL contribution will have large number of terms For example if 20 at risk and 3 die there are 20 choose 3 1140 terms 0 If failures are recorded in continuous time ties not an issue D Time recorded in days minutes D Modest sample size D Only occasional t where 1 lt d ltlt m D Above approximation adequate o If failures recorded at small number I of discrete times then consider methods appropriate for discrete time data where a log A005 estimated as part of parametric model BiostEpi 537 631 Partial Likelihood and Ties o Truly discrete times use logistic regression 39Ogit Mtz lxbxz H 733K 042 51x1 32x2 KmK where At x1x2 xK is discrete time hazard Cox 1972 o Ties result from grouping of continuous data use comple mentary log log regression 3909 3909 Mtz lxlaxz H 733K or 31x1 32x2 BKxK 0 See HampL pp 268 9 BiostEpi 537 632 Partial Likelihood and Ties However plenty of room between continuous and discrete 0 Example USRDS data 200000 subjects D 25 annual mortality 2 50000 deathsyear D 50000 deaths365 days 137 deathsday Approximation Breslow Peto default in STATA o Numerator is calculated using D sum x1 for deaths at 75 51 ZjERiy jzlxlj D sum x2 for deaths at 75 52 ZjERiy jzl 1327 D Approximation with d deaths at t is I GXD5181 52821 39 3K5Ki PCA H dz i1 2767a expwlxlj 523327 WWW BiostEpi 537 633 Partial Likelihood and Ties Example Two failures at time t in risk set 730 of size 4 o Covariates x1x2x3x4 x1 and x2 for failures 0 Relative risks ei exi Exact marginal likelihood contribution exactm in STATA 1 2 3 4 2 3 4 1 2 3 4 1 3 4 o Breslow Peto approximation breslow in STATA e162 0 2 61 62 63 64 o Efron approximation efron in STATA 6162 CC 61 62 63 4 1 622 63 64 o Formulas for general case in HampL 34 BiostEpi 537 634 Partial Likelihood and Ties Discrete time model conditional likelihood o Failures occur only at prespecified times 71 lt 72 lt lt 739 o Conditional probability of failure at Ti given observation past TZ given by linear logistic model expita x o Exact likelihood same as for conditional logistic regres sion analysis of matched case control studies D If di events among m at risk at ti consider all niCdi possible choices of dz covariates to go with the events D In the example 162 6162 6163 6164 6263 6264 6364 D Option exactp in stata BiostEpi 537 Partial Likelihood and Ties o Kalbfleisch amp Prentice 2002 423 summarize options and relative proscons D Breslow method simple to implement justify some bias if t discrete D Efron method simple to implement harder to justify performs well and is default in RS D Exact method justified if t truly discrete but a computational challenge D Should be minor issue in general and if not consideration should be given to one of discrete time methods BiostEpi 537 636 Partial Likelihood Ratio Tests Full Model Mtlx AoteXD31 3px p1p1 KxK Reduced Model Mtlx AoteXD31 319 To test 0 HO Reduced model p1 K 0 vs 0 HA Full model no restriction on p1 K Use the partial likelihood ratio or related statistics TgLR 2 log 73 Fu Model 2 og73 Reduced Mode BiostEpi 537 637 Partial Likelihood Ratio Tests 2 2 0 Under HO reduced correct TPLRN XKp D Degrees of freedom K p equals number of parameters set equal to O by null hypothesis reduced model 0 Applies to situations where two models are nested reduced a special case of full 0 Can also use Wald and likelihood score tests D Wald and PLR tests require fitting fuII model score does not D PLR and score tests invariant to parameter transformation hence X2 approximation to null distribution more accurate with small samples misbehaved covariate distributions BiostEpi 537 638 Fitting the Cox Model 0 Obtain estimates of 1 2 K by maximizing the partial likelihood function 73 Bl 2 K o 8182BK are MPLE s o CI s for 57 using Bji Z1aSEBj o CI s for hazard ratio HR using exp 37 ZlaSEBj exp Bj Zl aSEBj o Wald score and likelihood ratio tests are similar in concept to those used with logistic regression but based on the log partial likelihood rather than the full likelihood which also involves A0t BiostEpi 537 622 TABLE 2 Main quantities for the test of the null hypothesis for the data of Table l o u Risk population Failure time Multiplicity No in No in r f A m Sample 0 Sample 1 7 sample 0 sample 1 l 1 l l 23 23 6 1 7 01429 2 22 22 7 2 9 02222 2 17 10 3 13 02308 1 16 ll 3 14 02143 1 15 11 4 15 02667 1 13 12 4 16 02500 1 12 12 12 6 18 03333 2 ll ll 13 8 21 03810 2 10 15 8 23 03478 1 8 8 8 8 16 12 28 04286 4 7 17 12 29 04138 1 6 6 6 21 12 33 03636 3 5 5 21 14 35 04000 2 4 4 21 16 37 04324 2 3 21 17 38 0 4474 1 2 2 21 19 40 0 4750 2 l l 21 21 42 05000 2 10 quot1 2 mm Au 103925 m r m 2 WAHJI Au 62570 Example Cox 1972 Analysis of 6MP Data o 21 treated 9 relapses 21 control all relapsed 0 Several sets of tied observations dz 4 at t1 8 dz 3 at t 6 and d 2 at ti451112 Cox coded treatment var 1 O for treated 13 1 for placebo 0 Hence found positive regression coefficient 0 O E 21 975 1025 gt excess of relapses when 13 1 placebo o LogRank test 102526257O1679 st test tX cox Logrank test for equality of survivor functions tX I observed expected treatment 9 1925 placebo 21 1075 Total 30 3000 chi21 1679 Prgtchi2 00000 BiostEpi 537 640 Example Cox 1972 Analysis of 6MP Data 0 Cox remarks that his score test gives larger value than Gehan Wilcoxon sts test tX Wilcoxon Wilcoxon Breslow test for equality of survivor functions tX I observed expected ranks treatment 9 1925 271 placebo 21 1075 271 Total 30 3000 0 chi21 1346 Prgtchi2 00002 BiostEpi 537 641 Example Cox 1972 Analysis of 6MP Data o In view of numerous ties expect relatively large differences depend on method of handling them stcox tX nohr Cox regression Breslow method for ties No of subjects 42 Number of obs 42 No of failures 30 Time at risk 541 LR chi21 1521 Prob gt chi2 00001 t Coef Std Err z Pgtz 95 Conf Interval tX 1509191 4095644 368 0000 7064599 2311923 stcox tX nohr exactm Cox regression exact marginal likelihood LR chi21 1651 Prob gt chi2 00000 t Coef Std Err z Pgtz 95 Conf Interval tX 1598191 4216473 379 0000 771778 2424605 BiostEbi 537 642 Example Cox 1972 Analysis of 6MP Data 0 Cox himself used the exact permutation approach stcox tx nohr exactp Cox regression exact partial likelihood LR chi21 1625 Prob gt chi2 00001 t Coef Std Err z Pgtz 95 Conf Interval tX 1628244 4331313 376 0000 7793222 2477166 0 Cox himself found 6 165 777 0 He also found LR test statistic of 149 not 1625 777 0 However The overwhelming significance of the difference is in line with one s qualitative impression of the data BiostEpi 537 643 Example Cox 1972 Analysis of 6MP Data 0 Refit without nohr option but with basesurv option to get estimated HR and Ot o We ll learn later how Ot is calculated 0 Remember baseline here means treatment stcox tX exactp basesurv80 Cox regression exact partial likelihood LR chi21 1625 Prob gt chi2 00001 t Haz Ratio Std Err z Pgtz 95 Conf Interval tX 509492 2206769 376 0000 2179994 1190747 BiostEpi 537 644 Example Cox 1972 Analysis of 6MP Data 0 Generate estimated survival function for placebo group as baseline raised to power of HR in accordance with Cox PH model gen SlSO 509492 label var SO quotFitted Treatmentquot label var Si quotFitted Placeboquot 0 Compare nonparametric KM and semiparametric Cox estimates of S sts graph bytX plot1lpatternsolid lwidthmedthick plot2lpatterndash lwidthmedthick addplotscatter 81 80 time sort titlequotCompare KM with Fitted Coxquot VV BiostEpi 537 645 Example Cox 1972 Analysis of 6MP Data Compare KM with Fitted EM 113 CI 35 Al F a 3 Ell l I I I l D 10 a 3D 4D analysis time 1 tr1rnant a a v 11 planehn I Fi ad Tx1 I F39rl39iad Tx BiostEpi 537 646 survivor x function r 0 x 9 O 08 3 0 06 7 Q x C quotquot 04 x I x x x 02 X x L x x I I X l IO 20 30 remission time weeks FIG 1 Empirical survivor functions for data of Table 1 Product limit estimate sample 0 6MP sample 1 control Estimate constrained by proportionality 9 sample 0 x sample 1 For clarity the constrained estimates are indicated by the left ends of the de ning horizontal lines BIOSTEPI 537 Survival Data Analysis in Epidmiology Norman Breslow Tenth Lecture Cox Regression V 26 and 28 February 2008 TimeDependent Covariates o In General Model covariates 1 may depend on t At1tKt A0teXD31x1t KxKt Altlx1lttxKltti RR Atx 1tx Kt exp 1x1t x 1t KixKo x Krm is no longer constant in t 0 Interpretation D Model for instantaneous risk of failure given survival to t and the values of covariates up to and including time t D Both current and past covariate values may be used as predictors D Often requires extreme care for proper interpretation BiostEpi 537 101 USES fOI Time Dependent Covariates Time dependent covariates TDC used to model 0 RR s associated with fixed covariates that change over time D Interactions of covariates and time D Provide tests of proportional hazards PH assumption D Correct for lack of PH 0 Effects of exposures that change over time D Pregnancy and survival from birth of women with cystic fibrosis D Cumulative exposure to beryllium and lung cancer 0 Effects of treatments that change over time D Transplant and survival from acceptance into program of patients with heart disease 0 Adjustment variables that change over time BiostEpi 537 102 TimeDependent Regression Coefficients Motivation relative risks change with time 0 Effect of initial WBC attenuated with time in childhood ALL Model interaction of covariate 1 with function 905 of time Mtlx Aotexp lxvxgt A0texp tx where Mt B17gt Query Why isn t main effect 90 included in model with coefficienteg g BiostEpi 537 103 TimeDependent Regression Coefficients Choices for g 0 gt t where 2 average or modal value of time 90 logosf 0 gt 1t gt t where t fixed value of time D Jump or drop in hazard ratio at t STATA provides for such interactions in stcox options 0 tvcvarlist specifies covariates 1 o texpfunction of t specifies 905 D teXpt default for 905 t D texplogt for 905 logt D teXpt gt 10 for 905 1t gt 10 D teXpt25 to center time at f 25 BiostEpi 537 104 TimeDependent Regression Coefficients o Simplest example 1 exposed Mt Aotexp xE 7xEtv Where 33E 0 not exposed AUle 1 RR t ex 15 Mtle 0 35 7 0 So the RR can decrease or increase with the passage of time Can test HO log RR constant vs H1 log RR increases linearly with time decreases linearly with time by testing HO 7 0 vs H1 7 72 O in the above model 0 See graph next page BiostEpi 537 105 Time Dependent Regression Coefficients amtEm 537 Example Herpes Recurrence by Gender set mem 10000 Enlarge memory since we will split records replace censor0 if rectimegt365 Recall two failures at 366 days stset rectime censor idid Need ID to split data sts graph na bygender plot1lpsolid lwmedthick plot2lpdash lwmedthic NelsonAalen cumulative hazard estimates 203 303 403 500 l I l 103 a 1m 203 303 4611 analysis time gandermals h genderfemale o Hazards start to part markedly at about 80 days not before BiostEpi 537 107 Example Herpes Recurrence by Gender stcox age durprime if group2 stratagender basechazCH No of subjects 327 Number of obs 327 No of failures 281 Time at risk 28564 LR chi22 1100 Prob gt chi2 00041 Log likelihood 12214683 t Haz Ratio Std Err z Pgtz 95 Conf Interval age I 9851836 0097971 150 0133 9661675 1004574 durprime 1021912 0079167 280 0005 1006513 1037547 Stratified by gender gen CHMCH if gender1 gen CHFCH if gender2 label variable CHM quotMalequot label variable CHF quotFemalequot twoway scatter CHM rectime sort scatter CHF rectime sort gt ytitleCumlative Hazard XtitleTime to Recurrence from End of Primary Lesion gt titleBaseline Cumulative Hazard by Gender o Baseline cumulative hazards after adjustment for age and duration of primary lesion show same general pattern BiostEpi 537 108 Example Herpes Recurrence by Gender Baseline Cumulative Hazard by Gender 7quot I I 12 I i Em I i E E quot u quot quot L i Ir Ir 399 r 1 39 DJ D 1061 233 EDD 40 Tirna h Heeurranma frum End Elf Primary Lang II Male I Famalal BiostEpi 537 Example Herpes Recurrence by Gender 0 Test proportional hazards assumption using stphtest stcox age durprime gender if group2nohr basehch schr scas Log likelihood 14051372 t Coef Std Err z Pgtz 95 Conf Interval age I 0168776 0098617 171 0087 0362061 0024509 durprime 0218577 0077628 282 0005 006643 0370725 gender 434728 1298573 335 0001 6892437 1802124 est store modelO stphtest detail Test of proportionalhazards assumption Time Time rho chi2 df Probgtchi2 age 005431 074 1 03896 durprime 006238 125 1 02643 gender 018078 949 1 00021 global test 1091 3 00122 BiostEpi 537 1010 Example Herpes Recurrence by Gender stcox age durprime gender if group2nohr tvcgender Log likelihood 13987115 t Coef Std Err z Pgtz 95 Conf Interval age I 0145577 0099374 146 0143 0340346 0049192 durprime I 0219166 007752 283 0005 0067229 0371103 gender I 0288916 1742273 017 0868 3703707 3125876 gender I 0069843 0018404 379 0000 0105915 0033771 Note second equation contains variables that continuously vary with respect to time variables are interacted with current values of t Query What is log HR comparing Female gender2 to Male gender1 at o 50 days 0 100 days BiostEpi 537 1011 Example Herpes Recurrence by Gender 0 Three tests of proportional hazards with gender Wald 3792 144 Likelihood Ratio 2 139871 140514 2 129 Score 95 from stphtest o All reject hypothesis of proportionality 0 Using stphtest no need to fit model with TDC BiostEpi 537 1012 Example Herpes Recurrence by Gender 0 Now cut each subject s follow up record at failure times stsplit atfailures 28550 observations episodes created 151 failure times sum Variable Obs Mean Std Dev Min Max id 29006 1665958 1250672 6 9027 rectime 29006 6653096 6982288 1 366 censor 456 8157895 3880815 0 1 group I 29006 1831311 5618941 1 3 durprime 29006 1854706 7169798 4 66 gender 29006 1686341 4639877 1 2 age I 29006 2578404 6781993 1 52 treat I 29006 7524305 9853469 0 3 otherrec 29006 0922913 2894416 0 1 st 29006 1 0 1 1 d 29006 0128249 1125206 0 1 t 29006 6653096 6982288 1 366 t0 29006 6493257 6870363 0 362 BiostEpi 537 1013 Example Herpes Recurrence by Gender Each original record cut an average of 28550456626 times Average distance between adjacent failure times 2 6653 6493 16 days 0 t0 holds tstart for each record 0 thOdS texit o d indicates whether or not record ends in failure at t 0 Yet fit to this dataset identical to that obtained earlier BiostEpi 537 1014 Example Herpes Recurrence by Gender stcox age durprime gender if group2 nohr No of subjects 327 Number of obs 19023 No of failures 281 Time at risk 28564 LR chi23 2064 Prob gt chi2 00001 Log likelihood 14051372 t Coef Std Err z Pgtz 95 Conf Interval age I 0168776 0098617 171 0087 0362061 0024509 durprime 0218577 0077628 282 0005 006643 0370725 gender 434728 1298573 335 0001 6892437 1802124 BiostEpi 537 1015 Example Herpes Recurrence by Gender 0 Now model TDC s as functions of t gen gent gendert Covariate times time interaction stcox age durprime gender gent if group2 nohr Log likelihood 13987115 t Coef Std Err z Pgtz 95 Conf Interval age I 0145577 0099374 146 0143 0340346 0049192 durprime I 0219166 007752 283 0005 0067229 0371103 gender I 0288916 1742273 017 0868 3703707 3125876 gent I 0069843 0018404 379 0000 0105915 0033771 o Identical to result obtained earlier using original dataset and stcox age durprime gender if group2nohr tvc gender 0 When TDC s are interactions between fixed covariates and time use of options tvc teXp much more efficient BiostEpi 537 1016 Example Herpes Recurrence by Gender 0 Using original dataset 456 records to fit model Milan A005 exp lxage l vdurprime53xgender7xgender 1It gt 8O stcox age durprime gender if group2nohr tvcgender texptgt80 t Coef Std Err z Pgtz 95 Conf Interval age I 0152565 0099041 154 0123 0346682 0041552 durprime I 0217341 0077457 281 0005 0065529 0369153 gender I 1927699 1491889 129 0196 4851748 099635 gender I 1070645 2939911 364 0000 1646857 4944335 Note second equation contains variables that continuously vary with respect to time variables are interacted with current values of tgt80 Query What is log HR comparing Females to Males at o 50 days 0 100 days BiostEpi 537 1017 Example Herpes Recurrence by Gender 0 Return to expanded dataset and fit same model gen t800 replace t801 if tgt80 7916 real changes made gen gen80 gender1t80 Another interaction stcox age durprime gender gen80 if grou 2 nohr basehch Log likelihood 13988067 t Coef Std Err z Pgtz 95 Conf Interval age I 0152565 0099041 154 0123 0346682 0041552 durprime 0217341 0077457 281 0005 0065529 0369153 gender 1927699 1491889 129 0196 4851748 099635 gen80 1070645 2939911 364 0000 1646857 4944335 stcurve hazard 0 See graph of baseline hazard next page BiostEpi 537 1018 Example Herpes Recurrence by Gender Cox proportional hazards regression U11 3 VFW 7 3 E E 39 f I39l39 E I rquot m E g n 393 q 3 160 230 aoo anabrrsis time BiostEpi 537 1019 Example Herpes Recurrence by Gender o Attempt to graph hazards for males and females under model gen h0 heXp00152565250021734118 drop if censorO 84 observations deleted drop if dO 28550 observations deleted sort t gen newtimet collapse mean newtime hO byt kdensity newtime aweighth0 generatektime newhO gen newh1newhOeXp19276991070645ktimegt80 101 missing values generated graph twoway scatter newhO newhl ktime label variable newhO quotMale H label variable newhl quotFemalequot VV BiostEpi 537 twoway scatter newhO newhl ktime XtitlequotTime Since End of Primary Lesion Daysquot titlequotSmoothed Hazard by Genderquot 0 Graph next page 1020 Example Herpes Recurrence by Gender Emmth Hazard by Gender g n39lIIlii39 gill g quot iu if I l I lil 39D39Ii39 E 39 5 an 39 Ili ilillli Iili g Iiil lll39 q E 1511 253 aria Time Sims End if Primary Lasbn Days iMdE lmem BiostEpi 537 1021 Example CCG803 Study of Childhood ALL 0 Split CCG803 data into two time intervals at 270 days D 173 still on study beyond 270 days use ccg803 stset dur indic idid stsplit S270 at270 173 observations created summarize Variable Obs Mean Std Dev Min Max id I 441 1620454 9512899 1 331 inst I 441 9603175 6108778 1 24 age I 441 4993197 3399525 0 15 WbC I 441 3186576 8429884 7 9200 rX I 441 5918367 4920519 0 1 dur I 441 3482653 1897729 30 986 indiC I 268 6753731 4691113 0 1 Wbcg I 441 1941043 8040623 1 3 110WbC I 441 2016477 5872344 845098 3963788 agesq I 441 3646259 4690807 0 225 d I 441 4104308 4924706 0 1 t I 441 3482653 1897729 30 986 t0 I 441 1059184 1319801 0 270 8270 I 441 1059184 1319801 0 270 BiostEbi 537 1022 Example CCG803 Study of Childhood ALL sort id tO list id tO dur d 8270 if idlt1O id tO dur d 8270 1 1 O 200 1 O 2 2 O 270 O O 3 2 270 665 1 270 4 3 O 270 O O 5 3 270 938 O 270 6 5 O 270 O O 7 5 270 399 1 270 8 6 O 84 1 O 9 7 O 266 1 O 10 8 O 270 O O 11 8 270 868 O 270 12 9 O 270 O O 13 9 270 847 O 270 o Each record split at most once BiostEpi 537 1023 Example CCG803 Study of Childhood ALL 0 Fit separate Cox models D fist to Early period before 270 days stcox llOwbc age agesq rX if S2700 No of subjects 268 No of failures 93 Time at risk 62183 indic Coef llOwbc 86588 age I 2353322 agesq 0186309 rX 4681294 BiostEpi 537 Log likelihood chi24 Prob gt chi2 Std Err 1716943 107582 0074591 2102867 2 5043 2187 3 As O 00 2226 nohr 4807602 4168 00000 Pgtz 0000 0029 0012 0026 1024 Example CCG803 Study of Childhood ALL o and then to Late period after 270 days stcox llOwbc age agesq rX if S270270 nohr HM No of subjects 173 Log likelihood No of failures 88 chi24 Time at risk 44692 Prob gt chi2 indic Coef Std Err llOwbc 5463794 18686 age I 1506464 1252865 agesq 0113945 008547 1 rX 0751023 2274752 0 o What could be simpler BiostEpi 537 330 3932277 1202 00172 Pgtz 0003 0229 0182 0741 1025 Example CCG803 Study of Childhood ALL o Summarize results All Before After Relapses 270 days 270 days No pts 268 268 173 No deaths 181 93 88 Regression Coefficients IoglOWBC 072 087 055 Age10 189 235 151 Age2100 148 186 114 AMD 01 022 047 008 Standard Errors IoglOWBC 013 017 019 Age10 082 108 125 Age2100 056 075 086 AMD 01 015 021 023 Test of H0 LogLikeIihood 87695 48076 39323 X3 2 x 87695 48076 39323 592 NS 0 Coefficients appear to differ for two periods but lack of strong evidence BiostEpi 537 1026 Example CCG803 Study of Childhood ALL o Continue with TDC for IogloWBC and log time D Doing it the hard way using stsplit stsplit atfailures 123 failure times 18796 observations episodes created sum Variable Obs Mean Std Dev Min Max id 19064 1600467 9405523 1 331 inst 19064 9519461 6059545 1 24 age I 19064 4970048 3259581 0 15 wbc 19064 240588 5744156 7 9200 rX 19064 6105225 4876446 0 1 dur 19064 2510457 146711 30 986 indic 268 6753731 4691113 0 1 wbcg 19064 1885753 8067533 1 3 110wbc 19064 196079 5576982 845098 3963788 agesq 19064 3532569 4503662 0 225 st 19064 1 0 1 1 d 19064 0094943 0969778 0 1 t 19064 2510457 146711 30 986 t0 19064 2454396 1460551 0 942 BiostEpi 537 1027 Example CCG803 Study of Childhood ALL tab t0 t0 Freq Percent Cum 0 I 268 141 141 53 I 266 140 280 54 I 265 139 419 56 I 263 138 557 57 I 260 136 693 68 I 259 136 829 393 I 129 068 8246 394 I 128 067 8313 395 I 127 067 8380 399 I 124 065 8445 402 I 123 065 8509 836 I 14 007 9995 918 I 7 004 9999 942 I 2 001 10000 Total I 19064 10000 0 Note risk set sizes BiostEpi 537 1028 Example CCG803 Study of Childhood ALL gen wbclndur stcox 110wbc age agesq rX wbclndur nohr 110wbclogt5785 No of subjects 268 Number of obs 19064 No of failures 181 Time at risk 106875 LR chi25 5447 Prob gt chi2 00000 Log likelihood 87360287 t Coef Std Err z Pgtz 95 Conf Interval 110wbc 5709151 1405109 406 0000 2955188 8463114 age I 1998796 0813436 246 0014 3593101 040449 agesq 01547 0056001 276 0006 004494 0264459 rX 2098905 1517265 138 0167 5072689 0874879 wbclndur 5056175 1963819 257 0010 890519 120716 0 Similarly for interaction of treatment AM D and log duration 0 Summarize results on next page BiostEbi 537 1029 Example CCG803 Study of Childhood ALL Interaction of AMD and log time Model term Coeff Std Err p value Ioglo WBC 072 013 lt 0001 Age10 190 082 0020 Age2100 149 056 0008 AMD 01 014 017 0420 AMDogt 5785 030 025 0217 LR test for interaction X 16 p 021 Interaction of IogWBC and log time Model term Coeff Std Err p value Ioglo WBC 057 014 lt 0001 Age10 200 081 0014 Age2100 155 056 0006 AMD 01 021 015 0167 oglOWBCogt 5785 051 020 0010 LR test for interaction X 67 p 001 BiostEpi 537 1030 Example Estrogen Use and Breast Cancer 0 Use of TDC for time varying exposure 0 Thomas et aI JNCI 1982 BampD II pp 187 190 0 Cohort of women treated for benign breast disease BBD D Followed from Tx BBD to breast cancer BC or last seen D Age start estrogen E recorded for those who used it 0 Data omits 274 with unknown estrogen use Estrogen Breast Ca ncer Yes No Yes 25 522 No 33 499 Odds Ratio 12 33233 072 BiostEpi 537 1031 Example Estrogen Use and Breast cancer Issues 0 Different subjects observed for different amounts of time 0 Women who did not develop BC followed longer had a greater chance of exposure to estrogens m l l l BC 8 g N l l BC No Estrogen c0 Estrogen l l l A 30 40 50 60 70 80 Age BiostEpi 537 1032 Example Estrogen Use and Breast cancer Solutions o Age matched case control study D Match cases at age of BC diagnosis to controls of same age without BC D Compare history of prior estrogen use D Fails to use all available data 0 Cox model with TDC to represent time varying exposure D event BC incidence D time t 2 age in years left truncation by age at BBD x t 1 prior exposure to estrogens at age t E 0 no prior exposure to estrogens at age t BiostEpi 537 1033 Timevarying Exposure Model mm A0t exp aEt Exposed from 40 Exposed from 50 Exposed from 60 30 40 50 60 70 30 40 50 60 70 30 40 50 60 70 Age Age Age 0 Risk for women with and without prior exposure proportional 0 Once start exposure jump to hazard curve for exposed BiostEpi 537 1034 Time varying Exposure Estimation by partial likelihood 0 Split records at age of exposure a Woman who use estrogen age 53 generate two records 1 tstart 34texit 53xE 06 O 2 tstart 53atexit 62733E 15 1 0 Women who never start estrogen only have one record BiostEpi 537 1035 BkmtEpi537 1036 Age 20 30 40 50 60 7O 80 SuMed 0 Risk sets compare women with and without prior exposure Timevarying Exposure Time varying Exposure 0 Risk set at age ti 33 D Subjects 2 D Exposure 2 D Status 6 0 Risk set at age ti 39 D Subjects 2 D Exposure 2 D Status 6 0 Risk set at age t1 44 D Subjects 2 D Exposure 2 D Status 6 BiostEpi 537 1037 Example Estrogen Use and Breast cancer 0H1 0H2 hormone users at ages g ti hormone users at ages gt ti o no H not hormone users Age At Cancer cases Non cases ti risk H1 H2 no H H1 H2 no H 30 148 O O 1 27 63 57 37 279 O O 1 38 130 110 38 304 1 O 2 40 139 122 50 610 2 O 1 196 140 271 51 598 O O 3 216 115 264 52 577 2 O 2 226 94 253 54 520 1 O 1 221 66 231 58 389 4 O O 158 35 192 68 137 1 O 1 29 2 104 69 121 O O 3 27 O 91 76 37 O O 1 5 O 31 BiostEpi 537 1038 Example Estrogen Use and Breast Cancer 0 Compare cumulative hazards via time varying stratification 025 I 7 Exposed quot quot NonExposed 010 015 020 I I l Cumulative Breast Cancer Incidence 005 l 00 I 30 40 50 60 70 80 Age years Without adjustment Adjusted for birthyear RR 180 X2 441p 002 RR 144 X 182 NS o RR in opposite direction from crude OR that ignored timing BiostEpi 537 1039 Example Stanford Heart Transplant Data 0 103 subjects recruited into program beginning in 1967 o Transplanted when an HLA matched heart could be found D 69 received transplant after waiting time D 30 died before a suitable donor heart could be found D 2 still waiting for transplant at close of study D 2 deselected and censored at that time Q Does a heart transplant improve survival o Time time since acceptance into the program days 0 Event 2 death from any cause BiostEpi 537 1040 Example Stanford Heart Transplant Data o Two versions of data available via class website gt Crowley and Hu JASA 1977 heartorgdta D Kalbfleisch and Prentice 2002 Appendix AIV heartdta BiostEpi 537 1041 Example Stanford Heart Transplant Data VARIABLE DESCRIPTION CODE 1 PATND patient number 2 BMDN month of birth 1 12 3 BDAY day of birth 1 31 4 BYR year of birth 05 6O 5 ACCMON month of acceptance 1 12 6 ACCDAY day of acceptance 1 31 7 ACCYR year of acceptance 67 74 8 TXMDN month of transplant 1 12 1 if no transplant 9 TXDAY day of transplant 1 12 1 if no transplant 10 TXYR year of transplant 67 74 0 if no transplant 11 LASTMDN month last seen 1 12 12 LASTDAY day last seen 1 31 13 LASTYR year last seen 67 74 14 STATUS dead or alive 1 dead 0 alive 15 PRIOR prior surgery 1 yes 0 no 16 MATCH1 of mismatches 1 4 17 MATCH2 HLAA2 1 yes 0 no 18 MATCH3 mismatch score 0 305 19 REJECT rejected heart 1 yes 0 no 103 records from Crowley and Hu 1977 JASA pp 2736 BiostEpi 537 1042 Example Stanford Heart Transplant Data VV gen transplant o Prepare K84 P almost version of data for STATA infile id bmonth bday byear accmonth accday accyear txmonth thay txyear lastmonth lastday lastyear status prior matchl match2 match3 reject using heartraw 103 observations read txyear O replace byear byear 1900 replace accyear replace txyear replace lastyear accyear 1900 txyear 1900 lastyear 1900 gen bdays mdy bmonth bday byear gen gen gen gen gen gen mdy accmonth accday accyear thays mdy txmonth thay txyear fudays mdy lastmonth lastday lastyear survtime fudays accdays 1 waittime transplant thays accdays 1 TimAccaccdaysmdy101196736525 accdays replace waittime45 if id38 replace waittimesurvtime if transplant stset survtime failurestatus idid BiostEpi 537 1043 Example Stanford Heart Transplant Data stsplit post at1 afterwaittime 69 observations episodes created recode post O005O 0511 list id waittime survtime transplant status tO t d post id waittime survtime transpquott status tO t d post 1 I 1 5O 50 O 1 0 5O 1 0 I 2 I 2 6 6 O 1 O 6 1 0 I 3 I 3 1 11 1 O 11 O 0 I 4 I 3 1 16 1 1 11 16 1 1 I 5 I 4 36 361 1 O 361 0 0 I 6 I 4 36 39 1 1 36 1 39 1 1 I 7 I 5 18 18 O 1 O 18 1 0 I 8 I 6 3 3 O 1 O 3 1 0 I 9 I 7 51 511 1 O 511 0 0 I 10 I 7 51 675 1 1 51 1 675 1 1 I 11 I 8 4O 40 O 1 O 40 1 0 I 12 I 9 85 85 O 1 O 85 1 0 I 13 I 10 12 121 1 O 121 0 0 I 14 I 10 12 58 1 1 12 1 58 1 1 I 15 I 11 26 261 1 O 261 0 0 I BiostEpi 537 1044 Example Stanford Heart Transplant Data 0 Define time dependent covariate x t 1 transplant before 75 E 0 no transplant before 75 0 Consider models of form AtXt A003 eXDL BEOiWo 231 23500 zv where z are fixed covariates o Z1 2 age at acceptance 48 o 22 time of acceptance years since 10 October 1977 0 z3 prior surgery 1yes O no before acceptance BiostEpi 537 1045 Example Stanford Heart Transplant Data 0 Naive analysis ignore waiting time to transplant stcox transplant nohr No of subjects 103 Number of obs 172 No of failures 75 Time at risk 31954 LR chi21 2573 Prob gt chi2 00000 Log likelihood 28546262 t Coef Std Err z Pgtz 95 Conf Interval transplant 1318344 244019 540 0000 1796612 8400751 0 Problem 4 on midterm D Immortal time bias in transplant group bias T D Selection on early death in non transplant group bias i BiostEpi 537 1046 Example Stanford Heart Transplant Data sts graph na bytransplant titlequotCumulative Death Rate by Transplant Status gt Fixedquot XtitlequotTime since acceptance daysquot plot1lpsolid lwmedthick gt plot2 1p dash 1w medthi ck Cumulative Death Rate by Transplant Status Fixed r I U 3 l a IN 3 J x a o I 77 If ll 1 J ill l39 I i 18 r 1 I J gt quot394 a 39ll r39 If I 4 a l l f 8 b cl 0 5m 1030 15m 2000 Time since acceptance day5 8 tranqalant0 39 7 tlamplant1 0 Same problem time of transplant ignored 1047 BiostEpi 537 Example Stanford Heart Transplant Data 0 Use time dependent transplant status stcox post nohr No of subjects 103 Number of obs 172 No of failures 75 Time at risk 31954 LR chi21 018 Prob gt chi2 06751 Log likelihood 29823775 t Coef Std Err z Pgtz 95 Conf Interval post 1256669 3010765 042 0676 4644323 7157661 0 Now absolutely no indication of benefit from transplant BiostEpi 537 1048 Example Stanford Heart Transplant Data sts graph na bypost titlequotCumulative Death Rate by Transplant Statusquot gt XtitlequotTime since acceptance daysquot plot1lpsolid lwmedthick gt plot2lpdash lwmedthick Cumulative Dea 39 Rate by Transplant Status 560 who Time Since acceptance day s aa pastPre if P05iP 39quot a Time dependent stratification D Some transplanted immediately but risk sets small BiostEpi 537 1049 Example Stanford Heart Transplant Data 0 Try some covariates gen age48 AgeAcc48 Xi stcox ipost age48 nohr t Coef Std Err z Pgtz 95 Conf Interval Ipost1 I 0054986 3120163 002 0986 6170394 6060422 age48 I 0307364 0145003 212 0034 0023164 0591565 Xi stcox ipost age48 TimAcc nohr t Coef Std Err z Pgtz 95 Conf Interval Ipost1 I 0321918 3173907 010 0919 6542662 5898826 age48 I 0267991 0140978 190 0057 0008321 0544303 TimAcc I 1783414 0704397 253 0011 3164006 O402822 0 Risk of death increases with age at acceptance 0 Risk decreases with passing time D Healthier patients being admitted D Improvements in patient management BiostEpi 537 1050 Example Stanford Heart Transplant Data Xi stcox ipostage48 nohr t Coef Std Err z Pgtz 95 Conf Interval Ipost1 I 0744811 3213228 023 0817 5552999 7042621 age48 I 011865 0183251 065 0517 0240516 0477816 IPOSXage41 I 041295 0283479 146 0145 0142658 0968558 test Ipost1 IposXage chi2 2 212 Prob gt chi2 03461 Xi stcox ipostage48 TimAcc ipostpriornohr t Coef Std Err z Pgtz 95 Conf Interval Ipost1 0771957 3316176 023 0816 5727628 7271542 age48 0149866 0176007 085 0395 0195102 0494834 IposXage4quot1 0269781 0271197 099 0320 0261756 0801318 TimAcc 1363152 0709655 192 0055 275405 0027746 prior 4191803 6156507 068 0496 1625833 7874728 IposXprioquot1 298129 7580001 039 0694 1783782 1187524 o Suggestion that transplant benefits younger patients NS 0 Compare K84 P Table 61 BiostEpi 537 1051 Summary Time Dependent Covariates 0 Non PH models can be fit using interactions between fixed covariates and specific functions of time 0 Models incorporating covariates that themselves depend on time can be fit D The records for subjects in each risk set eg at time ti need to contain the covariate values 051 at that time D Facilitated by splitting each subject s record into tiny pieces between adjacent failure times stsplit at failures D Can also split records after occurrence of designated events Transplant Treatment with estrogen gtllt Use stsplit ltnewvargt at0 afterltvarnamegt BiostEpi 537 1052 Adjustment Variables Change with Time Example Animal Carcinogenesis Experiment Scribner et al 1983 Cancer Research 43 2034 41 Initiation DMBA Promotion BrMBA 6 groups DMBA yesno gtlt BrMBA 103090 nM Biweekly skin painting D time time weeks since beginning of promotion D event 2 incidence of first skin carcinoma D adjustment number of papillomas at time 75 Question Do DMBA and BrMBA each influence the risk of carcinoma adjusted for each other and the number of papillomas that have occurred a Hypothesis DMBA no BrMBA yes BiostEpi 537 1053 Example Mouse Skin Painting Experiment Total Numbers of Papillomas and Carcinomas by RX Group It at wwmwmmmmmmwmmv mmmmmmwnhn i mmmmmmhnlwmmmwmmnnmmmdwmaymwmummmm mmmmlmmmAnlmJtMmmAIlwmmwmmm w BlostEpl 537 10 54 Example Mouse Skin Painting Experiment Percent of Mice with Papillomas and Carcinomas by Rgtlt Group A n a w gt e39vo 7 nu m gnaw a39 w 1 g I i w quot quot39 E 1 mm r y E nun ll 39 nquot V up n 4 mm onstpr 537 10 55 Example Mouse Skin Painting Experiment Define O IZPD 2 71M Of BFMBA 1 DMBA m1 0 none 0 xpt number papillomas observed by time 75 Model Milwpo 1 33Pt A005 exp 5PPt B1331 Ppro BiostEpi 537 1056 Example Mouse Skin Painting Experiment Results variable 6 stnd err HR 95 CI Initiator 0138 0284 1148 06582002 Promoter dose 0016 0003 1016 10091023 Num papillomas 0135 0018 1135 11061185 Reduced model variable 6 stnd err HR 95 CI Initiator Promoter dose 0015 0003 1016 10091022 Num papillomas 0139 0015 1149 11151184 BiostEpi 537 1057 Example Mouse Skin Painting Experiment Interpretation 0 Given number of papillomas developed by time t initiator has no influence on the incidence of carcinoma D Effect of initiator is mediated through the development of papillomas o For a given number of papillomas developed by time t a mouse in the 30nM BrMBA group has a risk that is 101630 10 137 times as high as a mouse in the 1071M BrMBA group 0 In addition the relative hazard comparing 90hM to BONM is RR9030 101690 3O 261 BiostEpi 537 1058 Time Dependent AdjustmentStratification o In example number of papillomas was adjusted using a linear term Alternatives would be to create categories based on 33Pt Category 0 1 2 3 4 Count 0 14 5 9 10 14 15 Define indicators x t 1 if xpt categoryj P79 0 otherwise BiostEbi 537 1059 Time Dependent AdjustmentStratification Model using indicators dummy variables A005 exp 51 5PDPD 51P1t 54P4t Interpretation 0 Among animals in the same papilloma category at time 75 who received same dose xpD of promoter RR associated with initiation is RRzexpr c As xpt increases and animal enters a new papilloma category the log hazard jumps up or down but remains parallel to baseline log hazard D See figure next page BiostEpi 537 1060 Time Dependent AdjustmentStratification O 20 4O 60 80 Time since start of promotion weeks BIOStEpl 537 10 61 Time Dependent AdjustmentStratification Model using true stratification While xpt 6 category 939 Aojtexio 1x1 prPD Interpretation Among animals in the same papilloma category at time 75 who received same dose xpD of promoter RR associated with initiation is A0005 exp 51 1 PDmpo A0005 exp 51 0 PDmpo RR exp 1 c As xpt increases and animal enters a new papilloma cate gory the log hazard changes to a new curve corresponding to the baseline hazard for the current category of number of papillomas BiostEpi 537 1062 Time Dependent AdjustmentStratification Time since start of promotion weeks BIOStEpl 537 10 63 Summary Time Dependent Adjustment 0 Example illustrates a number of possible ways that a time dependent covariate can be used for adjustment D Linear adjustment D Using categories and indicators dummy variables D Using categories and true stratification 0 To fit these models the data needs to be structured with multiple records per subject generated according to the times that the adjustment variable changes BiostEpi 537 1064 Classifying TimeDependent Covariates External time dependent covariate 0 Not dependent on a subject s survival for value at any time can obtain 513t whether or not subject has died before 75 0 K amp P 63 give condition PrXtXuT 2 u PrXtXuT u for t gt u fixed non time varying covariates dose regimen if controlled externally amount of pollution or radiation in an area number of years since birth beginning of exposure etc VVVV BiostEpi 537 1065 Classifying TimeDependent Covariates Internal time dependent covariate 0 Can only be measured or defined when subject is alive 0 Often are random variables that may be considered as outcomes for repeated measures or longitudinal data analysis D white blood count at time t D systolic blood pressure at time t D spread of cancer at time t IMPORTANT WARNING Internal time varying covariates are particularly susceptible to being inappropriately controlled They often lie in the causal pathway about which you want to make inferences BiostEpi 537 1066 Classifying TimeDependent Covariates Example Clinical trial for the effect of immunotherapy in treatment of metastatic colorectal carcinoma Q Adjust for most recent WBC waCOt u would make treatment comparison among subjects with like prognosis at each time a but immunotherapy might improve prognosis by improving depressed WBC s over time o adjustment for WBC over time might remove the apparent effect of treatment since treated and control subjects with the same WBC might have similar prognosis BiostEbi 537 1067 Classifying TimeDependent Covariates Example Industrial cohort study of effect of smoking on risk of mesothelioma among asbestos exposed workers Q Adjust for chronic cough in past year o would make comparison 0 but chronic cough o adjustment for chronic cough could Conclusion TDC s are a powerful tool but be careful BiostEpi 537 1068 BIOSTEPI 537 Survival Data Analysis in Epidmiology Norman Breslow Ninth Lecture Cox Regression IV 19 and 21 February 2008 Assessing Model Fit o What are key assumptions desiderate of Cox PH model Independence of observations 0 Often difficult to test unless a priori clustering Covariates have correct functional form included in model when needed Proportionality regression coefficients 3 do not vary with time o Additive hazards regression models Mtlx Aot 223 Mtlx Aot expx available as alternatives Influence individual observations do not overly influence estimates of regression coefficients 8 BiostEbi 537 91 Assessing Model Fit Outliers individual failure rates are well predicted by model no gross departures Link relative risk function has correct functional form 0 Additive relative risk function RR13 1 I Ba available as alternative to standard exponential Goodness of fit Failure rates for different regions of covariate gtlt time space are well predicted by model BiostEpi 537 92 Checking for Proportionality Graphical approaches 0 log tX og ogStX plots D Stphplot in STATA 0 Observed and fitted StX D stcoka in STATA Confirmatory approaches 0 Test PH assumption stphtest in STATA Correct model to accomodate non proportionality 0 Split data at failures add covariate gtlt time interactions 0 Split data into intervals separate B s each interval 0 Both use stsplit time dependent covariates BiostEpi 537 93 IogIog Plots Recall under PH assumption 8th SotexpX IogStX exDX ogSot Iog IogStIX X Iog IogSoti Implies that separation between og og pots should be con stant over time or log time Iog IogStX 1i Iog IogStX O o Indications of lack of parallelism gt Crossing over in the middle D Convergence or divergence 0 But remember variation due to use of KM estimates BiostEpi 537 94 Example CCG803 Study of Childhood ALL 0 We did this already p 752 of notes sts graph cumhaz bywbcg yscalelog Nelson Aalen cumulative hazard estimates 050 1mm 1 392 2039s 400 mm 36 who analysisiime vrbcg 1 wbcag 2 wbcg El but it looks better if we use log scale for time also BiostEpi 537 95 Example CCG803 Study of Childhood ALL sts graph cumhaz bywbcg yscalelog xscalelog Nelson Aalen cumulative hazard estimates a ai rI39 r J Q i a J r J d fl r J l J39H f Iquot 1 r 4 37 l7 1 r39 m 400 i360 3601600 analysistirna meg 1 7 7 mag 2 wbcg3 which is essentially what STATA s phplot command gives BiostEpi 537 96 Example CCG803 Study of Childhood ALL stphplot bywbcg g n ln Survival Prdjdallltyi n 2 4 Fquot I r l39 f 2 I a 7 m 4 5 na nalysis time u wheg1 I nbcg2 7F hbcg3 except it plots the negative of the log cumulative hazard BiostEpi 537 97 Example CCG803 Study of Childhood ALL I Try grouping on linear predictors in basic model stcox new age agesq rx nohr schoenrelmscm scaledscmscaw a 7 predict score recode score min5o stphplot byCSG 511 1152 15max3 geDCSG AlanlSuNNal vamihml n 2 a I Curves seem convergent rather than parallel I Learn later about Schoenfeld residuals BIOStEpl 537 Fitted Survival VS Kaplan Meier PIOtS 0 Fit semiparametric Cox model with categorical predictor D yields fitted survival curve for each level D constrained to be powers of one another 0 Compare with nonparametric KM estimator for each level D no constraints 0 STATA allows adustment of fitted model for other covariates with both Stphplot and stcoka D but not recommended since rationale for comparison with unadjusted KM or cumulative hazard curves not at all clear BiostEpi 537 99 Fitted Survival VS Kaplan Meier PIOtS stcoka bywbcg predloptssi pred2optssi pred30ptssi a q a D 5 ED q EU a 3 r 2 3 man In a 3 III 0 l p I I 0 200 400 I330 300 1003 analysistima i Ghaewedzwbcg1 i Uhsawedmcg239 74 Dbsarvedrnbog3 Predicted htmg1 Predicted wng 2 Predicted whcg3 BiostEpi 537 910 Fitted Survival VS Kaplan Meier PIOtS stcoka bySG predloptssi pred2optssi gt pred30ptssi pred4optssi o O a an gun E 33 9n n 93 Eu 2 EDD 13 a S a l l l l U 230 400 I300 300 mm analysistime 39I Cbsewadzsi 0 e Observed SE 1 i Cbaswed 2 f Obaewed SG 3 FradbtadSG0 Fredbieclt a1 FradichadSG2 Predicted2933 o Compare with envelope plot on p 745 of notes BiostEpi 537 911 Issues with stphplot and stcoka Analyses How parallel is parallel o How close is observed to expected D both are subjective evaluations How categorize continuous covariates 0 Use of adjusted vs unadjusted for other covariates tX 0 Need for test procedures designed to detect specific types of departure from proportional hazards BiostEpi 537 912 Schoenfeld Residuals 0 Let 251 lt lt t lt mt denote the unique event times D 72 is risk set with m at risk D Dz is set of dz failures usually only one 0 Under standard Cox model probability that any particular member j of 72 fails at ti given that one does is eXDXjtz pl 3 5quot 2 2m eXDXetz 5 D Note use of time dependent covariate values later BiostEbi 537 913 Schoenfeld Residuals o Think of subjects j being sampled at random from 73 with probabilities pj ti D those with higher RR more likely to fail 0 Define weighted average of covariate values for randomly sampled subject distribution mean as atz Z etz pe5atz EERZ39 o The corresponding covariance matrix is Vw 1 Z etz 2pe5a 152 15082 EERZ39 where 292 denotes the outer product of two K dimensional vectors K number of covariates llx 2llkkl xkxk BiostEpi 537 914 Schoenfeld Residuals o The Schoenfeld residual for any subject j e Dz is the difference between the covariate for that subject and the weighted average of covariates in the risk set namely 7025 atz observed expected under PH model D Summed over all failures gives partial likelihood score equations D 8 determined by comparison of covariates of subjects who fail with weighted average of those in risk sets BiostEbi 537 915 Schoenfeld Residuals o Schoenfeld residual at ti defined as sum of Schoenfeld residuals among all those who fail at ti rm 2 Wm mm Z lt27 pjr3til x lxjrti mm jEDz jERz 0 Terms in sum on right hand side contribute to efficient scores D components of influence function later 0 Provided the PH model holds and B is the true regression coefficient the 73 are uncorrelated and have mean zero 0 In practice Schoenfeld residuals calculated as 7 2 738 hence 2147i 2 O B is partial likelihood estimate BiostEpi 537 916 Schoenfeld Residuals o Variance of B is inverse of 2111 V t o Scaled Schoenfeld residuals are residuals after multiplication by the inverse of corresponding weighted covariance matrix at rm V1Bt r Their average value over failure times may be substituted for some or all of the V t in calculation of scaled residuals for greater statistical stability 0 In practice substitute estimate 8 for B BiostEpi 537 917 Schoenfeld Residuals in STATA 0 Option schoenfeldnewval s gives Schoenfeld residuals 0 Option scaledschnewnarss gives scaled Schoenfeld resid uals D Needed if subsequent tests of proportionality are planned stcox 110wbc age agesq rx nohr schr scas summarize r s Varl Obs r1 181 r2 181 1 I 181 S2 I 181 aoijpiSBY Mean 613e17 981e18 7212221 1888412 Std Dev 5884073 3704611 1672415 1174902 Min 117775 5192175 2608564 3225816 Max 1594189 1012566 5803611 1754258 Schoenfeld Residuals in STATA o Residuals only defined for failures list id indiC dur rX r4 34 if id lt 20 id indiC 6 11 10 1 14 16 7 19 9 OHHOHHHHH aoijpiSBY dur 84 153 168 200 224 224 266 280 847 IX 0 1 0 1 0 1 1 1 0 r4 50477849 47919248 50980409 48736353 52687509 44645841 43227937 34 22504986 19657295 25363792 1586813 27271219 11741623 15639323 Schoenfeld Residuals and Tests Of PH When the 77 r B are plotted against any transform 902 of time tk the smooth curve through the plotted points approximates the manner in which the associated regression coeffients depend on time o If a specific covariate has a time varying coefficient Mt B 790 where 905 is a specified function of time t such as D 90 2 t or 905 ogt then the approximate expectation of the scaled Schoenfeld residual at time ti is Grambsch and Therneau Biometrika 1994 EW Ert V9050 BiostEpi 537 Schoenfeld Residuals and Tests Of PH o This suggests plots of 72 against 902 D Slope of linear regression gives numerator of score statistic for testing HO 7 O proportionality D Lack of trend slope near 0 gt PH assumption OK D Implemented in stata s Stphtest command D Much more efficient than expanding dataset using stsplit and testing for interactions of covariates with functions of time using LR or Wald tests coming BiostEpi 537 921 Schoenfeld Residuals and Tests Of PH 0 Interpretation of plots of 72 against 902 D If increasing then failures are occuring more often than expected among subjects with high values of covariate at later follow up times gt hazard ratio increasing over time D Ifdecreasing then failures are occuring more often than expected among subjects with low values of covariate at later follow up times gt hazard ratio decreasing over time BiostEpi 537 922 Example CCG803 Study of Childhood ALL 0 Test of PH for model previously fit to CCG803 data stphtest detail Time Time rho Chi2 df Probgtchi2 110WbC I 010270 184 1 01752 age 002896 017 1 O 6774 agesq 0 03375 022 1 06386 n O 04633 039 1 O 5349 global test 286 4 05815 0 No strong evidence for lack of proportionality using 75 itself 0 Plot scaled Schoenfeld residuals for oglO WBC vs time next page BiostEpi 537 923 Example CCG803 Study of Childhood ALL ksm sl dur lowess Lewess smdather scaled Echaenfald 4 1ow BiostEpi 537 924 Example CCG803 Study of Childhood ALL o Try again with log of time stphtest log detail Time Logt rho 110Wbc I 0 17245 age 007089 agesq 0 07226 rX 007703 global test o Some evidence for 0 Look at plot of Schoenfeld residuals against log time gen lndur logdur ksm sl lndur lowess BiostEpi 537 chi2 df 518 1 104 1 101 1 106 1 864 4 non proportionality log 10WBC Probgtchi2 00228 03084 03147 03021 00708 Example CCG803 Study of Childhood ALL Lowass smunthEF rl u E 52 T E E I m E u U 1 E BiostEpi 537 926 Summary Checking the PH Assumption o log log tX plots for categorical X 0 Compare Kaplan Meier curves forAdifferent values of X to Fitted Survival curves OteXDX5 0 PH testing based on Schoenfeld residuals o Plots of scaled Schoenfeld residuals against 75 or logt shows hazard ratio as a function of time and hints at form of Mt time dependent log HR associated with given covariate 0 Estimate coefficients of covariategtlttime interaction terms in Cox model using time dependent covariates coming BiostEpi 537 927 General Measures Of Lack Of Fit Logistic Regression Cox Regression Group obser vations with similar covariate values Individual observations Summary measures Tests Table observed and expected numbers of cases also used with Cox regression Plot Pearson residuals or Deviance residuals Compute Deviance Pearson X2 Deviance 2p Deviance grouped data Tsiatis 1980 Biometrika Plot observed and expected survival or hazard functions Plot Martingale residuals or Deviance residuals Compute Deviance Sum of squared MG residuals Deviance 2p Schoenfeld 1980 Anderson 1982 Aranda Ordaz 1993 all Biometrics BiostEpi 537 928 Martingale Residuals o Residuals in ordinary linear regression model y x8 haveform y Qy Bx5 D When plotted against residuals of omitted covariates show need for their inclusion in model D When plotted against included covariates show need for transformation to different functional form if curvature o No obvious analog t ffor survival data due to censorship of survival times BiostEpi 537 929 Martingale Residuals o Martingale residuals for survival data ti6ixi are more like residuals for binary logistic regression For model Mt AoteXD511 KwK t Otgt8d8 martingale residuals are Mr 51 A0tiexp811i W BKsz D Mi sum to zero over observations D Expectation approximately zero if model holds D Highly skewed distribution on interval 00 1 D When plotted against covariate 13 reveal functional form of covariate in Cox model BiostEpi 537 930 Deviance Residuals dz sign Mi 2Mi6ilog6imi sign 6i Mani mi Mm 6i Iogmtmi where N05 70tegtlt0511imIBKmKi 0 Estimate contribution of W observation to the deviance D 21d 2Iog Iikeihoodsaturated og ikeihood8 0 DO NOT sum t0 ZGI O BiostEpi 537 931 Deviance Residuals 0 Have less skewed distribution than Martingale residuals 0 provide evidence as to the accuracy of the model in predicting the failure rate of each case Fleming and Harrington 1991 o Are useful for detecting outliers o Deviance 2p provides analog of Akaike Information for assessing predictive capacity of model D Penalize number of parameters to reduce variability of prediction BiostEpi 537 932 Detecting Specific Departures From Model o Omitted or incorrectly specified covariates o Incorrectly specified RR function link D Have already considered one manifestation failure of proportional hazards assumption o Incorrectly specified distribution variance BiostEpi 537 933 Detecting SDGCifiC Departures From Model Covariates RR function Incorrect distribution vanance BiostEpi 537 Logistic Regression Cox Regression Plot resid ua ls vs covariates Test addition of transformed covariate Plot residuals vs fitted linear predictor Test departure from hypothesized RR in more general RR model Tests for overdispersion Plot residuals vs covariates Test addition of transformed covariate Plot residuals vs fitted linear predictor Test departure from hypothesized RR in more general RR model No obvious analog due to nonparametric component t 934 Residuals for CCG803 Study 0 Fit model without og WBC calculate martingale residuals use ccg803 stcox age agesq rX nohr mgalemg No of subjects 268 Log likelihood 89255656 No of failures 181 chi23 1657 Time at risk 106875 Prob gt chi2 00009 indic Coef Std Err z Pgtz age I 2691182 0801622 3357 0001 agesq I 0201308 0055019 3659 0000 rX I 2982786 1505894 1981 0048 0 Calculate deviance residuals by survival status using formula page 931 of notes gen dev signmgsqrt2mgdlogdmg gen dev0 dev if d0 gen dev1 dev if d gen mg0 mg if d0 gen mg1 mg if d BiostEpi 537 935 Residuals for CCG803 Study o Martingale residuals sum to zero deviance residuals don t summarize mg dev Var Obs Mean Std Dev Min Max mg I 268 353e17 8224 2528 992 dev 267 1044454 12140 2 248 2785 BiostEpi 537 936 Residuals for CCG803 Study twoway scatter mgO mgl id and some other stuff Index Plot of Martingale Residuals itquots1quot 3 1 a gquot I I l I n I 39i D i i I r39 I I I Ii I I 39 ili I f lIil t 39 ti 7 I uif I I t I i I I I 39 I c HI quot 39 o 3 39 n I II g I a II II 39 a W 39 0 1m 20 303 403 id a Values for cases bunched together near 1 BiostEpi 537 937

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.