SURVIVABLTY&LIFE TESTING STA 635
Popular in Course
Popular in Statistics
Helga Torp Sr.
verified elite notetaker
This 56 page Class Notes was uploaded by Helga Torp Sr. on Friday October 23, 2015. The Class Notes belongs to STA 635 at University of Kentucky taught by Staff in Fall. Since its upload, it has received 39 views. For similar materials see /class/228274/sta-635-university-of-kentucky in Statistics at University of Kentucky.
Reviews for SURVIVABLTY&LIFE TESTING
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/23/15
Summary Notes for Survival Analysis Instructor Mei Cheng Wang Department of Biostatistics Johns Hopkins University Spring 2006 1 Introduction 11 Introduction De nition A failure time survival time lifetime T is a nonnegative valued random vari able For most of the applications the value of T is the time from a certain event to a failure event For example a in a clinical trial time from start of treatment to a failure event b time from birth to death age at death c to study an infectious disease time from onset of infection to onset of disease d to study a genetic disease time from birth to onset of a disease onset age 12 De nitions De nition Cumulative distribution function Ft PrT t De nition Survial function St St PrT gtt 17 PrT S t Characteristics of St a St1iftlt0 b Soo limtH00 St 0 c St is non increasing in t In general the survival function St provides useful summary information such as the me dian survival time t year survival rate etc De nition Density function ft a If T is a discrete random variable ft PrT t b If T is absolutely continuous the density function is i Pr Failure occurring in t t At t 7 Algiln At 7 Rate of occurrence of failure at t Note that dFlttgt dSlttgt t 7 7 De nition Hazard function At a If T is discrete Mt PT tlT 2 t E Note that At 0 ift is not a mass point77 of T If T takes values at the mass points 1 lt2 lt3 When lttltxj1 t PT 2 x1 39 PT 2 952 quot39 PT 2 95 1 A951391 M952 1 MW b If T is absolutely continuous i i i gt Mt hm PrFailure occurring in tt AtlT 7 t AtaOt At lnstantaneous failure rate at t given survival up to t Here AtAt the proportion of individuals experiencing failure in tt At to those surviving up to t example a Constant hazard Mt A0 b Increasing hazard Mtg 2 MM if t2 2 t1 c Decreasing hazard Mtg 3 MM if t2 2 t1 d U shape hazard human mortality for age at death Remark Modeling the hazard function is one way for parametric modeling De nition Cumulative hazard function chfAt a If T is discrete let zs be the mass points Alttgt g m b If T is absolutely continuous 13 Relationship Among Functions a If T is discrete PltTtgt W W PT2t sltrgt b If T is absolutely continuous St PrT gt t PrT 2 t PT e um AMT 2 t Mt 135 At hm PT ttAtSt Ata0 i 1 PT ttAt 4113 At 7 St A well known relationship among the density hazard and survival functions is w g3 Also At OtuduOtdu t 7 1357 7 t 0 duililog wmo 7109510 7 7 logS0l 7 logsm Thus St 67AM 6 Md We now see that is determined if and only if or is determined and vice versa When T is a continuous variable we also have 5 Audu 00 This formula is implied by 0 Soo 5 face quotuldut Example Mt A0 a positive constant7 is a valid hazard function Example Mt A0 Alt with A0 A1 gt 07 is a valid hazard function Example Mt 6 6 gt 07 is NOT a valid hazard function Remark ln applications7 if a disease has cure7 that is7 we assume PT 00 gt 07 then it is OK that Aoo lt 00 This is allowed since T is not a regular random variable 14 Censoring Type l Censoring Type l censoring occurs when a failure time ti exceeds a pre determined censoring time Ci The censoring time 01 is considered as a constant in the study For example7 a clinical treatment study starts at the calendar time a and ends at b Patients could enter the study at different calendar times The failure time is the time between the start of treatment entry to a certain event Assume no loss to follow up In this case7 cl is the time from entry to b The actual alure time ti cannot be observed if ti gt Ci Type ll Censoring This type of censoring is frequently encountered in industrial appli cations Erom 71 ordered failure times7 only the rst rr S n times are observed7 others are censored For example7 put 100 transistors on test at the same time and stop the experiment when 50 transistors burn out In this example7 n 100 and r 50 Let t1t2 t50 be the rst 50 failure times Note that t50 is an estimate of the median failure time Random censoring This type of censoring will be the main censoring mechanism that we deal with in this course It occurs when the censoring time varies from individual to individual and is unknown in advance For example7 in a follow up study7 the censoring occurs due to the end of the study7 loss to follow up7 or early withdrawals Reasons for censoring 7 patients decide to move to another hospital 7 patients quit treatment because of side effects of a drug 7 failues occur after the end of study 7 etc Theoretical setting Suppose C is the censoring variable Assume T and C are indepen dent the so called independent censoring De ne Yi T if T30 o if TgtO and the censoring indciators A 7 1 if data is uncensored7 T S C T 0 if data is censored7 T gt 0 Assume Y17A1Y2A2 YnAn are iid copies of KA Under random censor ing7 what is the actually observed data ldeally7 we would like to observe the complete data77 t1 t2 7tn Due to censoring7 we only observe right censored data77 91751 y2 62 yn7 6n and possibly some covariate information Example A set of observed survival data is yl 25 18 17 22 27 6139 1 0 1 0 1 The data can also be presented as 25 184r 17 24F 27 15 Probability Properties lntuitively7 the random variable Y tends to be shorter7 than the failure time of interest7 T This is clear upon observing Y minT7 0 Under the assumption that T and C are 7 independent7 the survival function of Y is SM PT gty70 gt9 PT gtyPC gt y STySCy S STy39 Thus7 as compared with ST7 SY assigns more probability to smaller values as compared with S Example Suppose the censoring time is a xed constant7 C co co gt 0 Then the survival function of Y is SY STy if y lt co and SY 0 if y 2 00 0 Example Suppose T N Expt97 6 gt 07 and C N Unif0 7 B gt 0 Then the survival function of Y is 1 if y 0 SY fey if 0 lt y lt B 0 if 126 ltgt Hazard function is an important function for various reasons and the so called risk set7 plays a key role for exploring probability structure of the hazard function The risk set at t is de ned as Rtyj y721t7j12n7 tgt0 Property FortZyPTgttlT2yPTgttlY21 Proof For t 2 y PTgttlY2y Implication of this property The distribution among observed survivors at y is the same as the distribution in risk population at y Also the hazard probability on uncensored Y at y from Ry is the same as the hazard probability of T at y PYy7A1YZy PCgtTyYZy The above fomula can be equivalently expressed by My Y 2 my ydy or more directly My Y 2 y My where the subscript u7 represents uncensored7 Formula in is the base for the use of risk sets in many nonparametric and semiparametric models when analyzing survival data Left censoring The failure time It could be too small to be observed For example consider a study in which interest centers on the time to recurrence of a particular cancer following surgical removal of the primary tumor A few months after the operation the patients are examined to determine if the cancer has recurred Let T time from operation to the recurrence of cancer Some of the patients at this time may be found to have a recurrence and thus the actual time is less than the time from operation to the examination These cases are said to be left censored 16 Interval Censoring and Truncation Interval censoring The failure time ti falls in an interval ii77 and observe only An For example7 let T time from treatment onset to disease onset The onset of disease falls in the interval formed by two successive clinical visits Let Z time from the treatment onset to the last visit when the ith patient is free of the disease n time from the treatment onset to the rst visit when the ith patient becomes diseased The best knowledge we have about the true failure time T1 ti is Z lt ti 3 77 Right truncation The failure time ti is too large to be included in data A well known example is the reported AIDS incidences In this example7 T time from HIV infection to diagnosis of AIDS An AIDS incidence is reported to a health institution only when AIDS develops Those cases where AIDS occur after the closing date of data collection are excluded from the data set Left truncation and right censoring The presence of left truncation is usually due to the prevalent sampling scheme7 that is7 drawing samples from a disease prevalent population Right censoring is encountered for the usual reasons loss to follow up etc Example Failure time T time from the onset or diagnosis of breast cancer to death A prevalent cohort includes a group of women who have developed breast cancer at the time of recruitment Those with breast cancer who died before the recruiting time are excluded 10 from the study The study tends to recruit women with longer failure times Double truncation The failure time ti is included in the data set only ifthe failure event occurs in a calendar time window For example7 T onset age of a certain disease and the data are observed only if the disease occurs in the calendar time window 17 If double truncation is adopted as the sampling scheme7 those cases that the disease occurs before a or after b will not be included into the data set 17 Correlated Survival Data Univariate survival data refer to independent7 possibly censored failure times The statistical analysis for clusered or strati ed failure time data is called multivariate survival analysis Bivariate failure times Observe y117y12y21y227 ynhyng with censoring indicators 5117 512 5217 522 7 1373 twin data 0 eyes data cf Cox and Cakes 1984 Clustered failure times 1111711127 vyim17 1121711227 792m27 7 y 17y 2739 711mm With censoring indicators 6117 6127 761m17 62176227 762mg7 7 6n17 6n27 76mm 0 sibling data 0 family data 0 clustered animal data litters Recurrent event data Observe t117t127 thUCl t217t227 7t2m27027 tn17tn27 tmmn on where til lt tig lt tZmi lt Ci Examples include repeated occurrences of hospitalizations or infections Statistical methods have been partially developed for data described above 18 Parametric models Parametric models assume the knowledge of the survival or density function up to K un known parameters ln this course7 K 1 or 2 Assume the failure time has the density function ftt97 where 0 0162 719K is the unknown vector of parameters Clearly7 the density and survival functions are completely speci ed if 0 is known Err ample Emponential distribution T N expt97 6 gt 0 The Exponential distribution with the parameter 6 gt 0 has the density function W 6f 12 for t gt 0 The survival function is SW 00 fu9du 00 geieudu 9 t t The hazard function is Mt S 0 a constant Ezample Weibull distribution The Weibull distribution with the parameters 6 gt 0 and B gt 0 assumes the parameterized survival function St 5W for t gt 0 The density function is W 76182750 sawmileew The hazard function is ta 3 Mt 6060 9 1 Note that the hazard function Mt is constant if B 17 increasing in t if gt 17 and decreas ingintif lt1 Ezample Gamma distiibutizm The Gamma distribution with the parameters A gt 0 and r gt 0 is a continuous distribution with the density function AT Til iAt it e 7 PO f t i for t 2 07 where P0 f5 zf le mdz The survival and hazard functions can be derived from the density function The mean of the Gamma distribution is rA and the variance is rAZ Ezample Log logistic distiibutizm The Log logistic distribution with the parameters 04 gt 0 and foo lt 6 lt 00 is a continuous distribution and has the hazard function eeata l A t 7 1 get The hazard function decreases monotonically if 0 lt 04 S 1 The hazard function has a single mode if 04 gt 1 The survival function is St 1 eetarl 13 and the density function is 9 0471 e at f t 1 e t It is called the log logistic distribution because logT has a logistic distribution a sym metric distribution with density function similar to the normal density function Example Log normal distribution A random variable T is said to have a lognormal distri bution with parameters foo lt M lt 00 and 039 gt 0 The probability density function of T is ft expwog t 7 WW for t 2 07 from which the survival and hazard functions can be derived The hazard functions for the gamma and lognormal distributions are less interpretable as compared with the hazard functions for the Weibull and log logistic distributions Thus7 the Weibull and log logistic distributions are more useful for parametric hazard modeling 19 Maximum Likelihood Estimation Suppose that we are able to observe complete failure times77 t1 t2 7tn ln general7 for a parametric model T N ftt97 the likelihood function on the basis of identically and independently distributed failure times 1th tn is V L L09 H fltti7039 i1 The maximum likelihood estimate mle7 is the 6 which maximizes the likelihood function L09 Now we consider the case when 0 6 is a real number Note tht logL6 flogfm i 1 M U0dilogL0 I i d 109 1239 ll The mle satis es 0 By Taylor7s expansion7 0 U09 U 6 7 9 an ignorable term 14 Thus 1 1 n d 6767WU67U7wlogf By statistical theory law of large number7 central limit theorem7 when n is large7 Ti 9 apps Normal6I 16 N9r19 6 Fisher information d2 E 7 log L6 Err ample T N exp6 The density function is ft 6 66 9t1t gt 0 log L6 illog i tl U6 logL6 E in i 7 Thus 6 7122 ti is the mle Note that the Fisher information is 6 E 772 log L6 7162 Thus a fox 9 7 9 N N 0 7 when n is large n A a fox 62 9 L Nlt67gt 77 Thus Problt6 7 196 lt 6 lt 6 196 m 95 An asymptotic 95 con dence interval for 6 is A A 6 67196 6196 gt Regression extension Let xi be a 1 gtltp vector of covariates and 6 a pgtlt1 vector of parameters for subject 239 Assume the hazard function is Mt 16 Assume T has the pdf zi6e 9ti Based on 1121 xn tn7 the maximum likelihood techniques can still be applied to the likelihood function Hmmfwm i1 15 2 One Sample Estimation 21 Complete Failure Times Nonparametric Models Recall St PT gt t Population fraction surviving beyond t The set of the complete data t1 t2 tn re ects the structure of population failure times Thus7 we estimate St by the sample fraction surviving beyond t 510 W i1ltm gt t i1 t is also called the empirical survival distribution How to derive con dence interval for S 7 0 De ne Bt gt t a Binomial variable Bt lgilnomialmu39o St Em np p slttgt Var t imam inpq slttgtlt1 e 50 When n is large7 t apgox Normal it7 A 95 con dence interval for St is 107196 M7 51 196 39 Remarks o If n is small 71 lt 207 it is more appropriate to nd con dence intervals using the binomial distribution tables see Mood7 Graybill and Boes7 Chapter 8 l6 o If n is large n 2 307 use the normal approximation to derive con dence intervals 0 The normal approximation works better when 0 ltlt St ltlt 1 that is7 St is not close to 0 or 1 When St is close to 0 or 17 the Poisson approximation technique is better 22 Right Censored Failure Times Parametric Models We consider only random censoring The observed data could be right censored 1751 275277yn75n Note that I i mina39 CI 7 t1 uncensored case yl T 17 l 7 cl censored case 1 uncensored case 6i 7 HM 7 ti 7 0 censored case where ti is the failure time and cl is the censoring time Assume T1 and 01 are independent ln this case7 the censoring process is said to be uninformative that is7 independent censoring Let St t9 prTl gt t7 Gc prCl gt c7 and let ft0 and 90 be the corresponding density functions The likelihood function on the basis of 1761 yn7 6 is a Wm 6gt5islty6gt15i gunlricunm zl l 1 or simply ltgtlt H interwar 9141 ltgt i1 Note that the validity of relies on the independence between the failure and censoring times If T1 and 01 are Lm independent7 we then have informative censoring since the value of 01 could have implication on the value of Ti 23 Right Censored Failure Times Nonparametric Models Without parametric assumption on the distribution of Ti how do we estimate the survival function St7 First consider a simple example Example A prospective study recruited 100 patients in January 1990 and recruited 1000 patients in January 1991 The study ended in January 1992 Survival time T time from 17 treatment enrollment to death Suppose 70 patients died in year 1 and 15 patients died in year 2 from the rst cohort recruited in 907 and 750 patients died in year 1 from the second cohort Note that T is a discrete failure time7 T 17 27 say7 T 2 means death during the 2nd year Assume the two cohorts are sampled from the same target population When censoring is considered random7 note that this assumption implicitly implies uniformative censoring why How to estimate 2 year survival rate 82 Approach 1 Reduced sample estimate Only use information from individuals who had been followed for at least two years That is7 use only group 1 data to derive A 100 7 70 7 15 15 82 7 7 100 100 This estimate is statistically appropriate but inef cient It is appropriate in the sense that 82 is very close to 82 when 711 is large It is inef cient because only part of the data is used Here 015 lt2gtlt17 32 100 var82 Approach 2 Statistically inappropriate approaches 7 Assume 250 individuals from group 2 died in year 27 A 15 82 7 0014 1100 7 Assume 250 individuals from group 2 remained alive in year 2 7 Exclude 250 patients from the analyzed data Watch out A common mistake 5 S 2 0018 11007250 Approach 3 A simple case of the Kaplan Meier estimate Decompose the survival function into conditional probabilities i i PTltT 2 2 PTltT 2 3 S PTgt2 PrT21 PrT22 PrT 2 2T 21PrT 2 3T 2 2 30 250 280 P7 2 2lT gt1 1100 1100 A 15 PrT 2 3T gt 2 7 30 Thus 280 15 52 7 7 0127 1100 30 This estimator is more ef cient than the reduced sample estimate Now consider the Kaplan Meier estimator in its general form Kaplan Meier Estimator The Kaplan Meier estimator 19587 JASA is a nonparametric estimator for the survival function S Consider now either random censoring or type l censoring Assume uninforma tive censoring That is7 assume that T is independent of C for each 239 The data are 01751 112752 7 ym n Let ya lt 112 lt lt wk k S 717 be the distinct7 uncensored and ordered failure times Eat ample Data 32071535 0170270s704 0717375 Suppose yi1 S t lt ya A principle of nonparametric estimation of S is to assign positive probability Lo and only to uncensored failure times Therefore7 we try to estimate S N PTT 2 02 PTT 2 03 PTT 2 00 PTT 2 01 PTT 2 02 m PTT Z 0071 19 How to estimate St7 De ne Rm yk yk 2 dm of failures at gm N0 of individuals at risk at gm Rm Example Using the previous example 3 24r 0 1 SJr 3 5 M1 77 M2 67 Nltsgt 47 1W42 d1 1 612 1 dlt3gt 2 d4 1 Ta741 NOVdU 39 7 r 7 Now estimate PMTZW by NU j 7 12z 1 The Kaplan Meier estimate A 611 lt 612 lt dam S t 17 7 17 7 17 7 lt M1 M2 Nail d H W9 is thus Example 3 2 01 5 3 5 uncensored times Remark In general ifthe largest observed time is uncensored the Kaplan Meier estimate will reach the value 0 as t 2 the largest observed time if the largest observed time is censored 20 the Kaplan Meier estimate will not go down to 0 and is unreliable for t gt largest yi In this case we say that St is undetermined for t gt the largest uncensored time Greenwood7s formula The next question is how to identify the variance of the Kaplan Meier estimate The idea is sketched for grouped data First group the data using the uncensored times ya lt y2 lt lt wk For each risk set PW y 2 117 counting the number of failures is a binomial experiment Thus d0 Binomial Njj where A0 is the hazard at 117 Let 1739 1 7 Ag F01 y1 S t lt ya VaFUOg 5 Var10g 1 27 771 arlog j1 log j2 log jg4 3971 l Z varlogjj j1 The variances are additive because the risk sets at y1y2 yk are nested Ru 3 Ba 3 Thus by statistical theory we can treat log j1 log j2 as uncorrelated terms Use the delta method for a transformation 5 of an estimate 9 we have var 9 l 9lzvar Thus 1 2 1 lt M 39gt M varlog w varj 7 777 7 gm 7 12739 NU 1739 N0 A 71 A A I varlog 50 Z varlog gm lt 1 j1 ymSt 17 7 Use the delta method again 7t2 var5 t var exp log 9 m St2 varlog Plug in SW djNj and j Milli m Greenwood7s formula for estimating the variance of the Kaplan Meier estimate is varmt T lg 2 NltjgtNltj gtdltjgt 21 Property When n is large St appwmx NormalSt7 0t2 where Ut2 can be estimated by the Greenwood7s formula Remark 1 This general property holds also for continuous survival data Remark 2 A more formal approach which allows for theoretical developments of continuous survival data is through a representation of S 57 fot Altvgtdv 57 0t dig where F 1 is the cdf of uncensored Y and R1 is the cdf Y Let F 1 be the empirical distribution estimates Then um MU A 7 tF SKM t m 6 f0 Theoretical properties can be developed based on probability theory Nonparametric MLE Kaplan and Meier showed that the K M estimate is the unique nonparametric mle from the likelihood function n z ltgtlt H gural014i i1 where the likelihood maximization is subject to the class of probability distributions which assign probability tio7 and only to uncensored failure times To see the Kaplan Meier esti mator is the unique mle of the likelihood function n n I 6239 5 Elfltwgt5isltyigtl 5il l W di n di 7 we H i1 yjltyi Thus7 the unique mle of A is dWN and the Kaplan Meier estimate is the unique mle Reference Kaplan amp Meier JASA7 1958 Remark K M used St PT 2 t instead of St PT gt t for their MLE parameter ization Example Lee7 p29 Forty two patients with acute leukemia were randomized into a treat ment group and a placebo group to assess the treatment effect to maintain remission T remission time o 6 MP 6 mercaptopurine group 711 21 66 6 7 10 13 16 22236910 11 17 19202532323435 months 0 Placebo group 712 21 11223445588881111121215 17 22 23 months The empirical survival function from the placebo group is D H 50 1 51 g 52 53 g 514 067 Var 4 Small W01o3 A 95 con dence interval at t 4 is 067 i 196 gtlt 0103 067 196 gtlt 0103 047 087 Warning The sample size 712 21 may not be large enough for the normal approxirnationl For the 6MP group use the K M estimate to derive 55 1 5m lt17 lt1 7 17gt 310 lt1 i lt1 i lt1 i 0753 Apply the Greenwood7s formula to get A A 3 1 1 varS10 07532 17 gtlt 16 00093 A 95 con dence interval for 810 is 0753 7196x00093 7 0753 196x00093 0564 7 0942 What about 911 and var 117 7 Same as 310 and var 10 Remark 1 The K M estimate is a nonparametric method which can be applied to either discrete or continuous data For a rigorous development of statistical theory7 see Kalb eisch and Prentice 1980 Remark 2 The accuracy of the K M estimate and Greenwood7s formula relies on large sample size of uncensored data Make sure that you have at least7 say7 20 or 30 uncensored failure times in your data set before using the methods Remark 3 Greenwood7s formula is more appropriate when 0 ltlt St ltlt 1 Using Green wood7s formula7 the con dence interval limits could be above 1 or below 0 In these cases7 we usually replace these limit points by 1 or 0 For example7 a 95 con dence interval could be 08457 11307 we will use 084571 instead 3 Proportional Hazrds Model PHM 31 The model Now we move to regression analysis Assume covariates are available on each individual 962 i17 i27 WM The PHM assumes A Ot6 1i1 2i2w pmip ma a where am is p gtlt 1 vector of covariates and is a 1 gtlt p vector of parameters Interpretation of the model Hazard at t for given z baseline hazard at t gtlt Risk factor e wi Characteristics of the model 7 The PHM is a model on the basis of hazard function Note Alternatively you might be interested in the accelerated failure time model7 logTi 3332 10gT0i7 TOi N So 9623 Tl T01 6 ltgt a standard linear model 7 The baseline hazard A0t is left unspeci ed nonparametric thus the PHM is a semi parametric model A0 nonparametric component B parametric component 7 In most applications related to public health the parameter 6 is of primary interest and A0t is of minor interest However estimation of A0t is desirable when we wish to predict the hazard for an individual with covariates mi 32 PHM as Lehmann s Alternatives The PHM can also be expressed as St 90 sow 25 Proof 50 96 5 fot VuiwiWu 5 fJAoltugte5w du el quotOWWMle wi WSW We say that a class of distributions with the form St 50007 for some positive 7 is a family of Lehmann7s alternatives Clearly the PHM implies that the distribution functions form a family of Lehmann7s alternatives The PHM is a very exible model because of its semiparametric feature but the validity of the model is not automatic and still needs to be con rmed Example A two sample case z i 0 represents treatment A T 1 represents treatment B Under the PHM Mt s A0te m That is Using Lehmann7s alternative expression we derive 51c sow log 51 t e log Sot constant log Sot For exploratory analysis to examine the validity of the PHM for two sample case we can use the K M estiamtes 1 and SO to see if 1 5 t t L1 constant logSoltt The PHM is a valid model if gtt remains a constant over time 33 Partial Likelihood Method Assume independent censoring Conditional on xi T1 and 01 are independent Assume the PHM A02 A0t6 1wi1m pwm Ot5 i Data 3 y17617w1739quot7yn76n7wn yi observed follow up time 61 censoring indicator xi covariates H data history up to ya Assume failure times are Lot tied The likelihood function is Eli xi6iSyi 960176239 T 39 density function survival function HpxilHi7yiPHi790 i 6 H gwm EjeRm ewj uncensored gtlt something ignorable 239 where R Risk set at ya and we covariates corresponding to ya The rst likelihood is called the partial likelihood Cox 19727 JRSS B 19757 Biometrika identi ed the above likelihood structure Thus the partial likelihood method is also referred to as Cox7s method The result is great Why 0 The result is derived under an attractive model The PHM has nice interpretations in terms of hazards and it is semiparametric o The partial likelihood only involves 6 It does Lot involve A0 It7 and thus computation of B is manageable and inferences can be developed How did Cox obtain the ideas of partial likelihood 27 Assume no ties in the uncensored failure times Let Lp The partial likelihood Any likelihood77 must correspond to a probability or density of some kind Note that i i i i a failure occurring at ya and P lt1nd1V1dual z fails at ya data history before ltyi P mi fails at ya a failure occurring at ya andRW Oyi5 wi Eggs EjeRw Oyi5 wj EjeRm e wj Thus7 the partial likelihood77 is Lp uncensored Pwi fails at yila failure occurring at ya RW 239 53902 i EjeRa e wj Derive the maximum likelihood estimate by maximizing Lp over possible values of Eat ample Two sample case No treatment 77 97 18 Treatment 127 194r 0 no treatment PHM Mtgx A0006 i 7 1 treatment The partial likelihood is 60f i 60 LP leo 60 eo e e l l60 e e l l60 e l 171 l i 171 l 326 126 1e Obtain the mle B by maximizing LP 34 Generalization to TimeDependent Covariates Sometimes part ofthe covariates could be time dependent For example7 the time dependent covariates could be 7 age at failure time t 7 dosage level at failure time t 7 accumulative dosage at failure time t 7 treatment status off or on at failure time t or a transformation of the above time dependent measurements Time dependent covariates for the ith individual are i1t7i2t7 We shall use the general notation instead of am even though some of the covariates are time independent The PHM is now Mt u g t A0tg wit39 With time dependent covariates7 the previous partial likelihood argument still works7 and the partial likelihood becomes gamelt2 L 17 y r ya ZieRa e A 2 Example Suppose 90 95m 7 9512007 95130 m 7 1 treatment 11 7 0 no treatment zi2t the ith individual7s age at t zi3t the ith individual7s age at t2 T time from entry to death Note that 2120 baseline age of the ith patient The partial likelihood is 5 1wi1 2wi2yi awiayi L l l Z7 1 ya EjERi6 171 2w72yz thaw2 Suppose the observed data are Treatrnent ID 001 002 age at entry 10 12 yl 12 194r No treatment ID 003 004 005 age at entry 4 0 11 yl 7 94F 18 Tirne dependent age IDli x 1 001 17 22 002 19 24 30 003 11 005 18 23 29 Time dependent age2 IDli x 1 001 17 22 002 192 242 302 003 112 in 0 72 005 182 232 292 Note Computer needs the above covariate process data77 for tirne dependent covariates analysis 651390 23911 s39112 p e i39l 23917 s39172 e i39l 23919 s39192 e l39l l BQ39ls l S39lSQ 30 e i39l 23922 s39222 6511 239224r5339222 6511 239244r5339242 651390 239235339232 65104r5239294r5339292 39 6510 239294r5339292 6611623063302l Remark Using the baseline age 12 or time dependent age zi2t as a linear term in the proportional hazards model would end up with the same partial likelihood estimate g because Ot6 1i1 21i2t 3zi3t A0t6191M1 2zi2t 3zi3t 7 A3lttgt6 1i1 2i2 3m3lttgt where A3t A0lttgt6 2t is also a baseline hazard function Example T Time from onset of treatment to AIDS de nition before Jan 1993 CD4 count for the ith individual at time t MtgzZu u S t A0te t t l g t Relative hazard att M Mt zkuu S t 7 0t5 mt 7 Otg wkt 7 6 mit mkt lf 6 7001 7 250 zkt 2007 then R39H39 67001gtlt2507200 705 x 06065 Note that the RH is determined by the covariate information de ned7 theoretically M although in applications we could use an earlier measurement such as the treatment received one month ago as the current So7 be smart and exible when a time dependent covariate is used in the analysis 35 Tied Survival Data The partial likelihood methods so far do not handle tied survival data When we analyze discrete or grouped survival data7 the problem of how to analyze such data naturally arises Consider the following simple PHM Mt A0te No treatment 7 94F 18 17273 0 Treatment 18 194r 4 5 1 Recall the partial likelihood construction is motivated by Pxi fails at ya a failure occurring at ya RW Now7 at 112 187 the probability becornes Plt3 and 4 fail at 18 two failures at 18 risk set at 18 3 4 5 018em3 0186M4 AO18e z3 A018em4 Aougwm A018e ws AO18gte z3 A008 66mm lt 061 661 1 66mm The partial likelihood is 5190 eraOHM LP 3 5190 2 5191 axiom1 g H l axiom1 For the general data whyh l 2112637 7 mm y 6 the partial likelihood for tied survival data is 636D 320 Lp H Z Ecombinations 6 i DE iCR Sign 39w wn z 32 Where D is the set of deaths or failures occurring at ya D3 is a a combination of deaths or failures from the risk set R with the restriction DE Di Computation of the mle from Lp for tied survial data in a big problem Statisticians are still developing fast algorithms for calculation 7 If you have heavily tied survival data7 check your computing packages to see if they handle such data 7 Some of the computing packages use the Breslow7s approach Breslow7 19727 Biometrics to handle problems with tied data The results are reasonably accurate if you have a small proportion of ties Here the Breslow7s approach refers to Each of a set of tied failure times is sequentially treated as though it occurred just before the others 36 Discrete Survival Data In the situation that the failure times are truly discrete7 we may replace the proportional hazards model by the discrete logistic regression model Mtk zuu S tk A0tk mm 17 Atkuu S tk 17 A0tk where tk k 127K7 are the discrete points of the failure time T Equivalently7 the logistic model can be also expressed as Atkuu S tk 17 Mtk zuu S tk with 6 A0tk17 A0tk 6ak tk There are a number of approaches developed to estimate the parameter 6 see Breslow and Day Volume 17 1980 for details 37 Estimation of A005 Breslow 19727 JRSS B gave a heuristic argument He assumed 5x0t to be constant between uncensored survival times Let A012 be constants lt0gt 0 S t lt 91 AW N1 111 S t lt 112 33 Say7 we are interested in SW2 Tbe people in the risk set at 112 are in Ra Since we know one person fails at ya thus for given y2R2 1 Z P the jth individual fails at y2ly2R2 16132 2 93 awla 16132 93 M2039 Z 6W 16132 Thus7 the hazard probability between 112 and 113 is 93 youV2 276 5 3 Now use 3 the rnle derived from the partial likelihood to derive 1 93 112 36122 5 3W2 Now7 you may estirnate an individual7s hazard probability between 112 and 113 by ya 7 gm hazard with mi in y2y3 113 112 mega EjeRm e wj where am is that indiv7s covariates Sirnilary7 you can also estirnate an individual7s hazard probability between gm and gm by 5696i 216120 e wj If you are interested in the cumulative hazard probability77 within 0ym1 you just add up the hazard probabilities 539 5396i e wj wj EjERm SKEW 6 Note Although the estimate of the cumulative hazard probability described above is statis tically accurate when the sample size is large7 the Breslow7s estimate of the hazard function can be greatly improved by smoothing techniques 38 Goodness of Fit Timeindependent z material from Miller7s book7 p168 170 Suppose we want to check on the validity of proportional hazards model In the case that z is one dimensional7 an approach of goodness of t is to partition the X axis into K inter vals7 compute a separate Kaplan Meier estimate for each interval7 then apply the 2 sample goodness of t procedures When the time independent covariate z is multi dimensional7 we consider the following approach De ne n e mi A0udu 0 Thus7 because is monotonic in Ti PAmTlgtt PTgtA1t expAm 1509 git Thus7 the random variable follows Exponentialt9 1 distribution Further7 A11y17 61 7 AM yn6n form a sample with censoring Because Ami depends on B and 0t7 sub stitute the corresponding estimates and de ne A A Yi A Al AMY 5W A0udu 0 Let be the Kaplan Meier estimate based on 117 61 7 An 6 Under the proportional hazards model7 logSt it is a linear function off To verify the validity ofthe proportional hazards model7 check if t 7A 1 logSt is approximately satis ed Timedependent zt When the covariate t is tirne dependent7 the above techniques no longer work for goodness of t There is a large literature regarding how to construct tests to verify the proportional hazards model assumptions The so called Martingale residuals7 are used as the fundamental statistics for constructing the tests For continuous survival data7 de ne a residual7 at ya as W iyi T EkERv wMzm ziyi 7 Elcovariate at ya l RW Each residual term has 0 expectation Thus7 after replacing B by B the corresponding residual plot should re ect this speci c feature 4 TwoSample Testing Goal of testing Determine if there is a difference between two groups Some of the traditional methods77 are appropriate for complete failure times but not applicable to censored data 41 Complete Failure Times Suppose there is no censoring and the data include t17t27 7tn We are interested in the t year survival rate7 St7 and observe D D Treatment A 71A B n n3 771 mg n D Failing in t years D Surviving beyond t years 10A PDlA PB PDlB Consider the following way to construct a X2 test statistic D D Treatment A 71A B n n3 771 mg n Null hypothesis H0 pA pB or7 equivalently7 SAt SBt Conditional on 71A 713 mp mg the count a77 follows a hypergeometric distribution un der Hg with E0A mD 77 37 nAanDmD Var0A n2ltn 7 1 Construct a test statistic 2 Ti amDltffgtgt when n is large7 T N X21 42 A Test for Right Censored Data Suppose t year survival rate is of interest H0 35140 530 Data could be censored before It We use the K M estimate to estimate SAt and SBt7 and construct a test statistic SAt B t T 7 SDlSAt SBtl N N01 Here SDSAt 7 5 3 can be estimated by Greenwood7s formula7 Var At 7 309 Var At Var Bt SADlgAt 330 VaF At VaF5UBt7 where var is derived by by Greenwood7s formula Disadvantage of test This test only tests the survival difference at a speci ed time7 t It does not test the overall difference of two survival functions See Pepe and Fleming for alternative approaches 1989 Biometrics ls it possible to propose global77 nonparametric tests for assessing difference in survival 43 Logrank Test for Right Censored Data Ideas 1 Create a 2 gtlt 2 table at w uncensored failure time 2 The construction of each 2 gtlt 2 table is based on the corresponding risk set 3 Combine information from tables The nully hypothesis is H0 AAt ABt0rSAt SBt for all t Note Where for all It might be replaced by for observed It The general concept to construct a test statistic at an uncensored tirne y is the following At an uncensored tirne yy ya for some 239 D D Treatment A m m Treatment B n3 771 N mi N individuals in the risk set at y from pooled data d failures at y from group A mpsff failures at y from pooled data 71A individuals in the risk set at y from group A 713 individuals in the risk set at y from group B 7713 N 7 771 Use the following method to construct the test statistic conditional on 71147113771137771137 the random number d follows a hypergeornetric distribution under Hg with probability fllmiidl will Under H07 71 E0D 771 lt nAanDm Var0D N2N 7 DD km N N01 V ELVMNDW 71 large 39 For the calculation at Z z k quotWe a 31 do T 2K Altigt BltigtmDltigtmDltigt 1 N Norl when do we reject H0 The null hypothesis is H0 AAt i ABt for all t Consider three different kinds of alternatives A1 H1 AA 31 AB no prior knowledge A2 H1 AA lt AB treatment A is better A3 H1 AA gt AB treatment B is better Usually the signi cance level of a test is set up to be 005 For A17 use 2 Elma Roma X21 Elf var0Di 71 large Reject H0 when 22 gt 384 gt 196 22 p value Probability for values larger than 22 For A27 When H1 is true7 Z is likely to be negative7 so reject H0 when 2 is srnall7 that is7 z lt 71645 P value Probability for values smaller than 2 For A3 When H1 is true Z is likely to be positive so reject H0 when 2 is large that is z gt 1645 P value Probability for values larger than 2 Example Group A 3 5 7 Group B 12 19 20 Uncesored 3 5 7 H0 3 A140 A30 91 3 D D A 5 l3 5 5 l 9 10 92 5 D D A 4 l3 5 5 l 8 9 93 7 D D A 3 l3 5 5 l 7 8 M4 12 03gt 03an 115 18 USED dam Ev 0h m ma Cnth 116 19 U 03gt IE 54 qgtqgtcgt H 00 117 20 03gt Ev waaw 00000 W 61039 Edda Vardda CO H H X 01 H O CH CH H CD H com com H 5X5gtlt1gtlt9 H 71029 i 025 o 044 W 02469 038 02344 017 01389 020 01600 0 mo Mo 0 0 7di 7 Eodi 1 05 0 i 0 231 42 7 ZVar0dZ 02501030 1 231 2 228 1030 Now if H1 AA 31 AB two sided 22 2282 5198 gt 384 pvalue 00226 reject H0 if H1 AA gt AB one sided 2 228 gt1645 pvalue 00113 reject H0 Warning Sample size might be too small for the validity of X2 approximation 44 Generalization of LogRank Test After constructing a sequence of 2 gtlt 2 tables at uncensored times7 we consider the statistic T E wltigtdltigt E0 161ml uncensored 239 where w is the weigh 77 on the table at ya The variance of T is Z waVardZ 139 De ne Ea wltigtdltigt Edda mD A Em we do i approx 2 N0 1 wltigtnAltigtnBltigtmDltigtmDm n lar e M g Three cases of interest i w 1 for all i T log rank test ii w N T Gehan7s test 1965 Biometrika iii w N T Tarone and Ware test The tests of ii and iii are motivated by examining the risk set size and giVing weights to tables according to the risk set sizes In general the log rank test is more ef cient under the proportional hazards model and ii and iii are more ef cient under other classes of models Reference Tarone and Ware Biometrika 1977 For example if the underlying model is the PHM ABQ AAt6 Hoi 0At ABt HFMO Or H18gt0 Or H18lt0 The log rank test is the most powerful test Another example if the relative hazard is large at earlier times then Gehan7s test might be more powerful than When cross over in hazards occurs the weighted or unweighted log rank tests would not be good choices in general Gehan7s test is closely related to the Wilcoxon test It can be regarded as a generalization of the Wilcoxon test 45 Wilcoxon Test for Complete Data Data from treatment A t1 tm SA treatment B 21 2 N SB Here t1 tm 21 2 are failure times uncensored H0 SA 3 The general idea is the following Pool the data from treatments A and B Rank the data Calculate the sum of ranks from treatment A data If the rank sum is large or small7 then reject the null hypothesis 2 7713 712 7 Example A 37 7 B l7 4 Ordered data 17 27 37 4 7 Ranks for 37 77 2 are 37 57 2 Rank sum is 3 5 2 10 ls 10 large or small We will discuss it Order the pooled data and de ne yl rankofztl7 tlm R Z Yi i1 Under H0 SA 53 1st rank last rank l m n EolRl mlt 2 l Var0R W from permutation theory Testing statistics is W i R E0ltR When mn are small Use small sample tables Reject H0 when W is far away from 0 When m n are large use approximation result 1 R 7 m mzn mnmn1 12 Reject H0 when W is very different from 0 that is R is very large or small W 3 N0 1 To use the Wilcoxon test the usual underlying models we have in mind are likely to be 0 location shift model fAt fBt 9 0 Stochastic ordering model SAt gt SB 01 SAt lt SB 0 Proportional hazards model AB AAt6 46 Extension of Wilcoxon Test Gehan s Test for Right Censored Data For complete and continuous data an alternative way to write the rank sum is 1 1 R m U W 2 2 and U is de ned as U where UH 0 2739 151 11 71 if t lt z The statistic U77 is also called the Mann Whitney statistic Reject H0 if U is away from 0 Gehan Biometrika 1965 modi ed UH subject to right censored data To see the validity of consider the condition when we have the total separation ta lt ta lt lt tm lt 21 lt lt 2m 47 then R mag For every interchange of a consecutive 122 pair7 R is increased by 17 and the number of interchanges is 1 5 UH 1 M3 71 j1 H Thus R Wiz wn Now the data are A sarnple 1761 ym7 6m B sarnple 1176 7 2762 6176 censoring indicator De ne UH 0 either ti 27 or dont know77 Note ti and 27 may not be observable The Gehan statistic is G UH apgox N0UZ Reject H0 if G is large or small 1 a E 1 A 357918 m B 1219202033 5 z1 1 Um 1gtlt 1gtlt 1gtlt 1gtlt 1gt j 5 12 2 Ugj 5 j1 5 13 2 U3J 5 j1 5 24 2 U4j0 j1 15 i Uzj1 1 1 1 1 3 me i The Gehan statistic is Gi5 i5 75 0 73718 To get go value7 we need to estimate 02 Gehan provided a complicated formula Biometrika7 1965 For your calculation7 just use the weighted77 formula ii introduced earlier Because G Nltigt do Eo doll mD 71A 7 7 Z Z ltgt 0 Ni we may derive the variance of the Gehan statistic by the previous formula To see the equivalence7 note that G ZZUU yi censored jERi 2 2 UM ya 16R 49 I II Clearly7 I 0 For ll7 if the failure at ya is from group A 7 then the score is 7 NW 7 140 7 mDm 7 610 of failure at ya from B77 and 71140 7 d otherwise Thus the total score evaluated ya is 7 ldlt gt NW 7 We 7 mDa 7 dlt gtgt 7 mDe 7 dlt gtgt We 7 dlt gtgtl ldlt gtNlt gt mDltigtnAltigtl Thus G i Z ldltigtNltigt mDltigtnAltigtl ya mD 71A 2 dud M 7 i NU and G approx N01 Altigt BltigtmDltigtmDltigt E N0 71 large 1 N32Nmil 5 Truncation Models Statistical techniques for truncated data have been integrated into survival analysis in last two decades Truncation is a sampling mechanism for observing incomplete data where a random variable is observable only if it falls in a certain region untruncated region When the random variable of interest falls outside the region7 the information about the variable is lost and therefore excluded from the data set Truncated survival data typically arise in observational studies 51 LeftTruncation and LengthBiased Sampling When studying the natural history of a disease7 an incident cohort is de ned as a group of subjects whose initial events are randomly sampled from a pre determined calendar time interval The subjects are followed for detecting the occurrence of the failure event until loss to follow up or end of study The data collected from an incident cohort are the typical right censored data The observed data include observations y5s7 where y mintc7 6 t S c7 t and c are the failure and censoring times When the failure times are long7 the incident cohort design is inef cient for natural history studies because it usually takes a long follow up time to observe enough failure events In contrast7 a prevalent sampling design which draws samples from a disease prevalent population is more focused and thus more practical in real studies The prevalent sample is formed by subjects whose initial events had occurred but have not experienced the failure event at the time of recruitment7 739 The prevalent sampling can be described by one of the following two models I De ne T as the time from the disease incidence to the failure event for subjects who became diseased in a calendar time interval ab7 where a lt 0 The variable W is the time from the disease incidence to the potential recruitment time The variable W is called left truncation time Under the left truncation sampling7 the probability density of the observed w7 t is the population probability density of wot given T 2 W psw7t pw7tlT 2 W Without further complication of censoring7 the observations include w7 ts7 where Let g and f respectively be the marginal density function of W and T Assume the time to failure7 T7 is independent of when the initiating event occurs7 then it implies T and W are independent of each other7 forming the non informative truncation model ll Assume the initial events occur over the calendar time as a nonstationary Poisson process with intensity Mu u 6 07 and the distribution of T is independent of u when the initial event occurs De ne the pdf A0u OT Avdv as the normalized Mt in 0 739 Conditioning on the number of initial events occurring in 07 the event times us are order statistics of iid random variables with pdf 9 Pick an event time U randomly from Us and de ne W 739 7 U then the pdf of W is gw A0T 7 w Example Suppose a random sample of women with breast cancer bc are recruited for observation of survival The failure time T is de ned as the time from onset of bc to death and f is the probability density function of T Suppose the time of recruitment 739 is a xed calendar time Then 9 can be interpreted as the the rate of occurrence of bc over time 52 LeftTruncation and LengthBiased Sampling The joint density of the observed wt can then be expressed as 9wft1t 2 w PSWJ W mmnmuwgt U 15u9udu 39 In the situation that g is uniformly distributed then the observed It follows the length biased distribution Length biased sampling could arise in many epidemiological studies when survival data are collected from a disease population In the breast cancer bc example assume the rate of occurrence of bc remains constant over time and ii the density function of the time from bc to death f is independent of when bc occurred Conditions and ii together are referred to as the equilibrium condition The equilibrium condition typically holds for so called stable diseases7 When the equilibrium condition is satis ed we observe length biased failure time which has the following density function momwwmwom m where M ET is the mean failure time In general treating length biased data as the usual data7 would lead to biased analytical results because of the bias of data When length biased data are encountered we should use bias adjusted methods for analysis see Wang 1997 length bias7 Encyclop of Biostat and references therein Although statistical methods can be formulated for length biased observations Assumption is required for validating the length biased model as well as the corresponding methods Vardi 1982 Annal Stat Wang 1996 Biometrika Let u represent the disease incidence occurrence rate at the calendar time u and S the survival function of T for those patients whose disease was initiated at u Then the disease prevalence rate at the calendar time 739 can be obtained as PT 00 u Su739 iudu When the equilibrium condition is satis ed the incidence rate is a constant IO and the survival function is independent of u SM S and 137 10 57 2 udu 10 Sudu IO gtlt M foo 0 is independent of 739 Thus let PT P0 and we derive P0 IO gtlt M Prevalence lncidence gtlt duration Length biased data can be viewed as a special case of left truncated data since the conditional density of the observed t given w is ft1t2 1105011 3 which corresponds to the density function of left truncated failure time By viewing length biased data as left truncated data we next consider how to analyze left truncated data in a general setting It is important to indicate that the validity of the truncated density in 3 depends only on Assumption ii and not on Assumption 53 Left Truncated Data ProductLimit Estimator Suppose 71 individuals are recruited into a propective follow up study by prevalent sampling Suppose the observed data w1t1 wmtn are independent and identically distributed observations Let ta lt lt tm be the distinct and ordered values of t1 tn De ne RU 239 3 W S to 3 ti dm Number of failures at t0 N0 Number of individuals in BM Am f WW 5 Val Productlimit estimator For t1 S t lt t recall MT 2 m2 MT 2 as 1W 2 to S t m PTltT 2 Wm PTltT 2 Wm PTT Z RFD 53 Nerds m by NU j 12z 7 1 The product limit estimator is PrT2tj A d 1 d2 di71 St 177 177 177 M1 Nlt2gt Nail Now estimate thus 1 St Example Data 4504571228175 failure times 2 4 5 7 8 da 1 1 2 1 1 N 1 4 4 4 2 1 R1 0747 17 27 2787 R2 4757 0747 2787 Note Unlike right censored dlata7 risk sets usually are NOT nested Example Data 47570717577717272747175 54 failure times R1 1727 2747175 R2 4757 5777 33 577 The truncation product limit estimate is thus 51 1 01gt A CT ll H l eolH H l who ll who eolH CI A xi V l l A lt1gtlt1 gtlt1gt0 Note that the applicability of the product limit estimator requires that the truncation time wi be observable7 and such a requirement might not be met in some applications Remarks For left truncated and right censored data7 modi ed Greenwoods Formula still holds for the estimation of the asymptotic variance of the product limit estimator just use the revised risk sets modi ed partial likelihood method still holds for the estimation of B in the pro portional hazards model just use the revised risk sets modi ed logrank tests still hold for testing the difference between two groups just use the revised risk sets Essentially7 censoring and truncation share some signi cant similarities in statistical anal ysis especially7 the similarities in the risk set methods7 Nevertheless7 regardless of the 55 similarities there still exist signi cant dissimilarities ie different statistical properties that are not emphasized in this course References include Woodroofe 1985 Ann Statist Wang et al 1986 Ann Statist Tsai et al 1987 Biometrika Keiding and Gill 1988 Ann Statist and Wang 1989 1991 JASA 54 Right Truncation Suppose that a certain disease can be characterized by an initial event and a failure event An example is the study of the natural history of Human Immunode ciency Virus HIV and Acquired Immunode ciency Syndrome AIDS where the HIV infection is the initial event and the AIDS diagnosis is the failure event Let X denote the calendar time of the initial event and T the time from the initial event to the failure event Then an observation zt is observed only if z t S 739 where 739 is the closing date of data collection This is an example of right truncation the failure time T is observed only when T S 739 7 X Let W 739 7 X Then W is called the truncation time Product Limit Estimator Suppose the observed observations T S th39 1 n are independent and identically distributed Let ta lt lt tm be the distinct and ordered values of t1 tn A practical constraint in nonparametric estimation is that a nonparametric distribution estimator cannot estimate the distribution function beyond the largest observed tJ Thus what can be estimated is the conditional distribution function Ft FtFtJ for t S tJ De ne RU 239 3 ti 3 W S W dm Number of failures at t0 N0 Number of individuals in BM Ag ftjFtj For t S tJ the product limit estimator is