Empirical Microeconomics ECO 721
Popular in Course
Popular in Economcs
This 50 page Class Notes was uploaded by Reba Terry on Sunday October 25, 2015. The Class Notes belongs to ECO 721 at University of North Carolina at Greensboro taught by David Ribar in Fall. Since its upload, it has received 16 views. For similar materials see /class/229062/eco-721-university-of-north-carolina-at-greensboro in Economcs at University of North Carolina at Greensboro.
Reviews for Empirical Microeconomics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/25/15
Fixed and Random Effects Models A Introduction 1 N if or is correlated with consider a model of the form 3 XHB u wheretrn 0 a fori 1 Nandt 1 T Let Eoc E05 0 Varoc 0 Var of and Eoc 5 o the presence of 0 leads to serial correlation in the u E04r um of fort 5 thus failure to account for 0 leads at a minimum to incorrect standard errors and inefficient estimation failure to account for 0 leads to heterogeneity omitted Variables bias in the estimate of B to see consider the following illustration y Heterogeneity Bias B Fixed Effects Model 1 least squares dummy variable model a note that in the model above we could rewrite the on terms as coefficients on a set of dummy variables indicating membership in crossrsectional unit 139 and estimate the model simply by including the appropriate dummy variables b this approach is straightforward however for large N it may be impractical to specify so many ummy variables N mean7differenced model 7 71 T a let J 7 T 1 1y b similarly de ne A7 as the vector of uniti specific means for the explanatory variables c then the mean7differenced estimator is BWWWy and MW where N TX 7 X117 A l and Y 73quot 31771 quot n f 7 Y T X 7 La N44 Tl d the mean square error in this model is 5 NT 7 l 7 K71WW 7 W39 vl39lSCXWKV where N Tl h 7 2 andKis the number of columns in xi 3 speci cation test ococ2ocN b run OLS and xed effects versions of model test statistic is RV R ml39 N7 1 F17RMT71 gt7Kt FN71NT7171lt 4 limitations of xed effects approach a cannot estimate effects of variables which vary across individuals but not over time blunderbuss approach to controlling for omitted variables 7 knocks out all crosssection variation in the dependent and independent variables cannot predict effects in levels outside of sample prediction in levels requires prediction of the xed 5 0 3 effects use of xed effects is inef cient if 05 is uncorrelated ith x ie if appropriate model is random effects use of xed effects can exacerbate biases from other types of specification problems especially measurement error F C Random Effects Model 1 speci cation of model a consider a slightly respeci ed version of the model ya it b in addition assume that the 0 are unobserved random Variables which follow a probability distribution known up to some finite set of parameters c also assume EM Em 0 VaIOXz 022 Varm 02 Ewe s 0 Em at 0 Eu as o d can Write the covariance matrix as 2 2 2 2 3m lt3E lt3m 3m 2 2 2 2 G 3 39l 7 quot 39 3 El All J 2 m E G T a If e a generalized least squares procedure is possible if we c transform the dependent and independent 9 Variables by 2 12 I 7 777 Where 9 1 SE 1 TO G ie run aregressionwith ii J i 9T and i 9f as the dependent and independent Variables f sometimes described as the quasiidifferenced estimator 2 feasible generalized least squares FGLS estimation A 2 a run fixed effects regression to obtain 32 SW b use slope coef cient from any consistent regression eg OLS to form 8 I 7 C20 7 7 and 2 s mum f Ne 2 G S 7 i note that it is possible for this estimator to be negative 3 Breusch and Pagan 1980 speci cation test L ange multiplier test based on OLS residuals b test statistic is m1 7 2 T D Fixed or Random Effects 1 key consideration is the orthogonality of 05 a if 05 is uncorrelated with the Variables in x then random effects is the appropriate estimator b if 05 is correlated with the Variables in x then the xed effects model is appropriate 2 can be examined using a Hausman7Wu test a run both FE and RE models b test statistic is FE 7 BREIVar FEli Vark REjlilthE 7 Elm E Twoiway Fixed Effects Model 1 specification of model ENHXRE 0W Y 8 here the model includes both individualispecific effects or and periodrspecific effects y 2 xed effects estimator 71 7 v 77 l 71 N T a let 211 J 7A T EMEHLHand J xt J J7T similarly construct Xi and regress y on xi and adjust for the appropriate degrees of freedom note that if eitherNor Tis small it may be easier to run oneiway xed effects and add dummy Variables or a F Stata code 1 oneiway fixed effects models 7 the model specification is xtreg dependentivariable list of independent Variables fe iindex7var where the indexivar is a Variable indicating membership in a ou either i for individual or t for time fixed effects alternative the index could be set earlier in the program using an iis command oneway random effects model the model speci cation is xtreg dependentvariable list of independent variables re iindexvar speci cation tests a xttest0 command used after random effects speci cation conducts the BreuschPagan speci cation test b hausman command used after random effects speci cation conducts the Hausman speci cation test after the xed effects regression include the command est store xed after the random effects regression include the command hausman xed References Greene William Econometric Analysis 3rd Edition Upper Saddle River NJ PrenticeHall 1997 chapter 14 Homework Using the program afdc redo from the class website run an OLS random effects and xed effects regression of the determinants of the natural log of maximum AFDC benefits lafdc using 1982 and later data from the RibarWilhelm RESTAT paper The explanatory variables for the model should be the log of the price of transferring income lprice the log of the average state income linc an index of liberal voting sentiments xada the percent of the state population that is black xpcbl the percent of the state population that is over age 65 xage65 the percent of the state population that is under age 14 xpc14un the percent of the state population with a high school education xpchs the percent of the state population with a college education xpcco Run and interpret the BreuschPagan and Hausman specification tests Modify the program to add dummy variable controls for the years 19831992 Re run each of the models and each of the tests Re interpret the results ECO 721 PSID DATA SET DOCUMENTATION SPRING 2009 The class data come from the first 20 waves of the Panel Study of Income Dynamics PSID The SIPP is a longitudinal survey that interviewed people annually from 1968 to 1996 and has interviewed them biennially since then In most years the survey interviews one person per family and asks him or her about the family s economic programmatic and demographic circumstances More information about the survey is available at httppsidonlineisrumichedu and at httpwwwuncgedubaepeopleribarteachingECO725notesintro PSIDpdf The class will be using data from the family les of the PSID The data describe 13171 families and contains up to 19 years of data for each family There are a total of 114221 familyyear observations on the file The file is sorted by INTilD a familyspecific identifier that was internally created and the YEAR of the observation The extract omits data for 1973 because a key analysis variable was missing for that year The data have also been augmented to include some price index information The class will use these data in several lab exercises The extract contains a number of measures The data dictionary for the measures is as follows Variable Description INTilD family identifier 1 7 13171 YEAR year of observation 1968 7 1987 excluding 1973 TVAGE age of the household head in years 15 7 99 TVCURMAR indicator for the household head being currently married 0 l TVFEMALE indicator for the household head being female 0 1 TVNKIDS number of children in the family 0 7 13 TVWHITE indicator for the household head being white 0 l TVBLACK indicator for the household head being black 0 l TVFDSTD nominal measure of family food needs Thrifty Food Plan TVFHOME nominal annual expenditures on food consumed at home Note data are measured differently in 1968 than in other years TVFAWAY nominal annual expenditures on food away from home TVFDSTMP TVFAMINC TVHDEMPS TVHDANNE TVHDHEAR TVHDEDUC TCPI TCPIFH TCPIFA Note no data available for 1968 set to zero nominal annual value of food stamps nominal annual total family money income categorical measure of household head s employment status 1 7 working or on temporary layoff 2 7 looking for work unemployed 3 7 retired or disabled 4 7 keeping house 5 7 student 6 7 other household head s annual earnings household head s hourly earnings missing if head doesn t work categorical measure of household head s education 1 7 completed 05 years of school 2 7 completed 68 years of school 3 7 completed some high school but did not graduate 4 7 completed high school only 5 7 completed high school and obtained some nonacademic training 6 7 completed some college but did not graduate 7 7 obtained a bachelors degree 8 7 obtained an advanced degree 9 7 not applicable or don t know CPIU for all goods 1 19824 CPIU for food consumed at home 1 19824 CPIU for food purchased away from home 1 19824 Some Basics 0n Lifecycle Consumption Notes from Deaton 1992 A Introduction 1 Will examine consumption over time 2 A principal property of these models is that they allow people to transfer resources over time through borrowing and saving which leads to consumption smoothing 3 To introduce these models we will consider some simple cases a starting with a twoperiod model with no uncertainty b multipleperiod certainty case with additive preferences c multipleperiod case with uncertainty B Simple twoperiod model with certainty 1 Consider a person who faces a choice of how much consumption to allocate across two periods a preferences person s preferences are de ned over period 1 consumption C1 and period 2 consumption C2 by the utility function UC1 C2 which is increasing in both of its arguments b resources person has three resources that are received with certainty 1 an initial endowment of wealth inheritance A1 that is received at the start of period 1 2 income or earnings that are received in each period yl and yz 3 interest on savings or debt from period 1 that are received in period two let the interest rate be r2 c twoperiod budget constraint consider how the person s budget evolves l to simplify the model let s ignore prices of consumption equivalent to assuming that consumption is expressed as real expenditures and all other values are expressed in real terms 2 3 V in the rst period person begins with A1 y1 V savings or debts at the end of the rst period are the difference between rst period consumption and resources so wealth at the end of the rst period is A2 A1y1 C1 4 V assume there are no liquidity constraints that is assume there are no restrictions on savings or debts 5 resources in the second period are 1r2A2 yz 6 assume that the person cannot leave debts at the end of period 2 7 with these assumptions the twoperiod lifetime budget constraint is C C1 2 3A1ylL l r2 1 r2 a left side of expression is the present discounted value of lifetime consumption notice how lrz391 serves as a price on period two consumption b right side of expression is the present discounted value of lifetime income 2 The person s allocation problem is to choose C1 and C2 to maximize utility subject to the lifetime budget constraint a straightforward extension of standard consumer choice problem consider the optimal responses C and C what happens if there is an increase in one of the income terms say yl 1 increases resources available 2 will increase C1 and increase C2 extra resources transferred from period 1 to period 2 through savings standard income effect 3 note that the increases in each period s consumption are necessarily less than the increase in income consumption is smoother than in the static model are things any different if there is an increase in yz now consider an increase in the interest rate r2 1 m a 1mm a the increase in the interest rate effectively reduces the price of consumption in period two b consumption in period two becomes less expensive relative to consumption in period one c substitution effect d vv however because the substitution effect refers to consumption in different periods we refer to 3 it as an intertemporal substitution e ect e from this effect period 2 consumption increases while period 1 consumption decreases 2 V the fall in the effective price also means that more consumption is available generally a standard income effect b leads to increases in consumption in both periods 3 V despite some slightly different terminology both of these effects are standard effects found in other consumption models note however there is an additional effect a in standard consumer choice models income is fixed in this model however full income the complete rightside expression in the lifetime budget constraint changes with the interest rate b the decrease in lrz391 means that the present discounted value of lifetime resources is lower V c thus the income effect from the effective price increase in consumption is offset by the decrease in the present discounted value of period 2 earnings d V Deaton refers to this as the human capital effect C Model with multiple periods and intertemporal separability 1 Instead of two periods now consider a general model with 4 a nite but arbitrary number of periods T a b use lto denote the speci c periods the person s preferences are de ned over consumption in each ofthe periods UC1 C2 CT to simplify the analysis assume that the interest rate is a constant r over time we will maintain all of our other assumptions 1 interest rate implicit prices and preferences are known with certainty 2 unrestricted borrowing and saving 3 initial wealth is Al and terminal wealth is zero the lifetime budget constraint is then T 39Z Z 2W 1 T y A 1 221 1 r271 analysis of this model is similar to the twoperiod model but more complicated I still a generalization of consumer choice model 2 income effects now distributed across multiple periods in general income effects in any one period become very small 3 intertemporal substitution effects are now also distributed across multiple periods consumption in any one period depends on conditions in ALL periods eg lifetime pattern of income for empirical analyses this represents a huge data burden few predictions possible because of the myriad 5 possibilities for substitutability and complementarity among the goods from the unrestricted preference function 2 To put additional structure on the model assume that a preferences are strongly additiyely intertermporally separable such that U u1C1 712C2 UTCT 1 each of these subutility functions has the properties of a regular static utility function eg is increasing in consumption has an associated indirect subutility function etc 2 speci cation abstracts from habit formation and other time dependencies b moreover assume that each of the subutilities can be speci ed as utC 16H uC Zr 1 6 is a subjective discount rate 2 Z represents characteristics that shift preferences such as age family size health conditions etc c let 9 be the Lagrange multiplier we can rewrite the consumer s problem as choosing C1 C2 CT to maximize T 1 C Z xA Ty iC I I Ems quot l 1 gmm 1r 1 d the rstorder conditions for each period are simply H u39CzaZz lr 1 consumption decisions are made to keep the marginal utility of consumption equal to the discounted constant marginal value of wealth 2 consumption in each period depends ONLY on a characteristics from that period b the subjective and economic discount rates and c the constant marginal value of wealth changes in characteristics from other periods ALL enter through the marginal value of wealth 9t 3 refer to these as Frisch or tconstant consumption demand functions 4 this is an incredibly useful simplif1cation if we suppose further that the subjective and economic discount rates are the same 6 r the firstorder condition simplif1es further to u39C Z 9 l conditional on Zr people in this model behave to keep the marginal utility of consumption constant 2 if Z were constant consumption would also be constant across the lifecycle regardless of the life cycle path of income complete consumption smoothing more generally if we assume that 6 and r are close to each other and if we condition on Zr consumption evolves according to g u r 5 C I Clu g if we further assume that preferences follow a CBS speci cation uCt l p Cf p with p gt 0 then percentage changes in consumption follow p 1 r 6 D Uncertainty l The preVious models help to illustrate consumption smoothing and lifecycle consumption patterns but rely on the unrealistic assumption that future incomes interest rates needs and the like are known with certainty a doesn t matter whether you solve the model at period 1 I or T the solutions are all the same b no updating in the model 2 The standard way to incorporate uncertainty is to replace the lifetime utility function with an expected utility function a maintain the additiVity assumption b rewrite the function from the current period I going forward 1 c function can be written T m E zuxcuzo sl 1 here E is the expectations operator 2 It represents the information available at period I 3 Person s decision comes down to choosing a level of 8 consumption this period knowing a that decision will affect subsequent outcomes through the level of wealth Am that is passed on the person will be making subsequent consumption decisions 4 Let VtAt be the value of expected lifetime utility associated with the optimal choice of savings 5 a we can rewrite VtAt as mSaX uzAz y Sz EzVz11 for simplicity we are abstracting from Z and other things that could enter Vand u 5 The optimal level of consumption satis es 1rEzVz391Az1 1rEzuz391Cz1 6 If we make the additional assumptions that utility is the same over time and the subjective discount rate equals the interest rate the optimal condition becomes u39C EllCHIN 7 Once again we get consumption smoothing a b this time in expectations smoothing involves the marginal utilities of consumption depending on risk preferences expected consumption might also be smoothed for some preference specif1cations this would imply Cm Ct em where is em a random error References Deaton Angus Understanding Consumption Oxford Clarendon Press 1992 Survival Models A Introduction 1 Economists are interested in a variety of problems that require them to examine the determinants of the length of time or the duration that an individual spends in a particular state examples of problems include a b c d Spells of unemployment or employment Spells of welfare receipt Spells of business operation Birth timing and spacing 2 De nitions of spells and events a Assume that there are two or possibly more mutually exclusive states or conditions that a person could occupy for instance the person could be employed or unemployed A spell refers to the time that a person spends in one state or condition before transitioning to another All spells are characterized by start and ending times we refer to the event of transitioning from the initial state to the other state the event of leaving or ending a spell as a spell exit We refer to the amount of time that the person spends in the spell as the duration There are several equivalent ways to examine spell behavior speci cally we can examine the duration of spells the timing of exits or other characteristics 3 Duration distribution a Assume that the spell duration for a person is a random variable T which has a cumulative distribution F l ProbT S l 1 If T is a continuous random variable then it also has an associated density function fl 2 If T is a discrete random variable then it has an associated probability function pl ProbT I b We are interested in describing the distribution of T 4 Simple models a Assume that we have data on completed spells for a sample of people by complete we mean that we know the start and ending times and hence the Ts for all the spells b Suppose also that we have information on other time invariant characteristics of the people X c We could examine the association of these characteristics with average spell lengths by estimating a regression of the form ET X 0L B X d We could also estimate other models to examine other parts of the distribution 1 Median least absolute deviations regressions to examine conditional medians or more generally quantile regressions 2 Discrete choice models to examine the conditional probabilities of spells lasting past given points 5 Some data issues a Censoring 1 Information about some spells may be incomplete 2 When we collect data some spells may already be in progress and we might not know the start date we refer to these as leftcensored spells 3 Also some spells might not end by the end of our observation window also people could drop out of our sample before an eXit is observed we refer to these as rightcensored spells 4 Spells can also be doubly censored 5 In each case we know that the spell is longer than the duration that we observe 6 We cannot treat these observations as completed spells we know that the true spell durations are longer and treating the censored observations as if they were complete will systematically understate the actual distributions 7 Dropping the observations is also problematic a Long spells are more likely to be censored than short spells b Dropping censored spells produces a sample that contains a disproportionate number of short spells c Again spell distributions would tend to be understated 8 Event history procedures typically address censoring problems a Most address rightcensoring b Procedures also available to address left censoring these procedures however are more complicated and less frequently used for an example see Mof tt and Rendell 1995 usual practice is to either i Drop ongoing leftcensored spells or ii Use very restrictive models such as simple Markov models or approximate corrections see eg Ribar 2005 b Timevarying covariates 1 Observed characteristics for people may change over the course of their spells for example local employment conditions could improve their health might worsen etc 2 Simple models cannot incorporate timevarying covariates but most eventhistory models can c Duration dependence 1 We often want to examine whether people become more or less likely to exit a spell the longer they stay in a spell 2 If the probability of exit changes with the length of a spell the spell is said to exhibit duration dependence 3 Consider standard job search models a Models in completely stationary environments b C 4 V V V predict that there will be no change in the probability of leaving unemployment over the course of a spell no duration dependence Models with nite time horizons timelimited unemployment insurance bene ts and borrowing constraints predict that people will become less choosy over time and more likely to leave unemployment positive duration dependence Models with skills depreciation predict that people will become less capable of nding work as their unemployment spells progress negative duration dependence It is dif cult to characterize duration dependence using simple models 5 V The problem is that duration dependence is a characteristic of the distribution function F l a Many standard procedures such as regression b V models only look at one point in the distribution an analysis of duration dependence requires an examination of multiple points Event history procedures typically use either convenient function forms for F l or other restrictions that make it easier to examine duration dependence B Nonparametric Methods for Describing DiscreteTime Data 1 Assume that the durations of spells are measured in discrete intervals such as hours days weeks months or years a The random variable describing the spell length is discrete 1 It takes on the values 1 2 3 ZMaX 2 With associated probabilities 191 192 p3 19M 3 And with a CDFFZp1p2 pt b Let n1 denote the number of individuals who leave the sample in the rst period 112 denote the number of individuals who leave in the second period and so on let N denote the total number of individuals c If there is no censoring all people leave the sample because of transitions the probabilities can be estimated from the proportions of individuals observed to leave at each period eg f9 n N for l lMaX 2 KaplanMeier approach extends this to consider censoring 3 Some additional notation is helpful a Let SZ l F l ProbT gt Z denote the survivor function this is simply the probability that an individual transitions after survives past time I this is useful for summarizing what we know about censored observations b Let Ml pl SZ l be the hazard function this is the probability that someone who survives up to l survives past l l transitions at exactly I c Distinguish between people who leave in each period because they transition out h and people who leave because they are censored m Using our previous notation 11 hf m 4 Estimates of the survivor function can be constructed S z 1N 1 1Max jl J 5 And estimates of the hazard function can be constructed gt10 71 n i1 6 Example consider the following data on transitions to premarital births using data from the NLSY79 Age Women at risk Premarital births KM hazard 14 2035 2017 1962 1864 1707 1500 1277 1088 927 767 647 524 426 10 0049 0144 0280 0370 0387 0533 0423 0294 0378 0274 0247 0153 0117 27 363 7 0193 28 303 3 0099 29 259 l 0039 In this example censoring occurs because some women stop being respondents in the sample attrition and because some women marry before giving birth KaplanMeier Hazard 006 005 004 003 002 001 7 KaplanMeier hazard functions are the primary way of conducting descriptive analyses of spell data 8 A survivorbased approach is available for characterizing continuous time data C Parametric distributions for continuoustime spells l The survival function has a similar de nition in continuous time Sl ProbTZ I l Fl 2 The de nition of the hazard rate differs mProblSTltZAlT2l k0 2 20 A limFtAt Ft Az gt0 A150 2 m 2 N Sz 1 Ft 3 Survival functions and hazard functions are clearly related to underlying density and distribution functions 4 In working with these functions it is useful to note the following relationships a Relationship between hazard and survival function d In S I dz b Relationship between density and hazard function f0 50 W C Integrated hazard function W Az st ds d Other relationships S I e A and Al ln S I 5 Some common parametric distributions for continuous time data Distribution Hazarddensity function Survival function Exponential Mt x St e Weibull Ml prtZY l ln Sl l Gompertz In Mr 0L BI ln SZ Z e 3 Log normal fl pl p p lntt SZ I p lntt 6 Maximum likelihood estimation a likelihood function is lnL9 Z lnfl6 Z ln1 Fl6 uncensored censored Z 1nxzeZ 1nSr9 uncensored all b essentially the same likelihood function as the censored regression model similar data issues D Heterogeneity bias a N b 7 So far we have examined parametric and nonparametric means of describing the time pattern of exits from spells We have assumed that the spells are drawn from a single distribution with a common probability function we have not allowed for heterogeneity in the hazards for individuals Presence of heterogeneity leads to bias in the estimates of duration dependence Consider the following example a Assume that there are two types of people one type with exponential hazard X1 leavers and another with exponential hazard 9v stayers where 9v lt X1 b Next consider what happens to the average hazard as spells progress c As spells progress the leavers will exit faster than the stayers and the sample of survivors will become disproportionately composed of stayers d With controls for heterogeneity there will appear to be negative duration dependence e The failure to account for heterogeneity generally leads to bias this is different from standard regression models where only certain types of heterogeneity lead to bias f The intuition here is that event history models fit the distribution of observed exit times 1 unless you specify otherwise the models assume that all of the heterogeneity in exits is associated with duration dependence 2 the models con ate different sources of heterogeneity 3 similar to issues that arise in other ML models E Multivariate models accounting for observed heterogeneity 1 General approaches for parametric models technically any parameter could be specified to be a function of observed variables eg for the exponential model could specify 9 elm 2 Proportional hazard speci cations a A more common way that observed heterogeneity is incorporated is through a proportional hazards assumption 1 let Mr X B s z eB X where z is a baseline hazard and eBX is a proportional shifter 2 in this case the observed characteristics shift the entire hazard function up or down b This greatly simplifies the calculation of the hazard but is very restrictive l Restricts all individuals to have hazards with the same shapes 2 Assumption should be tested 3 Discrete logistic hazard a Consider a hazard evaluated at discrete intervals b Assume that the hazard follows a logodds specif1cation such that eXpI3 Xz l exp3 XZ c Suppose that transition occurs at duration I construct observations so that W 1 Observations for f1rstl l periods have outcomes Yj0i1r1 2 Observation for the lth period has outcome Y t l 3 Put another way construct l observations for each person with a sequence of Yj values equal to O O O 0 0 1 d If the observation is rightcensored at I construct a sequence with all zeros e Each outcome would also be accompanied by measures of the observed characteristics in each time period X for l l f With these data you could then run a standard logit model in which the Yj values are the dependent variables and the X are the independent variables g Speci cation is easy to implement and VERY exible 1 Easy to specif1y exible baseline hazard function 2 Incorporates timevarying covariates 3 Can be estimated with standard logit software once the data are transformed 4 Workhorse specification for exploratory data work h Besides the logistic distribution discretetime hazards are often fit with the complementary loglog distribution 1 For this distribution the hazard function is W 1 expl eXpl3 Xz l 2 For this speci cation you would set up the data in the same way as for the discretetime logit model 4 Cox partial likelihood model a One general weakness of parametric event history models is that they often restrict the shape of the hazard function like other distributional misspeci cations this can lead to biased estimates The Cox partial likelihood approach is a semi parametric approach that addresses this concern 1 Combines a nonparametric speci cation of the baseline hazard with 2 A proportional hazards assumption General description of approach when there are complete data 1 Arrange the observations in the order that the transitions occur from II to In 2 At any point in time I the risk set for departures will be R0 the observations I through In 3 Given the sequence up to I l1 l2 If that is the ranking of transition times the probability one transition occurs at I out of the risk set R0 is x01 7 X1 equij ZkeRjgtltrXk6gt ZkERjexpltB Xkgt This expression does not depend on the baseline hazard The expression can be modi ed to account for right censoring and for discrete exit times ties in the exit times Estimates of B are obtained conditional on an arbitrary baseline hazard function Note uses a partial or conditional likelihood approach rather than a full likelihood approach g Even more exible versions such as models with stratif1ed baseline hazard separate baseline hazards for different groups are available 5 Stata commands for descriptive and multivariate event history analyses a Start by creating an event history data set 1 If there is just one record for each failure time no 2 3 V V timevarying covariates use stset limeivariable failurevailureivariable where Zimeivariable gives the time to failure or transition and failureivariable is an indicator for whether a failure occurred the alternative is that the observation is right censored When multiple records are available for each spell eg if there are timevarying covariates add an ididivariable option to the stset command and use the limeivariable to indicate the end of each interval Once this is done Stata estimates models and calculates statistics in the context of this data arrangement do not need to specify dependent variables b KaplanMeier estimates for descriptive analyses 1 For survival functions use the sts command sts OR sts graph will create a graph with the KM survivor function can produce list output by substituting list for graph can perform conditional analyses by adding byconditioningivariable option 2 Can also produce smoothed hazard estimates by adding the hazard option sts hazard c Estimating parametric models 1 Use streg command streg lisliofivariables distributi0ndistilype 2 Where the distributions include exp exponential weibull Weibull gamma generalized Gamma gompertz Gompertz lnormal log normal llog loglogistic 3 Unless you specify otherwise Stata will output exponentiated coeff1cients interpreted as hazard ratios to get regular untransformed coef cients use nohr option d Estimating the Cox partial likelihood model stcox lisliofivariables e Estimating discretetime logistic model 1 Can use pgmhaz8 module written by S Jenkins 2 Program estimates a discretetime complementary loglog model 3 Use ssc install pgmhazs to install program 4 Syntax is pgmhazS lisliofivariables 5 Reports results from standard complementary log log model and from a model that accounts for gammadistributed unobserved heterogeneity frailty F Discrete Outcomes and Interval Outcomes 1 We often have data that are reported in countable units such as days weeks or months an issue arises however with whether the transitions actually occur on these types of time scales a Discrete outcomes actually occur at discrete intervals examples include some types of programmatic outcomes which terminate at the end of a day or month here the underlying event history process is discrete b Interval measures are used when the exact date of a transition is unknown but a range of dates is known here the underlying transition is continuous but the measures are discrete c We use different models for these two types of data 2 Discrete outcomes can be modeled using a discrete procedure such as the logistic or conditional loglog procedures described earlier 3 Interval outcomes should be modeled using a modi ed continuous procedure a Suppose that a transition was known to occur on or after duration I but before duration lk If lt lk b The hazard would be Flk Ftj S 2 c The aML software accommodates interval outcomes stata does not appear to address this issue treats ending times as exact exit times Mr STltlk 4 Timevarying explanatory variables a If explanatory measures change values at discrete points in time we can use a variant of the interval estimator to accommodate them b Essentially this requires breaking any spell into discrete subperiods corresponding to times when the explanatory variables remain constant c Stata can accommodate this using the multiplerecord version of the stset command d This approach does not work if the explanatory measures are continually changing ie are a continuous function of duration for continually changing measures you would have to include the duration dependence in the specification of the hazard G Unobserved Heterogeneity l N b 7 U1 Suppose that the hazard rate depended on a set of observed variables X and an unobserved variable u such that Mil II MEXIBJ It In particular assume that u is a continuous random variable with a density function gu If u was observed we could just include it as an explanatory variable however because u is unobserved we must condition on all possible realizations of u this is equivalent to examining the expected value of Mr or EM Mr I mm dH For some speci cations of gu and MI u it is possible to derive a closedform expression for this expected value a The gamma distribution is one such distribution it is especially convenient to work with b The Stata streg procedure will estimate models with this type of unobserved heterogeneity by adding the frailtygamma option c Jenkins pgmhazs discretetime complementary log log procedure also estimates models with gamma distributed unobserved heterogeneity If u is normally distributed a closedform expression is not possible however GaussHermite quadrature can be used to give an accurate approximation Mr I mm dH 2 2311M ocjgtgltocjgtwj 0 where 09 is an abscissa w is a weight andM is the number of quadrature points see earlier notes on numerical methods a Logit and complementary loglog models with normallydistributed heterogeneity can be estimated in Stata by using the longitudinalpanel procedures for these models xtlogit and xtcloglog with the random effects re options Logit and piecewise linear Gompertz models with normally distributed unobserved heterogeneity can also be estimated using the aML software package An alternative is to assume that u follows a discrete distribution withM outcomes ul uz rm and M probabilities in 72 71M then EM 2311M l mu We refer to this specification as a nite mixture model Heckman amp Singer 1984 Some normalization is needed such as u1 O and 751 l TEZ TCM The specification is very exible and can be viewed as an approximation to an arbitrary distribution Note the similarity between this expression and the GaussHermite quadrature expression 1 the GaussHermite quadrature specification is clearly a special case 20 2 comparison helps to show how the nite mixture approach generalizes the speci cation of the distribution Stata software 1 S Jenkins has written a routine to estimate a complementary loglog model that incorporates a nitemixture correction hshaz 2 A more general set of routines for Stata is available through the gllamm package written by S Rabe Hesketh see httpwwwgllammorg The aML software package also will estimate models with nite mixture distributions H Markov Transition Models for DiscreteTime Processes 1 Notation and data a Consider a discretetime process eXits or transitions observed at discrete intervals Let Y m be a dummy variable that indicates whether person 139 occupies state 0 or state 1 at time period I In aMarkov model Y U depends on the past realizations Y AH Y H In a firstorder Markov model Y a only depends on the immediate past realization Y EH We can consider the probabilities associated with making different types of transitions 1 let Pjkl be the probability of transitioning from state O l at time l l to state k O l at time I 21 2 possible transition probabilities for a firstorder Markov model are P010 P000 P100 and P110 where P00Z 1 P01Z and P11Z 1 P10Z f Finally consider a slalionary rsl order Markov model in which P010 P01 and P100 P10 for all I 2 It is straightforward to estimate stationary f1rstorder Markov models using standard binary outcome methods a Assume that the transition probabilities are functions b of observable characteristics X Those functions could be consistent with l Logistic distributions logit models 2 Normal distributions probit models 3 Extremevalue distributions complementary log log models 3 Similarly we could assume that the models are stationary conditional on the observed variables which would allow us to include timevarying observable variables 4 Models are a restricted version of the discretetime hazard models models have no duration dependence 5 Advantages a Models are relatively easy to estimate as they use standard software Require less data than standard hazard models do not need the entire history of the process or the duration of the spell Can use these models with leftcensored data 22 d Models are commonly used when there are only a few periods of data can be estimated using only two observations on the outcome variable 6 Main disadvantage is the strong assumption on duration dependence 7 If the data are leftcensored and the unobserved determinants of the outcomes are correlated you need to account for initial conditions ie selectivity associated with the rst state that you are able to observe a Formal approach is to calculate the probability of a given initial state given all of the possible earlier transitions this can be cumbersome b Informal approach is to estimate an approximate model for the initial condition eg simple probit model allowing for correlations between the unobserved determinants of the initial and subsequent outcomes 1 Repeated events 1 Single spells versus multiple spells a Single spell models are appropriate for describing many types of outcomes such as mortality transitions to rst marriages initial onset of drinking or sexual activity etc b However many processes are described by repeated spells or repeated periods at risk for the same individual examples include 1 Unemployment or employment spells 23 2 Welfare spells 3 Activity spells time use episodes The primary statistical issue that arises with repeated spells is that they are not likely to be completely independent of one another leading to concerns about 1 Appropriate calculation of standard errors 2 Controls for timeinvariant unobserved characteristics 2 Standard errors a b At a minimum we would like the calculation of the standard errors to re ect associations across the spells Standard errors for repeated spell models should therefore account for clustering eg using the robust or cluster options in Stata These corrections do not change the coef cient estimates just the coef cient variancecovariance estimates 3 If the source of the correlation across spells is a time invariant unobserved characteristic we can use random effects corrections as in any other panel model a The correction in the repeated spell case is similar to the general correction for unobserved heterogeneity b The only difference is that the unobserved term u in our earlier models applies to all of the observations for an individual instead of just a single spell 24 C For continuous hazard streg models in Stata common random effects can be incorporated using the shared frailty option 1 In streg include the frailty option 2 In addition include the sharedgr0upiid option where groupiid is an identi er common to all of the repeated events for a person or some other group For discretetime hazards with normally distributed random effects in Stata 1 Use the xtlogit or xtcloglog commands with random effects the re option 2 Specify the i option to use a personspecif1c identif1er instead of a spellspeci c identif1er Repeated discrete and continuoustime event history models with normal and finite mixture random effects can also be estimated in aML J Multiple outcomes 1 Description of problem a b C In our analyses so far we have assumed that there are only two states that a person can be in with the person starting in state 0 and transitioning to state 1 What happens however when there are several possible states that a person can transition to Example from labor economics 1 Consider someone who is initially unemployed 25 2 The person can transition to employment 3 But the person can also drop out of the labor force 2 Competing risk multiple latent risk framework a b c d Consider a person who starts a spell in state 0 Over the course of that spell assume that she is at risk of transitioning to one of several states The transition that occurs rst is observed All of the other risks are latent unobserved 3 If the exit times are continuous a b There is no possibility of multiple transitions occurring at the same instant The competing risk hazard could be modeled by treating l The transition into the state of interest as an eXit event or failure 2 Transitions into any other state as censoring events 3 Model is the same as the standard hazard just with a different de nition of censoring 4 If the eXit times are discrete There is some possibility of alternative transitions Competing risk should be modeled using some type of unordered multinomial choice speci cation such as multinomial logit If multinomial logit is used the set up of the data is very similar to the standard approach 26 Suppose that transition to one of K possible states occurs at duration I construct observations so that 1 Observations for rst l l periods have outcomes Yj0 j 1 2 1 2 Observation for the lth period has outcome Y t k where k l 2 K depending on the type of transition If the observation is rightcensored at I construct a sequence with all zeros Each outcome would also be accompanied by measures of the observed characteristics in each time period X for l l With these data you could then run a standard multinomial logit model in which the Yj values are the dependent variables and the X are the independent variables 27 References Allison Paul DiscreteTime Methods for the Analysis of Event Histories Sociological Methodology 13 1982 6198 Greene William Econometric Analysis 5th edition Upper Saddle River NJ Prentice Hall 2003 Chapter 22 Heckman James J and Burton Singer A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data Econometrica 52 hdarch 1984271320 Lillard Lee Simultaneous equations for hazards Marriage duration and fertility timing Journal of Econometrics 56 1993189217 Moff1tt Robert A and Michael S Rendall Cohort Trends in the Lifetime Distribution of Female Family Headship in the United States 19681985 Demography 323 August 1995 40724 Ribar DaVid C Transitions From Welfare and the Employment Prospects of Lowskill Workers Southern Economic Journal 713 January 2005 51433 28 Exercises 1 From the extract of timeuse time apart data from the ThreeCity Study available at httpWWW nm o J bae e le ri39 ar teachin FC0721 create an event history spell le for the rst spell apart Restrict the le so that it only contains observations for caregivers and children who have at least one spell apart and focal children who are 11 years of age or younger The resulting le should have 709 observations Along with the variables necessary to describe the duration of the spell the failureexit status create variables corresponding to the marorcoh hsgeal morthnhs fcagen black hispanic boston and chicago variables that we have examined in previous programs Use these data to estimate the following models a a continuous time exponential hazard model of the association of marorcoh hsgeal morthnhs fcagen black hispanic boston and chicago with exits from the rst spell apart b a continuous time Gompertz hazard model with the same explanatory variables and c a Cox partial likelihood hazard model with the same explanatory variables Use the estimation results to d Discuss the assumptions regarding duration dependence in each of these models What do the results indicate about these assumptions which speci cations can we reject e Discuss how the estimated association between marriagecohabitation status marorcoh differs across the models 2 Download and run the Stata program dishaziwj tuao The program uses the timeuse data from the ThreeCity Data to estimate four discretetime logit models of the duration of the rst spell apart for the caregivers and focal children Examine the program and results and use these to answer the following questions a What assumptions do each of the models make regarding the shape or pattern of duration dependence What do the estimation results indicate about the pattern of duration dependence b How does the estimated association between marriagecohabitation status marorcoh differ across models c How do the results from these models compare with the results from the continuous time models Extra credit no help will be given by the instructor 3 Use the Stata program dishaziwj tuao as a template For the rst spells apart for the focal children aged 11 and younger estimate the following discretetime hazard models a a complementary loglog model with general controls for duration dependence through the 141 hour apart a single control for the 15ml9Lh hours apart and marorcoh hsgeal morthnhs fcagen black hispanic boston and chicago b the same model with gammadistributed unobserved heterogeneity and c the same model with normallydistributed unobserved heterogeneity 29