ADV APPL LIN MODELS
ADV APPL LIN MODELS BIOST 570
Popular in Course
Popular in Biostatistics
This 32 page Class Notes was uploaded by Ramona Leannon on Wednesday September 9, 2015. The Class Notes belongs to BIOST 570 at University of Washington taught by Staff in Fall. Since its upload, it has received 25 views. For similar materials see /class/192290/biost-570-university-of-washington in Biostatistics at University of Washington.
Reviews for ADV APPL LIN MODELS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/09/15
W More smoothing Thomas Lumley BIOST 570 20051177 LOWGSS Lowess Cleveland JASA 1979 is a local linear regression smoother with two additional refinements o The bandwidth is adjusted for each point x0 so that x0 h130 h always contains the same number of observations 0 The estimation is iterative outliers are downweighted by 1 T822 where s is 6 x medr lowess was originally designed as a scatterblot smoother for exploratory analysis not for accurate estimation of EYX 13 Curse of dimensionality Multidimensional local regression is easy in principle define weights based on distance in d dimensions but difficult in practice Consider data on 01d A rectangular subset with 10 of the data has diameter 01 with d 1 diameter 031 with d 2 and 079 with d10 High dimensional space is big Achieving a fixed bias and variance with a d dimensional smoother requires exponentially increasing sample size as d increases so smoothing is not feasible in more than two or three dimensions Curse of dimensionality implies that modelling high dimensional data always requires some assumptions about about parametric form or lack of interactions Local likelihood Local regression can be extended to glms by fitting a generalized linear model at each x0 instead of a linear model For a parametric generalized linear model the local glms are maximising a local likelihood 3 x0 Z wix0gi i1 The local IWLS algorithm uses weights that are w xo times the working weights 1g 2V Again we can use crossvalidation based on leave one out estimates of the deviance or of the residual sum of squares from the last interation of IWLS to choose h Multiple variables So far we have talked about local linear regression for smoothing a single variable The same modelling principle can be applied to multivariable regression For example in modelling health effects of air pollution we might want a model log Ehospital admissions at B x PML 7 x temperature where at is a smooth function of date Fitting this model by local linear regression would make 3 and 7 also depend on t This is undesirable at least for We need a way to have some parameters estimated globally rather than locally Multiple variables The most general algorithm for this is called backfitting Hastie amp Tibshirani Generalized Additive Models We can isolate each parameter for estimation by using the partial residuals Partial residuals for time are m 640 Y mm and for the two linear variable are r57 B x PML 7 gtlt temperature Y g and the working weights are 1g u2Vu We fit a local linear regression to m to update 640 and a weighted linear regression to r57 to update 8 and i Multiple variables The backfitting algorithm also allows multiple univariate smooth functions For example we might want to replace the linear term in temperature by a local linear smooth term 7temp The backfitting algorithm would then have three steps update 640 by local linear regression of partial residuals for time update 8 by weighted linear regression of the partial residuals for PMl update amptemp by local linear regression of the partial residuals for temperature The backfitting algorithm converges reliably but slowly the convergence is linear rather than quadratic for Newton Raphson This means that the tolerance for declaring convergence matters more A major study of air pollution effects had to halve its estimates of the effect of fine particle pollution after they found that their models had not converged properly Example In air pollution data look at effect of air stagnation Eog PMl at B x stagno libraryKernSmooth for local polynomial regression loclinearlt functionxyh locpoly does local polynomial regression at a grid of points altlocpolyxydegree1bandwidthh interpolate to the whole data set approxaxayxoutxy pmlt naomitpm 19 NAs for stagno alphalt withpm loclineardatelogpm1h60 forit in 130 modelltlmIlogpm1alphaquotstagnodatapm alphaltloclinearpmdate residmodelalphah60 alphalt alphameanalpha printcoefmodel Example Intercept O5473191 Intercept 171950392 Intercept 17025888 Intercept 16995985 Intercept 16990699 Intercept 16989764 Intercept 16989599 Intercept 16989570 stagno 00826203 stagno 009722629 stagno 00998084 stagno 01002649 stagno 01003456 stagno 01003598 stagno 01003624 stagno 01003628 Example 02 04 alpha 00 Inference Backfitting provides point estimates we need standard errors as well If we have a model gEY at X the regression parameters 8 are still linear functions of Y according to the IWLS algorithm so in principle we can go from B AY to a sandwich estimator varB Apr poor mTAT Inference The difficulty is that A involves the design matrices for all the local linear regressions done to compute the smooth curves and so is large and complicated To be precise B XTWU SX 1XTWI SZ where Z is the working response g Y g W are the working weights 1g 2V and S is the n x n smoothing matrix Computing AT requires computing 8 explicitly which will be slow if n is large For a local linear regression smoother each row of S is the diagonal elements of the hat matrix for the linear regression used to fit that point Inference The S PLUS function gam uses an approximation to the variance that pretends the smooth function is linear This works reasonably well as long as the bandwidth is large It worked very poorly in the air pollution time series models where 640 is very far from linear One simple and valid approach is a bootstrap Inference If the smoother matrix satisfies STW 2 W8 then we can rewrite B as B HZ XTWX WSX 1WX WSXTZ and so we don t need to compute S itself only SX the smoother applied to the design matrix for the parametric part of the model If there are p predictors in the parametric part of the model this takes only p smooths Domenici et al JASA 2004 The condition STW 2 W8 means that Y contributes the same to j as Yj does to iii It will be approximately true for local linear regression with a symmetric kernel and fixed bandwidth but not for lowess which has a variable bandwidth We can then compute the sandwich estimator for 3 B HZ Z mTHT Penalized likelihood Another approach to smoothing is to modify the objective function being maximized In the air pollution example we want to fit a model Yt N P0issonexpozt x where at is arbitrary except that it does not wiggle too much We can estimate at and 3 by maximizing Pm a 2 WOW A owes dt i1 The form of the penalty term Afa t dt is somewhat arbitrary It is motivated by the potential energy from bending a thin metal strip to fit the curve A controls how flexible the curve is so high A corresponds to large bandwidth Penalized likelihood It turns out argument requires calculus of variations that at is a piecewise cubic polynomial with knots at every data point a cubic spline but not a regression spline It is called a smoothing spline The gam function in S PLUS and in the gam package in R will fit models with smoothing splines Also 640 is a linear function of the working response and the smoother matrix that maps the working response to 64 satisfies the condition STW 2 W8 needed for feasible standard error calculations for the parametric part of the model These standard error calculations are not built in to gam in S PLUS or R For S PLJJS see httpwwwihapssjhsphedusoftwaregam exactgamexacthtm COITIDEII iSOI IS A regression spline model is simple to fit and simple for inference It does not fit an arbitrary curve but will fit most plausible relationships Local linear smoothers and smoothing splines will fit arbitrary curves given enough data Estimation is not difficult but correct standard errors require some additional effort I tend to use regression splines for modelling and use more complex smoothers only for exploratory analysis Digression The gam function in the mgcv package in R uses a compromise between regression splines and smoothing splines that may be the best of both worlds but it is too complicated to explain how it works here Example Compare B x 1000 from different ways of fitting og Ehospita admissions at 6 x PML 7 x temperature Without adjustment 50 Regression splines df 24 32 40 Linear 12 055 030 Quadratic 059 064 042 Cubic 048 060 041 Smoothing spline 089 060 041 Example Similar comparison for Elog PM1 at B x stagno Unadjusted 0102 Local linear smoother with h 60 0104 Local linear smoother with h 90 0100 df 24 32 40 Linear 097 095 094 Quadratic 095 095 095 Cubic 095 095 095 Smoothing spline 098 096 095 Example Health effect model is more sensitive to amount of smoothing because the effect is smaller the confounding is stronger and probably because the true relationship involves multiple lags In both models the qualitative conclusions are fairly resistant to how the smoothing is done W Missing data Thomas Lumley BIOST 570 20051179 Missing data Data are often not obtainable for every observation of every variable Missing variables are obvious in the data set missing observations may not be obvious but are just as important There are tradeoffs a smaller sample size might allow more complete observation a less accurately measured variable might be easier to measure The only really satisfactory solution to missing data is good design but good analysis can help mitigate the problems Missing data analysis is more interesting in longitudinal data covered in 571 Problems 0 Loss of precision due to less data 0 Computational difficulties due to holes in the data set o Bias due to distortion of the data distribution I will initially pretend that we have the simplest missing data structure where some variables are observed for everyone and other variables are missing or observed together Approaches There are at least two ways to work with missing data 0 By analogy with deliberately missing data in survey samples model the probability of being missing and use probability weighting to estimate complete data summaries 0 Model the distribution of the missing data and use explicit imputation or maximum likelihood which does implicit impu tation Probability weighting Write R for the indicator that the ith observation is not missing and 7r 2 ER Recall that o If 7r is correctly specified probability weighting can give unbiased estimates of any statistic o If 7r depends only on X in a regression model reweighting is not necessary Ignoring missingness Suppose we have a regression model Y X and some additional variables Z and that some components of XY are missing We can analyze the complete cases the observations where XY was completely observed This is the default behaviour of nearly all software If By J YXZgt then complete case analysis is valid the estimating equations for the regression parameters will still be unbiased EUi OlRi7Xi EEU oX R EUz oX 0 If Ri depends on Y or Z then complete case analysis is typically biased There are exceptions we saw one in case control studies Estimating 7r If missingness of a particular variable or set of variables depends on the values of variables that are always observed then we can estimate 7r by logistic regression For example suppose that we know the age and employment status of everyone in a proposed telephone survey and that the probability of participating depends on age and employment but does not depend directly on the opinions being surveyed We know Bi age and employment for each individual and can fit logitER logit7r a0 11 x age 12 x employment and estimate 7a Estimating 7r Estimating 7r usefully requires either that we know Y for missing individuals or that we have variables that are predictive of missingness and of Y but that are not in the model we want to fit Often if X predicts Y we would want X in in the model probability weighting is useful only when this is not true There are two major cases where this can happen 0 When Y is missing but an important intermediate variable is available c When more data are collected than are distributed Intermediate variables Suppose we have computerized records of diagnosis for people admitted to a hospital but that our outcome variable Y is diagnosis based on complete medical records We ask for permission to view medical records but some people refuse The diagnosis Z based on electronic records is predictive of Y and R but we do not want to use Z as a predictor variable in the model Any exposure that affects Y presumably also affects Z so Z is not a valid adjustment variable Using Z to estimate 79 will reduce bias and does not require adjusting for a Z Limited disclosure In many national surveys and large cohort studies the organiza tion that collects the data will distribute a de identified data set or in the case of the Census only data summaries Because agesexraceincome information is known from the Census for fairly small geographic areas the original data producers can reweight to the correct agesexraceincome distribution within each geographical area Other consumers of the data cannot do this reweighting since they do not know which subjects live in which geographical areas The data producer gains simplicity of analysis national surveys are already weighted so the weighting adds no additional complexity and allows simple summaries such as means and medians to be valid Data consumers benefit even more they could not use the geographical information directly in their analysis
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'