### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Advanced Biostatistics Seminar 171 290

UI

GPA 3.8

### View Full Document

## 26

## 0

## Popular in Course

## Popular in Biostatistics

This 74 page Class Notes was uploaded by Vivien Tillman V on Friday October 23, 2015. The Class Notes belongs to 171 290 at University of Iowa taught by Joseph Cavanaugh in Fall. Since its upload, it has received 26 views. For similar materials see /class/228028/171-290-university-of-iowa in Biostatistics at University of Iowa.

## Popular in Biostatistics

## Reviews for Advanced Biostatistics Seminar

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/23/15

imam midi uwg l lgjw eb ik HUHM IM lMl l ll l ii ml Ledmm W m c e MC If l ll i di ed AlC ailCC and lMiAllC Joe Cava naugh Department of Biostatistics College of Public Health The University of Iowa February 15 2005 Joe Cavanaugh j W ngmm s imain Lecture IV Corrected AIC and Modified AIC AICc and MAIC 0 Review of AIC and AICc Lectures II and III 0 Proof of AICc Lemma 0 The Modified Akaike Information Criterion MAIC 0 Discussion Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review of AIC and AICc Key Constructs 0 True or generating model fyi0o o Candidate or approximating model fyi0k a Candidate class Hk fYi0k i 0k 6 900 o Fitted model fyi k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review of AIC and AICc Kullback discrepancy between fyi0o and fyi k with respect to fyi 00 109mm E9072 In fyi0ki9k k 0 Expected Kullback discrepancy A090 k E90d0o k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review Review of AlC and AlCc Consider writing A090 k as follows A090 k Eeod0o7 kl Eeo2 1 f l k EQOH In fyl 00 7 EW2 In an an 1 Emdwm me Eeomnaylml 2 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection MAM D Review of AlC and AlCc o The derivation of AlC is based on the following lemma Eeo2lnfYl0oEeo2lnfYl k k017 Eeod0o kEeo2lnfYl0o k01 3 Conditions under which the lemma holds 39 fyl90 6 k A o the maximum likelihood vector 6k satisfies the conventional largesample properties of MLE39s Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review Review of AlC and AlCc o The derivation of AlCc is based on the following lemma 0 We assume that the candidate class fk consists of normal linear regression models and that fyl0o E fk a We use p to denote the rank of the design matrix meaning k p 1 Eeo2 1 lel 0a Eeo2 1 lel 50 nlnn2 7 nzJ Eeod0ol k Eeo2 1 lel 0a inlnn2 mJ Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection aw AICc Proo Proof of AlCc Lemma 6 Suppose that the generating model for the data is given by yXBOe7 eNNn07Ug 7 and that the candidate model postulated for the data is of the form yX e e Nn0a2 I 0 Here y is an n x 1 observation vector e is an n x 1 error vector 30 and B are p x 1 parameter vectors and X is an n x p design matrix of full column rank Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection ReviewJ AICc Proof MAM Proof of AlCc Lemma Let 00 and 0k respectively denote the k p l 1 dimensional 2 2 vectors 000 and 0 Assume 30 is such that for some 0 lt p0 p the last p 7 p0 components of 30 are zero 9 Thus the true model is nested within the candidate model 0 Note that the nesting ensures that fyl0o E fk Let 3 denote the least squares estimator of B and let 62 y 7mm fwdn Let k 33w denote the MLE of 0k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection ReviewJ AICc Proof MAM Proof of AlCc Lemma Preliminary Results 9 Let X2 be a random variable having a central chi square distribution with d degrees of freedom 0 1 1 E 7 7 X2 d 7 2 ElogX2 ln21J d 2 7 where 1 is the digamma or psi function 0 12 cannot be expressed in closed form yet for z gt 1 it can be approximated to any degree of accuracy by using an expansion 1 1 1 1 wlz 7 Inlz 7 Z 71222 12024 7 25226 739quot Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection w AICc Proo Proof of AlCc Lemma Proof 0 Let E0 denote the expectation under 00 o The log likelihood for the candidate model is given by 1 In fyl 0k 7 ln27r 7 g ina2 7 g 7X5 y 7X3 0 The following relations can be established Eo72lnfyl0o nlnag l nl l ln27r7 Eo72nfyl k Eonnamp2nln27r Kama nn62n 13 olXX i o mm 6 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection ICc Proof MAM Proof of AlCc Lemma To evaluate the expected value of d0o k under 00 note the following o n zUg has a chi square distribution with n 7 p degrees of freedom the quadratic form 7 60 1a X XB 7 30 has a chi square distribution with p degrees of freedom 3 62 and 3 are independent Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection l I Eod0o k Eonnamp2 E0 7039 om 62 E0 Ea 37 o x x i 30 n m 2 039 Eonn02 nn27rn2Eonamp02 nEo Ea 37Bo10 XIX3BO Eon In 72 n1 n27r 7 n n21n 7 p7 2 quot 1quot i P i 2 P Eow W n1ln2wgt 2591 17129 Advanced Bioslalislics l I For the first bias adjustment term we have E472 In M 00 e EOH In an m nlnag n1 n27r 7 Eonnamp2 n1 n27r Eonltgt2 EOMM nlnninEonltrTggt nlnninn21J quotpgt nnn27 mp n P 2 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection l I For the second bias adjustment term we have Eod0ov k 7 Eo2 In I M 0a Eonnamp2 n1 In 27r n7p72 7nnagnln27r E nln 2 72np1 o 03 n7p72 1 n82 2np1 E0quot39 EltTginipi2 n62 2np1 iinlnnnEonltgt777072 o 7nlnnnn2 ltgtm n7p72 7nnn2mJ M n7p7239 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection AICc Proo Proof of AlCc Lemma 0 AlCc is obtained by adding the bias adjustment terms derived in the previous lemma to the baseline estimator of A090 k 72 In fyl0k a We have AICc 72n fyl k n nn2 7 mp 7nnn2 mp M n7p72 2p1n 72 f Yl Mini372 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection aw 1 MAIC D The Modified Akaike Information Criterion MAIC o The asymptotic unbiasedness of AlC and exact unbiasedness of AlCc in the normal linear regression framework require the assumption that fyl0o E fk o This assumption implies that the candidate model of interest fyl 0k is either correctly specified or overspecified o The modified Akaike information criterion MAIC is one of several AlC variants based on a development that relaxes this assumption 0 MAIC was introduced for the framework of normal multivariate linear regression models by Fujikoshi and Satoh 1997 0 We will introduce MAIC in the framework of normal univariate linear regression models Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection The Modified Akaike Information Criterion MAIC 0 Let f fk1 k2 fkL represent the candidate family 0 Assume that the largest candidate class in f is fK ie K maxk1 k2 7 o MAIC is based on the assumption that fyl 00 E fK 0 Under this assumption the largest candidate model fyl0K is either correctly specified or overspecified o The candidate model of interest fyl0k may be correctly specified underspecified or overspecified Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection L 0 1 D r The Modified Akaike Information Criterion MAIC o In the regression framework the model fyl 0K would typically represent all covariates under consideration 9 Let P denote the rank of the design matrix for this model 0 Let 0 denote the error variance for this model 0 Let 63 denote the maximum likelihood estimator of of 0 Define MAIC as A 2nP 1 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Discussion Discussion 0 MAIC can be written as MAIC AlCc 2p EZ 232 1 2 E2 21 1 o The additional stochastic penalization added to AlCc is designed to be approximately zero for correctly specified and overfitted models o For underfitted models this penalization is designed to reduce the bias of AlCc Joe Cavanaugh 17129 Advanced BioslaLislics Seminar Model Selection E Discussion Discussion Bias Properties of AIC AlCc MAIC Normal Linear Regression Models Fitted Model AIC AICc MAIC Underfitted 01 0 O1n Correctly Specified or Overfitted O1n O 01n2 Joe Cavanaugh 17129 Advanced BioslaLislics Seminar Model Selection Discussion Does the additional stochastic penalization added to AlCc to produce MAIC yield a practical improvement In simulation results featured in Fujikoshi and Satoh 1997 MAIC marginally outperforms AlCc in terms of overfitted selections yet performs the same as AlCc in terms of underfitted selections In the next lecture we will examine the performance of AIC AlCc and MAIC in a simulation study a We will also introduce a general criterion that provides an asymptotically unbiased estimator for A090 k without requiring fyl0o E fk Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection Ml m dwa immd Bi gt gttteiti iiilt Sainm rmare lMlelel Sill a iilwm Hill Ali and Al e Joe Cava naugh Department of Biostatistics College of Public Health The University of Iowa February 1 2005 Joe Cavanaugh j W ngmm s imain Re l2 39 Simuia C s Lecture lll AlC and Corrected AlC AlCc 0 Review of AlC Lecture ll 0 Bias Investigation of AlC o The Corrected Akaike Information Criterion AlCc 0 Discussion 0 Sim ulation Example ll Advanced Bio aLi 39 39 Model Selection R evi ew Review of AIC Key Constructs 9 True or generating model fyi0o o Candidate or approximating model fyi0k o Candidate class Hk HiW i 0k E 9W Kullback discrepancy between fyi0o and fyi0k with respect to fyi 00 1090700 Eeo2 in H 00 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection R evi ew Review of AIC o Fitted model fyi k o Kullback discrepancy between fyi0o and fyi k with respect to fyi 00 1090750 Eeo2 in fYi0ki9k k39 o The Akaike information criterion AIC 72 In fyi k 2k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review Bia 39 5 Simulation Review of AIC o The discrepancy 1090750 E902 In lel0klek k reflects the separation between the generating model fyl 00 and a fitted model fyl0k 0 Evaluating d0o k is not possible since doing so requires knowledge of 00 0 Under appropriate conditions the expected value of AIC asymptotically approaches the expected value of 1090 0k say A090 k E90d0o k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review of AlC 0 Specifically one can establish that E90 AIC 01 A090 k 0 AlC can therefore be viewed as an asymptotically unbiased estimator of A090 k the expected Kullback discrepancy Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection Review of AlC To justify the asymptotic unbiasedness of AIC consider writing A090 k as follows A0907 k Eeod007 M Eeo2 In I DWU EeoH In fyl 00 7 EW2 In fyl 63ml 1 Eeod0o ki E902lnf l00l 2 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection R evi e Review of AlC Eeo239nfYl0oEeo239nfYl k k017 Eeod0o kEeo239nfYl0o k01 0 Conditions under which the lemma holds o meg e k A o the maximum likelihood vector 6k satisfies the conventional largesample properties of ML 395 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Bias Investigation of AlC o AlC provides us with an approximately unbiased estimator of A090 k in settings where n is large and k is comparatively small 0 ln settings where n is small and k is comparatively large eg k z n2 2k is often much smaller than the bias adjustment making AlC substantially negatively biased as an estimator of A090 k Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection Bias Investigation of AlC o In the framework of normal linear regression we will investigate the adequacy of the approximation of the bias adjustment by 2k 0 When the candidate class fk consists of normal linear regression models under the assumption that fyl 00 E fk the bias adjustment terms 1 and 2 can be exactly evaluated for any values of n and k 0 The appropriate formulas will be later derived Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection Review Bias ofAIC J i ation Bias Investigation of AlC In the normal linear regression setting the following tables list the exact values of the bias adjustment terms 1 and 2 for specific values of n and k k n 1 2 3 320 301 306 5 320 504 515 9 320 913 945 17 320 1746 1856 3 160 303 313 5 160 508 531 9 160 926 994 17 160 1797 2034 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review Bias of AIC J Bias Investigation of AIC n 1 2 80 306 326 80 516 565 80 955 1103 1911 2476 40 311 355 40 534 643 40 1019 1381 40 2212 3970 Nomw cmw oo o Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review Bias of AIC J Bias Investigation of AIC k n 1 2 3 30 315 377 5 30 546 704 9 30 1069 1631 17 30 2506 5994 3 20 324 426 5 20 574 855 9 20 1193 2407 17 20 3760 30240 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review It AlCc Di i Simulation The Corrected Akaike Information Criterion AlCc o The corrected Akaike information criterion AlCc was first suggested for normal linear regression by Sugiura 1978 AlCc is defined by replacing the penalty term of AIC 2k with the exact expression for the bias adjustment Hurvich and Tsai 1989 demonstrated the small sample superiority of AlCc over AIC and justified the use of AlCc in the frameworks of nonlinear regression models and autoregressive models Since then AlCc has been extended to a number of additional frameworks including autoregressive moving average models Hurvich Shumway and Tsai 1990 vector autoregressive models Hurvich and Tsai 1993 multivariate linear regression models Bedrick and Tsai 1994 and overdispersed generalized linear models Hurvich and Tsai 1995 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection The Corrected Akaike Information Criterion AlCc o In the framework of normal linear regression models both univariate and multivariate the penalty term of AlCc provides an exact expression for the bias adjustment 0 In other frameworks for which AlCc has been justified the penalty term of AlCc provides only an approximation to the bias adjustment albeit an approximation that is generally more precise than 2k Joe Cavanaugh 17129 Advanced Bio 39 Model Selection R L39 AICc run The Corrected Akaike Information Criterion AlCc o The advantage of AlCc over AlC is that in small sample applications AlCc estimates the expected discrepancy A090 k with less bias than AIC o The advantage of AlC over AlCc is that AlC is more universally applicable since the derivation of AlC is quite general whereas the derivation of AlCc relies upon the form of the candidate model class fk Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection AICc Derivation of AlCc o The derivation of AlCc is based on the following lemma to be established in Lecture IV a We use p to denote the rank of the design matrix meaning k p 1 a We assume that the candidate class fk consists of normal linear regression models and that fyl0o E fk Eeo2 1 I M 0a 7 Eeo2 1 fYl k nlnn27mJltnpgt E90d0ov k 7 Eeo2 1 f l 0a inlnn2 mJ Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Derivation of AlCc o The statement of the preceding lemma involves the digamma or psi function 11z o The digamma function 12 arises when evaluating the expectation of the log of a chi squared random variable 0 12 cannot be expressed in closed form yet for z gt 1 it can be approximated to any degree of accuracy by using an expansion 1 1 1 1 wlz Inlz 7 Z 71222 12024 7 25226 739quot Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection Re ll L39 NE E Derivation of AlCc o AlCc is now obtained by adding the bias adjustment terms presented in the preceding lemma to the baseline estimator of A090 k 72 In fyl 0k 0 We have AICc 72n fyl k n nn2 ml wind2 quot11 l 2p1n 72 f Yl Mini372 17129 Advanced Bioslalislics Seminar Model Selection Joe Cavanaugh R eui ew 4 t Discussion Simulation Discussion Since k p 1 note that the penalty term of AlCc can be written as 2p1 47271 When n is large and k is comparatively small the penalty term of AlCc is approximately 2k However in settings where n is small and p is comparatively large the fraction nn 7 k71 may be appreciably greater than one making the penalty term of AlCc considerably greater than the penalty term of AIC In such settings AlCc often dramatically outperforms AIC as a model selection criterion 17129 Advanced BioslaLislics Seminar Model Selection Joe Cavanaugh Simulation Simulation Example Simulation Setting 0 One thousand samples of size n 30 are generated from a true regression model which has an n 30 by p0 4 design matrix a parameter vector of the form 30 171273 and a variance of 0390 4 o For every sample candidate models with nested design matrices of ranks p 27 37 7 P 13 are fit to the data a The first column of every design matrix is a vector of ones o The design matrix of rank pO 4 is correctly specified 0 The covariates are generated as iid replicates from a uniform 010 distribution 0 We examine the effectiveness of AIC and AlCc at selecting p 0 We refer to p as the order of the model Joe Cavanaugh 17129 Advanced Bio 39 Model Selection Re ll Simulation Simulation Example Order selections for AIC and AlCc p AIC AICc 2 0 0 3 0 0 4 543 843 5 105 93 6 54 29 7 43 16 8 42 8 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection l l Figure Plot of A6O7 k and the average values of AIC AlCc versus the order p The figure illustrates the negative bias of AIC 7 a 9 10 11 12 13 Order Joe Cavanai I 17 29 Advanced Bi tic Semina Model Selection Simulation Upcoming OpICS Topics on the upcoming agenda 0 Proof of the lemma which justifies AlCc 0 Additional approaches to approximating the bias adjustment for estimation of the expected Kullback discrepancy 0 Further simulations to evaluate AIC and its smaller sample variants Joe Cavanaugh 17129 Advanced Bio 39 Model Selection e 1 NEW in ll iiiwgdl Sawmill er lM Jl ll S li cj l l l Lgdmm V The Tgltewiuielh Mfrgwmm m ew mv THC Joe Cavanaugh Department of Biostatistics College of Public Health The University of Iowa February 22 2005 m awmm MW Lecture V The Takeuchi Information Criterion TIC 0 Review of AIC 0 Framework for TIC o Justification of TIC 0 Discussion 0 Simulation Study to Compare TIC AIC MAIC AICc Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection Review TIC D Review of AIC Key Constructs for AIC 0 True or generating model fyi0o o Candidate or approximating model fyi0k a Candidate class Hk VOW Wk 6 9W o Fitted model yi k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review TIC D Review of AIC Kullback discrepancy between fyi0o and yi k with respect to fyi 00 109mm E9072 In fyi0ki9k k 0 Expected Kullback discrepancy A090 k E90d0o k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review TIC D Review of AIC 0 Expected Fisher Information 82nfyi0k39 Wk EQquot if aekao k 9 Observed Fisher Information 82In may 10k7y aekael k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review TIC D Review of AIC Representation for A090 k leading to AIC A090 k Eeod0m m E9072 In fy k EQOH In an 00 i EeoH In an M me0mm i EeoH In m m EQOH In M Eeo k 7 0o 1 k mm 7 non Eeo k 0o390o k0o o1 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Review TIC Review of AIC EQOH In fyl k 2 Email a oo loo k a 0m 01 E9072 In 1mm 2k 01 0 The simplification of the bias adjustment to 2k requires two assumptions 0490 6 k A o the maximum likelihood vector 6k satisfies the conventional largesample properties of ML 395 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection TIC Framework Key Constructs for TIC 0 True or generating model gyl0o o Candidate or approximating model fyl0k a Candidate class Hk VOW Wk 6 9W o Fitted model fyl k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection TIC Framework 0 Kullback Leibler Information between gyl0o and fyl0k with respect to gyl 0o newer Eeo 39n l39 o Pseudo True Parameter 0k argmin9k69kl0o0k 0 When gyl0o E fk 9k 00 0 When gyl 00 fk 9k can be regarded as follows of all models in fk the model fyl k provides the best approximation to gyl0o in the sense of Kullback Leibler information Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection TIC Framework 0 When gyi 00 fk the MLE k will converge to the pseudo true parameter Q 0 When gyi0o fk the large sample variance covariance matrix of 0k is given by X050 39 k 1J k ki 17 Wk E90 t where Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection TIC Framework Kullback discrepancy between gy 00 and fy k with respect to gy 00 109mm E9072 In fy 0k 9k k 0 Expected Kullback discrepancy A090 k E90d0o k Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection TIC Justification Representation for A090 k leading to TIC A090 k Eeod0on k EeoH In fiyi n EQOH In fyl 67m e EeoH In fyl 63m Emdiom m e EeoH In fin 67m EQOH In fiyi n Email 7 kll l kn main 7 67m Eexi k e ki m kim 7 67ml o1 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection TIC Justification 7 E9072 In fyi k 2 Eexi k 7 ki m kix k 7 67m 01 7 E9072 In fyi k 2 tr i kE90 k 7 kx k 7 mm 01 7 E9072 In fyi k 2 itr k2 ki 01 7 Eeo72 In fyi0k 2 trl9k39 k 1J9kl014H 01 7 Eeo72 In fyi k 2 itIJ5k5ki 1i 01 17129 Advanced Bioslalisli TIC Justification TIC is defined by Takeuchi 1976 as follows TIC 72 In fyl 01 2 fr J kl k 1l where it Ji kiili kirl is an estimator of ltrJ kll9kl 1l Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection TIC Discuss Discussion 0 Recall the definitions for J k and l k A i E iazln ywkw 90 430k80 k WW m Hwyan a In general J k and K01 will depend on 00 and will not be directly accessible J0k and l0k must be estimated to evaluate the penalty term of TIC Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Discussion Silnl Discussion 9 In the expressions for J k and l k if k is used to estimate 00 the penalty term of TIC will reduce to 2k 0 The observed Fisher information evaluated at k 1 k7y is often used to estimate l0k Similarly the matrix aln fyl a aln nyl k 80k 80k is often used to estimate M01 9 An alternative approach is built upon the same assumption used in the development of MAIC Joe Cavanaugh 17129 Advanced BioslaLislics Seminar Model Selection Discussion Silnl Discussion 0 Let f k1 k27 7 IltL represent the candidate family 0 Assume that the largest candidate class in f is fK ie K maXk17 k2 kL 0 Assume that gyl0o E fK 0 Under this assumption the largest candidate model fyl0K is either correctly specified or overspecified 0 Let K denote the MLE of 0K 0 00 can be consistently estimated with K Joe Cavanaugh 17129 Advanced BioslaLislics Seminar Model Selection Discussion o In the normal linear regression setting when K is used to estimate 00 for the approximation of J0k and l0k TIC can be expressed as follows TIC 72nfyl0Ak2 p2 i 0 Here p denotes the rank of the design matrix for the candidate model of interest 0 63 denotes the maximum likelihood estimator of the error variance associated with the largest candidate model lel t9K Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection Discussion Sin Discussion 0 Compare to MAIC 2np1 nipi2 2 n 7 PM 7 7 PM 2 7 1 7 2 7 1 pn7Pamp n7Pamp 0 Here P denote the rank of the design matrix for the the largest candidate model MAIC 72 In fyl k 41gtin m m m Joe Cavanaugh 17129 Advanced BioslaLislics Seminar Model Selection Discussion Silnl Discussion The following table illustrates the relationships among AIC AlCc MAIC and TIC as estimators of A090 k Assumes Relaxes g l0o E Hk lel 0o E Hk assumption Requires large samples AlC TIC Designed for smaller samples AlCc MAIC Joe Cavanaugh 17129 Advanced BioslaLislics Seminar Model Selection 39 TIC Discussion Discussion o In the model selection literature a signal to noise ratio is often defined 0 Suppose that the response y can be partitioned into two stochastic components a predictable component 5 which can be modeled or described and an unpredictable component e which can be regarded as noise or error o A signalto noise ratio SNR can then be defined as a ratio of the variance of s over the variance of e I Advanced Bio aLi 39 39 Model Selection Discussion In the linear regression setting where the covariates are viewed as random we define SNR as the variance of the linear form in the regressor variables over the variance of the error component In traditional regression applications the linear form in the regressors is regarded as deterministic and thereby has a variance of zero However in simulation studies this SNR definition is sensible since the regressors are randomly generated The SNR definition is amenable to a familiar interpretation if a correctly specified model is fit to data generated under a true model with a signal to noise ratio of SNR the coefficient of determination for the fit will be approximately SNR1SNR Model Selection 17129 Advanced Bio Joe Cavanaugh Discussion In simulation studies to evaluate the performance of model selection criteria two common settings tend to promote a high frequency of underfitted selections 1 small sample sizes 2 low signal to noise ratios Selection criteria developed without the assumption gyl 00 E f often feature stochastic bias adjustments ie penalty terms Critics of such criteria argue that stochastic bias adjustments are highly inaccurate in settings conducive to underfitting Do the stochastic penalty terms of TlCMAIC yield a practical improvement over the nonstochastic penalty terms of AICAICc 17129 Advanced Bioslali cs Seminar Model Selection Joe Cavanaugh n Simulations Simulation Study Study Outline 0 In each of three simulation sets one thousand samples of size n 40 are generated from a true regression model which has an n 40 by p0 5 design matrix and a parameter vector of the form 50 11111 o For every sample candidate models with nested design matrices of ranks p 23 P 8 are fit to the data 0 The first column of every design matrix is a vector of ones 0 The design matrix of rank pO 5 is correctly specified 9 The covariates are generated as iid replicates from a uniform 010 distribution a In the three simulation sets the error variance 0 is chosen to produce signal to noise ratios SNR39s of 05 10 and 15 0 We examine the effectiveness of AIC AlCc TIC and MAIC at selecting p the order of the model Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection Simulations Simulation Study Set Order selections with SNR 05 AIC AlCc TIC MAIC 35 72 36 83 47 91 62 103 125 175 150 200 495 536 568 531 111 71 87 47 95 35 6O 26 92 2O 37 10 OONOUWJgtQJMU Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Simulations Simulation Study Set ll Order selections with SNR 10 AIC AlCc TIC MAIC ooNchU39Ibwmb 4 039 0 O 039 l 00 039 Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection Simulations Simulation Study Set lll Order selections with SNR 15 AIC AlCc TIC MAIC O O O O O O O O 2 5 4 6 696 846 792 896 147 93 115 71 82 34 50 19 73 22 39 8 OONOUWJgtQJMU Joe Cavanaugh 17129 Advanced Bioslalislics Seminar Model Selection n Simulations Simulation Study Conclusions 0 In settings where SNR is low or the sample size is small TIC and MAIC tend to choose underfitted models more frequently than AIC and AICc respectively 0 For a fixed sample size as SNR grows the underfitting propensities of the criteria are attenuated TIC and MAIC tend to outperform AIC and AICc respectively a Not illustrated in study For a fixed SNR as the sample size grows the probability of the criteria choosing an underfitted model converges to zero and TIC MAIC AICc and AIC all exhibit the same selection properties Joe Cavanaugh 17129 Advanced Bioslali cs Seminar Model Selection

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.