### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Statistical Analysis of Time Series STAT 635

OSU

GPA 3.58

### View Full Document

## 14

## 0

## Popular in Course

## Popular in Statistics

This 108 page Class Notes was uploaded by Alison Vandervort on Monday September 21, 2015. The Class Notes belongs to STAT 635 at Ohio State University taught by Staff in Fall. Since its upload, it has received 14 views. For similar materials see /class/209999/stat-635-ohio-state-university in Statistics at Ohio State University.

## Reviews for Statistical Analysis of Time Series

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/21/15

Outline Bayesian Optimal Predictive Model Selection Ernest Fokou 1 1Department of Statistics THE OHIO STATE UNIVERSITY Kettering University August 2006 Bayesian Optimal Pre ive Model Selection Outline OUTLINE 0 Introduction to Predictive Model Selection 0 Model Space and Optimality Criterion 0 Bayesian Predictive Optimality 0 Sparse Bayesian Learning Bayesian Optimal Predictive Model Selection Outline OUTLINE 0 Introduction to Predictive Model Selection 0 Model Space and Optimality Criterion 0 Bayesian Predictive Optimality 0 Sparse Bayesian Learning 9 Optimal Prediction via Model Space Search O The Median Probability Model 0 The Prevalence Model Bayesian Optimal Predictive Model Selection U TMME CD Introduction to Predictive Model Selection 0 Model Space and Optimality Criterion 0 Bayesian Predictive Optimality O Sparse Bayesian Learning 9 Optimal Prediction via Model Space Search 0 The Median Probability Model 0 The Prevalence Model Examples Conclusion and Extensions 0 Examples and applications 0 Conclusion and Extensions anmm Wmmmwmm Wm Introduction to Predictive Model Selection Model Space and Optimality criterion l el Spac rch ayesian Predictive Optilna ity to n and Exiensiot Sparse Bayesian Learning Optimal F39redic ion II J amples OU LINE 0 Introduction to Predictive Model Selection 0 Model Space and Optimality Criterion Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection pace and Optimality criterion Optimal F39re iction ode Space earc Bayesian Predictive Optilna ity amples 39 on and Extensions Sparse Bayesian Learning INGREDIENTS FOR MODEL SPECIFICATION 0 An iid data set D DXI7YI7I17 39 7 77 XIEXJiey o Aset of atoms Hh17h27 7hp o The atomic expansion p 09 50 Z hXi 1 o Noisy response variable Y fx e Bayesian Optimal Predictive Model Selection Introduction to Predictive Model S 39 e Space and Optimality criterion vi odel 5 c 39 Optimal F39re icti ayesian Predictive Optilna Ity Examples no ion and Extensions Sparse Bayesian Learning DIFFERENT TYPES OF ATOMS AND EXPANSIONS 0 Traditional linear model with x x1 xpT 6 RP p 00 Bo 2 WV 11 0 Polynomial regression with x e R p X 30 ZBX 11 0 Kernel regression with x e Rp fX 60 Z BKx7 X 1 Model Selection Bayesian Optimal Pred 39 Introduction to Predictive Model Se tion Model Space and Optimality criterion Optimal F39re iction v39 el Spac h Bayesian Predictive Opti n 39 y Examples Com and Extensi S arse Bayesian Learning DATA MATRIX AND FULL MODEL 0 The full model can be written as y H 6 0 Where the data matrix H is defined by 1 h1X1 h2X1 hpX1 H 1 h1X2 h2X2 hpX2 it h1m h23ltn 3 hprxn 0 And the other elements are yy177yniramoBpTse17new Bayesian Optimal Predictive Model Selection Introduction to Predictive Model S 39 e Space and Optimality criterion vi odel 5 c 39 Optimal F39re icti ayesian Predictive Optilna Ity Examples no ion and Extensions Sparse Bayesian Learning COORDINATEWISE MODEL DESCRIPTION 0 Define the coordinatewise model index Vv1V2vp ve01 0 Consider selecting from among submodels of the form Mv 3 y Hv v 6 Where 1 if h is used by model M v O otherWIse o The model space M with 29 7 1 models is defined as M MV1 V6 01P and V73 00 0 Model Selection Bayesian Optimal Pred 39 Introduction to Predictive Model S 39 e Space and Optimality criterion vi odel 5 c 39 Optimal F39re icti ayesian Predictive Optilna Ity Examples no ion and Extensions Sparse Bayesian Learning OPTIMAL PREDICTIVE MODEL SELECTION 0 Optimal predictive selection seeks to select from M M arg A511in HMv where the risk function HMV is HWv EMU 9W AWN o The loss function is the squared error loss ynew7 999W ynew 7 99 quot o The estimated prediction is given by Ky I I View Z 39 hwtxnew 60 1 Model Selection Bayesian Optimal Pred 39 Introduction to Predictive Model Selection Model Spac m y iter o l el Spac rch ayesian Predictive Optimality to n and Exiensiot Sparse Bayesian Learning Optimal F39redic ion II J amples OU LINE 0 Introduction to Predictive Model Selection 0 Bayesian Predictive Optimality Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Made 395 ace Search Examples Conclusion and Extensions o Specify pMV the prior in model space 0 Derive posterior probability of a model pley olt pylepMv 0 Where the marginal likelihood pylMV is given by pryle pylMV70p0lMVd0 Central role of pMVly Model selection and prediction are based on pMVly liaram lamnamlmeet Introduction to Predictive Model Selection Optimal Prediction via Mode 395 ace Search p es Conclusion and Extensions 39 39 The intuition might suggest that the best predictive model is the model with the highest posterior probability Mw am 341 Pley Some drawbacks of highest probability model 0 Correct if there are only 2 models in M o Requires considering all the models in M 0 Not necessarily the best when Nil 2 2 0 See Babieri and Berger 2004 for details jamaai limiting Wm Introduction to Predictive Model Selection Model Space and Optimality Criterion on s h i Optimal Prediction via 39pa ayesian Predictive Optima ty Examples Conclusion and Exiensi 2 Sparse Bayesian Learning PREDICTION THROUGH BAYESIAN MODEL AVERAGING Optimality of BMA prediction It is a known result that given a list of models the Bayes Model Average BMA prediction 2971 lAnew Z pMVKlyElYleK7xnewl k1 is optimal 0 Model description is lost in the averages o Computationally prohibitive Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Criterion Optimal Prediction via Mode 5 ace h ayesian Predictive Optima ity mples Conclusion and Exiensi 2 Sparse Bayesian Learning Approximating Bayesian Model Averaging Prediction l Broth prediction apparent and impel nmnmtyzmt t ltmy l l l 39 1 o If accurate prediction is the only goal then one should tolerate the computational burden and loss of model description and adopt the BMA prediction 0 However if there is a need to repeated predictions with the best predictor model selection becomes the goal alteration to lElivlth It makes sense that if we really want to select a model rather than make predictions based on BMA the selected model should produce predictions as close to BMA as possible How am is time Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Iter o l el Spac rch ayesian Predictive Optimality to n and Exiensiot Sparse Bayesian Learning Optimal F39redic ion II J amples OU LINE 0 Introduction to Predictive Model Selection 0 Sparse Bayesian Learning Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Made Space Search Examples Conclusion and Extensions 2 0 Likelihood under Gaussian noise with isotropic variance 0 n 1 10043702 moi exp e lyi H llz o Sparsin intuitively means Constrain the space of 6 so that many 3 are zero 0 Sparsity naturally achieved by 61 norm on 6 or double exponential prior over 8 anmm amammam Introduction to Predictive Model Selection Model Space and Optimality Criterion Optimal F39re iction ode Space earc Bayesian Predictive Optima ity alnples on and Extensions Sparse Bayesian Learning RELEVANCE VECTOR REGRESSION TIPPING 2000 Relevance Vector Machine Intui 0 Simple conditional Gaussian prior over each 3 PWlozi quot31 07 0471 o Intuition of relevance vector machine If 04 gt 00 then B sharply peaked at O 0 Therefore atom i is irrelevant Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Criterion Optimal F39re iction ode Space earc Bayesian Predictive Optimality amples 39 on and Extensione Sparse Bayesian Learning RELEVANCE VECTOR REGRESSION TIPPING 2000 0 Automatic relevance determination ARD hyperprior for sparsity induction a N gammaa b o Marginal prior on 3 pm p ilaipaidai Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Model Space Seam Examples Conclusion and Extensions 7 emf RVM p Studentitdriven by aand b o The marginal prior p6 is therefore a product of Studentt and therefore a good device for sparsity Sparsity controlled through hyperparameters a and b 0 Integrating 6 out leaves us with an a dependent distribution therefore a device for controlling sparsity through ML 2 133mm amaar lawnmowerat Model Space and Optimality Criterion Bayesian Predictive Optimality clu ii Sparse Bayesian Learning GEOMETRY OF THE MARGINAL PRIOR o Marginal prior pm using Gammaa b hyperprior for a 0 Notice concentration around the axes sparsity pressure 0 Therefore choose a and b to achieve sparsity ERNEST PARFAIT FOKOUE Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Criterion Optimal F39re iction ode Space earc Bayesian Predictive Optima ity alnples on and Extensions Sparse Bayesian Learning CONDITIONAL POSTERIOR OF THE WEIGHTS Conditional densities o The Conditional posterior of coefficients 6 is p la7az7y warm 0 Where 2 HTSH Aquot u EHTSy 0 With Sa2In and Adiaga1a2ap Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Uprimal F39re iciion 39a ch o The marginal likelihood pyla 02 is given by pryiai 02 y 0 8 HA HT 0 Where as before 3 azln and A diagoz1oz2 704p ammat earmm Introduction to Predictive Model Selection Optimal Prediction via Made 395 ace Search Examples Conclusion and Extensions ID TIoNS WITH TH gr o Ideally Bayes predictive distribution of the response pyly PYla702P ly7 annual Uzlyd dad02 o Often though all we hope to have is pyly 102 pylaazp lyaa2d A serious problem Very hard to find analytical expressions for pyly rangam Firrmar timers Introduction to Predictive Model Selection Optimal Prediction via Model Space Seam Exalnplest one lti nd Extensions sow39t t 1 o In our search for expression of pyiy the intractability comes from 2 2 a7 2 roman papcr p 0 iv pm 0 Instead of seeking the whole distribution pa aziy we can concentrate on a crude estimate 6aMp a lp where awn aim arg Egg pa7 02W anmm i f mt m l Introduction to Predictive Model Selection Optimal F39re iciion ISpa quot Extensione E nd EDCTVE DISTRIBUTI pyiy m pyi aimown aw airpw 0 Which is Gaussian and therefore tractable PW M JMP IP NWWE Thanks to Gaussianity pyiy z Ny ii 02 if xnele 02V UEAP hTXnewEhXnew i 4 an at armameme Wm Model Space and Optimality Criterion Bayesian Predictive Optimality Sparse Bayesian Learning Analysis of fX sinXX The sinc function is an interesting example 0 xe 1010 2 I r a s r r 2 a 2 a a I 0 Sample size n 100 0 Noise variance 02 001 0 Gaussian kernel used Figure Sinc function ERNEST PARFAIT FOKOUE Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Criterion E i Bayesian Predictive Optimality i Sparse Bayesian Learning mg RVM SOLUT39ON ON WW Analysis of fX sinXX o x E 1010 0 Sample size n 100 m 1000 repe ons Noise variance 02 01 o 0 Gaussian kernel used 0 Estimate noise level 0093 c Number of relevant vectors 6 o RVM regression test error RMS Fligugei RYM estimate of 00424558 smc unction ERNEST PARFAIT FOKOUE Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Made 395 ace Search rengts H K o Achieves both sparseness and fast accurate prediction 0 Works well for multivariate and univariate regression 0 Based directly on the predictive distribution 0 Flexible framework for both regression and classification 0 quotDoes not require a tuning parameterquot Drawbacks of the RVM framework 0 Relevance is achieved somehow by manual pruning 0 Does not take model size into account 0 Does not provide a measure of strength of relevance CEI S W Introduction to P we Mode Se Crl39l ptimal Prediction vIa Model Space Search one The Median Probability Model 1 The Prevalence Mo el Examples t n and Extensione O TLINE 9 Optimal Prediction via Model Space Search O The Median Probability Model Bayesian Optimal Predictive Model Selection IM39JGI39UI I The Median Probability Model The Prevalence Model Examples LONE and Exaensiotrc PRIOR DISTRIBUTION OVER THE MODEL SPACE 0 Assuming a priori that all the atoms are equally likely what is the prior probability that atom i is relevant ie included in the model Prv1I7r 7r 0 Prior probability of atom i Hr pwl7r W14 0 Assuming a priori that all the atoms are independent I 71quot 1 7 7rITVI pVl7r Bayesian Optimal Predictive Model Selection Introductian to The Median Probability Model 1 The Prevalence Mo el Examples LONE and Exaensiotrc PRIOR DISTRIBUTION OVER THE MODEL SPACE o The model size for a model index by v is defined by p Kv 2 Vi i1 0 Prior distribution of model size is binomial with PrK kimp lt i gt7rk1 7 y 0 The prior mode of the distribution of model size ks IP 1w 9 We can see that 7139 plays a crucial role Bayesian Optimal Predictive Model Selection Int II to e Mod ciion tion via Model Space Search on me JG The MedIan Probability Model Upllmal Fredquot The Prevalence Model Examples PRIOR DISTRIBUTION OVER THE MODEL SPACE o The a priori effect of 7139 reveals that large values of 7139 indicate our belief in large size models a By the same token a small value of 7139 indicates our belief in small size model 0 Hence 7139 provides a device for controlling the grade of sparsity parsimony o A key question naturally arises How does one go about choosing 7r Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Models ace Search Examples Conclusion and Extensions limitle vigil PRIOR OVER MODEL SPACE wig l 0 Fix the value of 7139 for expert knowledge 0 Estimate 7139 using empirical Bayes on full model 0 Put a prior on 7139 and explore its posterior via MCMC Completely nonintormative pMV By setting 7139 12 we express our belief that our prior inclusion probability of each variable is a pure random guess As a result 1 lev g which is simply a uniform prior over the 29 models that constitute the model space M Introduction to Predictive Model Selection Optima rediction via Mode S ace Search mples Conclusion and Extensions PQeTIEJEtH ABILITY OF INCLUSION Let M Mv MVP Mvzp be the model space t 0 Identity all models that contain atom hf 0 Compute the posterior probability of each such model 0 Compute the sum of those posterior probabilities Definition The posterior inclusion probability for basis element hf is 2971 Introduction to Predictive Model Selection Optimal Prediction via Mode ace Search a my mam 5 Conclusion and Extensions THE ll tll E ABILITY MODEL Definition If it exists the median probability model MW is defined by Vit1 if 102 0 otherwise 1 the ledi pobabil ty odli mdf we 0 Atoms that appear more that half the time in the set of plausible models The Median Probability model does not always exist 0 The fixed quotone halfquot cutoft may not be achieved 0 Our technique uses a more flexibleadaptive cutoft amt tartmating timerJ Wm Introduction to Predictive Model Selection Optimal Prediction via Model Space Search mples Conclusion and Extensions mwwmw Optimality of the Median Probability Model Babieri and Berger 2004 show that the median probability model prediction is the best approximation to the Bayes Model Average prediction for orthogonal designs and nested designs Main drawbacks 0 Due to fixed quotcutoftquot MPM does not always exist 0 Most of the search techniques used do not mix well b L 35333113 It is interesting to provide an extension to the MPM that has a more flexibleadaptive cutoft to guarantee existence l austere Meetn Qumran In oduc n 0 PI39 tl lode Sel Optimal Prediction via Model Space Search Examples Cot n and nsior The Median Probability Model The Prevalence Model 9 Optimal Prediction via Model Space Search 0 The Prevalence Model Bayesian Optimal Predictive Model Selection odel Selection Ge Search The Median Plobab I llly Model Examples Uonclueion and Extensione ELEMENTS OF THE PREVALENCE MODEL The Prevalence Mo e Definition The prevalence model MW is defined through index V with vP e quot0519353 lt2 where Pprev set of k0quott largest values of pi and W arg 133p MW 3 Bayesian Optimal Pred39 39 Model Selection Introduction to Predictive Model Selection Optima rediction via Mode S ace Search Examples Conclusion and Extensions limomve quot 1 mm 4 Wife lintan Comparison between Median and Prevalence o For orthogonal and nested designs the Median probability and the Prevalence models coincide O For nonorthogonal designs the Prevalence model emerges as superior and often exist where the Median fails to find any model at all y 1 Elements analytical proofs in progress I F 3 E 39 mm mans Introduction to Predictive Model Selection Optima rediction via Mode S ace Search Examples Conclusion and Extensions limumwzg Bquot THE PREVALENCE MODEL H mm v t We literate models of the optimal size The prevalence model always exists o Intuitiver better because it accounts for model size 0 Should have a connection to AIC and BIG that penalize model size 0 Later details show that the overall technique explores the support of pMVl y better Introduction to Predictive Model Selection Optimal Prediction via Mode ace Search Examples Conclusion and Extensions claim I N O A a1a2 akwhere a e 12 p The elements of A are called the active elements because they are the ones selected by the current submodel o The complement of A is D IA and contains the so called dormantelements since they are unused by the current model D E d1d2 dpk where d e 12 p Model uncertainty and model size uncertainty 0 Step 1 Birthanddeath kt At N pk7Al0t7y 0 Step 2 Gibbs sampling 00 p0lkt174f17y 133 V 39Fimiati twenan MeetJr Wm The Median Probability Model d The Prevalence Mo el Examples Lotlcluclon and Extension EXTENDED PRIOR WITH UNCERTAINTY ON R AND V Full prior specification 0 A full prior for all the unknown including model size 1067027 oak PkP 7027a l k 0 A truncated Poisson prior on model size k wk pkolt e for k1p 0 An even lull prior distribution including model indices pk7 A 10241 PkP 4lkP 7027 a l KA Rather than using a uniform pMV we instead use a locally uniform pAlk Works better in the presence of collinearity Bayesian Optimal Predictive Model Selection The Median Probability Model The Prevalence Model Examples Eviensionz DETAILED MCMC SCHEME FOR PREVALENCE Details of Prevalence Construction 0 Initialize 1 O and k lploj o A sample of k elements from 12 p 0 Initialize inclusion probabilities p O 0 0 Repeat 0 t 1 o A BirthandDeath73lt 1gtAWaw gammy o for 1 to lBl ifje A then p p1 end 0 p p1p27 mp o k lAWl engthA o H Gibbssampling0 1Amy o Untilt T Bayesian Optimal Predictive Model Selection In39il39oductiun 0 PI39 39 rediction The Median Probability Model d Examples on and Exaensiotie The Prevalence Mo el Birth and death process for model search 0 Initialize time O 0 Repeat k 0 Compute 6jforj1 k 6 6 1 0 time time Exponential1u 6 1 v6 o birth Bernoulli lt e If birth 1 o in Uniiormd 39gtd gt dg lm K K1 a v53 1 A Amu n D Dlt gtin o Else o i Multinomial616 4516 out a5 K K 71 o vg39m O A A39out Dm Dm UOUI 0 Until time 3 p Bayesian Optimal Predictive Model Selection The Median Probability Model d v39a a The Prevalence Mo el Examples on and Extension Birth rate and death rate of the process 0 Simulate a continuous time birth and death process in discrete time using an overall constant birth rather 1 O From the local uniformity of pAl k and the Poissonness of pk the death rate of element i simplifies to d 7 5 PYlk717AeI7027a 7 w pylkAwaZa 0 From the normality of the likelihood function we get 6 lazln HvibfAkiHil exp egyTtaZIn HviAmiHir yi lazln HviAHJHW exp imam HviAkiHir yi Bayesian Optimal Predictive Model Selection n to Fr t n Im irediciion via Model 5 c arch Examples Conclusion and Extensions Examples and applications Conclusion and Extensions OU LINE 9 Examples Conclusion and Extensions 0 Examples and applications Bayesian Optimal Predictive Model Selection anal WUQDWE in file turf 1230w itr in gunquot Q a 5 5123 s Figure One of the Figure Distribution of Figure Estimate realization of prevalence atom importance distribution of pki y Introduction to 39erii Plim ire b vi Mad Examples and applications Conclusion and Extensions Examples Conclusion and Extensions COMPARISON WITH OTHER METHODS Summary table on the sinc function using the REF kernel VS Table Prevalence vs median on the sine function Dataset PBFt SVR RVR SincGaussian 0378 0326 0232 452 SincUnitorm 0215 0187 0153 443 Bayesian Optimal Predictive Model Selection Introduction to Optimal F39red39 0 M0 at 39apac 39edicuve Model Selection 2 Sears Conclusion and Extensions The following table is based on the same sinc function estimated using different orthogonal basis functions via both median and prevalence Basis set Prev Size Med Size Prev Error Med Error Sine 24 24 0220 0220 Cosine 30 30 0203 0203 Legendre 42 40 0197 0196 Chebyshev 76 82 0238 0238 The two coincide The clear message is that prevalence and median coincide when the design is orthogonal Confirmation of the intuition lam roam Wmnam ubeet W l WUQDWE 1 90 wail m tiai39hrrf ii in Figure One of the realization of prevalence Figure One realization of median W l WUQDWE 1 90 whimr m iif urrr 39i is it Figwg Distribution of atom prevalence Figwe Distribution of model size n to Fr t n Im irediciion via Model 5 c arch Examples Conclusion and Extensions Examples and applications Conclusion and Extensions OU LINE 9 Examples Conclusion and Extensions 0 Conclusion and Extensions Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optima rediciion via Model 395 ace Search mples Conclusion and Extensions warn us o Achleves both sparseness and fast accurate prediction 0 Works well for multivariate and univariate regression o Performs a full Bayesian treatment rather than approximate 0 Provides a search technique that has very good mixing 9 Guarantee to find a solution Weaknesses and extensions o The conditioning of the data matrix H not always good 0 Provide a theoretical proof of computation insights 0 Consider cases with highly correlated predictor variables Outline Bayesian Optimal Predictive Model Selection Ernest Fokou 1 1Department of Statistics THE OHIO STATE UNIVERSITY Kettering University August 2006 Bayesian Optimal Pre ive Model Selection Outline OUTLINE 0 Introduction to Predictive Model Selection 0 Model Space and Optimality Criterion 0 Bayesian Predictive Optimality 0 Sparse Bayesian Learning Bayesian Optimal Predictive Model Selection Outline OUTLINE 0 Introduction to Predictive Model Selection 0 Model Space and Optimality Criterion 0 Bayesian Predictive Optimality 0 Sparse Bayesian Learning 9 Optimal Prediction via Model Space Search O The Median Probability Model 0 The Prevalence Model Bayesian Optimal Predictive Model Selection U TMME CD Introduction to Predictive Model Selection 0 Model Space and Optimality Criterion 0 Bayesian Predictive Optimality O Sparse Bayesian Learning 9 Optimal Prediction via Model Space Search 0 The Median Probability Model 0 The Prevalence Model Examples Conclusion and Extensions 0 Examples and applications 0 Conclusion and Extensions anmm Wmmmwmm Wm Introduction to Predictive Model Selection Model Space and Optimality criterion l el Spac rch ayesian Predictive Optilna ity to n and Exiensiot Sparse Bayesian Learning Optimal F39redic ion II J amples OU LINE 0 Introduction to Predictive Model Selection 0 Model Space and Optimality Criterion Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection pace and Optimality criterion Optimal F39re iction ode Space earc Bayesian Predictive Optilna ity amples 39 on and Extensions Sparse Bayesian Learning INGREDIENTS FOR MODEL SPECIFICATION 0 An iid data set D DXI7YI7I17 39 7 77 XIEXJiey o Aset of atoms Hh17h27 7hp o The atomic expansion p 09 50 Z hXi 1 o Noisy response variable Y fx e Bayesian Optimal Predictive Model Selection Introduction to Predictive Model S 39 e Space and Optimality criterion vi odel 5 c 39 Optimal F39re icti ayesian Predictive Optilna Ity Examples no ion and Extensions Sparse Bayesian Learning DIFFERENT TYPES OF ATOMS AND EXPANSIONS 0 Traditional linear model with x x1 xpT 6 RP p 00 Bo 2 WV 11 0 Polynomial regression with x e R p X 30 ZBX 11 0 Kernel regression with x e Rp fX 60 Z BKx7 X 1 Model Selection Bayesian Optimal Pred 39 Introduction to Predictive Model Se tion Model Space and Optimality criterion Optimal F39re iction v39 el Spac h Bayesian Predictive Opti n 39 y Examples Com and Extensi S arse Bayesian Learning DATA MATRIX AND FULL MODEL 0 The full model can be written as y H 6 0 Where the data matrix H is defined by 1 h1X1 h2X1 hpX1 H 1 h1X2 h2X2 hpX2 it h1m h23ltn 3 hprxn 0 And the other elements are yy177yniramoBpTse17new Bayesian Optimal Predictive Model Selection Introduction to Predictive Model S 39 e Space and Optimality criterion vi odel 5 c 39 Optimal F39re icti ayesian Predictive Optilna Ity Examples no ion and Extensions Sparse Bayesian Learning COORDINATEWISE MODEL DESCRIPTION 0 Define the coordinatewise model index Vv1V2vp ve01 0 Consider selecting from among submodels of the form Mv 3 y Hv v 6 Where 1 if h is used by model M v O otherWIse o The model space M with 29 7 1 models is defined as M MV1 V6 01P and V73 00 0 Model Selection Bayesian Optimal Pred 39 Introduction to Predictive Model S 39 e Space and Optimality criterion vi odel 5 c 39 Optimal F39re icti ayesian Predictive Optilna Ity Examples no ion and Extensions Sparse Bayesian Learning OPTIMAL PREDICTIVE MODEL SELECTION 0 Optimal predictive selection seeks to select from M M arg A511in HMv where the risk function HMV is HWv EMU 9W AWN o The loss function is the squared error loss ynew7 999W ynew 7 99 quot o The estimated prediction is given by Ky I I View Z 39 hwtxnew 60 1 Model Selection Bayesian Optimal Pred 39 Introduction to Predictive Model Selection Model Spac m y iter o l el Spac rch ayesian Predictive Optimality to n and Exiensiot Sparse Bayesian Learning Optimal F39redic ion II J amples OU LINE 0 Introduction to Predictive Model Selection 0 Bayesian Predictive Optimality Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Made 395 ace Search Examples Conclusion and Extensions o Specify pMV the prior in model space 0 Derive posterior probability of a model pley olt pylepMv 0 Where the marginal likelihood pylMV is given by pryle pylMV70p0lMVd0 Central role of pMVly Model selection and prediction are based on pMVly liaram lamnamlmeet Introduction to Predictive Model Selection Optimal Prediction via Mode 395 ace Search p es Conclusion and Extensions 39 39 The intuition might suggest that the best predictive model is the model with the highest posterior probability Mw am 341 Pley Some drawbacks of highest probability model 0 Correct if there are only 2 models in M o Requires considering all the models in M 0 Not necessarily the best when Nil 2 2 0 See Babieri and Berger 2004 for details jamaai limiting Wm Introduction to Predictive Model Selection Model Space and Optimality Criterion on s h i Optimal Prediction via 39pa ayesian Predictive Optima ty Examples Conclusion and Exiensi 2 Sparse Bayesian Learning PREDICTION THROUGH BAYESIAN MODEL AVERAGING Optimality of BMA prediction It is a known result that given a list of models the Bayes Model Average BMA prediction 2971 lAnew Z pMVKlyElYleK7xnewl k1 is optimal 0 Model description is lost in the averages o Computationally prohibitive Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Criterion Optimal Prediction via Mode 5 ace h ayesian Predictive Optima ity mples Conclusion and Exiensi 2 Sparse Bayesian Learning Approximating Bayesian Model Averaging Prediction l Broth prediction apparent and impel nmnmtyzmt t ltmy l l l 39 1 o If accurate prediction is the only goal then one should tolerate the computational burden and loss of model description and adopt the BMA prediction 0 However if there is a need to repeated predictions with the best predictor model selection becomes the goal alteration to lElivlth It makes sense that if we really want to select a model rather than make predictions based on BMA the selected model should produce predictions as close to BMA as possible How am is time Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Iter o l el Spac rch ayesian Predictive Optimality to n and Exiensiot Sparse Bayesian Learning Optimal F39redic ion II J amples OU LINE 0 Introduction to Predictive Model Selection 0 Sparse Bayesian Learning Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Made Space Search Examples Conclusion and Extensions 2 0 Likelihood under Gaussian noise with isotropic variance 0 n 1 10043702 moi exp e lyi H llz o Sparsin intuitively means Constrain the space of 6 so that many 3 are zero 0 Sparsity naturally achieved by 61 norm on 6 or double exponential prior over 8 anmm amammam Introduction to Predictive Model Selection Model Space and Optimality Criterion Optimal F39re iction ode Space earc Bayesian Predictive Optima ity alnples on and Extensions Sparse Bayesian Learning RELEVANCE VECTOR REGRESSION TIPPING 2000 Relevance Vector Machine Intui 0 Simple conditional Gaussian prior over each 3 PWlozi quot31 07 0471 o Intuition of relevance vector machine If 04 gt 00 then B sharply peaked at O 0 Therefore atom i is irrelevant Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Criterion Optimal F39re iction ode Space earc Bayesian Predictive Optimality amples 39 on and Extensione Sparse Bayesian Learning RELEVANCE VECTOR REGRESSION TIPPING 2000 0 Automatic relevance determination ARD hyperprior for sparsity induction a N gammaa b o Marginal prior on 3 pm p ilaipaidai Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Model Space Seam Examples Conclusion and Extensions 7 emf RVM p Studentitdriven by aand b o The marginal prior p6 is therefore a product of Studentt and therefore a good device for sparsity Sparsity controlled through hyperparameters a and b 0 Integrating 6 out leaves us with an a dependent distribution therefore a device for controlling sparsity through ML 2 133mm amaar lawnmowerat Model Space and Optimality Criterion Bayesian Predictive Optimality clu ii Sparse Bayesian Learning GEOMETRY OF THE MARGINAL PRIOR o Marginal prior pm using Gammaa b hyperprior for a 0 Notice concentration around the axes sparsity pressure 0 Therefore choose a and b to achieve sparsity ERNEST PARFAIT FOKOUE Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Criterion Optimal F39re iction ode Space earc Bayesian Predictive Optima ity alnples on and Extensions Sparse Bayesian Learning CONDITIONAL POSTERIOR OF THE WEIGHTS Conditional densities o The Conditional posterior of coefficients 6 is p la7az7y warm 0 Where 2 HTSH Aquot u EHTSy 0 With Sa2In and Adiaga1a2ap Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Uprimal F39re iciion 39a ch o The marginal likelihood pyla 02 is given by pryiai 02 y 0 8 HA HT 0 Where as before 3 azln and A diagoz1oz2 704p ammat earmm Introduction to Predictive Model Selection Optimal Prediction via Made 395 ace Search Examples Conclusion and Extensions ID TIoNS WITH TH gr o Ideally Bayes predictive distribution of the response pyly PYla702P ly7 annual Uzlyd dad02 o Often though all we hope to have is pyly 102 pylaazp lyaa2d A serious problem Very hard to find analytical expressions for pyly rangam Firrmar timers Introduction to Predictive Model Selection Optimal Prediction via Model Space Seam Exalnplest one lti nd Extensions sow39t t 1 o In our search for expression of pyiy the intractability comes from 2 2 a7 2 roman papcr p 0 iv pm 0 Instead of seeking the whole distribution pa aziy we can concentrate on a crude estimate 6aMp a lp where awn aim arg Egg pa7 02W anmm i f mt m l Introduction to Predictive Model Selection Optimal F39re iciion ISpa quot Extensione E nd EDCTVE DISTRIBUTI pyiy m pyi aimown aw airpw 0 Which is Gaussian and therefore tractable PW M JMP IP NWWE Thanks to Gaussianity pyiy z Ny ii 02 if xnele 02V UEAP hTXnewEhXnew i 4 an at armameme Wm Model Space and Optimality Criterion Bayesian Predictive Optimality Sparse Bayesian Learning Analysis of fX sinXX The sinc function is an interesting example 0 xe 1010 2 I r a s r r 2 a 2 a a I 0 Sample size n 100 0 Noise variance 02 001 0 Gaussian kernel used Figure Sinc function ERNEST PARFAIT FOKOUE Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Model Space and Optimality Criterion E i Bayesian Predictive Optimality i Sparse Bayesian Learning mg RVM SOLUT39ON ON WW Analysis of fX sinXX o x E 1010 0 Sample size n 100 m 1000 repe ons Noise variance 02 01 o 0 Gaussian kernel used 0 Estimate noise level 0093 c Number of relevant vectors 6 o RVM regression test error RMS Fligugei RYM estimate of 00424558 smc unction ERNEST PARFAIT FOKOUE Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Made 395 ace Search rengts H K o Achieves both sparseness and fast accurate prediction 0 Works well for multivariate and univariate regression 0 Based directly on the predictive distribution 0 Flexible framework for both regression and classification 0 quotDoes not require a tuning parameterquot Drawbacks of the RVM framework 0 Relevance is achieved somehow by manual pruning 0 Does not take model size into account 0 Does not provide a measure of strength of relevance CEI S W Introduction to P we Mode Se Crl39l ptimal Prediction vIa Model Space Search one The Median Probability Model 1 The Prevalence Mo el Examples t n and Extensione O TLINE 9 Optimal Prediction via Model Space Search O The Median Probability Model Bayesian Optimal Predictive Model Selection IM39JGI39UI I The Median Probability Model The Prevalence Model Examples LONE and Exaensiotrc PRIOR DISTRIBUTION OVER THE MODEL SPACE 0 Assuming a priori that all the atoms are equally likely what is the prior probability that atom i is relevant ie included in the model Prv1I7r 7r 0 Prior probability of atom i Hr pwl7r W14 0 Assuming a priori that all the atoms are independent I 71quot 1 7 7rITVI pVl7r Bayesian Optimal Predictive Model Selection Introductian to The Median Probability Model 1 The Prevalence Mo el Examples LONE and Exaensiotrc PRIOR DISTRIBUTION OVER THE MODEL SPACE o The model size for a model index by v is defined by p Kv 2 Vi i1 0 Prior distribution of model size is binomial with PrK kimp lt i gt7rk1 7 y 0 The prior mode of the distribution of model size ks IP 1w 9 We can see that 7139 plays a crucial role Bayesian Optimal Predictive Model Selection Int II to e Mod ciion tion via Model Space Search on me JG The MedIan Probability Model Upllmal Fredquot The Prevalence Model Examples PRIOR DISTRIBUTION OVER THE MODEL SPACE o The a priori effect of 7139 reveals that large values of 7139 indicate our belief in large size models a By the same token a small value of 7139 indicates our belief in small size model 0 Hence 7139 provides a device for controlling the grade of sparsity parsimony o A key question naturally arises How does one go about choosing 7r Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optimal Prediction via Models ace Search Examples Conclusion and Extensions limitle vigil PRIOR OVER MODEL SPACE wig l 0 Fix the value of 7139 for expert knowledge 0 Estimate 7139 using empirical Bayes on full model 0 Put a prior on 7139 and explore its posterior via MCMC Completely nonintormative pMV By setting 7139 12 we express our belief that our prior inclusion probability of each variable is a pure random guess As a result 1 lev g which is simply a uniform prior over the 29 models that constitute the model space M Introduction to Predictive Model Selection Optima rediction via Mode S ace Search mples Conclusion and Extensions PQeTIEJEtH ABILITY OF INCLUSION Let M Mv MVP Mvzp be the model space t 0 Identity all models that contain atom hf 0 Compute the posterior probability of each such model 0 Compute the sum of those posterior probabilities Definition The posterior inclusion probability for basis element hf is 2971 Introduction to Predictive Model Selection Optimal Prediction via Mode ace Search a my mam 5 Conclusion and Extensions THE ll tll E ABILITY MODEL Definition If it exists the median probability model MW is defined by Vit1 if 102 0 otherwise 1 the ledi pobabil ty odli mdf we 0 Atoms that appear more that half the time in the set of plausible models The Median Probability model does not always exist 0 The fixed quotone halfquot cutoft may not be achieved 0 Our technique uses a more flexibleadaptive cutoft amt tartmating timerJ Wm Introduction to Predictive Model Selection Optimal Prediction via Model Space Search mples Conclusion and Extensions mwwmw Optimality of the Median Probability Model Babieri and Berger 2004 show that the median probability model prediction is the best approximation to the Bayes Model Average prediction for orthogonal designs and nested designs Main drawbacks 0 Due to fixed quotcutoftquot MPM does not always exist 0 Most of the search techniques used do not mix well b L 35333113 It is interesting to provide an extension to the MPM that has a more flexibleadaptive cutoft to guarantee existence l austere Meetn Qumran In oduc n 0 PI39 tl lode Sel Optimal Prediction via Model Space Search Examples Cot n and nsior The Median Probability Model The Prevalence Model 9 Optimal Prediction via Model Space Search 0 The Prevalence Model Bayesian Optimal Predictive Model Selection odel Selection Ge Search The Median Plobab I llly Model Examples Uonclueion and Extensione ELEMENTS OF THE PREVALENCE MODEL The Prevalence Mo e Definition The prevalence model MW is defined through index V with vP e quot0519353 lt2 where Pprev set of k0quott largest values of pi and W arg 133p MW 3 Bayesian Optimal Pred39 39 Model Selection Introduction to Predictive Model Selection Optima rediction via Mode S ace Search Examples Conclusion and Extensions limomve quot 1 mm 4 Wife lintan Comparison between Median and Prevalence o For orthogonal and nested designs the Median probability and the Prevalence models coincide O For nonorthogonal designs the Prevalence model emerges as superior and often exist where the Median fails to find any model at all y 1 Elements analytical proofs in progress I F 3 E 39 mm mans Introduction to Predictive Model Selection Optima rediction via Mode S ace Search Examples Conclusion and Extensions limumwzg Bquot THE PREVALENCE MODEL H mm v t We literate models of the optimal size The prevalence model always exists o Intuitiver better because it accounts for model size 0 Should have a connection to AIC and BIG that penalize model size 0 Later details show that the overall technique explores the support of pMVl y better Introduction to Predictive Model Selection Optimal Prediction via Mode ace Search Examples Conclusion and Extensions claim I N O A a1a2 akwhere a e 12 p The elements of A are called the active elements because they are the ones selected by the current submodel o The complement of A is D IA and contains the so called dormantelements since they are unused by the current model D E d1d2 dpk where d e 12 p Model uncertainty and model size uncertainty 0 Step 1 Birthanddeath kt At N pk7Al0t7y 0 Step 2 Gibbs sampling 00 p0lkt174f17y 133 V 39Fimiati twenan MeetJr Wm The Median Probability Model d The Prevalence Mo el Examples Lotlcluclon and Extension EXTENDED PRIOR WITH UNCERTAINTY ON R AND V Full prior specification 0 A full prior for all the unknown including model size 1067027 oak PkP 7027a l k 0 A truncated Poisson prior on model size k wk pkolt e for k1p 0 An even lull prior distribution including model indices pk7 A 10241 PkP 4lkP 7027 a l KA Rather than using a uniform pMV we instead use a locally uniform pAlk Works better in the presence of collinearity Bayesian Optimal Predictive Model Selection The Median Probability Model The Prevalence Model Examples Eviensionz DETAILED MCMC SCHEME FOR PREVALENCE Details of Prevalence Construction 0 Initialize 1 O and k lploj o A sample of k elements from 12 p 0 Initialize inclusion probabilities p O 0 0 Repeat 0 t 1 o A BirthandDeath73lt 1gtAWaw gammy o for 1 to lBl ifje A then p p1 end 0 p p1p27 mp o k lAWl engthA o H Gibbssampling0 1Amy o Untilt T Bayesian Optimal Predictive Model Selection In39il39oductiun 0 PI39 39 rediction The Median Probability Model d Examples on and Exaensiotie The Prevalence Mo el Birth and death process for model search 0 Initialize time O 0 Repeat k 0 Compute 6jforj1 k 6 6 1 0 time time Exponential1u 6 1 v6 o birth Bernoulli lt e If birth 1 o in Uniiormd 39gtd gt dg lm K K1 a v53 1 A Amu n D Dlt gtin o Else o i Multinomial616 4516 out a5 K K 71 o vg39m O A A39out Dm Dm UOUI 0 Until time 3 p Bayesian Optimal Predictive Model Selection The Median Probability Model d v39a a The Prevalence Mo el Examples on and Extension Birth rate and death rate of the process 0 Simulate a continuous time birth and death process in discrete time using an overall constant birth rather 1 O From the local uniformity of pAl k and the Poissonness of pk the death rate of element i simplifies to d 7 5 PYlk717AeI7027a 7 w pylkAwaZa 0 From the normality of the likelihood function we get 6 lazln HvibfAkiHil exp egyTtaZIn HviAmiHir yi lazln HviAHJHW exp imam HviAkiHir yi Bayesian Optimal Predictive Model Selection n to Fr t n Im irediciion via Model 5 c arch Examples Conclusion and Extensions Examples and applications Conclusion and Extensions OU LINE 9 Examples Conclusion and Extensions 0 Examples and applications Bayesian Optimal Predictive Model Selection anal WUQDWE in file turf 1230w itr in gunquot Q a 5 5123 s Figure One of the Figure Distribution of Figure Estimate realization of prevalence atom importance distribution of pki y Introduction to 39erii Plim ire b vi Mad Examples and applications Conclusion and Extensions Examples Conclusion and Extensions COMPARISON WITH OTHER METHODS Summary table on the sinc function using the REF kernel VS Table Prevalence vs median on the sine function Dataset PBFt SVR RVR SincGaussian 0378 0326 0232 452 SincUnitorm 0215 0187 0153 443 Bayesian Optimal Predictive Model Selection Introduction to Optimal F39red39 0 M0 at 39apac 39edicuve Model Selection 2 Sears Conclusion and Extensions The following table is based on the same sinc function estimated using different orthogonal basis functions via both median and prevalence Basis set Prev Size Med Size Prev Error Med Error Sine 24 24 0220 0220 Cosine 30 30 0203 0203 Legendre 42 40 0197 0196 Chebyshev 76 82 0238 0238 The two coincide The clear message is that prevalence and median coincide when the design is orthogonal Confirmation of the intuition lam roam Wmnam ubeet W l WUQDWE 1 90 wail m tiai39hrrf ii in Figure One of the realization of prevalence Figure One realization of median W l WUQDWE 1 90 whimr m iif urrr 39i is it Figwg Distribution of atom prevalence Figwe Distribution of model size n to Fr t n Im irediciion via Model 5 c arch Examples Conclusion and Extensions Examples and applications Conclusion and Extensions OU LINE 9 Examples Conclusion and Extensions 0 Conclusion and Extensions Bayesian Optimal Predictive Model Selection Introduction to Predictive Model Selection Optima rediciion via Model 395 ace Search mples Conclusion and Extensions warn us o Achleves both sparseness and fast accurate prediction 0 Works well for multivariate and univariate regression o Performs a full Bayesian treatment rather than approximate 0 Provides a search technique that has very good mixing 9 Guarantee to find a solution Weaknesses and extensions o The conditioning of the data matrix H not always good 0 Provide a theoretical proof of computation insights 0 Consider cases with highly correlated predictor variables

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.