Adv. Topics ISYE 8843
Popular in Course
Popular in Industrial Engineering
This 0 page Class Notes was uploaded by Maryse Thiel on Monday November 2, 2015. The Class Notes belongs to ISYE 8843 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 16 views. For similar materials see /class/234209/isye-8843-georgia-institute-of-technology-main-campus in Industrial Engineering at Georgia Institute of Technology - Main Campus.
Reviews for Adv. Topics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 11/02/15
1 Hierarchical Bayes and Empirical Bayes MLII Method Hierarchical Bayes and Empirical Bayes are related by their goals but quite different by the methods of how these goals are achieved The attribute hierarchical refers mostly to the modeling strategy While empirical is referring to the methodology Both methods are concerned in specifying the distribution at prior level hi erarchical via Bayes inference involving additional degrees of hierarchy hyperpriors and hyperparameters While empirical Bayes is using data more directly In eXpanding Bayesian models and inference to more complex problems going beyond the simple likelihoodpriorposterior scheme a hierarchy of models may be needed The parameters of interest con sidered are entering the model via their realizations which are modeled similarly as they were measure ments The common name parameter population distribution is indicative of the nature of the approach 11 Hierarchical Bayesian Analysis Hierarchical Bayesian Analysis is a convenient representation of a Bayesian model in particular the prior 7139 via a conditional hierarchy of so called hyperpriors 713917 an 7r1t9 017r201 02 7rnt9n1 t9n7rn10n 61016102 Operationally the model W N NEW 0 01 W10017 amen Wn0017 M N M1lt0ngt 2 is equivalent to the model M9 940 0 7amp0 as the inference on 9 is concerned Notice that in the hierarchy of data parameters and hyperparameters Xgtgt01gt02gt0n X and 0 are independent given 0 That means We 01 th oiwax are where g is equality in distribution The joint distribution X7 0 01 0 which by de nition is X0010n X0010n 0 010n 01 020n 0 1l0n 0 can be represented as X0010nXl0001 01 02 071 04 0 thus to fully specify the model only neighbouring conditionals XW 0 01 01 02 0 1l0n and the closure distribution 0 are needed Why then decompose the prior as in l and use the model 2 Here are some of the reasons 0 Modeling requirements may lead to the hierarchy in the prior For example Bayesian models in meta analysis o The prior information may be separated into the structural part and the subj ectivenoninforrnative part at higher level of hierarchy o Robustness and objectiveness 7 let the data talk about the hyperparameters o Calculational issues utilizing hidden mixtures miXture priors missing data MCMC format Sometimes it is not calculatingly feasible to carry out the analysis by reducing the sequence of hyper priors 1 to a single prior 2 Rather Bayes rule is obtained by using Fubini s theorem as repeated integral with respect to more convenient conditional distributions Here is the result involving the model 1 conditional prior 109101 and hyperprior 7r201 Suppose the hierarchical model is given as X10 N fzl0 0101 N 109101 and 01 N 2091 then the posterior distribution can be written as 7rt9lx 7rt9lz017r0llzd01 The densities under the integral are 7r0lz 01 W and 7r01 M where 11191 m m m1zl01 f6 fml07r10l01d0 is the marginal likelihood and mz f 1m1zl017r201d01 marginal Now for any function of the parameter h Ewe10 E911 E9191gt h0 3 Example Suppose you pool 71 people about their favorite presidential candidate and X of them favor the candidate A The likelihood is X lp N 230119 and the proportion p is the patrameter of interest You believe that proportion is close to 12 but not quite sure The appropriate prior on p would be Beta 35k k k E N see Figure 1a it is symmetric about 12 but you are reluctant to specify the natural number k Thus N 35k Finally you put a hyperprior on k 7r2k X 2171 The hyperprior probability mass function is in fact 1 k k 1 2 m 210g2k2k 71 What is the Bayes estimator for p ifn 20 and X 12 The unconditional prior on p is 7rp pk 11ipk 1 1 171172291 4 Bk7 k 2 log2k2k 71 4log2p17 p39 This prior depicted in Figure 1b is symmetric about 12 The posterior does not have a nite form can be eXpressed in terms of special functions and the Bayes rule has to be found by numerical integration W M12 7 1 0581368 f0 f121p7rpdp Figure l Betak k densities for k l 2 3 5 10 and 20 The unconditional prior 7rp from 4 Mathematica programing is particularly convenient in this case and the integrals have exact solution given 71 and X In our case the exact result is 288241495798 The above approach to produce the Bayes estimator is direct the unconditional prior is obtained as mixture of Betas Thus we utilized transition from equations in l to prior in 2 Now we consider the same problem by utilizing hierarchical Bayes model discussed previously and equation 3 The hierarchical Bayes approach is as follows The Bayes rule is 67rlt gt Ek mEp kmp7 where the expectation Eklm is taken with respect to 7r2klz and EM with respect to 7r1plk The distribution for balm k is again Beta 7r1plk z X pm17 p m pk 1l7pk 1 with parameters z k and n 7 z k The expectation ofp is WM The distribution 7r2klz Maggy does not have a nite form m zlk fol fmlp7r1plkdpis BetaBinomial given by F2k P01 1 Fk mFk n 7 z m m 7 1 PkPk Hz 1rn 7 x 1 P2k n Of course m zlk can be simpli ed a bit more Now the marginal mz is 221 m1zlkW and it does not have a nite form although can be expressed in terms of hypergeometric qu spec1al func tions 1 However k quot p 2k n ik 2201 2mknm1lk210g2l 2k71 2201 mlMMWIij zltsF2ltl712n 7 11 4112 712 1 7121 1 7 1 n 1F2127n 7 967x 12 7127 n2l7 1 i 1 7 is a ratio of two hypergeometric qu functions with argument 2 1 thus representing an in nite sum of ratios of Gamma functions This ratio is easily numerically evaluated see mathematica notebook The result 1The hypergeometric qu function is de ned as qua1 ap 121 bqz 20 W24 where 11 aalan7l HMS 1quota for n 20 and X 12 is 6412 288241495798 Notice slight shrinkage toward 12 compared with the MLE estimator 1220 06 You may have noticed that even in the simple hierarchy as in the previous example the Bayesian analysis may not be computationally easy This is true even if we have perfect conjugate structure at different levels of hierarchy and normal models The following Theorem is adapted from Berger 1985 and illustrated on IQ adventures of our old friend Jeremy Berger 1985 Section 46 pages 1807195 contains an excellent account on hierarchical models with detailed proofs The following model in addition to its educational value could be quite is useful as a modeling tool if one believes in normality at various stages of the hierarchy Assume that X X1 Xp is a p dimensional observation with the multivariate normal like lihood f1110 N MVNP0039I Parameter 0 01 0t is of interest and positive scalar a is assumed known The rst stage prior on 0 71101 7 02 is multivariate normal MVNPQL 021 where p in 1 and 1 is px 1vector ofl s The hyperparameter in this case is twodimensional A Mr 0W To complete the model assume 7130 7r21u7r 7r22a2r with 7r21u7r N NW 7392 and appropriate 7r22a2r Of course 3 7392 and possible parameters in 7r22a2r are assumed known ie the hierarchy stops here The following theorem gives an explicit up to a univariate numerical integration Bayes estimator of 0 and its covariance matrix The Theorem 11 The posterior mean is my E01330 Eamwm 7 where a 02 6 2 7 7i 71 i13707r 13 13 0202 1972 f 7r 53 B 5 The posterior covariance matrix is 4 4 2 of O39fT VEU1 721714 f a wi a wixp w w J 6117 0 7 5 w57 0 7 M The distribution for 021211 important for the above expectations satis es 92 PCE V 0 0 p7203r 0 Texp 7 Might 0lt W220 7 0 t7 12pr2 0 0 where 52 1m7 it The expectation with respect to 031111 needs to be carried out numerically It is important that the marginal posterior 7r22 021111 is nite This is true whenever the closure prior 2207 is bounded and p 2 3 Exercise Suppose that Jeremy IQ concerned fellow has taken 5 IQ tests in the last 5 years and have obtained avector score X 102112916 109 98 Assume that each measurement Xi N N09 80 i 1 5 and 195s represent realizations of a random variable representing Jeremy s true ability Unlike before we assume that his true ability randomly changes in time however the underlying law from which such timevarying abilities are generated is common Thus 0 N N017 072 i 1 5 Finally the model is closed by assuming that uw N N110 120 and 71226772 1 Find Bayes estimator of0 and the covariance matrix of the estimate Solution The estimator for 0 can be found coordinatewise The Theorem 11 is directly applicable with a 80 B 1107392 120 andp 5 The result see MATHEMATICA program jeremynb on the web site is 9 104645110239101288108561102407 and 511158 81635 528252 762331 564264 81635 621226 263978 146078 448102 V 528252 263978 516211 34326 633962 762331 146078 34326 572654 48295 564264 448102 633962 48295 508602 007 006 005 004 003 002 001 o o o o o o o 200 400 600 800 1000 Figure 2 Marginal posterior 7r22 17721111 when 22077 1 in the IQ example Figure 2 shows 7r22a2rl1 when 22677 1 Notice that even when the closure prior 22677 is improper the marginal posterior is proper density although quite at 12 Empirical Bayes ML 11 Method Empirical Bayes has several formulations Original formulation of empirical Bayes assumes that past values of X i and corresponding parameter 0 are known to the statistician who then on basis of current observation Xn1 tries to make inference on unobserved 19711 Of course the parameters 0 are seldom known However it may be assumed that the past and current 19 s are realizations from the same unknown prior distribution Empirical Bayes is an approach to inference in which the observations are used to select the prior usually via the marginal distribution Once the prior is speci ed the inference proceed in a standard Bayesian fashion The use of data to estimate the prior in addition to subsequent use for the inference in empirical Bayes is criticized by subjectivists who consider the prior information exogenous to observations The repeated use of data is also loaded with perils since it can underestimate modeling errors Any data is going to be complacent with a model which used the same data to specify some of its features An mum m1 camme mm m Emma mes 11mm 1 Mun m1 me mm 3 11111111 mm 19154m11zm1m1 m 1155 495 VunM sg was consldml39lgd39l 1mmDrusmgamwmrmmmummmm Inzslrglz 1111mm 5 mnmks ufwztnxm 1111 11mm 15 mm 1111 1m mbzklrus 1mm 11 mm nka mba nm Dmbyidu W mtyduusxrglzbmhcmm 111m mmm Lelee mmgmrwsm 111cm minken W1 1 111w111 1mm 15 1113111111111 v1n 111513 was 1111 11m g a g 11111 z 01 hmmpw an 11 1 11mmsW memhMMm 191 w may 1111551 1211111111 New suybym mum 1111131 M1115 1111111 111mg 91 mm 511111131111113111 111w 11111111111113 1mm 1mm 1111mm 2111mm 1 1s m wsmveu 2 1111211 com 1131 15 mam 151m 111am 1m am 11511111 th aquot 11 11111111 19111111111111 mme mymm mm 1 111mm 11 111 17mexpmm1m 51 X1 A m m 1w w 1 1m MW 115mm mwm 511 g o 21x1 0 111 a 1 31113 1111 1111 111 m1 1 kam 111m mmm mmm s EWMJW m1 111 WE m Wham cnmzllrme 41 1115171171111 1 Dumbka 3 mmmma ii Estimate Oz and B in the marginal The method of moments is most straightforward The theoretical moments 50404 B and W are replaced by the empirical moments based on historic data m1 221 X 0875 and m2 1716 221 X 16257 and equations solved with respect to Oz and 3 Solutions are m1m2 7 57m A i 7 m1 7 5m2 7 57711 m14m1 5 7 5mg 7 35 7 165 m14m1 5 7 57712 3 iii Express PO 017 02 X 17 0 in terms of incomplete Beta functions with estimated hyperpa rameters 64 and B For example in MATHEMATICA Beta02 35 l65Beta35 l650658454 Unlike the von Mises example the original Robbins formulation of empirical Bayes was nonparametric In the following example the Bayes rule with respect to an unknown prior will be expressed in terms of marginal distribution Nonparametric empirical Bayes follows and uses the historic data to estimate the marginal distribution in nonparametric fashion The estimated marginal distributions are then plugged in the formal Bayes rule Example Assume that Xlt9 N 730i0 and that 7r0 is to be speci ed The Bayes rule is f0 5 9N0W 7 9 1 TIBWTQWWW f0 e 9w0d0 T f0 9775 97r0d0 m m 67f z 1m7r1 m7r ie the Bayes rule is depends on the model only via the marginal prior predictive distribution m m Let X 0 N 730i0i l7 n 17 and all 0 are parameters having the same distribution We are interested in estimating 0 using Xn1 and X17 7Xn The estimator can not be used since underlying 0 s are different Thus the MLE for 0 is Xn1 The empirical Bayes rule is 67rzn1 67rzn1m1 zn 1n1 l1n mlbwhere mm m1 zn mi i1quot39gnlmquot1mi The performance of the estimator could be improved if the estimators of marginals 77 are smoothed The matlab le eb m demonstrates the NPEB estimator Description of this m le and simulations in it will be added soon In the meanwhile please take a look at matlab le since it is annotated Now consider the same setup Xil0i N 730i0i but this time the prior distribution of 0 s is assumed known up to a hyperparameter Assume that 0i s have exponential distribution with density 7r0l Ae milw 2 07 E R5 A Negative Binomial is a marginal distribution for Poisson likelihood and Gamma prior If the shape parameter in the Gamma prior is 1 then the Negative Binomial becomes Geomet ric distribution Handout 0 1 m W W l 139 mu The MLE estimator of based on 1 zn is XMLE 7 Now the Bayes estimator of 0 is 7A1 be cause of the PoissonGamma conjugate structure The parametric empirical Bayes estimatoris 6zn1 m1 mn 1 When n 7 gt07 7 m m in probability and empirical Bayes rule is consistent 67rlt 1w7ngt H 7n En Figure 4 Comparison of MLE NPEB and PEB The circle is the true parameter 0 Exercise If Xl0 N Geom 7 0 with mw 0ml 7 0 z 0172 The prior 7r0 is unknown 7r 1 I Show 67rz For Xl0 N NBnl 7 0 m 7 0 the Bayes rule is 67rz z1 m z nmmwm James Stein Estimator and its EB Justi cation Consider the estimation of0 in amodel X N MVNPM I under squared error loss L07 a 7 Ll2 For p l and 2 estimator 9 X is admissible as unique minimaX ie no estimator has uniformly better risk However for p 2 3 X is neither unique minimaX nor admissible A better estimator is 72 5JSX 1 227 X7 i1 139 known as JamesStein estimator The empirical Bayes justi cation for 6J5X is provided neXt Suppose that 0 has a prior distribution 0 MVN0T21 where hyperparameter 7392 is not known and will be estimated from the sample X in this case The Bayes rule under squared error loss is 6 X 7 X 7 1 X B 1 T2 1 T2 39 Marginal prior predictive distribution for X is lV39p07 1 7 2 For such X the random variable BL X T l72 has X2 distribution Then Z7 00 12271 42 El 1 t 5 dt T 0 impMW 2727 l p271l 0 2517 it dt NM 2 0 1 p212 2 1 1 1 71 19271 2 7 1 p72 Thus 1T2 1 p72 1 E77 E77 L1 X12 19 2 L1 X12 1 7392 Therefore the methodofmoments estimator of 1 172 is i which yields an empirical Bayes estimator i1 i 7 2 6EBX lt17 127 X i1 139 121 ML II We already have seen the spirit of ML 11 method in Parametric Empirical Bayes The ML 11 approach was proposed by I J Good a statistician that was in the team who broke German code in the WWII The idea is to mimic the maximum likelihood estimation at the marginal level Select a prior 7139 that maXimizes m7r z given the data Figure 5 I J Good born 1916 Exercise Suppose that Bayesian have chosen a prior no but wants to look at all priors close to 7r0 One such family of priors close to no is contamination family P 1767r0 teq7 q 662 Suppose that Q is a family of all distributions Determine empirical Bayes choice Hint Observe that for 7139 E P the marginal is m z l 7 em0m emqz Also for any model mw a weighted average is smaller than mode ie fltzwgtqlt0gtdo 1W 7 fltzwgt6lt mgt where 6 MLE is a point mass distribution concentrated at 0km Thus the empirical Bayes choice is 7r0 l 7 e7r00 660MLE 2 Exercises 1 Assume XW is exponential 510 with density mw ge mQ z 2 0 Let Fbe the cdfcorresponding to 1 Assume a prior on 0 7r0 Let m z f6 fzl07r0d0 be the marginal and M7r fox mwtdt be the corresponding cdf a Show that 9 liffg zlf b Show that Bayes estimator of 9 with respect 7139 is 6 Hint You will need to use a version of Fubini s theorem Tonelli theorem and change order of integra tion Tonelli theorem allows for change when integrands are nonnegative c Suppose you observe Xiwi N l0i 1 n 1 Explain how would you estimate 0 in the empirical Bayes fashion using the result in b 2 Assume XW N N09 1 and My 7392 N N01 7392 a Find the marginal for X b What are the moment matching estimators oflu and 7392 if the sample Xi N N09 l7 i l7 is available Hint Find the moments of the marginal and be careful the estimator of the variance need to be non negative c Propose an empirical Bayes estimator of 9 based on the considerations in b d What modi cations in b are needed if you use MLE II estimator of u and 7392 Sol moment matching Marginal is Npl 7392 1 X and estimator of l 7392 is 52 So f2 max07 52 7 1 For MLE 2 max07 77152 71 7717 3 If the data X N aw can not be reduced by a suf cient statistics then so called pseudoBayes approach is possible Let T be an estimator of 9 for which distribution 9M0 is known and 7139 the adopted prior Instead of nding the Bayes rule f9 H1 f90ilt97r0d0 67rz1 mn one nds the pseudoBayes rule as 6 t f8 tgtl07r0d0 quot fa 9tl07r0d0 Suppose that you have model Xi N N097 1 t9 NAN 772