### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Bayesian InferenceDec Theory SYST 664

Mason

GPA 3.97

### View Full Document

## 34

## 0

## Popular in Course

## Popular in System Engineering

This 43 page Class Notes was uploaded by Dedric Yost on Monday September 28, 2015. The Class Notes belongs to SYST 664 at George Mason University taught by Kathryn Laskey in Fall. Since its upload, it has received 34 views. For similar materials see /class/215152/syst-664-george-mason-university in System Engineering at George Mason University.

## Reviews for Bayesian InferenceDec Theory

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/28/15

Bayesian Decision Theory and Machine Learning Kathryn Blackmond Laskey Department of Systems Engineering and Krasnow Institute George Mason University November 21 1995 Decision Theory and Machine Learning Decision theory provides a solid theoretical foundation for thinking about problems of action and inference under uncertainty framework for formulating problem learning theory framework for making tradeoffs between information and computationaldata cost Machine learning is a problem of actioninference under uncertainty Direct implementation of decision theory may not necessarily be best approach to practical learning Goal approximate a decision theoretically sensible approach within resource constraints The best way to do this may not be explicitly decision theoretic llDecision Theory Decision problem Possible actions aeA Possible states of the world 865 Consequences csa Goal choose best action Ingredients of a decision theoretic model Utility function uc expresses preferences for consequences Probability Psa expresses knowledgeuncertainty for states The best action maximizes expected utility aoptimal argmax Eua argmaxZucsaPs a lLBayesian lnference Uncertainty about state of world is represented by a probability distribution over states Probability is a rational agent s degree of belief about uncertain states of the world Beliefs are updated by conditioning on new information about the world PSi X PXlsi PSi P x Pxlsj Psj Priorodds ratio Posterior odds ratio Likelihood ratio If there are true probabilities any nondogmatic Bayesian who collects enough information will eventually learn them to within a close approximation lA Caricature of a Contrast Statistical inference is about using large samples to draw inferences about a small number of parameters of an objective probability distribution Applies to inherently probabilistic phenomena Don t use statistics unless you have enough data Don t try to estimate too many things at once or test too many hypotheses at once Machine learning is about using small samples to learn the rules characterizing a phenomenon Applies to inherently deterministic phenomena Not enough data to use statistics Too many parameters rules to use statistics Machine Learning as Bayesian Inference The learning problem Given training set x of instances from some concept c Goal learn which concept c from family E produced the training set There may or may not be noise in the training data Bayesian inference applied to machine learning Prior distribution Pc over E Likelihood function Pxc for data given concept may be deterministic Result of learning posterior distribution Pcx for concept given data Graphical Models for Probabilistic Reasoning Bayesian networks Model for causal andor correlational influences Directed graph encodes dependency relationships Local probability distributions encode strength of relationships Markov networks Model for correlational influences Undirected graph encodes dependency relationships Local probability distributions encode strength of relationships Hybrids and extensions Other models that can be viewed as graphical probability models Neural networks Networks of rules with certainty factors Learning for HighDimensional I Parameter Spaces In some classes of models there are exact Bayesian methods for computing the posterior distribution Decomposable models with complete data and conjugate prior distributions There are many approximate methods for cases in which exact methods are unavailable Maximum likelihood or maximum a posteriori methods EM algorithm Mean field approximation Backpropagation Monte Carlo Gibbs sampling MetropolisHastings sampling Weighted Monte Carlo lStructural Uncertainty Model can be decomposed as MS9 S Structural assumptions conditional independence normality connections in neural network etc 6 is a structurespecific parametere local probability distributions mean and covariance of normal distribution weights in neural network etc Traditional approach to statistical inference Pick best S Estimate 6 assuming S is the correct structure Problems with traditional approach Overfitting Poor performance off training set Underestimation of variance pproaches to Structural Uncertaintj Adjust significance levels for multiple hypothesis tests Sensitivity analysis to determine dependence of results on structural assumptions Holdout samples crossvalidation Formal Bayesian treatment of structural uncertainty Higher Order Uncertainty for Structures Structural and parameter uncertainty for concept PM ENS P lS i eIPcl esisifesi Is This sum cannot be computed explicitly Approximate by searching over small part of space Heuristic or Monte Carlo search llLearning about Structure Use Bayes rule to update prior distribution Posterior distribution for 393 given training sample P xPsiIxPixsgt P SIX IPchesiSif9si esi Learning algorithm for structureparameter learning Search heuristic for searching over structures Method for computing posterior probabilities of structures exact or approximate Method for approximating posterior outofsample predictions lLSome Examples Learning Bayesian networks Cooper amp Herskovits Heckerman Shachter Bayesian learning of neural networks Neal McKay Hinton Bayesian learning of graphical models York Madigan Raftery Buntine Advantages to Model Averaging Classical approach breaks down on high dimensional parameter spaces Significance tests not valid when many models are considered No good basis for deciding among competing model choice heu s cs No way to account for hidden variability due to model exploration Bayesian approach provides unified framework for Combining into a single analysis exploration model choice parameter estimation Suggesting and evaluating competing heuristic approaches lMore Advantages Ease of interpretation Bayesian The probability of a direct link between A and B is greater than 95 Classical The chance of getting a test statistic this extreme if there is no link between A and B is less than 5 Theory applies to any problem Discrete or continuous variables Deterministic or stochastic phenomenon Nested or nonnested models iid or correlated variables General purpose algorithms for computing posterior distributions are becoming available No need to choose arbitrary null hypothesis All hypotheses are compared simultaneously against each other lCriticismsH Theory is far ahead of ability to compute First figure out what you really want to compute then try to approximate it Decision theory provides a unified framework for thinking about what you want to compute Where do the priors come from Bayesians are explicit about their assumptions Assumptions are often buried in classical methods When you have knowledge you should include it in the analysis NFL theorems there are m assumptionfree methods Identifiability Several concepts may be observationally equivalent not distinguishable even with infinite data We usually care about good performance off training set and not about the correct concept The correct model may not be good for purpose Computational complexity Storage requirements Unintelligibility Many machine learning algorithms include bias Bias pushes system toward good models Priors in a Bayesian analysis act as bias Priors should be about belief not utility Decision Theory in Machine Learning Goal Acquire a highutility problem representation Utility includes Utility for base problem Don t use informationtheoretic distance when you care about correct treatment of patient Computational cost Ease of explanation Best model may not be explicitly decision theoretic Approximate solution to the right problem is better than exact solution to the wrong problem lLOccam s Razor Occam s razor says prefer simplicity As a heuristic it has stood the test of time It has been argued that Bayes justifies Occam s razor More precisely if you put a positive prior probability on a sharp null hypothesis the data are generated by a model near the null model the sample size is not too large Then usually the posterior probability of the null hypothesis is larger than its prior probability chcam s Razor contH Of course we don t really believe the null hypothesis We don t believe the alternative hypothesis either When predictive consequences of H0 and HA are similar H0 is robust to plausible departures from Ho When HA has many parameters in relation to the amount of data available we may do much worse by using HA H0 is robust to likely misspecification of parameters 6A of HA But Occam s razor only works if we re willing to abandon simple hypotheses when they conflict with observations Decision Theory and Occam s RaZOI Occam s razor is really about utility and not probability Choose the simplest model that will give you good performance on problems you haven t seen Decision theoretic justification The simple model is not correct Adding more parameters to fit the data is often not the way to make it correct Toocomplex models give false sense of precision and are difficult to apply Occam s razor is a heuristic for finding highutility models Department ofSyxtem Engineering and Operations Research Bayesian Inference and Decision Theory Instructor Kathryn Blackmond Laskey Room 321 ST2 703 9931644 Office Hours Tuesday Monday 300400 PM and Thursday 500600 PM or by appointment Spring 2007 Unit 6 Bayesian Hierarchical Models Kathryn Blackmond Laskey Spring 2007 Unit 6 v3 I Department ofSyxtem Engineering and Operations Research Learning Objectives Specify hierarchical models for multiparameter problems Parameters are viewed as a sample from a common population distribution Data can be used to learn about population distribution of unobservable parameters Explain the benefits of Bayesian hierarchical models for complex multiparameter problems Apply techniques from previous units to inference in hierarchical models Apply techniques from previous units to evaluating structural assumptions in hierarchical models Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 2 Department ofSyxtem Engineering and Operations Research Hierarchical Models Address tension between realism and statistical power Use structural assumptions to achieve statistical power without sacrificing realism Parameters are often related to each other by structure of the problem Hierarchical models exploit the relationship Borrow strength from data used to estimate related parameters Example estimating babies born In a simplistic model the birth rate would be a constant In a more realistic model the birth rate would vary by time of day season and hospital A model with a separate rate for each hour of each day for each hospital has an enormous number of parameters All these parameters are related to each other A hierarchical model exploits the relationship to gain statistical power Experience has shown that hierarchical models can flexibly adapt dimensionality of model to exploit the information in the data Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 3 Department ofSyxtem Engineering and Operations Research Studies are commonly performed on rats to evaluate cancer rIsks of drugs We have a set of studies We assume the tumor probabilities are drawn from a common Hierarchical model Example Rat Tumors Gelman et al Sections 5153 YS of rats is the number of rats out of nS rats in the 5 study which developed tumors Model for YS gtgt Tumors occur independently with probability qS gtgt YS is drawn from a Binomialns qs distribution The tumor probabilities qS may vary due to differences in rats and experimental conditions distribution It is natural to use the Beta distribution as a prior distribution for the unknown tumor probability Hyperparameters A and B are drawn from a distribution gAB Conditional on A and B the tumor probabilities Q1 Qn are independent draws from a BetaAB distribution Given the tumor probability Q5 the number of tumors YS has a Binomialns Q5 distribution Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 4 Department ofSystems Engineering and Operations Research Graphical Representation of Model Structure Plate representation H yperparameters Parameters Observations Unrolled Graph Kathryn Biackmond Laskey Spring 2007 Unit 6 v3 5 Department ofSyxtem Engineering and Operations Research An Equivalent 3Level Hierarchical Model Xsi indicator variable for whether ith rat in studys develops a tumor If Xsi are independent and identically distributed for each study 5 then Y5 2 Xsi is sufficient statistic for Q5 K Plate representation UHFOlled Graph Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 6 Nuisance Parameters in Rat Tumor Model We are interested in tumor probabilities QS Conditional on hyperparameters A and B the QS and YS are conjugate pairs If we knew A and B we would have an exact posterior distribution for the QS given YS Hyperparameters A and B are nuisance parameters Department ofSyxtem Engineering and Operations Research Kathryn Blackmond Laxkey Spring 2007 Unit6v3 7 Department ofSyxtem Engineering and Operations Research Approaches to Inferring Tumor Probabilities Empirical Bayes Compute point estimates aand from the data Set Aa and B Calculate posterior distribution for OS given YS using conjugate updating Full Bayes Specify a prior distribution gAB Compute an approximate posterior distribution for QS given YS using numerical methods or simulation Comparison Nonhierarchical Bayes Assess a separate prior distribution for each study Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 8 Department ofSyxtem Engineering and Operations Research Empirical Bayes Inference We will analyze the first 70 studies The sample mean and sample standard deviation of observed frequencies Y1In1 len70 are 0136 and 0103 respectively Fitting a Beta distribution to this mean and variance we obtain or14 and 86 Expected value is 014 median is 012 90 credible interval for Q5 is 0016 0316 Posterior distribution for a study with YS tumors in nS rats is Beta14 Y5 86nSYS Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 9 Department ofSystems Engineering and Operations Research Empirical Bayes Example 1 of 3 There were 7 studies in which 0 out of 20 rats had tumors Empirical Bayes analysis Prior distribution for Q5 is Beta with parameters a14 and p86 Posterior distribution for Qsis Beta with parameters a14 and 3286 Expected value is 0047 median is 0037 90 credible interval for Q5 is 0005 0122 Comparison Uniform prior distribution Prior distribution for Q5 is Beta with parameters 121 and 31 Posterior distribution for Qsis Beta with parameters a1 and 20 Expected value is 0045 median is 0032 90 credible interval for Q5 is 0002 0133 I v V V 39 39 A I O 0 002 004 00 008 01 012 014 OCremble lntewal Posterior credible interval for Q5 egg fglremivft39xvral given 0 tumors in 20 observations Aph ral aayeg Kathryn Blackmond Laskey Spring 2007 Unit 6 v3 10 Department ofSyxtem Engineering and Operations Research Empirical Bayes Example 2 of 3 There were 3 studies in which 4 tumors occurred out of 19 rats Empirical Bayes analysis Prior distribution for Q5 is Beta with parameters a14 and p86 Posterior distribution for Qsis Beta with parameters a54 and 3236 Expected value is 0186 median is 0179 90 credible interval for Q5 is 0083 0315 Comparison Uniform prior distribution Prior distribution for Q5 is Beta with parameters 121 and 31 Posterior distribution for Qsis Beta with parameters a5 and 16 Expected value is 0238 median is 0230 90 credible interval for Q5 is 0104 0401 I I On I O 0 01 02 03 04 05 OCremble lntewal Uniform Prior Posterior credible interval for Q5 Creclble interval tmplrlcal Bayes given 4 tumors in 14 observations my Mam Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 II Department ofSyxtem Engineering and Operations Research Empirical Bayes Example 3 of 3 In the smallest study 1 tumor occurred in 10 rats Empirical Bayes analysis Prior distribution for Q5 is Beta with parameters a14 and p86 Posterior distribution for Qsis Beta with parameters a24 and 3176 Expected value is 0120 median is 0107 90 credible interval for Q5 is 0028 0255 Comparison Uniform prior distribution Prior distribution for Q5 is Beta with parameters 121 and 31 Posterior distribution for Qsis Beta with parameters a2 and 10 Expected value is 167 median is 0148 90 credible interval for Q5 is 0033 0364 I no a I o 0 005 01 015 02 025 03 035 04 OCreuIble intewal Uniform Pnar Posterior credible interval for Q I Credible Interval Empirical Bayes given I tumor in 0 Observations my Memn Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 12 Probability Density 000 Triplot for Rat Data 0 Tumors in 20 Rats Prior Beta1486 NLK Beta121 Post 005 010 015 020 025 030 035 040 045 Tumor Probability 050 Example Triplots for Empirical Bayes Analysis Probability Density 000 005 Triplot for Rat Data 4 Tumors in 19 Rats Prior Beta1486 NLK Beta516 Post 010 015 020 025 030 035 040 045 Tumor Probability 050 Probability Density 000 005 Triplot for Rat Data 1 Tumor in 10 Rats Prior Beta1486 NLK Beta210 Post 17 010 015 020 025 030 035 040 045 Tumor Probability 050 Department ofSyxtem Engineering and Operations Research Summary Hierarchical model with empirical Bayes estimator allows information from each of the studies to influence inferences about the other studies Credible intervals are narrower than separate analyses with uniform prior Posterior distribution for each study shrinks toward other studies Studies with smaller sample sizes borrow strength from studies with larger sample sizes Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 I4 Department ofSyxtem Engineering and Operations Research Predicting the 71St Study The 71st study has 14 rats Likelihood function is Binomial14Q71 Using the empirical Bayes estimator the prior distribution is Beta14 86 Marginal likelihood for Y71 is r100 14 fy1486j x F14F86 11100 1411114 xF226 x qx041 q14rx76dq 1 141 86L x J 1 24 Predictive distribution for the 715 study with 14 rats No tumors 264 1 tumor 239 2 tumors 181 3 tumors 126 4 tumors 82 5 or more tumors 109 Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 I5 Department of Systems Engineering and Operations Research a 5 UNIVERSIT lt Marginal Likelihood for 71St Study Binomial Beta Marginal Likelihood for sample of size 14 alpha 14 beta86 07 06 05 g 04 E a n E n 03 02 01 0 I I 0 1 2 4 5 6 7 8 9 10 11 12 13 14 Number of Tumors Posterior distribution of Observed value Q71 is Beta54 1amp6 Kathryn Blackmond Laskey Spring 2007 Unit 6 v3 16 Department ofSyxtem Engineering and Operations Research Issues with Empirical Bayes Approach We are using the sample twice once to estimate the hyperparameters and once to estimate the posterior distributions of the Q5 given the hyperparameters If the hyperparameters A and B are part of the prior distribution shouldn t they be specified a priori If the hyperparameters A and B are unknown shouldn t a Bayesian specify a probability distribution for them Using a point estimate understates uncertainty What justifies the point estimate we used over some other one There is an asymmetry between how studies done before and after estimation of A and B are treated Studies 1 through 70 are used twice once to estimate hyperparameters and once to infer distribution of QS Study 70 is only used once hyperparameter estimates based on first 70 studies are used to infer Q71 Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 I7 Department ofSyxtem Engineering and Operations Research Fully Bayesian Analysis Full hierarchical model Number of rat tumors Y5 out of nS rats in study 5 has BinomialnSQs distribution Binomial parameters Qs are independent BetaAB random variables Conditional on hyperparameter AB we can compute an exact joint probability mass function for YS Assign prior density function ha for hyperparameter AB Use numerical approximation to find posterior density hoc 1 To draw inferences about YS we marginalize over AB Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 18 Department ofSyxtem Engineering and Operations Research Hyperparameter Prior Gelman et al discuss several functional forms for the hyperparameter prior ha An improper prior distribution results in a nonintegrable posterior distribution A uniform prior distribution with a large upper bound on A and B results in a posterior distribution with almost all mass on very large values of grand3 This corresponds to knowing the Q with certainty We seek a noninformative prior density that is dominated by the likelihood and yields a proper posterior distribution Gelman et al choose uniform distribution on u aa and v1a 1 2 1 This distribution is uniform on expected K i W l 01 9 on 9 Y1 Y2 Wquot sample size proportional to standard 4 of I A tumorr 11 Kathryn Blackmond Laxkey tumor probability and inverse root virtual Unit6v3 19 Spring 2007 Department of Systems Engineering and Operations Research Marginalize out the Q5 to obtain joint marginal distribution for lgiven Aaand B 71 rap njrayjr nj yj P I Z w gnwnmkyj FO nj Reexpress in terms of uoda and v1la 1 2 Likelihood Given Uu and Vv 39 2 39 2 X39WfmWi ziwvzii m 3332 14 w huvolt1 i 4 5qu uvum 13m 1qs1uv721 fy5q5qgs1qsmy Kathryn Blackmond Laskey Spring 2007 Unit 6 v3 20 Department ofSyxtem Engineering and Operations Research Posterior Distribution for U and V Prior distribution is uniform 7 s ll Posterior density is equal to i l normalized likelihood E f 33 Plot of normalized likelihood i39llil verifies likelihood is informative 1 about AB Maw 3D surface plot of discrete approximation to posterior density function for U and V 71 2 n Fuv2 1quot 1 uv2n hWWCH 12 1v 2A y 2 1y 1quotuv1quot1 uvkyj F1v nj F1 Kathryn Blackmond Laxkey Spring 2007 Unit 6 v3 21

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.