Review Sheet for CMPSCI 689 at UMass(2)
Review Sheet for CMPSCI 689 at UMass(2)
Popular in Course
Popular in Department
This 53 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 19 views.
Reviews for Review Sheet for CMPSCI 689 at UMass(2)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
CMPSCI 689 Probability Distributions Sridhar Mahadevan mahadevacs umass edu University of Massachusetts CMF SCI 689 p151 Topics Review of probability theory 0 Discrete and continuous distributions Graphical models CMF SCI 689 p251 Sample Spaces 39 A sample space S is a set finite or countany infinite eg H T is the sample space for tossing a coin 1 6 is the sample space for rolling a dice An event is any subset Si g S eg 2 4 6 denotes the event of getting an even roll 39 A collection of subsets B is a aalgebra if 39 The empty set gb E B 39 IfAEZS thenAC EB 39 IfA1An E B then UiAi E B CMF SCI 689 p351 Probability Axioms Given a sample space S and an associated aalgebra B we can define a probability function on B as follows 39 PA 2 0 for every A E B PS 1 note that S must be in every aalgebra 39 If A1 An E B is a collection of painvisedisjoint subsets then PUz Az Zi PUD CMF SCI 689 p451 Conditional Probability and Bayes Rule Given two events A and B in some sample space S the conditional probability mm Example Let B denote the event of getting an even roll and let A denote the 1 event of getting a 2 PAB g 2 a Note that PABPB PA F B PB F A PBAPA This immediately gives us the celebrated Bayes rule PAB P 3121 A CMF SCI 689 p551 Marginal and Conditional Independence Two events are marginaly independent if PA n B PAPB eg let A 2 Rain in Amherst and let B 2 Bush wins reelection in 2004 Two events are conditionaly independent given a third event if PA BC PAC PBC Let A Small Shoe size B No Gray hair 0 Age30 Note that in this case A and B are not marginally independent CMF SCI 689 p651 Random Variables A random variable X is a function from a sample space S into the real numbers R We denote the value of the variable by Xs for elements E S of the sample space A random variable X induces a probability function PX S gt X where X is the range of X PXX PSj E S Xsj Examples of random variables Number of heads in an experiment where a coin is tossed 100 times number of times the word student appeared in an email document etc CMF SCI 689 p751 Graphical Models A graph G is a pair V E where V is a set of vertices or nodes and E is a set of edges If the graph is directed the edges are a set of ordered pairs of vertices If the graph is undirected the edges are a set of unordered pairs of vertices We want to define probability distributions on graphs and we will do this informally for the moment CMF SCI 689 p851 Discrete Directed Graphical Model C PS a 10 PC 5 PW 99 90 90 00 mmaam manJaw PR a 80 20 The joint distribution over this graph factorizes as PC s R W PCPSCPRCPWS R CMF SCI 689 p951 Discrete Undireeted Graphical Model 39 The joint distribution over this graph factorizes as 1 PABC E AB BO Here Z is a normalization constant and gbA and 3 are arbitrary realvalued potential functions of the form WC exp mm If the graph is undirected the probability distribution is called a random eld CMF SCI 689 p1051 Three Canonical Graphical Models diverging PBC A PBI A PC IA lt9 converging PBC PB PC G serial PCA BPCA CMF SCI 689 p1151 Analysis of Canonical Models 39 Diverging case B l OA PA 30 PM PAPBAPC A PM PBAPC A PBC A Serial case B l OA PA 30 PA B PBPABPC A 0 PBPABPC A PBPABPC A PBPAB Zo PC A PC A PC BA CMF SCI 689 p1251 Dseparation for Directed Graphs 39 A set of nodes X is said to be d separated from another set of nodes Y given a conditioning set Z if and only if the following is true of of every undirected path from a node in X to a node in Y 39 The path includes a diverging or serial node V and V E Z 39 The path includes a converging node V and neither V nor any of its descendants E Z 39 The Markov blanket of a node is the set of all its parents its children and the parents of its children A node is dseparated from all other nodes in a graph given its Markov blanket CMF SCI 689 p1351 Example of Dseparation 54 39 What is the Markov blanket of node D 39 Assume the immediate neighbors of node E are observed s E dseparated from B How about A CMF SCI 689 p1451 Conditional Independence for Undireeted Graphs 39 In undirected graphs conditional independence corresponds directly to graph separability A set of nodes X is conditionally independent of a set of nodes Y given a conditioning set of nodes Z if and only if every path from a node in X to a node in Y goes through a node in Z 39 s E conditionally independent of B given its neighbors 0 CMF SCI 689 p1551 Example Graphical Model Spark Plugs PGaugeFuel Fuel yes Gauge full 039 Gauge half 060 Gauge empty 001 PStarts yesFuel SparkPlugs Fuel yes Fuel no Spark Plugs yes 099 Spark Plugs no 001 CMF SCI 689 p1651 Example Graphical Model Spark Plugs PFuel Fuel yes 098 PSparkPlugs Spark Plugs yes 096 CMF SCI 689 p1751 Inference in Automobile Model Consider computing the probability PStarts YesFuel yes In general we want to compute queries such as PXE where E represents the evidence and X represents the query Both X and E can be a single node or a conjunction of nodes To answer such queries we always start with the joint distribution and then marginalize it as necessary To begin with we can express our query as PStarts Yes Fuel 2 yes PFuel yes PStarts YesFuel yes 2 We can compute the numerator by marginalizing over all values of the spark plug status SP and gauge reading G E Z PF yesPSPPGF yesPStarts yesF yesSP SP G CMF SCI 689 p1851 Inference by Enumeration in Automobile Model PSPYes 096 PGemptyFye PGfullFyes 001 PGemtI1yes pGfuIIFyes 039 A A p A A D 0 V 0 V 2 2 2 3 m A 3 ll H II gt m gt e a e a g u 0 I II n39 3 o 01 3quot 001 3 3quot o 99 099 3 o 99 gt 39 gt ll u 001 f m g 39i u 39i u g II 390 w 3 V II 39i 1 w w LI 390 gt gt gt gt a ll H II H 3 gt 9 I 9 I gt J n E n E u E PS 098001001 06001 039001004 039099 06099 001099096 09318 CMF SCI 689 p1951 Eliminating Redundant Steps Inference by enumeration is an exponential O2 algorithm If we examine the steps carefully we notice there are many redundant or repeated calculations in the calculation 2 Z PF yesPSPPGF yesPStarts yesF yesSP SP G In particular with some thought we can reorder the computation of the numerator as PF yes ZPSPPStarts yesF yesSP Z PGF yes SP G We can simplify even further by noting that the sum over G 1 giving us PF yes 2 PSPPStarts yesF yes SP SP CMF SCI 689 p2051 Smarter Inference in Automobile Model Dramatic savings in computation from 9 multiplications and 5 additions down to 3 multiplications and 1 one addition 098 PFyes PSPYes PSP No 0 96 004 No 001 099 YesFyes SP Yes FyesSPyes P65 PS O 098 001oo4 99096 09318 CMF SCI 689 p2151 Inference in Graphical Models Many clever insights have been developed to make inference more efficient Variable elimination use dynamic programming to cache intermediate terms that are summed over repeatedly Junction tree algorithm exploits a beautiful correspondence between the computation steps and operations on the underlying graph Stochastic sampling approximate inference by sampling the network Complexity of exact inference 39 Lineartime for special types of models eg hiddenMarkov models and hierarchical HMMs 39 Polynomial time for polytrees Pearl 39 NPhard for general DAGs Cooper 39 Approximate inference remains NPhard Dagum and Luby CMF SCI 689 p2251 Hidden Markov Models a a Actions 69 e o o 39 Hidden Markov models 39 Abstract Markov Models hierarchical HMMs Observations 39 Markov decision processes Partially observable MDPs CMPSCI 689 p2351 Cumulative Distribution Function 39 The cumulative distribution function cdf of a random variable X denoted by FXac is defined as FXac PX X 3 ac Example Suppose we keep tossing a coin till we get a head Let the random varible X be the number of coin tosses Show that the CDF FXac 1 1 pm 139139P K7 CMF SCI 689 p2451 Cumulative Distribution Function 39 We begin by noting that if we toss a coin ac times before getting a head then PX xlp 1 pm1p 39 Next we observe that the CDF requires us to sum up all the probabilities up to and including as 1 1 Pm FXX 06 EW W 2p 1 1 Pgt i1 CMF SCI 689 p2551 Probability Mass Function for Binary Distributions For a binary random variable X the probability mass function is fXac PX 2 ac where ac 0 orx 1 Consider a binary rv X Let PX 1 6 and PX 0 1 6 A Bernoulli distribution is defined by the following PMF fx009 9m1 9V A geometric distribution is defined by the following PMF 1 93 1 0 fx00p p p 96gt 0 I QCSO A binomial distribution is defined by the following PMF fXxngtP pm1pnm 1 x20 0 xlt0 CMF SCI 689 p2651 Multinomial Distribution A multinomial variable ac can be viewed as a vector of M binary variables ac 901 acMT where only one 98k 1 and the remaining variables are 0 We denote the probability that X 2 by 6 In any given experiment let the indicator variable xi denote the outcome of a sample taking on value 2 Then the multinomial pdf of a dataset D of size N is given by fXx6 61 6ij Note the constraints 2 6i 2 1 and 2 xi 1 CMF SCI 689 p2751 Example Generative Model of Documents 39 Consider the problem of classifying a given document di into a category cj A document is an ordered collection of word events wdi1 where wt is a word in the vocabulary V 101102 lel We assume that the document size is marginally independent of the category but that the word probabilities are conditionally independent given the category Further we assume that the conditional word probability is marginally independent of its location in the document Idil Pdz 0j79 Pdz H Pde0jgt9 k1 39 We represent each wordcategory probability as a multinomial distribution 6l7jmlj For English this random variable takes 50 000 values I QMVM 1 Idll Pdz 0jgt9 Pdz H k ll 139 V CMF SCI 689 p2851 Continuous Distributions in Graphical Models CMF SCI 689 32951 583 I owe amass 0 NM z 0 0000 0 m uwmmwu mmw u i 0000000 h 00 A g aa wag t to qu ituui uoo zz a 000000 00 to Wai iu mvc i g i 309 039 6 foo 4 I 2 923353 352 133920 E maous bma msosiEoU 02555 5 Ummawmcsaosm E 03658 25 WAoEuzmsdma E 03 om Sm op o 3 com o Oi mQ 0mm I vaimn Continuous Distributions in Graphical Pclh 04 035 03 025 02 015 01 in39l I o 0 05 39 quot39t D 4 1 t I39 s 5 t u Vang iii was Models CMF SCI 689 p3251 Probability Density Function For a continuous random variable X the probability density function f X ac is such that FX00 2C fXtdt forall ac A PDF or PMF satisfies these properties 39 fXx 2 0 for all ac 39 Em fx00 1 0r ff fxxdx 1 The CDF of the logistic distribution is given by 1 1 CC Hence its PDF is given by CMF SCI 689 p3351 CDF 0f Logistic Distribution PXlt x 09 08 07 06 05 04 03 02 01 CDF of Logistic Distribution 10 10 CMF SCI 689 p3451 PDF of Logistic Distribution PX x 09 08 07 06 05 04 03 02 01 PDF of Logistic Distribution 39 exp x22 7 10 10 CMF SCI 689 p3551 Expectation and Variance The expected value or mean of a random variable X is EX ff foxdx Xis continuous 230625 00PX x Xis discrete The variance or average squared deviation from mean of a random variable X is VarX EX EX2 Example Let X N Ua b uniform rv CMF SCI 689 33651 Expectation Properties Expectation is a linear operation this property and convexity of functions underly Jensen s inequality and the EM algorithm Nested expectation EEX EX Expected deviation around mean is 0 EX EX 0 Prove the following properties of variance varX EX2 EX2 Vara1X1 l l aan aVarX1 l l aiVaNXn CMF SCI 689 p3751 Expectation minimizes squared error loss Theorem Show that expectation minimizes squared error loss EX 92 2 Clearly we only can control the second term above and it is minimized by choosing b EX CMF SCI 689 33851 Moment Generating Function The moment generating function of a random variable X is Mt EetX 2 emfxdx Let us examine the derivates of Mt att 0 M t 2 diff 2 OO we xdx Thus att 0 we find that Similarly it is easy to show that M 0 EX2 Giving us the identity VarX M 0 M 02 CMF SCI 689 33951 Joint marginal and conditional density functions 39 Let X Y be a bivariate random variable The joint density mass functions can be defined as follows fX xay 2 way 2 PXY E A Afxydx dy 39 The marginal density function is defined as mac owMy f 2 w 39 The conditional density function is defined as fXYx7y kwwmPmmyw fX CMPSCI 689 p4051 Covariance Given two random variables X and Y the covariance between them is defined as 0XY EX EXY EY The covariance can also be written as show this 039XY EXY EXEY This makes it clear that variance is simply the covariance of a random variable with itself When the random variable X is a vector the covariance of X becomes a matrix The full study of covariance matrices is postponed to next lecture since it requires the tools of linear algebra CMF SCI 689 p4151 Conditional expectation The conditional expected value of a random variable gY given another variable X is defined as EYX ZyPO le 00 Note that EYX is a function of X not a constant unlike EX Double conditioning EEYX EY EY ZyZPXxYg zzypltxxyygt CC Z yPY yX 90 PX x CMF SCI 689 p4251 Fundamental Result 0n Variance 39 Lemma varY EvarYX varEYX Proof Note that varY EY2 EY2 EEY2X EmaIX 2 Also note that emuIx EltY2Xgt ltEltYXgtgt2 EmmaIX EltEltY2Xgtgt EltltEltlegtgt2gt If we substitute the above into the expression WHY EvarYX we get varEYX CMF SCI 689 p4351 Fundamental Theorem of Regression 39 Regression means estimating a function hX of the input variables X that minimizes the meansquared error loss function with respect to the true value Y associated with X Theorem The conditional mean EYX minimizes the mean squared error loss function over all possible functions hX EY hX2 Z EY EYX2 Proof Left as a homework exercise extend the earlier proof CMF SCI 689 p4451 PDF and CDF of N ormal Distribution Normal density dnormx dnormx ltr 0 lt0 3 O 3 E o i S 7 5 5 s g F O O o i i i i i 4 2 0 2 4 X Normal Cumulative normx Worm 0 0 o 3 3 0 E o39 o E 3 E V 7 9 O s z a N o O 7 i i i CMPSCI 689 p455 4 2 0 2 4 Normal Distribution 39 Consider the following integral I foo 6 y22dy 39 To evaluate this integral requires the following trick Instead of computing 1 we proceed to compute I2 I2 foo 6 m22dx foo 6 y22dy CMF SCI 689 p4651 Normal Distribution Let us change to polar coordinates setting ac mine and y r cos 6 27139 00 2 2 2 Z 7 81n 0 7ocos 0 2r O O 27139 00 2 6 7 2 we 0 277 0 OO 2 f6 7A2rdr 0 274 642 8 27 We have just derived the PDF of the normal distribution namely 00 1 2 6 74 2d 2 1 00 V 277 y CMF SCI 689 p4751 General Univariate Normal Distribution 39 Let us introduce the change of variable 39 Then we can easily show that x 2 1 m 2202dx 1 00 27702 39 We can also show that Efx M and that mem 02 CMF SCI 689 p4851 Multivariate Normal Distribution 2D Normal CMF SCI 689 p4951 Sum of Random Variables converges to Gaussian Let us try to motivate the normal Gaussian distribution through a different route Suppose we consider sums of random variables Sn where each variable Xi has a finite mean u and variance 02 Define the standardized sum 8 It follows that ESj 0 and varSj 1 Central limit theorem For large n Sf is approximately standard normal More generally a sum of normally distributed variables is normal In the multivariate case projections of normal variables remain normal CMF SCI 689 p5051 Gamma Function 39 We now introduce a key distribution based on the gamma function W yaleydy 0 Using integration by parts we can show that PCK 2 CK 1I cr 1 If a is an positive integer gt 1 then it follows that Her 2 a 1 CMF SCI 689 p5151 Gamma Distribution Let us introduce a change of variable y 905 ma O ltgtO 1e mla 1 dx 5 Thus the general form of the gamma distribution can be written as 0 1 fxcv O Pa axO 1 dx The gamma distribution is important because many others emerge as special cases of it For example setting a 1 and B 2 gives us the wellknown exponential distribution f ac Ace M Also if we set a r2 and B 2 we get the wellknown X20 distribution 1 Z 7o2 1 c2 x Pr22r2x e CMF SCI 689 p5251 Gamma Distribution 39 The mean of the Gamma distribution can be computed as 00 1 a l EX O Pmmaxw e dx 39 Notice that the kernel of the above integral is actually a Gammaa 15 pdf We can use this fact to write 1 EX 2 f0 Pmmaxo e 1 a 041 mama 15 CKWCKW PM Cv Elsa 39 The variance of a gamma distributed random variable is varX a62 CMPSCI 689 p5351
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'