Review Sheet for CMPSCI 689 at UMass(1)
Review Sheet for CMPSCI 689 at UMass(1)
Popular in Course
Popular in Department
This 48 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 17 views.
Reviews for Review Sheet for CMPSCI 689 at UMass(1)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Review of EM and Mixture Models Intro to Hidden Markov Models Sridhar Mahadevan mahadevacs umass edu University of Massachusetts Sridhar Mahadevan CMPSCI 689 p14 The General EM Algorithm 39 Initialize parameters 60 39 Repeat until convergence 1 Expectation Step At the j 1th step compute the expected loglikelihood function map Eme lXobs ej log PX6 Z Pam mobs 92 log PX6 m7n 2 Maximization Step Find a new parameter setting 6H1 such that W argmax9Q66j 3 Setjlt j1gotostep1 Sridhar Mahadevan CMPSCI 689 p24 Mixture Models Formal Derivation We now formally derive the EM algorithm for the general case of mixture models a g a PX0bs6 ZWiPXobs6i z1 where the mixing parameters 7 are unknown except that 21 m 1 The component mixture densities are also unknown but let us assume they are normally distributed with parameters 6 W a Let us now postulate latent binomial indicator variables that specify which mixture each sample came from The complete dataset is now X XObSXmS where Xms ZlTZ2TZT zij Zji is a binary variable that indicates whether sample 9 comes from mixture 2 Sridhar Mahadevan CMPSCI 689 p34 The EStep 39 Now we have to compute the loglikelihood of the complete data 3 1i1 n g n 9 ZZZ 10g7 l il ZZZZJ 31 z1 j1 i1 39 Now the missing data here are the zij Since the complete loglikelihood above is linearin the missing data it follows that the Estep in EM simply corresponds to replacing the zij with its expected value conditioned on the observation xj and old parameter setting 6 k 76 WP an 6 27 E gm llxj k z 2 z W ZX 9 a 39 M Pltm6 gt Sridhar Mahadevan CMPSCI 689 p44 The M step 39 The Mstep correspods to finding a new parameter setting 6k that maximizes the complete data loglikelihood Jensen s inequality assures us this is a lowerbound on the observed data loglikelihood 39 To compute the maximum likelihood over the complete dataset we introduce a Lagrangian multiplier term 231 m 1 so that the gradient maximization now becomes amen a Ak 9 8 7 amltz Zijlogmeltz rl j1z 1 i1 39 where we have dropped the term that did nt depend on 7 a 8lC76X 213 Z 377i 2 7n Sridhar Mahadevan CMPSCI 689 p54 The M step 39 Setting the above to 0 we first solve for A Ale Zz A A 02 25Am 7T 71 j1 39 We use the identitites 2 m 2f 25 1 in solving for A 39 Using this value of A in the original gradient gives the desired value for the mixing parameter m Z Z A asz wzmngzzg j1 j1 7 k nek Z Sridhar Mahadevan CMPSCI 689 p64 MStep Component Densities How do we estimate the parameters of the component densities We now turn to the second term in the loglikelihood equation M73930 2 n g n 9 23 logm Z logPxj 1 j1 z1 jli Expanding the second term only we get a m m 8lc7r9X a 10g 1 3202 ij 2 8m 8m j1z1 270i Solving for W gives us a a n 9 j l2 3l077797X 3 E 2k 10g 1 6 203 ij 2 8m 8m 31 Z1 270i 7L 2 Z M SridharMahadevan CMPSCIGSQ p74 EM Algorithm for Mixture of Univariate Gaussians Expectation Step initialize 6 fro g2 ampZ20 WfPWluf 63 23 21 WfPWIMf 522 Maximization Step A A Ah 1 k1 231 221 z 2 TL A n A 2 291 z 291 n A Ak 7T E zijn j1 Sridhar Mahadevan CMPSCI 689 p84 EM Algorithm for Mixture of Bivariate GallSSlaIlS Ex I A0 A0 quot0 pectatlon Step Inltlallze 9 71 Mi 2 k k Ak 2k 2 77 PWJ IMZ 7 2i Z 9 k k Ak i1 71Ple zazz Maximization Step 71 Ak n Ak Ak1 Ak1 T 1 231 27 ij k1 231 zijxj Mi xj Mi j1 ij 21 ij gtgt II M 3 Ngt S 3 Sridhar Mahadevan CMPSCI 689 p94 EM for TwoMixture Bivariate Gaussian Generalize previous expressions from univariate to bivariate case 10 Density1 Density2 Mixture 12 0 10 10 2 5 8 4 6 6 4 2 8 0 10 10 20 10 5 0 X104 log likelihood mixing param 0 15 08xxxxxxxx 05 10 06 5 1 04 0 S I 15 02 9K 4 ate amass 5 3 2 10 0 0 5 10 10 0 10 20 0 5 10 Moliulldl Ivlailauc all CMF SCI 689 p104i Hidden Markov Models HMMs can be viewed as extending mixture models to include state transition dynamics For HMMs the IID assumption no longer holds true at the level of individual samples However we will assume that sequences of observations are IID Later in hierarchical HMMs we will relax the IID assumption at the level of subsequences as well Sridhar Mahadevan CMPSCI 689 p114l Applications of Hidden Markov Models Perception face recognition gesture recognition handwriting speech recognition Robot navigation Biology DNA sequence prediction Language analysis part of speech tagging Smart rooms wearable devices Sridhar Mahadevan CMPSCI 689 p124 Hidden Markov Models References Model developed by Baum in the sixties The training method was later found to be an instance of the EM framework the wellknown BaumWelch algorithm HMMs are instances of dynamic Bayesian networks DBNs Dean et al 1989 Murphy 2002 L R Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition Proceedings of the IEEE 77 no 2 257 285 February 1989 Fred Jellinek Statistical Methods in Speech Recognition MIT Press 1997 Sridhar Mahadevan CMPSCI 689 p134l HMMs for Speech Recognition Russell and Norvig 2003 Analog acoustic signal AA Iv V w Sampledaqm39md I IH I II III digital signal 10 15 38 22 63 24 10 12 73 u I Frames With features Phone HMM for m 03 09 04 Output probabilities for the phone HMM Onset Mid End C1 05 C3 02 C4 01 C2 02 C4 07 C6 05 C3 03 C5 01 C7 04 Sridhar Mahadevan CMPSCI 689 p144 HMMs for Speech Recognition Russell and Norvig 2003 a Word model with dialect variation Sridhar Mahadevan CMPSCI 689 p154 Graphical Model Representation of HMMS H gt i y0 V1 y2 yT 39 Note how conditioning on a state qt dseparates the observation sequence into three categories current observation yt the past yo yt1 and the future yt17HwyT39 Sridhar Mahadevan CMPSCI 689 p164 HMM De nition 39 A finite set of states Q 39 lQl M 39 q is a multinomial random variable 2 1 for some particular value of z and 0 otherwise 39 An initial distribution 7r on Q where 7 13chj 1 39 An observation model 9 Pytqt where the space of observations is discrete or continuous realvalued 39 We ll solve the discrete multinomial case for simplicity mj is the probability that state 2 produces observation 3 39 Using the previous week s notes you can work out the univariate or multivariate Gaussian observation model case 39 A transition matrix aij Pm1 qi Stationarity assumption transition probabilities are timeinvariant Sridhar Mahadevan CMPSCI 689 p174 Markov Properties of HMMS 39 The future is independent of the past given the present PQt1QtQt 17QO PQt1Qt 1330 yqut Py0ythtPyt1 H yqut 39 HMMs are more powerful than any finite memory device Pyt1ytayt 17ayt k 75 Pyt1yt7yt 1739H ayt kJrl Sridhar Mahadevan CMPSCI 689 p184 Basic Problems in HMMS Likelihood Given an observation sequence Y 2 mm yT and a model a A n 7 determine the likelihood PY 93 Filtering Given an observation sequence Yt 2 mm yt and a model A Q7r determine the beliefstate Pqt Prediction Given an observation sequence Yt yoyl yt and a model A Q 7r determine the posterior over a future state Pq3 Yt s gt t Smoothing Given an observation sequence Yt yoyl yt and a model A Q 7r determine the posterior over a previous state Pq3 Yt g s lt t Most likely explanation Given an observation sequence Y yoyl yT and a model Q A Q 7r find the most likely sequence of states qoql qT given 3 that is compute argmaxq0 quPq0 q y 7 Learning Find the model parameters such that H PYi g is maximized over multiple independent IID sequences Yi Sridhar Mahadevan CMPSCI 689 p194i Paths O Wall Wall Opening Wall PWall 1 a12 PWall 2 2 O 2 30 O 3 4 O 4 a23 POpening 3 PWall 4 Sridhar Mahadevan CMPSCI 689 p204 Inference in HMMS 39 The complete data for an HMM is the output sequence y produced along with the hidden state sequence traversed Pq y T l T Pqo H Pqt1qt139 Pltytqt 750 250 T l M T M i j N i j 71 16 H aijqtqt1 H Hltnijqtyt t0i1j1 Ez T l T 77610 H aqtacItJrl Pytqt 750 250 Sridhar Mahadevan CMPSCI 689 p214 Maximizing Observed Data Likelihood is Dif cult 39 The inference problem is computing the probability of a state sequence Pqy 135 39 To get the probability of the output 3 we have to sum over all possible state sequences which is intractablel T l T Z 277610 H aCIt7Qt1 H Pytqt qo C11 CIT t0 t0 39 It is obvious that we cannot easily maximize the observed data s likelihood by analytically solving a a argmax l6y argmaxg 10gPy Sridhar Mahadevan CMPSCI 689 p224l Structuring Inference in HMMs 39 We can condition on a particular state qt to decompose the inference 7qt Pqty PyqtPqt Py PLUG agilqtPyt1w nyIQtPQt Py 1330 7yt7qtPyt17H yqut Py Qtwwt Py Note that Pltygt 2 mama Sridhar Mahadevan CMPSCI 689 p234 Computing the Forward Variable 39 We can define a recursive procedure for computing the forward variables by exploiting the dynamic programming trick of caching intermediate solutions qt PyOgtyt1Qt1 PyOayt1IQt1PQt1 13907 ytht1Pyt1IQt1PCt1 PyOyt79t1Pyt1IQt1 ZPyOwyt7qtQt1Pyt1lCJt1 qt ZPyOytQt1IQtPQtPyt1IQt1 qt ZPyOythtPqt1IQtPCtPyt1IQt1 qt 2 aqtaCIt7Qt1 Pyt1IQt1 qt Sridhar Mahadevan CMPSCI 689 p244 The Forward Algorithm 39 Initialization aq0 Py0q0 wq0Py0qo where 1 3 go 3 M 39 Induction crqt1 2 aqtaqt7qt1Pgt1qt1 where 0 g t g T 1 and 1 S qt7qt1 S M 39 Time complexity Each aqt computation takes 0M2 since we have to iterate over all M values of qt and qt1 Totally the forward algorithm takes 0M2T since we have to iterate over each time step from t 1 T Sridhar Mahadevan CMPSCI 689 p254 Computing the Backward Variable 39 A recursive calculation of the backward variable can be developed as follows 50 Pyt17yTCIt Z Pyt1wayTQt1IQt qt1 Z Pyt1ayTIQt1aCItgtPltCIt1IQtgt qt1 Z PJt2 7yTth17QtPyt1IQt1PCt1Ct qt1 Z Bltqt1Pyt1lqt1GQt7Qt1 qt1 Sridhar Mahadevan CMPSCI 689 p264 The Backward Algorithm 39 The 5 variables can be calculated recursively also except we proceed backwards Initialization Define 5qT 1 Induction 5Qt thH 5qt1Pyt1lqt1aCIt7Qt1 Where 1 g t 3T 1 and 1 g qtqt1M Time complexity 39 Each 5qt computation takes 0M2 since we have to iterate over all M values of qt and qt1 39 Totally the backwards algorithm takes 0M2T since we have to iterate over each time step from t 1 T Sridhar Mahadevan CMPSCI 689 p274 Most Likely State The most likely state at any time t can be readily calculated from the a and 5 variables We can then define the most likely state as qltVILS argmax1gz gM7Q where the variable 7qu can be easily calculated as aqi5q M Zja q gt5ltq gt Sridhar Mahadevan CMPSCI 689 p284l Most Likely State Sequence If we concatenate the most likely state at each time t into a sequence it might not be connected Another approach is to compute the most likely sequence given the observation sequence y and model g Let us define the probability of the most likely sequence that accounts for all observations upto time t and ends in state q as a Sta max Pq0qtiy0yt6 q17q277qt 1 We can define this probability by induction as 6t1z39 max6tjaqjqi Pyt1IQ1 j t t1 Sridhar Mahadevan CMPSCI 689 p294l The Viterbi Algorithm The Viterbi procedure keeps track of both the most likely state at a given time and the probability associated with it in two arrays 39 Initialize 39 Recursion 6timaX6t 1ja j Z Pwthi 1 93 1 2 M j qt7qt1 2mm 2 argmaxlgjgM 6t 1jaqj C1 1 g t g T 1 g 239 g M t 1 Sridhar Mahadevan CMPSCI 689 p304 The Viterbi Algorithm 39 Termination P 1g gtlt wlt6Tltzgtgt q argmaxlSiSM T 239 39 Path computation q Zwt1q17 Sridhar Mahadevan CMPSCI 689 p314 Viterbi for Rainy Days Russell and Norvig 2003 RH PRt I 07 f 03 Ramt 1 Ramt Ramt 1 Rt PM I 09 f 02 Sridhar Mahadevan CMPSCI 689 p324 Viterbi for Rainy Days Russell and Norvig 2003 Rain 1 Rain 2 Rain 3 Rain 4 Rain 5 true true true true true false false false false false Umbrella t true true false true true b m11 m12 m13 m14 m15 Sridhar Mahadevan CMPSCI 689 p334 Learning the HMM Parameters 39 The a and 5 terms allow us to estimate the observation model from sample sequences To estimate the transition matrix we introduce a new variable qt7qt1 Pqtqt1y Pqutqt1PQt1IQtPQt Py Py0 yt qtPyt1qt1Pyt27 7 yqut1Pqt1qt Py qtP3t1lqt15qt1aqtqt1 Py Sridhar Mahadevan CMPSCI 689 p344 Maximum Likelihood for HMMS 39 Generally we want to find the parameters that maximizes the log likelihood a log Py6 where T l T 10gpyl5 10gZZmZ7TqO H 61th H Pytht 150 250 q0 q1 CIT 39 As before we note that this involves taking the log of a bunch of summations and it is pretty difficult to optimize it Sridhar Mahadevan CMPSCI 689 p354 Complete Loglikelihood for HMMs 39 To simplify the maximum loglikelihood computation we postulate a complete dataset y q where lc9qy M Z T l M 1 j T M N logH7TZq0 H H aijqtqt1 H H Hmmwwt z1 750 z 31 750 i1j1 M T l Sridhar Mahadevan CMPSCI 689 p364 M Step of EM for HMMS 39 We find the ML parameter estimates using partial derivates ma a4 8 T4 M z 239 M M T Z a qtqt110gazjZMZaiFD am a t0z j1 z 1 j1 T 1 i j C C 2 Z t t1 i 0 250 a T l 39 A 2150 135611 x Z T l j M A 2150 ngt1 am 2 W explonting that Zaij 1 Zj1Zt0 qtqt1 11 Sridhar Mahadevan CMPSCI 689 p374 M Step of EM for HMMS 39 Similar calculation for the other parameters give us 39 N 7 M exploiting that 77 1 Zk1 2250 qut j1 Z 7Tz610 39 These estimates are intuitive since for example 7 is simply counting the fraction of times observation 9 occurs at time t when the state is z Sridhar Mahadevan CMPSCI 689 p384 Estep for HMMS 39 Recall that the Estep replaces the values of the missing data by their predicted values conditioned on the observed data and the old parameter values T l a q 150 z 15 Emjly 533 39 We qi1y7 9 Eqiy9kyi M e N l l o m Pq lly yi Me N l l o W M e is m N l l o T l T l 39 Hk 39 We Eqiqi1ly6 Z Pq qi1ly6 150 150 T l ij 15151 O Sridhar Mahadevan CMPSCI 689 p394 a l l MStep Reestimation of HMM parameters 39 Putting the expected value estimates in the earlier ML formula gives us 728m 2 NZtToTviyi Z ZtTTomiyi 2161 Zt0 Vigil 2150 7 Alg rl Z 231 2150 55 2250 7 7 EH1 78 39 These equations can be readily generalized to the case when the observation model is a univariate or multivariate Gaussian instead of a multinomial Sridhar Mahadevan CMPSCI 689 p404 Extensions of HMMS Embedded HMMs each state is viewed as a sequence of HMMs SemiMarkov HMMs state durations are not exponential but arbitrary Hierarchical HMMs multilevel treestructured models which are a special case of probabilistic contextfree grammars Abstract Markov Models HHMMs with statemediated transitions Sridhar Mahadevan CMPSCI 689 p414 888 Embedded HMMS Sridhar Mahadevan CMPSCI 689 p424 Embedded HMMS 6 3 9 9 This model can be viewed as a mixture of simple HMM models Sridhar Mahadevan CMPSCI 689 p434 Hierarchical HMMS 39 Fine Singer Tishby The Hierarchical Hidden Markov Model Machine Learning 1998 04 OBSERVATION MODEL FOR ss FRONTW01 009 LEFrwo9 001 BACKW01 009 RIGHTW09 001 Sridhar Mahadevan CMPSCI 689 p444 Using Hierarchical HMMS in Robot Navigation 39 Actual model used extends hierarchical HMM to include temporally extended actions like exit corridor hierarchical POMDP Observation vectors Front Left Right Back Wall Opening Door on Right Stripe on Right Wall 39 See Theocharous Rohanimanesh and Mahadevan ICRA 2001 Theocharous Murphy Kaelbling IJCAI 03 for details 82 83 B 0 i 0 30quot 0 quotquotquotquotquotquotquotquotquotquotquotquotquotquotquot quot390 O quot84quot Q 81 O A Sridhar Mahadevan CMPSCI 689 p454 Foveal Face Recoglition using HMMs Minut Mahadevan Henderson and Dyer quotFace Recognition using Foveal Visionquot in Lecture Notes in Computer Science Biologically Motivated Computer Vision SeongWhan Lee Heinrich H Bulthoff Tomasio Poggio editors vol 1811 pp 424433 SpringerVerlag 2000 smmmaw cMVsciaEgrprxl Comparing Sliding Windows With Foveation 39 Competing approach slide a window down of fixed size Samaria PhD Cambridge AVERAGE RECOGNITION RATE 100 80 60 40 20 PERFORMANCE OF SUBSAMPLED VS FOVEAL HMM CLASSIFIERS WdMENFOVEAL WOMENSUBSAMPLED gt4 746quot K T V XVV N 3 4 5 10 6 7 8 NUMBER OF STATES 11 Sridhar Mahadevan CMPSCI 689 p474 Abstract Hidden Markov Model Bui Venkatesh West JAIR IJCAI 03 Level 2 E2 E1 H2 H1 El H2 H1 Sridhar Mahadevan CMPSCI 689 p484
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'