### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 621 Note for STAT 597A at PSU

### View Full Document

## 36

## 0

## Popular in Course

## Popular in Department

This 62 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Pennsylvania State University taught by a professor in Fall. Since its upload, it has received 36 views.

## Similar to Course at Penn State

## Popular in Subject

## Reviews for 621 Note for STAT 597A at PSU

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Hidden Markov Model Hidden Markov Model Jia Li Department of Statistics The Pennsylvania State University Emaii panama psu edu http www stat psu eduNnah J15 L1 Hidden Markov Model gt deden Markov mode s have dose connectwon wwth rmxture mode s gt A rmxture mode C generates data as 7 foHows 3 ha L m mm me P5 FemN113 Hidden Markov Model gt For sequence or spatial data the assumption of independent samples is too constrained gt The statistical dependence among samples may bear critical information gt Examples gt Speech signal gt Genomic sequences J15 L1 httpwwwstatp H widen Mark Model Setup gt Suppose we have a sequential data u u1u2mutmur U 6 Rd gt As in the mixture model every ut t 1MT is generated by a hidden state 5 ha L httpwwwstatpsueduJah en Markov Model gt The underlying states follow a Markov chain gt Given present the future is independent of the past P5r1 i 575717w750 P5r1 i51A gt Transition probabilities 3kIP51i5k7 k12m7 M where M is the total number of states nitia probabilities of states m M M Zak 1 for any k727rk 1 11 k1 J15 L1 httpwwwstatpsuedu en Markov Model gt 517527 ST P51P52i51P53i52 PSTiST1 7151 35152352S3 39 39 39 357457 gt Given the state st the observation ut is independent of other observations and states gt For a fixed state the observation ut is generated according to a fixed probability law J15 L1 httpwwwstatpsuedu Hidden Markov Model gt Given state k the probability law of U is specified by bku gt Discrete suppose U takes finitely many possible values bku is specified by the pmf probability mass function gt Continuous most often the Gaussian distribution is assumed 1 1 z 71 bku W eXP Mk k U Mk J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt In summary Pus PsPus W1b1u1a152b52u2 aST71gt5Tb5TuT 39 Pu Z PsPu 5 total prob formula 5 Z 7quot51b51 135152b52 2 asrimrbser 5 J15 L1 httpWWWstatpsueduleall Hidden Markov Model Example J15 L gt Suppose we have a video sequence and would like to automatically decide whether a speaker is in a frame gt Two underlying states with a speaker state 1 vs without a speaker state 2 gt From frame 1 to T let st 1 17 T denotes whether there is a speaker in the frame gt It does not seem appropriate to assume that st39s are independent We may assume the state sequence follows a Markov chain gt If one frame contains a speaker it is highly likely that the next frame also contains a speaker because of the strong frameto frame dependence On the other hand a frame without a speaker is much more likely to be followed by another frame without a speaker http www stat psu eduijh Hidden Markov Model J15 L For a computer program the states are unknown Only features can be extracted for each frame The features are the observation which can be organized into a vector The goal is to figure out the state sequence given the observed sequence of feature vectors We expect the probability distribution of the feature vector to differ according to the state However these distributions may overlap causing classification errors By using the dependence among states we may make better guesses of the states than guessing each state separately using only the feature vector of that frame http www stat psu eduijh Hidden Markov Model Model Estimation gt Parameters involved gt Transition probabilities a k1MM gt Initial probabilities m k 1MM gt For each state k Mk Xk J15 L1 httpWWWstatpsueduleall Hidden Markov Model Definitions gt Under a given set of parameters let Lkt be the conditional probability of being in state k at position 1 given the entire observed sequence u L117 Hg LIT Lkf Pst klu Z Ps i u5 k gt Under a given set of parameters let Hkt be the conditional probability of being in state k at position 1 and being in state I at position 1 l 1 ie seeing a transition from k to I at t given the entire observed sequence u Hklt Pstkst1 Ilu Z Ps l umst we I gt Note that Lkt 271 Hkt 2241 Lkt 1 J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Maximum likelihood estimation by EM gt E step Under the current set of parameters compute Lkt and Hklt for k1M7 M t 1m7 T gt M step Update parameters Mk w zzT1 Lkt 2k 2 mm 7 mm 7 mt 211 211 HkJU 3 Lat am J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Note the initial probabilities of states 7rlt are often manually determined We can also estimate them by T M Irk X ZLkU Z kl 151 k1 or Irk X Lk1 J15 L1 httpWWWstatpsueduleall Hidden Markov Model Comparison with the Mixture Model J15 L gt Lkt is playing the same role as the posterior probability of a component state given the observation ie ptk LkU Ptk P5t klU17L 27u7Ut7 Pst klut 7 HT Ifwe view a mixture model as a special hidden Markov model with the underlying state process being iid a reduced Markov chain ptk is exactly Lkt http www stat psu eduijh Hidden Markov Model gt The posterior probabilities ptk in the mixture model can be determined using only sample ut because of the independent sample assumption gt Lkt depends on the entire sequence because of the underlying Markov process gt For a mixture model we have T 251 PtkUt Mk T Zr1 Ptk T X Zr1 Ptk t Mkllut Mklt k i T Zt1 ptk J15 L1 httpWWWstatpsueduleall Hidden Markov Model Derivation from E M gt The incomplete data are u ut t 7 17 T The complete data are x st7 ut t 17 T gt Note Q0 i0 Eogfxi0 iu0 gt LetM 12M J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt The function fx 0 is fX 0 PS0 PU 570 P5 321 I 9 6MPU 57127 LikeV1 T T W51H35t71 t X H Put USN Zst 39 132 t1 We then have T 0g7r1 2 log 3 152 log fx 0 T 2 log PUt Malst 1 151 J15 L1 httpWWWstatpsueduleall Hidden Markov Model J15 L1 Eog fx 9 U7 0 Z Psu0 T Zlog PUt HQHZ SE t 1 T 0g7r1 2 log a m 152 M T M Lk1 Iog7r k Z Z Z Hklt log 32 132 k1 1 M Z LkU log P t ML XL 1k1 M x H M 1 http www stat psu eduijh Hidden Markov Model gt Prove the equality of the second term T Z Pslu0 2 log aSHSt 132 M Z Z His1 log 32 k1 1 s T t2 Similar proof applies to the equality corresponding to other terms J15 L1 httpWWWstatpsueduleall Hidden Markov Model J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt The maximization of the above expectation gives the update formulas in the M step gt Note that the optimization of In L can be separated from that of 32 and wk The optimization of 32 can be separated for different k gt The optimization of p and X is the same as for the mixture model with ptk replaced by Lkt J15 L1 httpWWWstatpsueduleall Hidden Markov Model Forward Backward Algorithm gt The forward backward algorithm is used to compute Lkt and His1 efficiently gt The amount of computation needed is at the order of M2 T Memory required is at the order of MT gt Define the forward probability akt as thejoint probability of observing the first 1 vectors uT 739 1t and being in state k at time t akt Pu17 Hg uhst k J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt This probability can be evaluated by the following recursive formula 041 kaku1 1g k g M M Mr bk tZO lt alk7 11 1ltt T1 k M J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Proof W3 PU17U27 Med k ZPtumuz ueseksH M ZPUL wueihiea 0 PM k l 571 hub 71 1 M Zat71PUh5tkl5t71 0 1 M Zat 1PUt l 5 Kira Pse k l 571 1 M Eatti 11PM l 5e HHS k l 571 0 1 M Eatti 11bkUeJSk The fourth equality comes from the fact given 51 s is independent of all 57 739 12 t 7 2 and hence U7 739 1 t 7 2 Also 5 is independent of u1 since 51 is given J15 L1 Hidden Markov Model gt Define the backward probability Bk as the conditional probability of observing the vectors after time t LIT 739 t 1T given the state at time t is k mt P t177 Tl5tk71 t T71 Set k73917 for all k gt As with the forward probability the backward probability can be evaluated using the following recursion BAT 1 M Mr Zakbut1 ltl 1 tlt T 1 J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Proof 511 Pu1muT1s k M ZPU17UT7511 l15 k 11 M ZP511 l15 kPUz17UT1511 LS k 11 M Zak1PUz17UT1511 0 11 M Zak1PUz11511 PUz27UT1511 7 U1 11 M Zak1PUz11511 PUz27UT1511 0 11 M E 3klblut1 lt1 11 C J15 L1 httpwwwstatpsueduijh Hidden Markov Model J15 L gt The probabilities Lk Lkt Hid http www stat psu eduijh t and His1 are solved by 7 u 7 Pust k Pst k i Pu Pgu kuwr Pst k75t1 I l u Pust k75t1 I Pquot akfaklblut1 lf 1 1 P00 Hidden Markov Model gt Proof for Lkt Pust k Pu17 ut unst k Pu17 uhst IltPut17 uT st k7 L117 ut aktPut1 uT st k 04kt kt J15 L1 httpWWWstatpsueduleall Hidden Markov Model J15 L gt Proof for Hkt P quot751 k75t1 I 0 L117 ut unst k7 5H1 I L117 uhst k U U ut1st1 I st k7 L117 ut P th27 uT 5H1 Ist k7 L117 ut1 OzkfPth175t1 I St k Put2 uT 5H1 I aktPst1 I st k Put1st1 Ist QBt 1 04kfakPUt15t1 IWf 1 akfaklblut1 f 1 http www stat psu eduijh Hidden Markov Model gt Note that the amount of computation for Lkt and Hkt kl1M t 17 T is at the order of MZT gt Note M P ZakU WkU for any t k1 gt In particular if we let 1 T M M Pm gamma 2am k1 k1 J15 L1 httpWWWstatpsueduleall Hidden Markov Model Proof J15 L1 PU Pu17 uhst kPut1uT st L117 ut aktPut1uT st M EM EM EM 04kt kt H http www stat psu eduijh Hidden Markov Model The Estimation Algorithm The estimation algorithm iterates the following steps gt Compute the forward and backward probabilities akt Bk k 1M t 17 T under the current set of parameters 041 kaku1 1g k g M M Mr bk tZO lt alk7 11 1ltt T1 k M BAT 1 M Bk Zakbut1Btl 1 fltT 1 J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Compute Lkt Hkt using akt Bk Let Pquot 221 04k15k1 mt Pgu kumr 1 Hkf Pmakfaklblut1 lf 1 J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Update the parameters using Lkt Hkt x Lka L Lkm k i w 211 LkU 211 HkJU 2 mo 39 akJ J15 L1 httpWWWstatpsueduleall Hidden Markov Model M ultiple Sequences gt lfwe estimate an HMM using multiple sequences the previous estimation algorithm can be extended naturally gt For brevity let39s assume all the sequences are of length T Denote the ith sequence by u u1u2uT i 17 N gt In each iteration we compute the forward and backward probabilities for each sequence separately in the same way as previously described gt Compute Lkt and His1 separately for each sequence also in the same way as previously described gt Update parameters similarly J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Compute the forward and backward probabilities ailt 591 k 1 M t 1 T i 1 N under the current set of parameters wkbkui11 k g M1 N M 3 Zaklbluit1 f r1 1 1 tlt T 1kM lgigN J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Compute L9 using 0453 BilMt Let Puz 221ch 1B 1 i 1 i i um 7PUia t t i 1 i i Hi3 Pula taklbluit1 f 1 J15 L1 httpWWWstatpsueduleall Hidden Markov Model J15 L gt Update the parameters using Lkt Hkt W7 ZN T i1 t1 L9 2 L1 Lruit 7 mm 7 M k I 2 A Mr 2L 3 H ikr 2123 L90 am http www stat psu eduijh Hidden Markov Model HMM with Discrete Data gt Given a state k the distribution of the data U is discrete specified by a pmf gt Assume U ELI 127 Denote bkj q j 17J gt Parameters in the HMM 3k and q kl1M j 17J J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Model estimation by the following iteration gt Compute the forward and backward probabilities akt Bk Note that bku qu gt Compute Lkt Hkt using akt Bk gt Update the parameters as follows T71 3k E 1Hkgt t k 1 M 21 Lk 2 Laou 1 quii 7k1mM j1mJ 22 J15 L1 httpWWWstatpsueduleall Hidden Markov Model Viterbi Algorithm J15 L gt In many applications using HMM we need to predict the state sequence 5 517 ST based on the observed data u L117 LIT gt Optimization criterion find 5 that maximizes Ps l u P 5 arg msax Ps l u arg msax arg msax Ps7 u This criterion is called the rule of Maximum A Posteriori MAP The optimal sequence 51527 H757 can be found by the Viterbi algorithm The amount of computation in the Viterbi algorithm is at the order of MZT Memory required is at the order of MT http www stat psu eduijh Hidden Markov Model J15 L gt The Viterbi algorithm maximizes an objective function Cs where s 517 H757 st 6 17 M is a state sequence and 65 has a special property gt Bruteforce optimization of Cs involves an exhaustive search of all the MT possible sequences gt Property of 65 65 g151 g2527 51 g3537 52 gT5T7 5L1 gt The key is the objective function can be written as a sum of merit functions depending on one state and its preceding one http www stat psu eduijh Hidden Markov Model gt A Markovian kind of property gt Suppose in the optimal state sequence 5 the tth position sf k To maximize GshszwsT we can maximize the following two functions separately Gtllt517m75t71 g151 g252751 39 39 39 grk75r71 Gtk51175T g151 k gT5T75T71 The first function involves only states before 139 and the second only states after t gt Also note the recursion of Gk51ms1 GI517 7 5727 k GP1Ilt517 M 572 gr 0 J15 L1 httpWWWstatpsueduleall Hidden Markov Model J15 L gt Every state sequence 5 corresponds to a path from t 1 to t T gt We put weight gtk7 I on the link from state I at t 71 to state k at 1 At the starting node we put weight g1k for state k Cs is the sum of the weights on the links in paths In the figure suppose the colored path is the optimal one At t 3 this path passes through state 2 Then the sub path before 1 3 should be the best among all paths from t 1 to t 3 that end at state 2 The sub path after t 3 should be the best among all paths from t 3 to t 6 that start at state 2 http www stat psu eduijh How the Vlterbl Algorithm Works Pseudocode m n m Mm an m gnumm Hidden Markov Model Pseudocode J15 L gt At t 1 for each node state k 1m M record Glik g1k gt At t 2 for each node k 1m M only need to record which node is the best preceding one Suppose node k is linked to node at t 1 record and 62k maXI12MGf g2k Cf g2k7 The same procedure is applied successively for t 23 m T At every node link it to its best preceding one Set G k max12 MGt1 gk G571 gk Grik is the sum of weights of the best path up to t and with the end tied at state k and is the best preceding state Record and thk At the end only M paths are formed each ending with a different state at t T The objective function for a path ending at node k is G k Pick k that maximizes G k Trace the path backwards from the last state k http www stat psu eduijh Hidden Markov Model Proof for the Viterbi Algorithm Notation gt Let 51 7 k be the sequence 517 st1 that maximizes Gtk517 quot751571 51 7 k argSImasx1 Gtk51 st1 Let sz maxslwSH Gtk51 st1 gt Let t7 k be the sequence 5144 H757 that maximizes Gtk5t17 STI tkarg max atkst1sr 5t1gtwgt5T J15 L1 httpWWWstatpsueduleall Hidden Markov Model Key facts for proving the Viterbi algorithm gt If the optimal state sequence 5 has the last state 5 k then the subsequence of 5 from 1 to T7 1 should be 57397 k and max Cs GTk5T7 k39 s gt Since we don39t know what should be 5 we should compare all the possible states k 17 M max Cs mEX G Tks7397 k s gt Gtkst7 k and 51 7 k can be obtained recursively for t 17 T J15 L1 httpWWWstatpsueduleall Hidden Markov Model Proof for the recursion gt Suppose Gt11kst 71k and 51 7 1 k for k 1 M have been obtained For any I 1 M Gtst I 1 njasxi1 uh51 st11 max max Gt51st2k k 1m572 mIaXSImaSX72Gt1k51 st12 gtl k maxgtl k Simsfo Gt11k51 12 maxgtl k Gt11kst71k J15 L1 httpWWWstatpsueduleall Hidden Markov Model gt Suppose k achieves the maximum that is k arg maxkgtl k Gt11kst 7 1 Then 51 I 51 7 1 k k that is for 51 I the last state 571 k and the subsequence from position 1 to t 7 2 is st 7 1 k gt The amount of computation involved in deciding Gtst I and 51 I for all I 1 M is at the order of M2 For each I we have to exhaust M possible k39s to find k gt To start the recursion we have Gui 1007 517 k Note at t1 there is no preceding state J15 L1 httpWWWstatpsueduleall Hidden Markov Model Optimal State Sequence for HMM gt We want to find the optimal state sequence 5 5 arg max Ps7 u arg max log Ps7 u s s gt The objective function Cs log Ps7 u log7r51b51u1asl52b52u2 35T715Tb5TuT Iog7r51 log b51u1 log 351 s log b52uz log 25H log b57uTl If we define og7r51og b51011 log 2mm log b5 U 7 g151 gt5t75t71 then Cs g151 232 gst7 51 Hence the Viterbi algorithm can be applied J15 L1 httpWWWstatpsueduleall Hidden Markov Model Viterbi Training gt Viterbi training to HMM resembles the classification EM estimation to a mixture model gt Replace soft classification reflected by Lkt and His1 by hard classification gt In particular gt Replace the step of computing forward and backward probabilities by selecting the optimal state sequence 5 under the current parameters using the Viterbi algorithm gt Let Lkt 55 k ie Lkt equals 1 when the optimal state sequence is in state k at t and zero otherwise Similarly let Hkt Is1 kls I gt Update parameters using Lkt and Hklt and the same formulas J15 L1 httpWWWstatpsueduleall Hidden Markov Model Applications Speech recognition gt Goal identify words spoken according to speech signals gt Automatic voice recognition systems used by airline companies gt Automatic stock price reporting gt Raw data voice amplitude sampled at discrete time spots a time sequence gt Input data speech feature vectors computed at the sampling time J15 L1 httpWWWstatpsueduleall State Callagc W puma H w m 4 new Mm mg m mm A 3 J Vj u u mg lt fanlum vemo 5 quot Francis I fem a Vector i Hidden Markov Model gt Methodology gt Estimate an Hidden Markov Model HM M for each word eg State College San Francisco Pittsburgh The training provides a dictionary of models W1W2M gt For a new word find the HMM that yields the maximum likelihood Denote the sequence of feature vectors extracted for this voice signal by u U17 m uT Classify to word Iquot if W maximizes Pu l W gt Recall that Pu 211 akT where akT are the forward probabilities at t T computed using parameters specified by We gt In the above example HMM is used for profiling Similar ideas have been applied to genomics sequence analysis eg profiling families of protein sequences by HMMs J15 L1 httpWWWstatpsueduleall maaen mm Mm Supervised learning Mu Hidden Markov Model J15 L gt Training data the original images and their manually labeled segmentation gt Associate each block in the image with a class label A block is an element for the interest of learning gt At each block compute a feature vector that is anticipated to reflect the difference between the two classes man made vs natural gt For the purpose of classification each image is an array of feature vectors whose true classes are known in training http www stat psu eduijh en Markov Model gt If we ignore the spatial dependence among the blocks an image becomes a collection of independent samples L117 Hg LIT For training data we know the true casses 217 H727 Any classification algorithm can be applied gt Mixture discriminant analysis model each class by a mixture model gt What if we want to take spatia dependence into consideration gt Use a hidden Markov model A 2D HMM would be even better gt Assume each class contains several states The underlying states follow a Markov chain We need to scan the image in a certain way say row by row or zigzag gt This HMM is an extension of mixture discriminant analysis with spatial dependence taken into consideration J15 L1 httpWWWstatpsueduleall en Markov Model gt Details gt Suppose we have M states each belonging to a certain class Use Ck to denote the class state k belongs to If a block is in a certain class it can only exist in one of the states that belong to its class gt Train the HMM using the feature vectors U17 uz m uT and their classes 217 22 mzT There are some minor modifications from the training algorithm described before since no class labels are involved there gt For a test image find the optimal sequence of states 5152MST with maximum a posteriori probability MAP using the Viterbi algorithm gt Map the state sequence into classes 2 C55 J15 L1 httpWWWstatpsueduleall Hidden Markov Model Unsupervised learning J15 L gt Since a mixture model can be used for clustering HMM can be used for the same purpose The difference lies in the fact HMM takes spatial dependence into consideration gt For a given number of states fit an HMM to a sequential data gt Find the optimal state sequence 5 by the Viterbi algorithm gt Each state represents a cluster gt Examples image segmentation etc http www stat psu eduijh

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.