Computing For Life Sciences
Computing For Life Sciences CS 59000
Popular in Course
Popular in ComputerScienence
This 20 page Class Notes was uploaded by Nick Rowe on Saturday September 19, 2015. The Class Notes belongs to CS 59000 at Purdue University taught by Yuan Qi in Fall. Since its upload, it has received 19 views. For similar materials see /class/208074/cs-59000-purdue-university in ComputerScienence at Purdue University.
Reviews for Computing For Life Sciences
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/19/15
CS 59000 Statistical Machine learning Lecture 24 Want QMam Q PUMME CS NQM 20 2008 Outline 0 Review of K medoids Mixture of Gaussians Expectation Maximization EM Alternative view of EM 0 Hidden Markvo Models forward backward algorithm EM for learning HMM parameters Viterbi Algorithm Linear state space models Kalman filtering and smoothing Kmedoids Algorithm Euclidean distance has limitations Inappropriate for categorical labels Cluster means are nonrobust to outliers Use more general dissimilarity measure vxx and distortion measure jZZZIivkxnnuk 111 k1 Which gives the k medoids algorithm Mixture of Gaussians Mixture of Gaussians K MK Z WkAle kv 2k k11 Introduce latent variables X 19Zlc 1 Tic Marginal distribution K MK ZPZpXIZ EMAXle 2k 1131 Conditional Probability Ma 1pXle 1 7 2pm 1X K 1pxzj 1 33921 WkNXHka 53k K Evame 23 j1 Responsibility that component k takes for explaining the observation Maximum Likelihood Maximize the log likelihood function N K 1npX7r u E Zln TrkNanuk 2w 121 71 1 Graphical representation of a Gaussian mixture model Zn for a set of N iid data points xn with corresponding latent points Zn where n 17 N 7quot 39 X71 u N L Severe Overfitting by Maximum Likelihood M a L v v v v v r B When a cluster has only data point its variance goes to O Maximum Likelihood Conditions 1 Setting the derivatives of lnpXimu 2 to zero IV WkNX39TZr Hk7 2k 0 Z ZMX39 His 1 2339 WjNXniHja 23 n Altznk 1 N 11k 7Z39r2kzgtxn N Maximum Likelihood Conditions 2 Setting the derivative of 1npleui 2 to zero N 1 Z Vltzr1 Xn ukgtltxn HktT n 1 Maximum Likelihood Conditions 3 Lagrange function K 1npX7r u 2 A 7m 1 k1 Setting its derivative to zero and use the normalization constraint we obtain Wig N Expectation Maximization for Mixture Gaussians Although the previous conditions do not provide closed form conditions we can use them to construct iterative updates E step Compute responsibilitiesvlt2nk M step Compute new mean Mk variance 2k and mixing coefficients at Loop over E and M steps until the log likelihood 1npXluE7r stops to increase General EM Algorithm Given a joint distribution pX Z6 over observed variables X and latent vari ables Z governed by parameters 6 the goal is to maximize the likelihood func tion With respect to 6 Choose an initial setting for the parameters 00 E step Evaluate pZiX 001d M step Evaluate 011W given by Onew arg max Q6 001d 0 where Qwi 60 ENZIX001d1111XZ9 Z Check for convergence of either the log likelihood or the parameter values If the convergence criterion is not satis ed then let 001d anew and return to step 2 EM and Jensen Inequality Goal maximize pXl9 ZpX7Z0 Z 617Zgt KLltqllp Zqltzgt1n m WE have 111pXl9 q 9 KLWHP DE hEZ q0 ZqZlnpX Zlg From Jesen s Inequality we see MM is a lower bound of 1npX9 Lower Bound mm is a functional of the distribution 12 Since KLqllp 2 0 and 1npX9gt q79gt KLCJllp MM is a lower bound ofthe log likelihood function lnpXl6 Another way to see the lower bound without using Jensen s inequality Lower Bound Perspective of EM Expectation Step Maximizing the functional lower bound q6gt over the distribution 12 Maximization Step Maximizing the lower bound over the parameters 6 Illustration of EM Updates h1pX6 6 Old 6 new Sequential Data 0000 8000 6000 Frequency Hz 4000 2000 Amplitude 0 02 04 06 02 1 Time sec lbl ey I z thl ihl er lem l Bayes39 l Theorem I There are temporal dependence between data points Markov Models By chain rule a ioint distribution can be re written as 7V pX17 39 39 39 7XN Hpxnlxla 39 39 39aXn 1gt n21 Assume conditional independence we have pXn X17 7Xn 1 pltXnan 1 1V pltx1 pm pltx1gt H pltxrnan 1gt n22 It is known as firstorder Markov chain High Order Markov Chains Second order Markov assumption pX17 7XN pxlpx2lX1H panXn 17Xn 2 7sz5 Can be generalized to higher order Markov Chains But the number of the parameters explores exponentially with the order