### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 620 Class Note for STAT 597C at PSU

### View Full Document

## 32

## 0

## Popular in Course

## Popular in Department

This 33 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Pennsylvania State University taught by a professor in Fall. Since its upload, it has received 32 views.

## Similar to Course at Penn State

## Popular in Subject

## Reviews for 620 Class Note for STAT 597C at PSU

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Mixture Models J15 L1 Mixture Models Jia Li Department of Statistics The Pennsylvania State University Emaii panama psu edu Mtp www stat psu eduNnaii Mixture Models Clustering by Mixture Models General background on clustering Example method k means Mixture model based clustering YVVY Model estimation J15 L1 httpwwwstatpsueduNj1511 Mlxnne Modek Clustering E 5 39 V A basic tool in data i miningpattern A 39 a 34 I recognition 3 quot if 1 i gt Divide a set of data ht 2 39 a into groups 39 gt Samples in one 7 cluster are Close and r clusters are far apart A 5 71 o 1 2 a A 5 a 11le httpwwstatpsueduN21 Mixture Models gt Motivations gt Discover classes of data in an unsupervised way unsupervised learning gt Efficient representation of data fast retrieval data complexity reduction gt Various engineering purposes tightly linked with pattern recognition J15 L1 httpwwwstatpsueduNj1511 Mixture Models Approaches to Clustering gt Represent samples by feature vectors gt Define a distance measure to assess the closeness between data gt Closeness can be measured in many ways gt Define distance based on various norms gt For gene expression levels in a set of microarray data closeness between genes may be measured by the Euclidean distance between the gene profile vectors or by correlation J15 L1 hccpMmscatpsueduijh Mixture Models gt Approaches gt Define an objective function to assess the quality of clustering and optimize the objective function purely computational gt Clustering can be performed based merely on pairwise distances How each sample is represented does not come into the picture gt Statistical model based clustering J15 L1 httpwwwstatpsueduNj1511 Mixture Models K means gt Assume there are M clusters with centroids Z 217227 7ZM gt Each training sample is assigned to one of the clusters Denote the assignment function by Then 77139 j means the ith training sample is assigned to thejth cluster gt Goal minimize the total mean squared error between the training samples and their representative cluster centroids that is the trace of the pooled within cluster covariance matrix N i 2 arg n21 H X 771 ll J15 L1 httpwwwstatpsueduNj1511 Mixture Models gt Denote the objective function by N 13777 2 H Xi 270 H2 il gt Intuition training samples are tightly clustered around the centroids Hence the centroids serve as a compact representation for the training data J15 L1 httpwwwstatpsueduNj1511 Mixture Models Necessary Conditions gt If Z is fixed the optimal assignment function should follow the nearest neighbor rule that is 770 arg mlnje12M Xi Z gt If is fixed the cluster centroid 2 should be the average of all the samples assigned to thejth cluster 2 710in 39 2 J M NJ is the number of samples assigned to cluster j J15 L1 httpwwwstatpsueduNj1511 Mixture Models The Algorithm gt Based on the necessary conditions the k means algorithm alternates the two steps gt For a fixed set of centroids optimize by assigning each sample to its closest centroid using Euclidean distance gt Update the centroids by computing the average of all the samples assigned to it gt The algorithm converges since after each iteration the objective function decreases non increasing gt Usually converges fast gt Stopping criterion the ratio between the decrease and the objective function is below a threshold J15 L1 httpwwwstatpsueduNj1511 Mixture Models Mixture Model based Clustering gt Each cluster is mathematically represented by a parametric distribution Examples Gaussian continuous Poisson discrete gt The entire data set is modeled by a mixture of these distributions gt An individual distribution used to model a specific cluster is often referred to as a component distribution J15 L1 hccpMmscatpsueduijh Mixture Models gt Suppose there are K components clusters Each component is a Gaussian distribution parameterized by pk Xk Denote the data by X X 6 Rd The density of component k is fkX X i Mkik 1 ex 4x7 mtz x m 2mm p 2 39 gt The prior probability weight of component k is ak The mixture density is K K fx Z amx Z ak gtx i W 2k k1 k1 J15 L1 httpwwwstatpsueduNj1511 Mixture Models Advantages gt A mixture model with high likelihood tends to have the following traits gt Component distributions have high peaks data in one cluster are tight gt The mixture model covers the data well dominant patterns in the data are captured by component distributions J15 L1 httpwwwstatpsueduNj1511 Mixnue Models Adva ntages b Well studied statistical 04 inference techniques 035 available m 03 gt FleXibility In choosing 3 the component E025 distributions 2 02 b Obtain a density gm estimation for each 5 m cluster H H I I I 005 b A soft classification U is available 5 Mixture Models EM Algorithm gt The parameters are estimated by the maximum likelihood ML criterion using the EM algorithm gt A P Dempster N M Laird and D B Rubin Maximum likelihood from incomplete data via the EM algorithmH Journal Royal Statistics Society vol 39 no 1 pp 1 21 1977 gt The EM algorithm provides an iterative computation of maximum likelihood estimation when the observed data are incomplete J15 L1 hccpMmscatpsueduijh Mixture Models gt lncompleteness can be conceptual gt We need to estimate the distribution ofX in sample space X but we can only observe X indirectly through Y in sample space 3 gt In many cases there is a mapping x A yx from X to 3 and X is only known to lie in a subset of X denoted by Xy which is determined by the equation y yx gt The distribution ofX is parameterized by a family of distributions fx l 6 with parameters 6 E Q on X The distribution of y gy l 6 is amokM axlodx J15 L1 httpwwwstatpsueduNj1511 Mixture Models gt The EM algorithm aims at finding a 9 that maximizes gy l 0 given an observed y gt Introduce the function QW l t9 EUOE fX l 0 ly707 that is the expected value of log fx l 0 according to the conditional distribution ofx given y and parameter 0 The expectation is assumed to exist for all pairs 070 In particular it is assumed that fX l 0 gt O for 9 E Q gt EM Iteration gt Estep Compute Q6 l Olpl gt Mstep Choose 69 to be a value of 6 E 0 that maximizes Q9 903 J15 L1 httpwwwstatpsueduNj1511 Mixture Models EM J15 L for the Mixture of Normals gt Observed data incomplete X17X277Xn where n is the sample size Denote all the samples collectively by x gt Complete data X17y17 X27y27 Xn7yn where y is the cluster component identity of sample X gt The collection of parameters 0 includes ak pk Xk k 127K gt The likelihood function is n K Lxl0 2 log akltjgtXluk7 Xkgt il k1 gt Lxl0 is the objective function of the EM algorithm maximize Numerical difficulty comes from the sum inside the log httpWwwstatpsueduleall Mixture Models gt The Q function is QWW E IogHay x ALWY x70 il E lodeM og gtx y72 70 il 2E Down Iowm pm me i1 The last equality comes from the fact the samples are independent J15 L1 httpwwwstatpsueduNj1511 Mixture Models gt Note that when X is given only y is random in the complete data Xy Also y only takes a finite number of values ie cluster identities 1 to K The distribution of Y given X X is the posterior probability of Y given X gt Denote the posterior probabilities of Y k k 17 K given X by phk By the Bayes formula the posterior probabilities are x Pik 0lt ak gtXi l Wm 07 Z Pik 1 k1 J15 L1 httpwwwstatpsueduNj1511 Mixture Models gt Then each summand in Q0 i0 is E 390ga y log gtxz WW X9 ixn 0 K K 2 PM log 3L 2 PM 3902 Xi i Him 2 k1 k1 gt Note that we cannot see the direct effect of 9 in the above equation but pk are computed using 0 ie the current parameters 0 includes the updated parameters J15 L1 httpwwwstatpsueduNj1511 Mixture Models gt We then have Q0 i0 Pik log 32 I 2 WM pm log gtXi i 27 2 i1 H gt Note that the prior probabilities a and the parameters of the Gaussian components In L can be optimized separately J15 L1 httpwwwstatpsueduNj1511 Mixture Models gt The 3239s subject to 2L1 a 1 Basic optimization theories show that a are optimized by 2L1 Pik in gt The optimization of pk and ER is simply a maximum likelihood estimation of the parameters using samples X with weights phk Basic optimization techniques also lead to a U 211 PikXi k 2L1 Pik X2 271 PikXi MLXXI Milt 211 Pik gt After every iteration the likelihood function L is guaranteed to increase may not strictly gt We have derived the EM algorithm for a mixture of Gaussianrs J15 L1 httpwwwstatpsueduNj1511 Mixture Models EM Algorithm for the Mixture of Gaussians Parameters estimated at the pth iteration are marked by a superscript 1 Initialize parameters 2 Estep Compute the posterior probabilities for all i 17 n k K pik aiquxi 9721125 25121 Xi l MR 2139 3 Mstep apl EL Pik Mpl EL PikXi k 397 7 k 21 PM n 1 1 p1 21 PikXi kpl Xi Mlkpl l k 21 Pik 4 Repeat step 2 and 3 until converge J15 L1 hccpMmstatpsueduijh Mixture Models gt Comment for mixtures of other distributions the EM algorithm is very similar The E step involves computing the posterior probabilities Only the particular distribution 1 needs to be changed The M step always involves parameter optimization Formulas differ according to distributions J15 L1 hccpMmscatpsueduijh Mixture Models Computation Issues gt If a different ER is allowed for each component the likelihood function is not bounded Global optimum is meaningless Don39t overdo it gt How to initialize Example gt Apply k means first gt Initialize uk and Xk using all the samples classified to cluster k gt Initialize ak by the proportion of data assigned to cluster k by kmeans gt In practice we may want to reduce model complexity by putting constraints on the parameters For instance assume equal priors identical covariance matrices for all the components J15 L1 hccpMmscatpsueduijh Mixture Models Examples J15 L gt The heart disease data set is taken from the UCI machine learning database repository gt There are 297 cases samples in the data set of which 137 have heart diseases Each sample contains 13 quantitative variables including cholesterol max heart rate etc gt We remove the mean of each variable and normalize it to yield unit variance gt data are projected onto the plane spanned by the two most dominant principal component directions gt A two component Gaussian mixture is fit httpWwwstatpsueduleall Mlxnne Models 72 7i U i 2 3 4 5 4 73 72 71 U i 2 3 A 5 The heart disease data set and the estimated cluster densities Left The scatter plot of the data Right The contour plot of the pdf estimated using a singlelayer mixture of two normals The thick lines are the boundaries between the two clusters based on the estimated pdfs of individual clusters 112 L httpwwstat psueduN31al Mixture Models Classification Likelihood gt The likelihood n K l Xl 9 2 log ak gtlxill k7 EU i1 k1 maximized by the EM algorithm is sometimes called mixture likelihood gt Maximization can also be applied to the classification likelihood Denote the collection of cluster identities of all the samples by y y17 7yn ZW731 Zlog BMWLy 2 i1 J15 L1 httpwwwstatpsueduNj1511 Mixture Models gt The cluster identities Yi i 17 n are treated as parameters together with 9 and are part of the estimation gt To maximize EM algorithm can be modified to yield an ascending algorithm This modified version is called Classification EM CEM J15 L1 httpwwwstatpsueduNj1511 Mixture Models Classification EM A classification step is inserted between the E step and the M step gt Initialize parameters gt E step Compute the posterior probabilities for all i1n k 17 K pi k amx l pm 2f1a quot xz l m 2 gt Classification 1 ylpl arg mkax pm Or equivalently let pw 1 if k arg maxlt phk and 0 otherwise J15 L1 httpwwwstatpsueduNj1511 Mixture Models gt M step 4km 2 m 2 IMP k n n m1 221 m 271 IMP m k 7 n A 7 n 2quot1 p gtk Zi1IYip1 k W 21 ikXf 7 M P1x WW k 2L1 nk 2 IMP kxx 7 M P1x WW 21Iy quot1 k gt Repeat the above three steps until converge J15 L1 httpwwwstatpsueduNj1511 Mixture Models Comment gt CEM tends to underestimate the variances It usually converges much faster than EM For the purpose of clustering it is generally believed that it performs similarly as EM gt lfwe assume equal priors ak and also the covariance matrices Xk are identical and are a scalar matrix CEM is exactly k means Exercise J15 L1 hccpMmscatpsueduijh

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.