Matrix Computation CSE 275
Popular in Course
CHEM 1040 - 001
verified elite notetaker
Popular in Computer Science and Engineering
This 14 page Class Notes was uploaded by Abel Lueilwitz on Thursday October 29, 2015. The Class Notes belongs to CSE 275 at University of California - Merced taught by Ming-Hsuan Yang in Fall. Since its upload, it has received 22 views. For similar materials see /class/231720/cse-275-university-of-california-merced in Computer Science and Engineering at University of California - Merced.
Reviews for Matrix Computation
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/29/15
CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 9 115 Overview 0 Multivariate Gaussian o Mahalanobis distance 0 Probabilistic PCA 0 Factor analysis 2i5 Multivariate Gaussian distribution 0 Assume X x17 7xn can be modeled with Gaussian distribution puma we page uTC 1X Ill where p is the mean and C is the covariance matrix 0 Assume independent observations find p and C that maximize log likelihood pXluyC Hill pxlu7 C E log 1171 PXil73 log l27rCl Xi TC 1XI It 0 Maximum likelihood estimate 0 a e Zixi sample mean 0 gt C Zixi 7 x 7 T sample covariance 315 Properties of Gaussian distribution puma i2vrr weak We ve M o The ellipsoid that best represents the distribution of data points can be estimated by covariance matrix C 6 Marginal densities obtained by integrating out some of the variables are themselves Gaussian a Conditional densities by setting some variables to fixed values are also Gaussian a Can find a linear transformation which diagonalizes C 1 so that the density function can be factorized m PXl7C H Pl l i i i1 o For given values of p and C the Gaussian density function maximizes the entropy a Used for linear classifiers eg Fisher linear discriminant Geometric interpretation 0 The equidensity contours of a non singular Gaussian are ellipsoids ie linear transformation of hyperspheres o The directions of the principal axes of the ellipsoids are the eigenvectors of covariance matrix C and the lengths are the corresponding singular values e Let C UZUT UZl2UZl2T ie eigendecomposition where the columns of U are orthonormal basis and Z is a diagonal matrix of c x N Now ltgt x N u UZl2NO I ltgt x N u UNO Z 0 The distribution of NMC is equivalent to NO I scaled by ZlZ rotated by U and translated by M 515 Mahalanobis distance 0 O O O 0 0 The quantity did x 7 MTC 1X 7 u is called the Mahalanobis distance from x to p Also known as generalized squared inter point distance The distance of a point x to the center of mass divided by the width of the ellipsoid in the direction of x Linear transformation of the coordinate system Keep its quadratic form and remain non negative If C I Mahalanobis distance reduces to Euclidean distance If C is diagonal the resulting distance is normalized Euclidean distance x7 y Xiiyi2 HZ Can be approximated with eigenvectors of C a where a is the standard deviation of x Used for learning distance metric 615 Generative PCA model A subspace is spanned by the orthonormal basis eigenvectors computed from covariance matrix Can interpret each observation with a generative model Estimate approximately the probability of generating each observation with Gaussian distribution pxp Z Several ways to approximate pxp Z eg distance to subspace distance within subspace and combination Each observation has a projected latent variable Used in modeling tracking recognition etc 715 Factor analysis 0 A generative dimensionality reduction algorithm a Let x E Rm and z 6 Rd x is modeled by z dubbed as factors d lt m VVVV VV V gt xze is factor loading matrix 2 is assumed be NOI distributed zero mean unit variance normals The factors 2 model correlation between the elements of x e is a random variable to account for noise and assumed to be distributed with NOl where ll is a diagonal matrix 5 accounts for independent noise in each element of x The diagonality of ll is a key assumption constraining the error covariance ll for estimation The observed variable X are conditionally independent given the factors 2 x is NO7 T ll distributed Properties of factor analysis 0 Factor analysis x z e a Latent variables 2 explain correlations between x o 5 represents variability unique to a particular x o Differ from PCA which treats covariance and variance identically a Want to infer and ll from x 0 Suppose and ll are known by linear projection Ezlx Bx where B ATIJ AAT 1 since the joint Gaussian of data x and pltiigtilti2iineiigt 915 Properties of factor analysis cont d 0 Note that since lJ is diagonal w AAT 1 IJ l 7 IJ 1AI ATKIJ lA 1TJ 1 o The second moment of factors EzzTix Varzix EzixEzixT I 7 BA xprT o Expectation of first and second moments provide measure of uncertainty in the factors which PCA does not have a lJ and can be computed by the EM algorithm 1015 EM algorithm for factor analysis 0 Expectation Maximization useful technique in dealing with missing data c Start with some initial guess of missing data and evaluate the expected values Optimize the missing parameters by taking derivate of likelihood of observed and missing data wrt parameters Repeat until the data likelihood does not change 0 E step Given and lJ for each data point x compute Ezlx Bx Ezlex Varzlx l EzlxEzlxT I 7 BA BXXTBT 0 a M step Anew Elquot1 XiElZlXilTXZILl ElZZTlXilT1 lJr39eW diag1 xixT 7 AneWEzlxixlT where diag operator sets all off diagonal elements to zero 11 i5 FA and PCA 0 0 Factor analysis provides a proper probabilistic model PCA is rotationally invariant FA is not Given a set of data points would A correspond to orthonormal basis of a PCA subspace No in most cases However A corresponds to orthonormal basis if FA has isotropic error model ie 11 02 12 i5 Probabilistic probabilistic component analysis 0 Let x E Rm and z 6 Rd from factor analysis we have x 2 l 5 with isotropic noise model N0702I o The conditional probability of x given 2 is given by xlz N Nzazl 9 Since x N NO7 I marginal distribution for x is x N NO E where E T l 02 0 Log likelihood of data c 7gmln27r In la trE 1 where 1 s ixxT n 0 Estimating and 02 can be obtained by maximizing E using the EM algorithm similar to that in factor analysis 1315 Probabilistic probabilistic component analysis cont d o Maximize log likelihood with the EM algorithm A UX 7 0212R gt Umxd is the first d eigenvectors computed from covariance matrix S gt ded is a diagonal matrix corresponding to the first d eigenvalues A gt Rdxd is an arbitrary orthogonal rotation matrix note 2 has a uniform Gaussian distribution gt The noise variance 02 is the residual variance per dimension 14 l5