Matrix Computation CSE 275
Popular in Course
Popular in Computer Science and Engineering
This 246 page Class Notes was uploaded by Abel Lueilwitz on Thursday October 29, 2015. The Class Notes belongs to CSE 275 at University of California - Merced taught by Ming-Hsuan Yang in Fall. Since its upload, it has received 20 views. For similar materials see /class/231720/cse-275-university-of-california-merced in Computer Science and Engineering at University of California - Merced.
Reviews for Matrix Computation
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/29/15
CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 8 115 Overview 0 Least squares minimization 0 Regression o Regularization 2i5 Overdetermined linear equations 0 Consider y Ax where A E lRmx is skinny ie m gt n 0 One can approximately solve y Ax and define residual or error r Ax y 0 Find x xls that minimizes o xls is the least squares solution 0 Geometric interpretation Axs is the point in ranA that is closest to y ie Axs is the projection of y onto ranA r i AxIS ranA 315 Matrix derivatives 0 First order derivative as 7 331 7 a a Second order derivative T axaxBx B BTX x 7 AsTWx 7 As 72AT Wx 7 As x 7 sTWx 7 s 72Wx 7 s x7 AsTWx 7 As 2Wx 7 As where W is symmetric 4i5 Least squares minimization ranA o Minimize norm of residual squared r Ax y HrH2 XTATAx 2yTAx l yTy 0 Set gradient with respect to x to zero VxHrH2 2ATAx 2ATy 0 ATAx ATy 0 Assume ATA is invertible we have xs ATA1ATy Axs AATA1ATy 515 Least squares minimization ranA y Ax xs ATA 1ATy xs is linear function of y xS A ly if A is square xS solves y AxS if y E ranA Al ATAUT is called pseudo inverse or Moore Penrose inverse Al is a left inverse of full rank skinny A AlA ATA 1ATA I 615 Orthogonality principle 0 Optimal residual r 7 Ax 7 y 7 AATA 1AT 7 Iy which is orthogonal to ranA r A2 7 yTAATA 1AT 7 IyAz 7 0 for all z 6 IR 0 Since r AxS 7 yLAx 7 x15 for any x E ranA we have lleiyllZ WW 3 AXX1sll2 We 7 yll2 llAX gtltIsll2 which means for x 7 x15 llei yll gt lleS 7 yll 0 Can be further simplified via QR decomposition 7i5 Least squares minimization and orthogonal projection o Recall if u 6 EU then P is an orthogonal projection 0 Given a point x XH l xi its projection is Pux uuTxH l uuTxL XH o Generalize to orthogonal projections on a subspace spanned by a set of orthonormal basis A U17 u PA AAT o In general we need a normalizing term for orthogonal projection if U177U is not orthonormal basis PA AATA 1AT 0 Given X UXVT it follows that PA UUT by least squares minimization 815 Least squares estimation a Numerous applications in inversion estimation and reconstruction problems have the form y Ax i v X is what We want to estimate or reconstruct y is Our sensOr measurements V is Unknown nOise or measurement errOr V V V gt i th row of A characterizes i th sensor 0 Least squares estimation choose 2 that minimizes HA 7 2 ATArlATy gis Best linear unbiased estimator BLUE 0 Linear estimator with noise y Ax l v with A is a full rank and skinny A linear estimator of form 2 By is unbiased if 2 x whenever v 0 Equivalent to BA I ie B is the left inverse of A o Estimator error of unbiased linear estimator is 0 xixx7BAxv7Bv It follows that Al ATA 1AT is the smallest left inverse of A such that for any B with BA I we have 2 35 2 A32 lg lg ie least squares provides the best linear unbiased estimator BLUE 0 1015 Pseudo inverse via regularization o For M gt 0 let xM be unique minimizer of 2 A N N 2 leeylzwllel Mixelgl HAxin thus N N N XM ATAlilATy ATA uI 1ATy is called regularized least squares solution for Ax z y 0 Also called Tikhonov Tychonov regularization and ridge regression in statistics 0 As ATA l ul gt O and so is invertible then we have lim xM Aly Mao and Iim ATA erlAT Al Ho 7 1115 Minimizing weighted sum objective 0 Consider minimize a weighted sum objective 2 N 2 ll AH ll llAXYll2MllFx gll2 l wilF l xi l Jig l 0 Thus the least squares solution is x ZTZHZTN ATA uFTF 1ATy uFTg o Widely used function approximation regression optimization image processing computer vision control machine learning graph theory etc 1215 Least squares data fitting 0 Given x X17 7XmT E Rm and scalar output y the least squares fit or linear regression is m y flx 30 2X15quot j1 where B are unknown parameters or coefficients 0 For a set of training data x17 xn want to minimize n n m 20439 fX02 04 i 50 7 inj jy i1 i1 j1 o In matrix form denote X by n x m 1 matrix with each row an input vector with 1 in the first position and 6 6 Ilel 1 X11 X12 le BO yX X X21 X22 sz 6 31 1 xnl xng xnm 3m 1315 Least squares data fitting cont d o In matrix form minimize EB yeX meX 0 Take derivative with respect to 6 7 T 32E 2XTY X5 3133 72X X 14 L5 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 11 122 Overview 0 Gra m Schmidt process 0 Gram Schmidt triangular orthogonalization Geometric properties of orthonormal basis 9 Suppose the columns of U U1Un are orthonormal o If w Ux then gt mapping w Ux is isometric it preserves distance as llwll2 gt inner products are preserved if w1 le and wz sz t en ltW17W2gt ltX17X2 gt angles are preserved 4w17 wz 4x1xz o Multiplication by U preserves inner products angles and distances 9 For any x x UUTX ie n x ZuTxu i1 ulTx is called the component ofx in the direction of u a UTx resolves x into the vector of its u components x Ua reconstructs x from its u components x Ua ELI aiu is called the expansion of x 322 Gra m Sch midt process 0 Given independent vectors x17 xn 6 EU Gram Schmidt process finds orthonormal vectors q1 qn such that spanX17 7Xr Spanq17 qqr 0 Thus q17q are orthonormal basis for spanx17 x o Idea first orthogonalized each vector wrt previous ones and then normalize result to have unit norm V V1 x1 7 V1 39 11 llnll vz xz 7 q1 xzq1 remove ql component from xz gt qz HEM normalize V3 X3 7 qTX3q1 7 qJX3q2 remove q1 qz components V3 Cls W3 gt etc normalize V VV 0 Find orthonormal basis incrementally 422 Geometric interpretation of Gram Schmidt process T x V2x2 39 CI1 x2 q1 q2 7 11 V1x1 o Gra m Schmidt process gt q1 quot 1 normalize V1 gt v2 x2 q1Tx2q1 remove ql component from x2 V gt q2 Hv ill normalize T T gt V3 x3 qjL x3q1 q2 x3q2 remove q1 q2 components C13 V3 V3 gt etc 0 Vi we have T T T Xi 11 Xiql 12 Xiq2 qi1xiCIi 1 HVini mm r2iCI2 riiCli 522 QR decomposition 0 In matrix form A QR where A E Rm Q E Rm R E IRr39X r11 r12 rln 0 r22 rzn X1 X2 anl h 12 in 0 0 W A Q R o QTQ I and R is upper triangular and invertible 0 Usually computed using a variation of Gram Schmidt process which is less sensitive to numerical errors a Can also be computed by Householder transformation or Givens rotations 0 Columns of Q are orthonormal basis in ranA QTAR 62 General Gram Schmidt process 0 If x17 xn are dependent we find v O for somej which means X is linearly dependent on x17 7xJ1 o Modified algorithm when we have V 0 skip to the next vector xj1 and continue k 0 for i1n k V X Zj1 qjqj lxii if v7 0 kk1qk 722 Example 0 Consider the matrix 12 A 6 74 CIQ Hq2H QH1H 0 Then we have 3 4 1 w 4 768 741 769 7585 158 65 30 4733 158175 6175 635 473335 21 714 175 770 0 35 769175 58175 Properties of QR decomposition O O O 0 Find orthonormal basis for ranA directly Let A BC with B E IRmxp C E IRPX p rankA To check whether y E spanx17 procedure to x17 7xn7y If A Q1R1 with rankA p the full QR factorization is 7xn apply Gram Schmidt m where Q1 Q2 is orthogonal ie columns of Q2 6 Rmxm p are orthonormal orthogonal to Q1 Io find Q2 one can use any matrix A St A is full rank eg A I and then apply general Gram Schmidt process Q1 are orthonormal vectors obtained form columns of A Q2 are orthonormal vectors obtained from extra columns A ranQ1 and ranQ2 are complementary subspaces Least squares via QR decomposition o A GIRmXquot A QR with QTQ I R EIRr39X o The pseudo inverse is ATA71AT RTQTQR71RTQT R71 QT o The projected point x15 R lQTy o The projection matrix AMTWMT AlileT QQT 10 22 Least squares via full QR factorization a Full QR factorization R AQ1 inl 01 where Q1 Q2 6 IRme is orthogonal R1 6 IRr39X is upper triangular and invertible a Multiplication by orthogonal matrix R 01 lx y R HlQlQzlllQ1Q2llollxilQ1Q2lTy 7H Rlxiny 2 i T lleeyll2H 01 Q2 2 iQZy 2 2 ll RlxiQFy ll ll QJy ll 0 Can be minimized by choosing x15 Rlelly 11 22 Least squares via full QR factorization cont d 0 Least squares minimization lleeyllZ l RlxiQIylhl m l2 0 Optimal solution x15 Rfl Qlly 0 Residual of the optimal x is AXIS 7 y inQQTy 0 QlQlT gives projection onto ranA 0 QQQ gives projection onto ranAL 12 22 Gram Schmidt as triangular orthogonalization T x V2Xz ell X2 11 12 on gt Va 0 Steps V1 X1 gt q1 m normalize gt v2 x2 q1Tx2q1 I q1q1Tx2 remove ql component from x2 gt q2 m normalize gt v3 X3 q1TX3q1 quX3q2 I CIqu q2q2TX3 I qlqlTX q2q2TX3 remove q1 12 components V3 F C393 llvsll gt etc r11 r12 r1 r22 x1 x2 Xnq1 q2 on rnn 1322 Gram Schmidt as triangular orthogonalization cont d r11 r12 rln r22 X1 X2 Xn Q1 12 Qn rnn X1 f11 11 X2 mm 02 gifh 02 X3 f13 11 03 f33 13 h 02 X1 f33 13 Xn f1n 11 f2n 12 rann Xi qlTXiQ1 QQTXIMZ Q1Xiqlgt1 HquHqi f1i 11 f2i 12 riiqi rij 1ij 7g 14 22 Gram Schmidt as triangular orthogonalization cont d o Gram Schmidt process X1 X2 Xnq1 qz qn 0 At the j th step want to find a unit vector qj 6 x17 7xj that is orthogonal to q1 qJ71 T T T V1 7 X1 11 Xjql 12 X1 12 1149th 7 X1 11 7 H T q XQTVIqu ITq1q1 x pQXQ 392 HI7q1qlTX2H HP2X2H 7 XrZLIrnq 7 I7QHT1Q171XN 7 ann 7 7 f 7 7 qquot rm HIeonilon71xnu HPmH 7 T rij 1 X1 17 J rl ij 7 271 r01ng often choose 0 gt 0 15 22 Gram Schmidt as triangular orthogonalization cont d 9 Let A E IRmX m 2 n be a matrix of full rank with columns x consider the sequence of formulas 7 P1X1 7 P2X2 7 F an HP1X1H quot llelel qquot lananl where 6 IRme of rank m 7 71 projects x onto the space orthogonal to q17qj1P1 I when j 1 0 Projection can be represented explicitly a Let Qj1 denote the m x j 71 matrix containing the firstj 71 columns of Q Q14 Q1 12 quot39qj3971 then I 7 I T P 7 I jSleil 16 22 Modified Gram Schmidt process 0 Recall for rank one orthogonal projection with q E Rm 7 W Pq f q q o The complements are rank m 7 1 orthogonal projections W P I 7 7 iq qTq o By definition of P PLqil quot PLqQPqu where P1 I and thus VJ Pier71quot Piquiqlxj 17 22 Modified Gram Schmidt process cont d 9 Evaluating the following formulas in order V1 X39 1 J7 VJQ P1q1VJ1 V 1Q1Q1TVJ17 via 7 P115 v52quan VJ V51 Piq ngel vyinwjilquilngjel 0 Projection matrix Piq can be applied to vjli for each j gtI39 immediately after q is known v lt1 1 V1 x1 2 1 1 v2 Vg Vg Q1q1TVg x2 1111sz I Q1q1TX2 3 2 2 2 1 1 V3 Vg Vg 12 12TVg 1 Vg v5 Q1Q1TVgl Q1Q1X31 V3 I Q2Q2TW Q1q1TX3 18 22 Modified Gram Schmidt algorithm 0 Steps ng1 xj VJ V51 PLqilngil VJ39J3917 qjilquilvf39J39il 0 Algorithm for i 1 to n do i Xi end for for i 1 to n do ii HVIH Qi 7 forji1tondo ij 1 VJ39 VJ39 VJ39 7 fijq end for 1e22 end for Gram Schmidt as triangular orthogonalization o The projection matrix If Gib71621 can be represented explicitly X1 f11 11 X2 ruin r22q2 2X1 r22q2 X3 f13 11 r23q2 r33q3 n 02 2X1 r33q3 Xi mm f2i 12 39 39 39 riiqi o The first iteration multiply the first column x1 by 1r11 and then subtract rlj times the result from each of the remaining columns X 0 Equivalent to right multiplication by a matrix R1 712 713 r11 r11 r11 VI V2 vn 1 ql vgz v22 20 22 Gram Schmidt triangular orthogonalization cont d a At step i the modified Gram Schmidt algorithm subtracts rUrii times column 139 of the current A from columnsj gt i and replaces column 139 by 1r times itself 0 This corresponds to multiplication by an upper triangular matrix R 1 1 L 423 1 22 22 R2 1 7 R3 0 At the end we have i ARIRZ RnQ iv 21 22 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 17 123 Overview a QR algorithm without shifts o Simultaneous iteration o QR algorithm with shifts o Wilkinson shifts 223 The QR algorithm a The QR algorithm dating to the early 1960s is one of thejewels of numerical analysis 0 In its simplest form it can be viewed as a stable procedure for computing QR factorizations of the matrix powers A A2 A 0 Useful for solving eigenvalue problems 323 Pure Q R algorithm 0 Pure QR algorithm A0 A for k 12do QUORU Ak 1 QR factorization of Ak 1 AU RmQU Recombine factors in reverse order end for 0 Take a QR factorization multiply the computed factors Q and R together in the reverse order RQ and repeat a Under suitable assumptions this simple algorithm converges to a Schur form for the matrix upper triangular if A is arbitrary diagonal if A is Hermitian 0 Here we assume A is real and symmetric with real eigenvalues A and orthonormal eigenvectors qj ie interest in the convergence of the matrices AU to diagonal form QR algorithm cont d o The QR algorithm 0 O QUORU AMA QR factorization ofAk 1 Am RmQU Recombine factors in reverse order Carry out similarity transformation gt triangularize Alk by forming le QlklfAlk l gt multiply on the right by Qlk gives AlklQlklTAlk 1lQlkl Like Rayleigh quotient iteration the QR algorithm for real symmetric matrices converges cubically However the algorithm must be modified by introducing shifts at each step The use of shifts is one of the three modifications required to bring it closer to practical algorithm gt before starting the iteration A is reduced to tridiagonal form instead of Alk a shifted matrix AW 7 ulkll is factored at each step where Mk is some eigenvalue estimate whenever possible and in particular whenever an eigenvalue is found the problem is deflated by breaking Alk into submatrices V V 5 23 Practical QR algorithm a Practical QR algorithm Q0TA0Q0 A Am is a tridiagonalization of A for k 12do Pick a shift Mk eg choose Mk Align QUORU k l 7 pm QR factorization of AMA 7 pm A RmQU l lam Recombine factors in reverse order If any off diagonal element Ajljlrl is sufficiently close to zero set AjJ1 Aj1J 0 to obtain A1 0 k l0 A2 A l and apply the QR algorithm to A1 and A2 end for o The QR algorithm with well chosen shifts has been the standard method for computing all eigenvalues of a matrix since the early 1960 623 Unnormal ized simultaneous iteration 0 O Idea apply the power iteration to several vectors at once Suppose we start with a set of n linearly independent vectors v 07vn As Akvgol converges as k a 00 under suitable assumptions to the eigenvector corresponding to the largest eigenvalue of A in absolute value The space Akvgo7 7Akv20gt should converge under suitable assumptions to the space ltq17 7q spanned by the eigenvectors q17qn of A corresponding to the n largest eigenvalues in absolute value 723 Unnormalized simultaneous iteration cont d o In matrix notation define V0 to be the m x n initial matrix Vltogt Mo v9 and define VU to the result after k applications of A Vk Ak V0 v90 AM 0 Extract a well behaved basis for this space by computing a reduced QR factorization of VU 6900 0 Vk where 620 and Wk have dimensions In x n and n x n respectively a As k a 00 the columns should converge to eigenvectors iql lq27 39 39 391 iqn 823 Analysis of simultaneous iteration o If we expand vjlo and vjlk in the eigenvectors of A we have 0 VJ Bum 39 39 39 aqum k v l Alanm Afnaqum 0 Simple convergence results will hold if two conditions are satisfied gt the leading n 1 eigenvalues are distinct in absolute value lAll gt lAzl gt gt lAnl gt lAn1l 2 l n2l 2 2 lAml gt the collection of expansion coefficients 3 is in nonsingular Define Q as the m X n matrix whose columns are the eigenvectors q1q27 l l l 7qn We assume the following All the leading principal submatrices of QT Vlo are nonsingular 923 Simultaneous iteration 0 As k a 00 the vectors v k k 17 vn all converge to multiples of the same dominant eigenvector q1 of A 0 Thus although the space they span vgk7 vjkgt converges to something useful these vectors constitute a highly ill conditioned basis of that space 0 Need to orthonormalize at each step rather than once for all 0 Use a different sequence of matrices ZU rather than VU as before 620 6 IRmxr39 with orthonormal columns for k 12do ZA U GOOix Z reduced QR factorization of Z end for o The column spaces of 620 and ZU are the same both being equal to the column space of AkQ0 10 23 Simultaneous iteration cont d o The QR algorithm is equivalent to simultaneous iteration applied to a full set of n m initial vectors namely the identity 620 I 9 Since the matrices QU are now square we are dealing with full QR factorization and can drop the hats on QU and R 0 Will replace R00 by RU but QU by Q to distinguish the Q matrices of simultaneous iteration from those of the QR algorithm 11 23 Simultaneous iteration and unshifted QR algorithm Simultaneous iteration Unshifted QR algorithm 9 I 1 AW A 5 Z AQUPl 2 Ak1 QUORUO 6 z QUORUO 3 Ak RkQk 7 Ak QkTAQk 4 Q00 Q1Q2 Qk 8 For both algorithms we define one In x m matrix E00 300 RkRk1 R1 9 Theorem The above processes generate identical sequences of matrices Em Q and A namely those defined by the QR factorization of the k th power 0 Ak Qme 10 together With the projection Ak QkTAQk 11 12 23 Simultaneous iteration and QR algorithm Proof The case for k O is trivial For both simultaneous iteration and the QR algorithm imply A0 Qm Em I and A 0 A from which the results are immediate For k 2 1 for simultaneous iteration Ak AA Pl AQUPl R091 QkRkRk1 QUORUO WM 7 i i i i 7 10 2x3 For k 2 1 for the QR algorithm Ak AAkil AQk71 R091 Qk71Ak71 R091 QkRk V i i i V7 i i 10 11 6X8 Finally k k R 7091 k 0T R Lee0 4Q AWHQ 49 AQ 7 6 11 Convergence of the QR algorithm 0 O 0 Qualitative understanding of 10 and 11 is the key The first explains why the QR algorithm can be expected to find eigenvectors it constructs orthonormal bases for successive powers Ak The second explains why the algorithm finds eigenvalues It follows from 11 that the diagonal elements of A are Rayleigh quotients of A corresponding to the columns of Q As these columns converge to eigenvectors the Rayleigh quotients converge to the corresponding eigenvalues Meanwhile 11 implies that the off diagonal elements of A correspond to generalized Rayleigh quotients involving approximations of distinct eigenvectors of A on the left and the right Since these approximations must become orthogonal as they converge to distinct eigenvectors the off diagonal elements of A must converge to zero 14 23 QR 00 O algorithm with shifts What makes QR iteration works in practice is the introduction of A a A 7 ul at each step An implicit connection to the Rayleigh quotient iteration The pure QR algorithm is equivalent to simultaneous iteration applied to the identity matrix In particular the first column of the result evolves according to the power iteration applied to e1 The pure QR algorithm is also equivalent to simultaneous inverse iteration applied to a flipped identity matrix P and the m th column of the result evolves according to inverse iteration applied to em Let QU be the orthogonal factor at the k th step of the QR algorithm the accumulated product of these matrices k 39 k k k QHQJH qg qm j is the same orthogonal matrix that appears at step k of simultaneou iteration 39 15 23 QR algorithm with shifts cont d 0 Another way to put this is to say Q is the orthogonal factor in a QR factorization Ak QUOBUO o If we invert this formula we calculate Aik BUOfIQUOT Qk kyT for the second equality we have used the fact that A 1 is symmetric Let P denote the m x m permutation matrix that reverse row or column order 1 16 23 QR 0 O algorithm with shifts cont d Since F 2 I we have HP QRPPBk TP where the first factor is this product QUOP is orthogonal and the second factor PEk TP is upper triangular start with lower triangular matrix Emfl flip it top to bottom then flip again left to right Can be interpreted as a QR factorization of A kP In other words we effectively carry out simultaneous iteration on A 1 applied to the initial matrix P which is to say simultaneous inverse iteration on A In particular the first column of QUOP ie the last column of Q is the result of applying k steps of inverse iteration to the vector em 17 23 Connection with shifted inverse iteration D O The QR algorithm is both simultaneous iteration and simultaneous inverse iteration the symmetry is perfect However there is a huge difference between power iteration and inverse iteration as the latter can be accelerated arbitrarily through the use of shifts The better we can estimate an eigenvalue u x J the more we can accomplish by a step of inverse iteration with shifted matrix A 7 ul This corresponds to shifts in the simultaneous iteration and inverse iteration Let Mk denote the eigenvalue estimate chosen at the k th step of the QR algorithm the relationship between steps 7 1 and k of the shifted QR algorithm is Ak1 7 Mk Ak Qk R00 R00 Qk Mk 18 23 Connection with shifted inverse iteration cont d O 0 Th39 39 I39 IS ImP Ies A QkTAk71Qk and by induction Ak QkTAQk which is unchanged from that in the QR algorithm without shifts However the equation Ak Qm m in QR algorithm without shifts is no longer true Instead we have the factorization A 7 MUGA 7 Mk1A 7 M1Qk k a shifted variation on simultaneous iteration In other words Q Ill1 Q0 is an orthogonalization of HLAAepwn The first column of Q is the result of applying shifted power iteration to e1 using the shifts p0 and the last column is the result of applying k steps of shifted inverse iteration to em with the same shifts If the shifts are good eigenvalue estimates this last column of Q converges quickly to an eigenvector 19 23 Connection with Rayleigh quotient iteration 0 0 O I to the eigellveLLur it is natural to apply the To estimate the I r approximated by the last column of QM Rayleigh quotient to this last column k k qmTAqm k k qlnquln If this number is chosen as the shift at every step the eigenvalue and eigenvector estimates Mk and qlrlf are identical to those that are computed by the Rayleigh quotient iteration with em Lu W qmkTAqmk Thus the QR algorithm has cubic convergence in the sense that qlrlf converges cubically to an eigenvector Notice that the Rayleigh quotient rqlrllt appears as the m7 m entry of AM so it comes for free All elAmem e QkTAQkem qlnkllAqll Thus the above equation of Mk is simply setting Mk is known as the Rayleigh quotient shift which 20 r 23 Wilkinson shift 0 Although the Rayleigh quotient shift gives cubic convergence convergence is not guaranteed for all initial condition oConsider 0 1 Al ol The unshifted QR algorithm does not converge 0 1 1 0 1 1 A Q R L ol l0 ll 1001 Demo 7 A RQ almllloliA a The Rayleigh quotient shift In Amm however has no effect either since Amm O o The problem arises because of the symmetry of the eigenvalues one is 1 and the other is 1 0 Need an eigenvalue estimate that can break the symmetry 2123 Wilkinson shift cont d a Let B denote the lower rightmost 2 x 2 submatrix A 7 3mil bmil B i lbmil arn l o The Wilkinson shift is defined as that eigenvalue of B that is closer to am where in the case of a tie one of the two eigenvalues of B is chosen arbitrarily o A numerically stable formula for the Wilkinson shift is M am a sign6b3n71l6l M62 has where 6 am1 7 am2 and if 6 0 sign6 can be arbitrarily set to 1 or l 22 23 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 6 120 Overview 0 Orthogonal projection distance between subspaces a Principal component analysis 0 Applications 220 Orthogonal projection 0 Let 5 C IR be a subspace P E IRr39X is the orthogonal projection onto 5 if ranP 5 F 2 P and PT P 0 Example in R3 1 O O X X X X POlOPyyandP2yy O O O z 0 z 0 0lfxEIR thenF xeSandliF x L 0 If P1 and P2 are orthogonal projections then for any 2 E R we have 5 P12TI 7 P2z szTl 7 P1z llP1 lell 0 If ranP1 ranP2 5 then the right hand side of the above equation is zero ie the orthogonal projection for a subspace is unique 3 20 Orthogonal projection and SVD o If the columns of V v17 7vk are an orthonormal basis for a subspace S then it is easy to show that P VVT is the unique orthogonal projection onto 5 o If v 6 IR then P is the orthogonal projection onto 5 spanuv 0 Let A UXVT E IRmxr39 and rankA r we have the U and V partitionings then U M V V T projection onto projection onto projection onto projection onto vr V nullAL ranAT nullA ran A 420 Distances between subspaces a Let 1 and 52 be subspaces of IR and dim1 dim2 we define the distance between two spaces by dist1 2 HP1 7 Psz where P is the orthogonal projection onto 5 o The distance between a pair of subspaces can be characterized in terms of the blocks of a certain orthogonal matrix Theorem Suppose W1 W2 Z 21 Zg k n 7 k k n 7 k are n by n orthogonal matrices 1 51 ranW1 and 2 ranZ1 then d5f51752 HW1T22H2 HZ1TW2H2 See Golub and Van Loan for proof 520 Distance between subspaces in IRquot 0 If 51 and 52 are subspaces in IR with the same dimension then 0 dist1 52 1 The distance is zero if 51 2 and one if 51 5 7 0 620 Symmetric matrices 0 Consider real symmetric matrices AT A gt Hessian matrix second order partial derivatives of a function y fx Ax m fx JxAx AxTHxAx where J is the Jacobian matrix ie the gradient gt covariance matrix for Gaussian distribution 9 The inverse is also symmetric A lT A 1 o Eigenvector equation for a symmetric matrix Auk Akuk which can be written as AUDU or AiDUO where D is a diagonal matrix whose elements are eigenvalues D l Ar l and U is matrix whose columns are eigenvectors uk 720 Eigenvectors for symmetric matrices o The eigenvectors can be computed from l A 7 D l O 0 Can show that the eigenvectors can be chosen to form an orthonormal basis 0 For a pair of eigenvectors u and uk it follows uJTAult AkuJTuk T 7 ukAuJ 7 Ajuk uj and thus k Ajluluj 0 o For lt 7 j the eigenvectors must be orthogonal 0 Note for any uk with eigenvalue AR Buk is also an eigenvector for non zero B with the same eigenvalue 9 Can be used to normalize the eigenvectors to unit norm so that T quotk U1 5M 820 Symmetric matrices and diagonalization o For symmetric matrix A AU DU and UTU I A can be diagonalized UTAU D o For symmetric matrix A the SVD of A UXUT 0 Recall U V are left and right singular vectors AATU XU ATAV XV Since A is symmetric U V and A UXUT 920 Principal component analysis 0 O 0 0000 Arguably the most popular dimensionality reduction algorithm Curse of dimensionality Widely used in computer vision machine learning and pattern recognition Can be derived from several perspectives gt Minimize reconstruction error KarhunenLoeve transform gt Decorrelation Hottelling transform gt Maximize the variance of the projection samples ie preserve as much energy as possible Unsupervised learning Linear transform Second order statistics Recall from SVD we have A UXVT and thus project samples on the subspace spanned by U can be computed by UTA XVT 10 20 Principal component analysis PCA 0 Given a set of n data points x 6 EU we39d like to project each x00 onto a onto a d dimensional subspace 20 217 zd 6 Rd where d lt In so that rn X Zziu i1 where the vectors u satisfy the orthonormality relation UITUJ39 in which 60 is the Kronecker delta Thus 2 ulTx 0 Now we have only a subset d lt m of the basis vector u The remaining coefficients will be replaced by constants b so that each vector x is approximated by x can be approximated by d m Zziui Z bu i1 id1 11 20 Principal component analysis cont d o Dimensionality reduction x has In degree of freedom and 2 has d degree of freedom 0 For each XU the error introduced by the dimensionality reduction is X00 ii Z zllk 7 bu id1 and we want to find the basis vector u the coefficients b and the values 2 with minimum error in I2 norm 0 For the whole data set with orthonormality relation 1 n N 1 n m k EFEZHxWijZ Z 2 Lt k1 k1id1 12 20 Principal component analysis cont d 0 Take derivative of Ed with respect to b and set it to zero 1 1 E Z ulTxm ulTi where i E Z xk k1 k1 k1 0 Plug it into the sum of square errors Ed Ed Zquotd1 ZZ1UTXk 42 g ild1 UiTCUI where C is a covariance matrix 1 n 37 x007 xk7iT n g x o Minimizing Ed with respect to u we get CU U ie the basis vectors u are the eigenvectors of the covariance matrix 13 20 Derivation a Minimizing Ed with respect to u 7 1 T k 7 2 Ed 5 ng ZIui X l X n E Zid1 quoti 0 Need some constraints to solve this optimization problem Impose orthonormal constraints among u 0 Use Lagrange multipliers 1 m 1 1 m m uiCuiliilz Z ijluiluj 6ij Id1 Id1jd1 Ed 0 Recall min f x st gx0 LOW fx gtgx 0 Example min fx1X2 X1X2 subject to gx1X2 X1 l X2 71 O 14 20 Derivation cont d c In matrix form Ed trUTCU 7 MUTU7 1 where M is a matrix with elements 1 and U is a matrix whose columns areAui o Minimizing Ed with respect to U CCTU7 UM MT 0 a Note C is symmetric M is symmetric since UUT is symmetric Thus CU UM UTCU M 0 Clearly one solution is to choose M to be diagonal so that the columns of U are eigenvectors of C and the diagonal elements of M are eigenvalues 15 20 Derivation cont d o The eigenvector equation for M MW IJA where is a diagonal matrix of eigenvalues and lJ is the matrix of eigenvectors o M is symmetric and lJ can be chosen to have orthonormal columns 39 T 7 ellal W7 AWTMW 0 Put together A wTuTCUW NUWCUJ UTCU where U UW and U UWT 0 Another solution for UTCU M can be obtained from the particular solution U by application of an orthogonal transformation given by lJ 16 20 Derivation cont d e We note that Ed is invariant under this orthogonal transformation Ed trUTCU trJUTCUJT trUTCU 0 Recall the matrix 2 norm is invariant under orthogonal transformation 0 Since all of the possible solutions give the same minimum error Ed we can choose whichever is most convenient a We thus choose the solutions given by NJ with unit norm since this has columns which are the eigenvectors of C 17 20 Computing principal components from data 0 Minimizing Ed with respect to u we get CU U ie the basis vectors u are the eigenvectors of the covariance matrix 0 Consequently the error of Ed is m 1 Ed E id1 In other words the minimum error is reached by discarding the eigenvectors corresponding to the m 7 d smallest eigenvalues o Retain the eigenvectors corresponding to the largest eigenvalues 18 20 Computing principal components from data 0 Project XU onto these eigenvectors give the components of the transformed vector 20 in the d dimensional space 0 Each two dimensional data point is transformed to a single variable 21 representing the projection of the data point onto the eigenvector ul 0 lnfer the structure or reduce redundancy inherent in high dimensional data 0 Parsimonious representation 0 Linear dimensionality algorithm based on sum of square error criterion 0 Other criteria covariance measure and population entropy 1920 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 24 113 Overview 0 Least norm solutions of underdetermined equations 0 General norm minimization with equality constraints 2i3 Underdetermined linear equations 0 Consider y Ax where A E Rm is fat m lt n ie gt there are more variables than equations gt x is underdetermined ie many choices x lead to the same y 0 Assume A is full rank so each y 6 EU there is a solution set of all solutions has form XlAX y Xp 2l2 6 NW where xp is any particular solution ie Axp y and NA is the null space of A o z characterizes available choices in solution 0 A solution has dimNA n 7 m degrees of freedom 0 Can choose 2 to satisfy other specifications or optimize among solutions 313 Least norm solutions a One particular solution is x ATAAT 1y AAT is invertible since A is full rank 9 x is the solution y Ax that minimizes ie x is solution of optimization problem min llxllg subject to Axy with available x 6 IR 4i3 Least norm solution cont d 9 One particular solution for Ax y is x ATAAT 1y 0 Suppose Ax y so Ax7 x1 0 and x 7 xlnTxn x 7 xlnTATAAT 1y AX XInTAAT 1y 0 ie x 7 xlnLxn so Hle 7 ix x7 xmii lXInll2 llx 7 xmll2 Z llXInll2 ie x has smallest norm of any solution 513 Geometric interpretation xlew NAx Ax 0 xIn o Orthogonality condition xnLNA 0 Projection interpretation x is projection of 0 on solution set XlAX y 0 Al ATAAT1 is called the pseudoinverse of full rank fat A o ATAAT1 is a right inverse of A ie AATAAT1 I t A o I ATAAT1A gives projection onto NA 613 Overconstrained and underconstrained linear equations Overconstrained Underconstrained min HAxesz min HXH subject to Ax y Axymgtn Axymltn Af ATA 1AT Af ATAAT 1 ATA 1AT is a left inverse of A ATAAT 1 is a right inverse of A AATA 1AT I 7 ATAAT 1A gives projection onto RA gives projection onto NA 713 Least norm solution via QR factorization 0 Find QR factorization of AT ie AT QR with gt Q 6 1RquotXm7QTQ Im gt R E Rquot39Xquot39 upper triangular nonsingular o x ATAAT 1y QR Ty HXInH HR TYH 8L3 Derivation via Lagrange multipliers o Least norm solution solves optimization problem xTx min xTx subject to Ax y 0 Introduce Lagrange multiplers Lx xTx l ATAX 7 y o Optimality conditions are VXL2xATAO VALAx7yO a From the first condition x iAlA2 0 Substitute into the second condition A 72AAT 1y 0 Hence x ATAAT 1y 913 Relation to regularized least squares 0 Suppose A E Rm is fat full rank 0 Define J1 lleiyllz J2 llxll2 o Least norm solution minimizes J2 with J1 O o Minimizer of weighted sum objective J1 l ng llei yll2 Ullxll2 is xM ATA uI 1ATy 0 Fact xM a x as u a 0 ie regularized solution converges to least norm solution as u a O o In matrix form as u a 0 ATA WlAT a ATAAT 1 for full rank fat A 1013 General norm minimization with equality constraints 0 Consider the problem min lle 7 bll subject to Cx d with variable x 0 Includes least squares and least norm problems as special cases 0 Equivalent to min lle 7 bll2 subject to Cx d o Lagrangian is Lx lle 7 bu ATcx 7 d XTATAX 7 bTAx bTb AT Cx 7 AM 11l3 Norm minimization with equality constraints cont d o Lagrangian is Lx7gt lle7bll2gtTCxid XTATAX 7 bTAx bTb AT Cx 7 ATd o Optimality conditions are VXLATAx7ATb CTA0 VAL Cxid0 0 Put in matrix form ATA CT x 7 Alb C O A 7 d o If the block matrix is invertible we have x 7 ATA CT 1 ATb A 7 C 0 d 1213 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 14 123 Overview 0 Eigenvalue algorithms 9 Schur decomposition a Power iteration a Inverse iteration o Rayleigh quotient iteration 223 Eigenvalue algorithm a Shortcomings of obvious algorithms gt Characteristic polynomial Compute the coefficients of the characteristic polynomial and find the roots an illconditioned problems in general gt Power iteration The sequence x Ax Azx A3x llel lAXll llAZXll llA3Xll converges to an eigenvector corresponding to the largest eigenvalue of A but it is very slow in convergence 0 Best general purpose eigenvalue algorithms are based on a different principle the computation of an eigenvalue revealing factorization of A where the eigenvalues appear as entries of one of the factors gt diagonalization A XX 1 gt unitary diagonalization A QAQH gt unitary triangularization Schur factorization A QTQH 0 Most of these direct algorithms proceed in two phases gt a preliminary reduction from full to structured form gt an iterative process for the final convergence Eigenvalue solvers o It is well known that no formula exists for expressing the roots of an arbitrary polynomial given its coefficients 0 Abel in 1824 proved that non analogue of the quadratic formula can exist for polynomials of degree 5 or more a Any eigenvalue solver must be iterative o The goal is to produce sequences of numbers that converge rapidly towards eigenvalues 423 Schur factorization and diagonalization 0 Most of the general purpose eigenvalue algorithms in use today proceed by computing the Schur factorization A QTQH QHAQ T 0 Compute Schur factorization A QTQH by transforming A using a sequence of elementary unitary similarity transformation X l gt QJHXQJ so the product Q QnyAQ1Q2Qj W OH O converges to an upper triangular matrix T asj a 00 o If A is real but not symmetric then in general it may have complex eigenvalues in conjugate pairs 523 Eigenvalue solvers cont d o If A is real but not symmetric then in general it may have complex eigenvalues in conjugate pairs 0 If A is Hermitian then QJHQnylAQ1Q2 Q is also Hermitian and thus the limit of the converging sequence is both triangular and Hermitian hence diagonal o This implies that the same algorithms that compute a unitary triangularization of a general matrix also compute a unitary diagonalization of a Hermitian matrix 623 Two phases of eigenvalue computation o In the first phase a direct method is applied to produce an upper Hessenberg matrix Le a matrix with zeros below the first subdiagonal o In the second phase an iteration is applied to generate a formally infinite sequences of Hessenberg matrices that converge to a triangular form gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt H gtlt gtlt gtlt gtlt H gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt A7 AH H 723 Two phases of eigenvalue computation cont d o If A is Hermitian the two phase approach becomes faster 0 The intermediate matrix is a Hermitian Hessenberg matrix ie tridiagonal o The final result is a Hermitian triangular matrix ie diagonal gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt H gtlt gtlt gtlt H gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt gtlt AAH D 823 Reduction to Hessenberg or tridiagonal form a To compute Schur decomposition A QTQH we would like to apply unitary similarity transformation to A and introduce zeros below the diagonal a One may want to apply Householder transformations X gtlt gtlt gtlt gtlt gtlt gtltgtltgtltgtltgtlt Jgtgtltgtltgtltgtltgtlt gtltgtltgtltgtltgtlt GOOD 0 The zeros that were introduced by Householder transformation Q1 X X X X X X X X X Q A XXXXX XXXXX l X X X X X XXXXX XXXXX XXXXX XXXXX 01 Q AQ1 are destroyed by rotation of Q1 in similarity transformation 923 Reduction by Householder transformations 0 The right strategy is to introduce zeros selectively 9 Select a Householder reflector Q1 that leaves the first row unchanged 0 When multiplied on the left of A it forms linear combinations of only rows 27 7 m to introduce zeros into rows 37 7 m of the first column a When multiplied on the right of QlHQ it leaves the first column unchanged and forms linear combinations of columns 27 7 In so it does not alter the zeros that have been introduced x x x x x x x x x x x X X X X x x x x x X X X X X x X X X X x x x x x a 0 X X X X a X X X X x x x x x 0 X X X X X X X X ix x x x xi l0 X X E El l X X E El A Qll QFA Q1 QFAQ1 10 23 Reduction by Householder transformations 0 Same idea is repeated to introduce zeros into subsequent columns X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X a X X X X a X X X X X X X X 0 X X X X X X X X X X 0 X X X X X X Q94621 Q5 Q94621 Q Q94621 Q2 0 Repeating this process 77 2 times we have a product in Hessenberg form XXXXX X X X X X Q 1QfQAA Q1Q2Qm72 H v H O 11 23 Householder reduction to Hessenberg form 0 Algorithm for k1to m72 do X Ak1 mk Vk Signxlllxll2e1 X Vk V llellQ Ak1 mk rn Ak1 mk rn 2VkVEAk1 mk m A1 mk1 rn A1 mk1 m 7 mk1 mvkvll end for a If A is Hermitian the algorithm will reduce A to tridiagonal form 0 Since A is Hermitian QHAQ is also Hermitian and any Hermitian Hessenberg matrix is tridiagonal 1223 Rayleigh quotient and inverse iteration 0 0 Examine some classical eigenvalue algorithms with real matrices Rayleigh quotient of a vector x E Rm is the scalar 7 xTAx rX xTx If x is an eigenvector then rx is the corresponding eigenvalue Given x what scalar Oz acts most like an eigenvalue for x in the sense of minimizing lle 7 axllg An In x 1 least squares of the form xa z Ax x is the matrix 04 is the unknown and the solution is 04 xTx 1xTAx rx rx is a natural eigenvalue estimate to consider if x is close to but not necessarily equal to an eigenvector 13 23 Gradient of Rayleigh quotient and eigenvector 0 Consider x E Rm as a variable so that r is a function Rm a R 0 Interested in the local behavior of rx when x is near an eigenvector a One way to quantitatively approach this is to compute the partial derivatives of rx with respect to the coordinate Xj 3 00 867XTAX XTAX857XTX 5T9 7 7 xTx T xTx2 2Ax x Ax2x 7 2 TTX XTX2 WAX 7 rxxj 0 Collect these partial derivatives into an m vector we get the gradient of rx Vrx Ax 7 rxx 0 At an eigenvector x of A the gradient of rx is the zero vector 0 Conversely if Vrx O with x 7 0 then x is an eigenvector and rx is the corresponding eigenvalue 14 23 Geometric perspective a The eigenvectors of A are the stationary points of the function rx The eigenvalues of A are the values of rx at these stationary points Since rx is independent of the scale of x these stationary points lie along lines through the origin in IR quot 0 If we normalize x to unit sphere 1 they become isolated points 0 For 1R3 there are 3 orthogonal stationary points Let qJ be one of the eigenvectors of A it can be shown rX rqJ 0IIX qul2 as X gt m Rayleigh quotient is a quadratically accurate estimate of an eigenvalue 1523 Power iteration 0 Produce a sequence v0 that converges to an eigenvector corresponding to the largest eigenvalue of A 0 Algorithm Initialize v0 randomly with llv0ll 1 for k 12do wAvk 1 applyA vU m normalize k vkTAvk Rayleigh quotient end for a Let v0 denote the linear combination of the orthonormal eigenvectors q we can analyze power iteration v a1q1 32 amqm 0 Since v00 is a multiple of Akvm we have some constant ck vk CRARVW Cka1 fq1 azx q2 amA fnqm Ckl131Q1 3221k 12 l l amm1kqm 16 23 Power iteration cont d Theorem Suppose l1l gt l2l 2 2 lml 2 0 and qlTvm 7 0 then after k Q 1 iterations k k 2k gt7 gt as k a 00 The i sign means that at each step k one or the other choice of sign is to be taken and then the indicated bound holds Q llvlk e iq1ll o lt M 0 However it has limited use gt it can find only the eigenvector corresponding to the largest eigenvalue gt the convergence is linear reducing the error only by a constant factor m lAzAll at each iteration the quality of this factor depends on having 3 largest eigenvalue that is significantly larger than the others V 17 23 Inverse iteration o For any u 6 R that is not an eigenvalue of A the eigenvectors of A 7 ul 1 are the same as the eigenvectors of A and the the corresponding eigenvalues are J 7 u 1 where A are the eigenvalues of A 0 Suppose u is close to an eigenvalue J of A then J 7 u 1 may be much larger than j 7 u 1 for allj 7 J o If we apply power iteration to A 7 11071 the process will converge rapidly to qJ 0 Algorithm Initialize v0 randomly with llv0ll 1 for k 12do Solve w A 7 ul 1vk 1 apply A 7 ul 1 vU m normalize k vkTAvk Rayleigh quotient end for 18 23 Inverse iteration cont d o What if u is or nearly is an eigenvalue of A so that A7 p is singular a Can be handled numerically 0 Exhibit linear convergence 9 Unlike power iteration we can choose the eigenvector that will be found by supplying an estimate u of the corresponding eigenvalue 0 If u is much closer to one eigenvalue of A than to the others then the largest eigenvalue 0 A 7 ul 1 will be much larger than the rest Theorem Suppose J is the closest eigenvalue 0qu and K is the second closest that is lpi Jl lt lpi Kl lt lpi jl for eachj 7 J Furthermore suppose qjvm 7 0 then after k iterations 2k k I07 7 7A lA Alumina as k a 00 The i sign means that at each step k one or the other choice ofsign is to be taken and then the indicated bound holds llvlkiq1ll o l 3 Inverse iteration and Rayleigh quotient iteration o Inverse iteration gt One of the most valuable tools of numerical linear algebra gt A standard method of calculating one or more eigenvectors of a matrix if the eigenvalues are already known 0 Obtaining an eigenvalue estimate from an eigenvector estimate the Rayleigh quotient o Obtaining an eigenvector estimate from an eigenvalue estimate inverse iteration o Rayleigh quotient algorithm Initialize v0 randomly with llv0ll 1 0 v0TAv0 corresponding Rayleigh quotient for k 12do Solve w A 7 k 1l 1vk 1 apply A 7 uk 1l 1 vllt W normalize k vkTAvk Rayleigh quotient end for 20 23 Rayleigh quotient iteration Theorem Rayleigh quotient iteration converges to an eigenvalueeigenvector pair for all except a set of measure zero of starting vectors v0 When it converges the convergence is ultimately cubic in the sense that if J is an eigenvalue ofA and v0 is suf ciently close to the eigenvector qJ then iivk17 quNi 0Hvk7iq1ii3 and Wk 7 M 00W 7 M3 as k a 00 The i sign means that at each step k one or the other choice of sign is to be taken and then the indicated bound holds 21 23 Example 0 Consider the symmetric matrix 2 l l A l 3 l l l 4 and let v0 111T the initial eigenvector estimate 0 When Rayleigh quotient iteration is applied to A the following values k are computed by the first 3 iterations M0 5 A0 52131 W 5214319743184 The actual value of the 39 39 cu 439 to the eigenvector closest to W is 5214319743377 0 After three iterations Rayleigh quotient iteration has produced a result accurate to 10 digits 0 With three more iterations the solution increases this figure to about 270 digits if our machine precision were high enough 22 4 23 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 12 117 Overview 0 QR decomposition by Householder transformation 0 QR decomposition by Givens rotation 2i7 Householder triangularization o In Gram Schmidt process R71 ARIRZ RnQ W has orthonormal columns The product R R1R1Rf1 is upper triangular o In Householder triangularization a series of elementary orthogonal matrices Qk is applied to the left of A Qn Q2Q1AR W 1 is upper triangular The product Q QFQ Q T is orthogonal and therefore A QR is a QR factorization of A Geometry of elementary projectors o For ux 6 IR st 1 o Orthogonal projectors onto spanu and uL are uuT uuT F uiT and PULI 7T uu uu o For u 7 0 the Householder transformation or the elementary reflector about uL is T uu R I 7 27 uTu or R I 7 2uuT when 1 and R RT R l 4H7 Triangularization by introducing zeros o The matrix Qk is chosen to introduce zeros below the diagonal in the k th column while preserving all the zeros previously introduced x x x X X X x x x x x x x x x 0 X X X X x x x x x g 0 X X ampgt 0 E ampgt X x x x 0 X X 0 X 0 x x x 0 X X 0 X 0 A 114 Q2Q1A Q3Q2Q1A o Qk operates on row k7 7 m the changed entries are denoted by boldface or X a At beginning of step k there is a block of zeros in the first 71 columns of these rows 0 The application of Qk forms linear combinations of these rows and the linear combination of the zero entries remain zero a After n steps all the entries below the diagonal have been eliminated and QnQ2Q1A R is upper triangular 5H7 Householder transformation 0 Each Qk is chosen to be I 0 where I is the k 1 X k 1 identity matrix and F is an m k 1 X m k 1 orthogonal matrix 0 Multiplication by F has to introduce zeros into the k th column 0 The Householder algorithm chooses F to be a particular matrix called Householder transformation or reflector I I X 1 I I I 1 uxe1 x Fxxe10Le1 0 At step k the entries k m of the k th column are given by vector x E Rm k l l 617 Householder transformation cont d I X 1 I I I l I uxe1 x Fxxe10Le1 0 To introduce zeros into k th column the Householder transformation F should gtlt llxll X F X gt FX llxlle1 0461 X 0 o The reflector F will reflect the space Kim k across the hyperplane H orthogonal to u llxllel x o A hyperplane is characterized by a vector u llxllel x 717 Householder transformation cont d Fxxe10Le1 0 Every point x 6 IR39quot is mapped to a mirror point T T uu u x Fx 2 xx 2u UTquot quotTquot and hence T UU F 1 2 UT 0 Will fix the sign in the next slide 817 The better of two Householder reflectors 0 Two Householder reflectors transformations 0 For numerical stability pick the one that moves reflect x to the vector llxllel that is not to close to x itself ie llxlle1 x in this case o In other words the better of the two reflectors is u signX1llelel X where X1 is the first element of x signX1 1 if X1 0 917 Householder Q R factorization 0 Algorithm for k 1 to n do X Ak mk Uk Signxlllxll2e1 X 7 u quotk Hull Ak mk n I 2UkUAk mk n end for 0 Upon completion A has been reduced to upper triangular form ie R in A QR QT QnQ2Q1 or its conjugate Q Q1Q2 Qn 10 U QR decomposition with Householder transformation 0 Want to compute QR decomposition A with Householder transformation 12 751 4 A 6 167 768 74 24 741 0 Need to find a reflector for first column of A a1 1267 74F to llalllel 147 070T uae1 7 x 2 76 4T 21 7327 T 67 37 727 14 21 714 F1I 7 2 37 727 67 FlA 0 749 714 727 67 37 0 168 777 2 Next need to zero out A32 and apply the same process to 7 749 714 A l168 777l 1117 QR decomposition with Householder cont d a With the same process Q2 1 0 0 0 7725 2425 0 2425 725 0 Thus we have Q Q1Q2 37 158175 76175 727 635 3335 14 21 714 R Q2Q1A QTA 0 175 770 0 0 735 9 The matrix Q is orthogonal and R is upper triangular 67 769175 58175 1217 Givens rotations o Givens rotation is useful for zero out elements selectively 1 0 0 0 O c s O 60190 0 is O k 0 0 0 1 39 k I where c 6050 and s sin0 for some 9 o Pre multiply Ci7 70 amounts to a counterclockwise rotation 9 in the i7 k coordinate plane y Ci7 k0x ex 7 5X j i M sx m j k Xj J 7g 1397 k 13 U Givens rotations cont d 0 Can zero out yk 5X l X O by setting CL 57 Xk 0arctanXkX 7 7 lxizxf lxizxf a QR decomposition can be computed by a series of Givens rotations a Each rotation zeros an element in the subdiagonal of the matrix forming R matrix Q Gl Gn forms the orthogonal Q matrix 0 Useful for zero out few elements off diagonal eg sparse matrix 0 Example 12 751 4 A 6 167 768 74 24 741 a Want to zero out A31 74 with rotation vector 67 74 to point along the X axis ie t9 arctan74 6 14 U QR factorization with Givens rotation 0 With 9 we have the orthogonal Givens rotation Cl 1 0 0 1 0 0 61 0 6050 5070 0 083205 7055470 0 75070 6050 0 055470 083205 0 Pre multiply A with Cl 12 751 4 61A 72110 1256396 73383671 0 1126041 77183368 9 Continue to zero out A21 and A32 and form a triangular matrix R o The orthogonal matrix QT 636261 and 63626114 QTA R for QR decomposition 1517 Gram Schmidt Householder and Givens o Householder QR is numerically more stable 0 Gram Schmidt computes orthonormal basis incrementally o Givens rotation is more useful for zero out few selective elements 16 U CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 20 118 Overview 0 Steepest descent o Conjugate gradient 0 Preconditioning 218 Quadratic form 0 Consider real symmetric A a quadratic form is simply a scalar fx xTAx 7 bTx l c 0 Setting the gradient to zero VfxAx7b0Axb o The solution to Ax b is a critical point of fx 0 If A is positive definite as well then at arbitrary point p fp fx 037 xJAip 7 xo 2 0 and the latter term is positive for all p 7 x and x is a global minimum of f 318 Steepest descent O 0 Start at an arbitrary point x0 and slide down to the bottom of the paraboloid by taking a series of steps x1 x2 until we are satisfied that we are close enough to the solution x Choose the direction which f decreases most quickly ie the opposite of Vfx 7Vfx b 7 AX The error e x 7 x is a vector that indicates how far we are from the solution The residual r b 7 Ax indicates how far we are from the correct value of b It is easy to see that r iAei and residual is being the error transformed by A into the same space as b More important r VfXi Can think of residual as the direction of steepest descent Steepest descent cont d a After finding the direction move to the next point Xi X121 M121 0 How big is the step e A line search is a procedure that chooses 04 to minimize f along a line a From basic calculus Oz minimizes f when the directional derivative xi isdequal to zero d xi VfxTEX VfxTr1 iriTri1 a To determine Oz riTl iA b 7 AxTr1 b AXi71 MPHTI I A b Axi71Tri71 04Ari71Tri71 b 7 Axi1Tri 7 aAri1Tri1 OOOO T 7 T rpm1 7 ari71Ar1 a HAY71 rHlArHl Steepest descent cont d O 0 Put it all together r121 b AXi71 a 7 liar71 rllAr1 Xi X121 04174 Can save computation by multiply 7A to the above equation and adding b on both sides 397 r121 OziAl iA Steepest descent often finds itself taking steps in the same directions as earlier steps Convergence rate depends on the conditional number a Steepest descent can converge quickly if a fortunate starting point is chosen but is usually at worst when a is large Conjugate gradient as a direct method 0 We want to solve a system of linear systems Ax b where A 6 IRme is symmetric and positive a Two non zero vectors u and v are conjugate with respect to A if uTAv O 0 Since A is symmetric and positive definite ltu7vgtA Alum ltAuvgt ltu7Avgt uTAv 0 Suppose pn is a sequence of m mutually conjugate directions then the pn form a basis of Rm and we can expand the solution x unique solution of Ax b in this basis rn X ZaiPi i1 o The coefficients are given by b AX 211 O iPi Plb PZAM Zl 1o iplAPi anPIApn a PnTb ltPngtbgt ltPngtbgt plApn menM llpnlli 718 Conjugate gradient as a direct method cont d 0 First find a sequence of n conjugate directions and then compute the coefficients require only inner products 9 How to find conjugate directions 0 Gram Schmidt conjugations Start with n linearly independent vectors U17 7 Ur o For each vector subtract those parts that are not A orthogonal to the other processed vectors Pn un l ZZ1 nkPk an IAP 7 P AP 0 Problem Gram Schmidt conjugation is slow and we have to store all the vectors that we have created 818 Conjugate gradient as an iterative method 0 If we choose the conjugate vectors pn carefully we may not need all of them to obtain a good approximation 9 Also the direct method does not scale well when m is large 0 Without loss of generality assume the initial guess x0 O 0 Need a metric to tell us whether we are getting closer to x 1 T T m fx x Axibx 7XEIR where Vfx Ax 7 b o This suggests taking the first basis vector p1 to be the gradient of f at x0 ie Axo 7 b 0 Since x0 0 p1 7b a The other direction vectors in the basis will be conjugate to the gradient hence the name conjugate gradient method 918 Conjugate gradient as an iterative method cont d 0 Let rn be the residual at n th step rn b Axn 0 Note that rn is the negative gradient of f at x xn so the gradient descent method would be to move in the direction of rn 0 Here we insist the directions pn are conjugate to each other so we take the direction closest to the gradient rn under the conjugacy constraint T D Arn plApn pn1 Zrn n 1018 Conjugate gradients 0 The conjugate gradient CG iteration is the original Krylov subspace iteration The most famous of these methods and one the mainstays of scientific computing Discovered by Hestenes and Stiefel in 1952 it solves symmetric positive definite systems of equations amazingly quickly if the eigenvalues are well distributed Consider the case of 2 norm in solving a nonsingular system of equations Ax b with exact solution x A lb Let ICn denote the n th Krylov subspace generated by b ICn ltbAb7 A 1bgt One approach to minimize 2 norm of the residual is based on the Krylov subspace is GMRES 1118 Minimizing the 2 norm of the residual 0 In GMRES at step n x is approximated by the vector xn E ICn that minimizes llrnllg where rn b 7 Axn o The usual GMRES algorithm does more work than is necessary for minimizing llrnllg a When A is symmetric faster algorithms are available based on three term instead of n l 1 term recurrences at step n a One of these goes by the names of conjugate residuals or MINRES minimal residuals 1218 Minimize the A norm of the error 0 Assume that A is real symmetric and positive definite c That means the eigenvalues of A are all positive or equivalently that xTAx gt O for every nonzero x E Rm 0 Under this assumption the function HA defined by llxllA VXTAX is the A norm on Rm 9 The vector whose A norm will concern us is en x 7 X the error at step n o The conjugate gradient iteration is a system of recurrence formulas that gt generated the unique sequence of iterates x E Kn gt with the property that at step n llenllA is minimized c Will reveal the use of orthogonality in minimizing llenllA 1318 The conjugate gradient iteration a To solve Ax b 9 Algorithm x00r0bp0r0 for n123 do an rnlilrnilPlilApnil step length xn xnsl l anpnsl approximate solution rn rn1 7 anApn1 residual Br rrTrnrr1rn1 improvement this step pn rn l Bnpnsl search direction end for a Very simple programmable in a few lines of MATLAB 0 Deals only with m vectors not with individual entries of vectors or matrices o The only complication is the choice of a convergence criterion 1418 The conjugate gradient iteration cont d 0 At each step the conjugate gradient iteration involves several vector manipulation and one matrix vector product the computation of Apnil o If A is dense and unstructured the matrix vector product dominates the operation count 0m2 flops for each step 0 If A is sparse or has other structure that can be exploited the operation count may be as low as Om flops per step 1518 The conjugate gradient iteration cont d Theorem Let the conjugate gradient iteration be applied to a symmetric positive definite matrix problem Ax b As long as the iteration has not yet converged ie rn1 7 0 the algorithm proceeds Without divisions by zero and we have the following identities of subspaces 1C x17x27xngt r07r1 7 rn71gt Moreover the residuals are orthogonal rjrj 0 0 lt n 2 and the search directions are A conjugate pZApj 0 i lt n 3 The conjugate gradient iteration cont d 0 Proof by induction From the initial guess x0 O and the formula xn xnsl l anpnsl it follows by induction that xn belongs to P07 P17 7Pn71gt 9 From pn rn l npnsl it follows that this is the same as r07r17rn1gt c From rn rn1 7 anApn1 it follows that this is the same as ltbAb7 A 1bgt e To prove apply the formula rn rn1 7 anApn1 and the identity AprklT pzilA to compute rn U rLrj O HPFl71ArJ39 o lfj lt n 71 both terms on the right are zero by induction 0 lfj n 71 the difference on the right is zero provided an rli1rn7llP171Arn71 a Note this is the same as the line an rzilrn1p1Apn1 0 Since pnsl and rnsl differ by nslpnsg the effect of this replacement is to change the denormalization by n1leApn2 which is zero by induction 1718 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 5 113 Overview 0 Matrix properties via singular value decomposition SVD o Geometric interpretation of SVD 0 Applications 2i3 Reading a Chapter 3 of Matrix Computations by Gene Golub and Charles Van Loan 313 Matrix multiplication 0 Recall 7 T x y 7 xy b 0 Let A a1an E IRmxr39 and B E IRr39Xq matrix b multiplication can be written as AB iaibf Eganab i1 i1 o For example SVD expansion 0 We can decompose A in terms of singular values and vectors r r A ZUiUiVIT Zaiu 8 V i1 i1 where X is the Kronecker product a The matrix 2 norm and Frobenius norm properties have connections to the SVD llAllF 1af l 05 pminmn llAllz 01 minx7go ll mllg am In 2 n o Closely related to eigenvalues eigenvectors and principal component analysis Matrix properties via SVD Theorem The rank ofA is r the number of nonzero singular values Proof The rank of a diagonal matrix is equal to the number of its nonzero entries and in SVD A UXVT where U and V are of full rank Thus rankA rankX r 1 Theorem HAMPmand HAHF Uf03 i Proof As U and V are orthogonal A UXVT HAHQ By definition HXHQ maXHxH1HXxH2 max a 01 Likewise HAHF and by de nition iiziiF D 6t3 Matrix properties via SVD cont d The nonzero singular values ofA are the square roots of the nonzero eigenvalues ofAAT or ATA they have the same nonzero eigenvalues Theorem 39 Proof From definition AAT UXVTXUXVTT UXVT VXUT U diagaf ag UT D Theorem ForA E IRmxm idetAi a Proof idetAi idetw VTi idetUiidetiidetvTi ideti H a i1 D 7i3 Matrix properties via SVD cont d Theorem A is the sum ofr rank one matrices A 21 aiuivjT Theorem Eckart Young1936 Let A UXVT U diagz717 70 07 OVT For any 1 With 0 g 1 g r A ELI aiuiviT HA7M2 7 min HA7 3H2 7 n1 rankBgV Proof Let L UA 7 A1VT then X U diagz717 a n1 7Up 7 diagz717 a 07 OV U diag07 707039V17 7Up consequently HA 7 AVHQ HXVHQ n1 1 8L3 Theorem Eckart Young1936 Let A UXVT U diagz717 70 07 OVT For any 1 With 0 g 1 g r A ELI auvT iiA7AVii2 min iiA7Bii27m rankBgV Proof Suppose there is some B with rankB 1 such that HA 7 Bllg lt HA 7 AVllg n1 Then there exists an n 7 1 dimensional subspace W 6 IR such that w E W Bw 0 Then llAWll2 llA Blwllz S llA BllzllWllz S 0u1llWll2 Thus W is a n 7 1 dimensional subspace where llAwll lt awrlllwll But there is a 1 l 1 dimensional subspace where llAwll 2 ay llwll namely the space spanned by the first 1 1 right singular vector of A Since the sum of the dimensions of these two spaces exceeds n there must be a nonzero vector lying in both and this is a contraction El 9i3 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 22 121 Overview 0 Algorithms for Modern Massive Data Sets MMDS gt Explore algorithms for modeling and analyzing massive high dimensional and nonlinear structured data gt Bring together computer scientists computational and applied mathematicians and practitioners 9 Tools numerical linear algebra kernel methods multilinear algebra randomized algorithms optimization differential geometry geometry and topology etc o Organized by Gene Golub et al in 2006 and 2008 0 Sponsored by NSF Stanford and Yahoo 9 Slides available at mmdsstanfordedu Topics 0 Low rank matrix approximation theory sampling algorithms 0 Nearest neighbor algorithms and approximation 0 Spectral graph theory and applications 0 Non negative matrix factorization o Kernel methods 0 Algebraic topology and analysis of high dimensional data a Higher order statistics tensors and approximations 3 21 Matrix factorization and applications 9 Treat each data point as a vector gt 2D image A 1D vector of intensity values gt 3D models A 1D vector of 3D coordinates gt Document A 1D vector of term frequency Massive data set High dimensional data Find a low 439 ion using eirrender nmpn itinn See 039 Leary39s slides 421 Low rank matrix approximation 0 SVD is great but computationally expensive for large scale problems 0 Efficient randomized algorithms for low rank approximation with guaranteed error bounds o CX algorithm Drineas and Karma FOCS 01 randomly pick k columns 4an z kaXan o CUR algorithm Drineas and Kannan SODA 03 randomly pick 6 columns and r rows Aan CmXcUchRan o Element wise sampling Achiloptas and McSherry STOC 01 7with probability p0 AijPij Amx N Smxn 5 7 O 7with probability 17 po 0 See Kannan39s slides and Drineas39 slides 521 Approximating matrix multiplication 0 Given an m by n matrix A and an n by p matrix B AB AWE ERMXP where A0 are the i th column of A and B is the i th row of B 9 Each term is a rank one matrix a Random sampling algorithm gt fix a set of probabilities p i 1p 7 n summing up to 1 gt for t 1 up to s set jt i where Prjt i p pick 5 terms of the sum with replacement wrt pi gt approximate AB by the sum of 5 terms after scaling 1 S 1 AB 3 E f AW 300 t1 pk SIR71X 621 Approximating matrix multiplication cont d o In matrix notation AanBnsz meRsXp o Create C and R iid trials with replacement a For t 1 up to 5 pick a column A and a row But with probability llAill2llBill2 P41 1 n t 21llA ll2llBill2 0 Include Aug5 2 as a column of C and BOospjtf2 as a row of R 721 Approximating matrix multiplication cont d 0 000 O O O The input matrices are given in sparse unordered representation eg their non zero entries are presented as triples ijAJ in any order The expectation of CR element wise is AB The nonuniform sampling minimizes the variance of this estimator Easy to implement the sampling algorithm in two phases If the matrices are dense the algorithm runs in Osmp time instead of Onmp time Require only Osm l 5p memory space Does not tamper with the sparsity of the matrices For the above algorithm 1 7 lt 7 EllAB CR 2F 7 llAllFllBllF With probability at least 17 6 O0g15 AB 7 CR lt A B ll 2F 7 ll llFll llF 821 Special case B AT 0 If B AT then the sampling probabilities are llA ll llAmllg 2quot1 llAmllg llAllZF 0 Also R CT and the error bounds are Prpicking i 1 EllAAT CCTllZf S llAllZF 9 Improvement for the spectral norm bound for the special case 4xlogs ll AT CCTllz S TllAllFll lllz o The sampling procedure is slight different 5 columnsrows are kept in expectation ie column 139 is picked with probability sllAmllE llAllZF Prpicking i min17 921 Approximating SVD in On time o The complexity of computing SVD of a m by n matrix A is Ominmn27 mzn eg using Golub Kahan algorithm 0 The top few singular vectorsvalues can be approximated faster using LanczosArnoldi methods a Let Ak be rank k approximation of A 9 AR is a matrix of rank k such that HA 7 AR rank k matrices 0 Approximate SVD in linear time Om n gt sample c columns from A and rescale to form the m x c matrix C gt compute the m x k matrix Hk of the top k left singular vectors C 2 is minimized over all 0 Structural theorem For any probabilities and number of columns llAi HkHE A F iiAeAk Eh NEHAAT e ccTiiF o Algorithmic theorem If p HAMME and c 4n2k62 then llA HkaTA 3F S HA 7 Ak 5 6HAM 10 21 Example of randomized SVD 0 Compute the top k left singular vectors of matrix C and restore them in the 512 by k matrix Hk Original matrix A After sampling columns C Ll lir u ill U Potential problems with SVD 9 Structure in the data is not respected by mathematical operations on the data gt Reification maximum variance directions gt Interpretability what does a linear combination of 6000 genes mean gt Sparsity destroyed by orthogonalization gt Nonnegativity is a convex but not linear algebraic notion 0 Does there exist better low rank matrix approximation gt better structure properties for certain applications gt better at respecting relevant structure gt better for lnterpretability and informing intuition 12 21 CX matrix decomposition a Goal Find Aan z CngXan so that A 7 CX small in some norm 0 One way to approach this is min HA 7 CXlF HA 7 CCTAMF XEIREX where Cl is the pseudoinverse of C 9 SVD of A Ak UkaVkT where Ak is of m x n Uk of m x k X of kxkand VkT okan o Subspace sampling Vk is an orthogonal matrix containing the top k left singular vectors of A o The columns of Vk are orthonormal vectors but the rows of Vk denoted by Vk are not orthonormal vectors a Subspace sampling in OVDkA time Vi 1 2 77 quot7 Pi llleill k 13 21 Relative error CX a Relative error CX decomposition gt compute the probabilities p gt for each i 127 l l l n pick the ith column ofA with probability min1cp let C be the matrix containing the sampled columns V Theorem For any k let Ak be the best rank k approximation to A In OVDkA we can compute p such that ifc Ok log kez then With probability at least 17 6 minXdchnllAiCXllF llAiCClAllF 1 ElllA 7 AkllF lA ll 14 21 CU R decomposition 0 0 Goal Find Aan z mXCchRXn so that HA 7 CURll is small in some norm Why After making two passes over A one can compute provany C U and R and store them sketch instead of A of Om n vs Omn SVD of Amxr VPTXn Exact computation of the SVD takes Ominmn27 mzn and the top k leftright singular vectorsvalues can be computed from LanczosArnoldi methods nprpo where ranA p Rank k approximation Ak UkakakT where kak is a diagonal matrix with top k singular values of A Note that the columns of Uk are linear combinations of all columns of A and the rows of VkT are linear combinations of all rows of A 15 21 The CU R decomposition 0 Sampling columns for C use CX algorithm to sample columns of A gt C has 5 columns in expectation C UC C v m x E m x p p x p p x E V U5 is the orthogonal matrix containing the left singular vectors of C and p is the rank 0 Let Uci denote the i th row of U 0 Sampling rows for R gt subspace sampling in 0c2m time with probability q UC 39 2 qi ll Ill2 p Y Vi12m gt R has 7 rows in expectation 0 Compute U gt Let W be the intersection of C and R gt Let U be a rescaled pseudoinverse of W 16 21 The CUR decomposition cont d 0 Put together A z CUR r A x C 0 W JD R mxn me 7x2 Fxn where D is a diagonal rescaling matrix and U DWTD Theorem Given C in 062m time one can compute q such that HA7 CURHF 1 We CCTAmF holds With probability at least 17 6 if r Oc log 682 rows 17 21 Relative error CUR Theorem For any k it takes OVDkA time to construct C U and R such that llA CURllF S 1 ElllA 7 UkaVlelF 1 ElllA 7 AkllF holds With probability at least 1 7 6 by picking Oklog klog1662 columns and Oklog2klog1666 rows Where OVDkA is the time to compute the top k top leftright singular values 6 Applications Genomic microarray data timeresolved fMRI data sequence and mutational data hyperspectral color cancer data a For small k in OVDkA time we can construct C U and R St HA 7 CURllF 1 l 0001llA 7 Akllp by typically at most k l 5 columns and at most k 5 rows 18 21 Element wise sampling 0 Main idea to approximate matrix A keep a few elements of the matrix instead of sampling rows or columns and zero out the remaining elements compute a rank k approximation to this sparse matrix using Lanczos methods o Create the matrix 5 from A such that s AUp0 with probability po 0 0 gt V with probability 17 po o It can be shown that HA 7 Sllg is bounded and the singular values of 5 and A are close 0 Under additional assumptions the top k left singular vectors of 5 span a subspace that is close to the subspace spanned by the top k left singular vectors of A 19 21 Element wise sampling cont d o Approximating singular values fast gt zero out a large number of elements of A and scale the remaining ones appropriately gt compute the singular values of the resulting sparse matrix using iterative methods 2 gt good choice for pg E where 5 denotes the expected number of elements that we seek to keep in S gt Note each element is kept or discarded independently of the others 0 Similar ideas that has been used to gt explain the success of Latent Semantic Indexing gt design recommendation systems gt speed up kernel computation 20 21 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang M39 e i a TYo quot 399 L E 11 g g a E r 3 I gjt 2 quot Q s a v39 in I868 Lecture 16 117 Oveniew 0 Condition 0 Perturbation 0 Stability Conditioning of least squares problems rb x yAxPb raMA 0 Assume A is full rank and consider 2 norm for analysis Given A E Cm of full rank m 2 n b E Cm Find x E C such that Hb Axll is minimized o The solution x and the corresponding y Ax that is closest to b in ranA are given by x Alb y Pb where Al AHA1AH E C gtltm is the pseudoinverse of A and P AAl E Eme is the orthogonal projector onto ranA 317 Conditioning of least squares problems cont d rb x yAxPb raMA 0 Recall for rectangular matrix A 01 HM llAllllAlll 0 n 0 Another measure of the closeness of the fit AM 6 cos llbll 417 Conditioning of least squares problems cont d b rb x yAxPb ranA o The third is a measure of how much falls short of its maximum possible value given and n llAllllel llAllllel llyll llAXll 0 These parameters lie in the ranges 1 HAltoo0 6 7t21 n HA 517 Conditioning of least squares problems cont d Theorem Let b E Cm and A E CmX be full rank The least squares has the following 2 norm relative condition numbers describing the sensitivities of y and x to perturbations in b and A y X b 4 HM c059 ncos9 HA HA tanQ A 53 Auiw The results in the first row are exact being attained for certain perturbations 6b and the results in the second row are upper bounds Conditioning of least squares problems cont d 0 Let A UXVH where X is an m x n diagonal matrix 0 Since perturbations are measured in 2 norm their sizes are unaffected by a unitary change of basis so the perturbation behavior of A is the same as that of X a Without loss of generality we can deal with X directly o In the following analysis we assume A X and write where A1 is n x n and diagonal and the rest of A is zero Conditioning of least squares problems cont d o The orthogonal projection of b onto ranA is now 7 3901 b 7 lbzl where b1 contains the first n entries of b then the projection y Pb is 2 3901 y 0 0 To find the corresponding x we can write Ax y as A1 x 7 b1 0 7 0 which implies x Aflbl o It follows that the orthogonal projector and pseudoinverse are the block 2 gtlt 2 and 1X 2 matrices P AfA1 0 Sensitivity of y to perturbations in b o The relationship between b and y is linear y Pb o The Jacobian of this mapping is P itself with 1 o The condition number of y with respect to perturbations in b is M l llPll 1 H X 7 mac 7 llfxllllxll y llyllllbll 050 o The condition number is realized ie the supremum is attained for perturbations 6b with llP6bll ll6bll which occurs when 6b is zero except in the first n entries Sensitivity of x to perturbations in b o The relationship between b and x is linear x Alb with Jacobian AJ 0 The condition number of x with respect to perturbations in b is llAlll lbl lYll 1 llAl W m b lAHi HAW 07 0 MW ll llyllllxll COS 7 71605 a The condition number is realized by perturbations 6b satisfying llAl6bll llAlllll6bll ll6bllan which occurs when 6b is zero except in the n th entry or perhaps also in other entries if A has more than one singular value equal to an Tilting the range of A o The analysis of perturbations in A is a nonlinear problem 0 Observe that the perturbations in A affect the last squares problem in two ways they distort the mapping of Cm onto ranA and they alter ranA itself Consider the slight change in ranA as small tiltings of this space 0 What is the maximum angle of tilt 604 that can be imparted by a small perturbation of 6A 0 o The image under A of the unit n sphere is a hyperellipse that lies flat in ranA 0 To change ranA as efficiently as possible we grasp a point p Av on the hyperellipse hence 1 and nudge it in a direction 6p orthogonal to ranA o A matrix perturbation that achieves this most efficiently is 6A 6pvH which gives 6Av 6p with ll6All ll pll Tilting the range of A cont d a To obtain the maximum tilt with a given we should take p to be as close to the origin as possible a That is p anun where an is the smallest singular value of A and un is the corresponding left singular vector 0 n vector 001 and 6A is a perturbation of the entries of A below the diagonal in this column 0 The perturbation tilts ranA by the angle 604 given by tanwa ll6pll0n 0 Since ll6pll ll6All and 604 g tan6a we have 0 Let A A1 as before p is equal to the last column of A v is the NW NW 604 lt 7 7H A 7 an llAll l with equality attained for choices 6A of the kind described above Sensitivity of y to perturbations in A o y is the orthogonal projection of b onto ranA it is determined by b and ranA 0 Study the effect on y of tilting ranA by some angle 604 a Can look at this from the geometric perspective when imaging fixing b and watching y vary as ranA is tiled o No matter how ranA is tiled the vector y E ranA must always be orthogonal to y b 0 That is the line b y must lie at right angles to the line 0 y o In other words as ranA is adjusted y moves along the sphere of radius centered at the point b2 ranA 1317 Sensitivity of y to perturbations in A cont d ranA o Tilting ranA in the plane 0 b y by an angle 604 changes the angle 26 at the central point b2 by 2604 o The corresponding perturbation 6y is the base of an isosceles triangle with central angle 2604 and edge length thus ll5yll llbll sin5a o For arbitrary perturbations by an angle 604 we have ll5yll S llbll sin5a S llbllM 1417 Sensitivity of y to perturbations in A cont d ranA o For arbitrary perturbations by an angle 604 we have H5yii S W sin5a S HbHM 0 Using the previous results on 6 and 604 MM MM 6 cos 1 H 0 We have HM S ii5Aii AHyiiHAH C059 and HM HMH lt MA in HAH 059 1517 Sensitivity of x to perturbations in A o A perturbation of 6A can be split into two parts one part 6A1 in the first n rows and another part 6A2 in the remaining 77 n rows 7 6A1 i 6A1 0 6A iaAzi i o i M o A perturbation 6A1 changes the mapping of A in its range but not ranA itself or y o It perturb A1 by 6A1 in x Aflbl without changing b1 and the condition number is H6XH H6A1H lt H6xH HAH 0 A perturbation 6A2 tilts ranA without changing the mapping of A within this space a This corresponds to perturbing b1 in x Aflbl without changing A1 and the condition number is H5XH ii5b1ii lt HA1 MA HXH Hblii 77A1X 7 Sensitivity of x to perturbations in A cont d 0 Need to relate 6b1 and 6A2 a The vector b1 is y expressed in the coordinates of ranA a Thus the only changes in y that are realized as changes in b1 are those that lie parallel to ranA orthogonal changes have no effect a If ranA is tilted by an angle 604 in the plane 0 7 b 7 y the resulting perturbation 6y lies not parallel to ranA but at an angle 7r2 70 9 Thus the changes in b1 satisfies ll6b1ll sin0ll6yll and ll5b1ll S llbll5asin0 0 Since llblll llbllcos0 we have ll5b1ll 604 tan0 lbll thus l6xll ll6b1llHA WWW llel llblll 7 n CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 15 129 Overview 0 Condition 9 Perturbation 0 Stability 229 Conditioning and condition numbers 0 O 0 Conditioning pertain to the perturbation behavior of a mathematical problem Stability pertain to the perturbation of an algorithm used to solve the problem on a computer We view a problem as a function f X a Y from a normed vector space X of data to a normed vector space Y of solutions f is usually nonlinear even in linear algebra but most of them it is at least continuous A well conditioned problem all small perturbations of x lead to only small changes in fx An ill conditioned problem some small perturbation of x leads to a large change in fx Absolute condition number 0 Let 6x denote a small perturbation of x and write 6f fx 6x 7 fx 0 Absolute condition number A A ll5fll a x IIm sup 7 5H0H5xllS6 ll5xll o For most problems it can be interpreted as a supremum over all infinitesimal perturbations 6x and can write simply as R sup WM 6x ll5xll o If f is differentiable we can evaluate the condition number by means of the derivative of f 0 Let Jx be the matrix whose ij entry is the partial derivative a axj evaluated at x known as the Jacobian of f at x o In the limit ll6xll a 0 the absolute condition number becomes 399 llJXll where represents the norm of Jx 429 Relative condition number 0 Relative condition number m sup ltll5fll ll5xllgt 5H0H5xllg6 llflxlll llxll 0 Again assume 6x and 6f are infinitesimal mammal HM 5x Hfxll M o If f is differentiable we can express this in terms of Jacobian a HJXH llfXllllxll a Relative condition number is more important in numerical analysis o A problem is well conditioned if a is small eg 1 10 102 and ill conditioned if a is large eg 106 1016 529 Computing eigenvalues of a non symmetric matrix 0 Consider the two matrices 1 1000 1 1000 Al0 1l3l0001 ll whose eigenvalues are 11 and 07 2 respectively 0 On the other hand if a matrix A is symmetric more generally if it is normal then its eigenvalues are well conditioned o It can be shown that if and l 6A are corresponding eigenvalues of A and A l 6A then l6l ll6All2 with equality if 6A is a multiple of the identity 0 In that case R 1 if perturbations are measured in the 2 norm and the relative condition number is a 629 Condition of matrix vector multiplication 3 Consider A E Cm determine a condition number corresponding to perturbations of x v Z W ll6xllgt llA6Xll llAXll su 7 WW llxll 6x ll5xll llxll 6x thatis llel llAXll which is an exact formula for re dependent on both A and x re MAN 9 If A happens to be square and non singular we can use the fact that llA lll to loosen the bound 1 S llAllllA lll I LOtA A717 047llxllA 1 H HH H W H H 729 Condition of matrix vector multiplication cont d o For certain choices of x we have 04 1 and consequently F llAllllA lll o If we use 2 norm this will occur when x is a multiple of a minimal right singular vector of A c If A E CmX with m 2 n has full rank the above equations hold with A 1 replaced by the pseudo inverse Al AHA 1AH E Cnxm 0 Given A compute A lb from input b 829 Condition of matrix vector multiplication cont d Theorem Let A E Cm be nonsingular and consider the equation Ax b The problem of computing b given x has condition number Mi 1 F HAHi S HAHHA H W With respect to perturbation ofx The problem of computing x given b has condition number 1 W 1 F HA H S HAHHA H M With respect to perturbation of b If we use 2 norm then the equality holds ifx is a multiple ofa right singular vector ofA corresponding to the minimal singular value am Likewise the equality hold in the second equation ifb is a multiple ofa left singular vector ofA corresponding to the maximal singular value 01 929 Condition number of a matrix a The product llAllllA lll is the condition number of A relative to the norm denoted by MA HAllAllllA 1ll The condition number is attached to a matrix not a problem If MA is small A is said to be well conditioned if MA is large A is ill conditioned O 0 If A is singular it is customary to write MA 00 o If then 01 and llA lll 1am and thus 7 2 MA i am in the 2 norm which is the formula for computing 2 norm condition numbers of matrices o The ratio TlTm can be interpreted as the eccentricity of the hyperellipse that is the image of the unit sphere of Cm under A 10 29 Condition number of a matrix cont d 0 For a rectangular matrix A 6 CW of full rank m 2 n the condition number is defined in terms of pseudo inverse MA HAHN4W since A is motivated by least squares problem this definition is most useful with 2 norm where we have 01 A 0n 11 29 Condition number of a system of equations a Let b be fixed and consider the behavior of the problem A l gt x A lb when A is perturbed by infinitesimal 6A then A 6Ax 6x b 0 Using Ax b and dropping the term 6A6x we obtain 6Ax l A6x 0 that is 6x 7A 16Ax o It implies ll6xll or equivalently ll5xll ll5All g A 1 A nA M W H llll ll o The equality in this bound will hold when 6A is such that llA 15AXll llAilllllMllllel and it can be shown that for any A and b and norm such perturbations of 6A exists 12 29 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 7 114 Overview 0 Principal component analysis 0 Applications 214 Karhunen Loeve Transform o Transform data into a new set of variables the principal components PC gt which are uncorrelated and ordered gt so that the first few retain most of the variation 0 Consider the first PC ullx u1 arg HmHaX varuTx arg HmHaX EuTCu U 1 u 1 0 Solving constrained optimization with Lagrange multiplier uTCu 7 uTu 71 0 Take derivative with respect to u Cuiku0 Karhunen Loeve Transform cont d 0 To maximize varuT uTCu uTu uTu so 11 is the eigenvector corresponding to the largest eigenvalue of C o In general the k th PC of x is ulx and varux lt where lt is the k th largest eigenvalue 0 The second PC u2x maximizes u2Cu2 subject to being uncorrelated with u1x ie covu1Txu2Tx O covu1Tx7 UQTX ullCUQ UZTCul ugklul Alugul Alulluz 0 Solving constrained optimization problem with one of these constraints UQTCUQ 7 u2Tu2 7 1 7 gtu2Tu1 where x j are Lagrange multipliers 414 Karhunen Loeve Transform cont d 0 Take derivative with respect to uz Cuz 7 u2 7 gtu1 O and multiply on the left by 11 ullCUQ 7 Aulluz 7 gtu1Tu1 O 0 Consequently j O and Cuz Auz 0 Assuming that C does not have repeated eigenvalues has to be the second largest eigenvalue to satisfy all the constraints 514 Karhunen Loeve Transform and SVD a Let X UXVT the projected data onto the subspace spanned by the first d singular vectors Y UdTXXdVdT 0 The matrix U of singular vectors ofX is equivalently the matrix U of eigenvectors of the covariance matrix C c XXT U2T UT 0 The eigenvectors with the largest eigenvalues correspond to the dimensions that have the strongest correlation in the data set 0 The Rayleigh quotient T TC quot HA pl uTu uTu o The eigenvectors u are the critical points of the Raleigh quotient and their eigenvalues A are the stationary values of pu 614 Rayleigh quotient and eigenvectors 0 To find the critical point of the function p x xTAx s t llxllg 1 o The constrained optimization problem Lx7 xTAx 7 xTx 71 where is a Lagrange multiplier 0 Take derivative with respect to x 2Ax 7 2x O Ax x 0 Thus T T x Ax x x x 7 pl xTx xTx o The eigenvectors x17 xn of A are critical points of the Raleigh quotient and their corresponding eigenvalues A1 7 n are the stationary values of px 714 Derivation using covariance matrix 0 Let the transform matrix be P such that Y PTX with the constraint Cy is a diagonal matrix and P 1 P Cy EWT PTEXXTP PTCXP 0 Therefore PCy PPTCXP CXP 0 Note P p17 7pd and Cy diag17 d 1P172P277de CXP17CXP277CXPd ie p CXpi and p is an eigenvector of the covariance matrix CX of X 814 SVD and PCA 0 Recall rn X Zziuii Z uTx7 and ulTuj 6U i1 0 Center the data points X x1 7 i x 7 a Covariance matrix CXXT 0 Singular value decomposition allows us to write X as XUVTu1 un 5 l Uni luzl 914 SVD and PCA cont d C lXXT UXVTUXVTT UXVTVXUT luzzuT 0 Therefore 2 039 CU 4U n 0 So the columns U are eigenvectors and the eigenvalues arejust U i 1014 Properties and limitations of PCA o Theoretically the optimal scheme in terms of I2 norm Involves only rotation and scaling Unsupervised learning 0 Unique solution a Assumption gt data can be modeled linearly gt data can be modeled with mean and covariance ie Gaussian distribution gt the large variances have important dynamics gt 2 norm 0 Nonlinear PCA mixture of PCA probabilistic PCA mixture of probabilistic PCA factor analysis mixture of factor analyzers sparse PCA independent component analysis Fisher linear discriminant etc 1114 Eigenface Turk and Pentland 1991 0 Collect a set of face images 0 Normalize for contrast scale and orientation 0 Apply PCA to compute the first C eigenvectors dubbed as Eigenface that best accounts for data variance ie facial structure 0 Compute the distance between the projected points for face recognition or detection ii ill 1214 Appearance manifolds urase and Nayar 95 0 The image variation of an object under different pose or is assumed to lie on a manifold o For each object collect images under different pose 0 Construct a universal eigenspace from all the images 0 For the set of images of of the same object find the smoothly varying manifold in eigenspace ie parametric eigenspace o The manifolds of two objects may intersect the intersection corresponds to poses of the two objects for which their images are very similar in appearance 1314 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 1 116 Course information o CSE 275 Matrix Computation a Meeting time TuesdayThursday 3pm 415pm 0 Office hours Tuesday 430pm 530pm 0 Office Science and Engineering Building 258 a Lab session Science and Engineering 138 Wednesday 130pm 420pm 9 Lecture notes and schedule httpfacultyucmercededumhyangcoursecse275 a Email mhyangQucmercededu 216 What is this course about a Cover algorithms and techniques for matrix computation and analysis 0 Focus on computational algorithms rather than rigorous mathematical proofs or numerical techniques 0 Analyze data represented as vectors and matrices 0 Demonstrate their use in vision and learning applications 0 Solving Ax b7 AERmXquot erR beIR X1 b1 Aa1anx 7b Xn bm b Ella Xjaj 316 Examples 0 0 Given a set of imagesdocuments how do you find a compact representation Given a set of inputoutput pairs how do you learn their mapping function Given an image sequence how do you model the underlying dynamics temporal correlation Given a linear system a set of linear constraints how do you solve it Given a set of data how do you determine metric to compute their similarities Given a set of data how do you visualize their relationship Given a set of data how do you cluster them Given a large data set how do you develop efficient algorithms to analyze data Topics 0 Fundamentals Introduction vector space vector and matrix norms orthogonalization covariance and Gram matrices multivariate Gaussian 0 Matrix decomposition LU decomposition QR decomposition Cholesky decomposition Schur decomposition eigen decomposition singular value decomposition factorization a System of equations least squares linear programming linear dynamical system stochastic matrix random walk Statistical models principal component analysis factor analysis O Eigenvalue problems Lanczos method power method 0 Matrix inverse Sherman Morrison Woodbury formula Kailath variant pseudo inverse approximation 0 Topics cont d o Sequential update matrix update and downdate subspace update 9 Matrix approximation sparse matrix approximation low rank approximation Nystrom method 0 Emerging topics non negative matrix factorization compressive sensing randomized algorithms large scale matrix computation 0 Applications 616 Textbook o No textbook 0 All the lectures and papers are available on the course website 0 Reference gt Matrix Computation by Gene Golub and Charles Van Loan gt Matrix Algebra From a Statistician39s Perspective by David Harville gt Matrix Analysis and Applied Linear Algebra by Carl Meyer gt The Matrix Cookbook by Kaare Petersen and Michael Pedersen Prerequisites a Basic knowledge in linear algebra 0 Basic knowledge in probability and statistics 0 Proficiency in some programming language MATLAB and others eg C 816 Requirements 0 Literature review and critique read conferencejournal papers and submit critiques weekly 0 Term project work on term project individually or in two member groups 9 Oral presentations make one presentation on project overview and progress as well as one final presentation 0 Final project report submit one project report and code with demos o Formats and details regarding all the abovementioned items will be available on the course web site Grading 0 Grading gt 20 Critiques 10 Homework eg derivations 10 Midterm presentation 20 Midterm project report 10 Final presentation 30 Term project VVVVV 9 You will get excellent grade if you work hard 1016 What you could expect from this course and me 0 Algorithms and techniques in matrix computation and their applications gt I will discuss the proscons of algorithms and demosntrate their merits with their applications to vision and learning problems 9 Advice on term project gt Everyone needs to discuss your project proposal with me gt You are encouraged to contact me for questions or suggestions regarding your term project 1116 What I expect from you 0 Ask questions 9 Send me the typosmistakes on the lecture notes I am only human 0 Think critically about papers 0 Discuss project ideas with me 9 Work hard on term project 1216 Term project a Do not simply re implement other39s works gt Although it is a good way to learn a subject it is not sufficient to simply reproduce results of others 0 Think critically and surpass the state of the art gt What is the fun ofjust repeating other39s works gt Push the envelope o Numerous ways to impose existing algorithms gt robustness efficiency accuracy gt simpler approaches with better results 0 Publish your research findings 0 Make the code and data sets available 0 Suggestions gt Discuss with your ideas with me and other classmates gt Work in a small group at most 2 people gt Read papers and write critiques weekly gt Tinkering with your ideas with experiments gt Re search Some suggested projects 0 Human detection gt how to locate humans and identify body parts as well as pose in 2D or 3D using what features single or multiple views 0 Human tracking gt how to track human body parts in 2D or 3D single or multiple views a Visual tracking gt how to track articulated objects eg arms fingers how to handle occlusions how to adapt models how to recover from failure how to combine detector with trackers VVVV 0 Texture synthesis gt how to synthesize videos with multiple motion textures eg fire and water gt how to use the learned dynamics to classify visual scenes 1416 Some suggested projects cont d a How to learn compact representation for object detection 9 Image features gt how to find reliable features for application X gt how to extend existing methods into spatio temporal domains 0 3D reconstruction gt how to infer 3d structure from moving objects under lighting variations 0 Sky is the limit 1516 CSE 275 Matrix Computation Ming Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced CA 95344 httpfacutyucmercededumhyang Y Q v T Op t e if 397 quot2 339 ul 3 as 17 we Et 2 quotv xA39 339 h a 1868 an Lecture 10 119 Overview 0 Gaussian elimination o LU decomposition 0 Solving linear systems 0 Cholesky decomposition 219 Quadratic form 0 A function f IR a R has quadratic form fx xTAx Z Z Ailxix i1 j1 o Often assume A is symmetric xTAx xTA l AT2x where A l AT2 is called the symmetric part of A llell2 XTBTBX did x 7 MTC 1X 7 u 7 2 2 T 7 a b2 fxyiax lbxylcy7 fxix Mx7M7b2 C o Uniqueness If xTAx xTBx for x 6 IR and A AT B BT then 0 xlfx a is called a quadratic surface a xlfx a is called a quadratic region 319 Positive definite 0 Recall a matrix A 6 RM is positive definite if xTAx gt O for all nonzero x E R 0 Consider a 2 by 2 symmetric case if A 311 312 321 322 is positive definite then x 17 0T xTAx all gt O x 01T xTAx an gt O x 11T xTAx 311 2312 322 gt O x 171T xTAx 311 7 2312 322 gt 0 0 The last two equations imply llalgll S 311 3222 and the largest entry in A is on the diagonal and that is positive a A symmetric positive definite matrix has a weighty diagonal Matrix decomposition 0 LU decomposition A LU applicable to square matrix A Cholesky decomposition A UTU where U is upper triangular with positive diagonal entries applicable to square symmetric positive definite matrix A QR decomposition A QR where Q is an m by m orthogonal matrix and R is an m by n upper triangular matrix applicable to m by n matrix A Eigendecomposition A QDQ 1 where D is a diagonal matrix formed from the eigenvalues of A and columns of Q are the corresponding eigenvectors of A applicable to square matrix A Schur decomposition A QTQT where Q is an orthogonal matrix and T is a block upper triangular matrix applicable to square matrix Singular value decomposition A UX VT where X is a non negative diagonal matrix of singular values and the columns of U are eigenvectors of AAT and V are eigenvectors of ATA Gaussian elimination o For the linear system 3X1 5X2 9 6X1 7X2 4 0 Multiply the first equation by 2 and subtract it from the second equation we get 3X1 5X2 9 7 3x2 714 which is the n 2 Gaussian elimination o In general form we want to factorize A into a lower triangular and upper triangular matrices A L 3 5 i 6 7 7 2 1 9 The solution of Ax b is found by two step triangular solve process Lyb7 UxyAxLUxLybyL 1b7 xU 1yr 1 l l3 ll 619 Gauss transformation 0 Need a zeroing process for Gaussian elimination eg for m 2 if X1 7 O and 739 XQXl 1 0 X1 X1 7739 1 X2 0 o More generally for x 6 IR with X 7 0 let Xi Tn T T 7 k1n Xk T 00Tk1 W k where rk is the pivot and define Mk I 7 Tel then 1 0 0 0 X1 x1 7 0 1 0 0 Xk Xk ka 0 in 1 0 xk 0 O 7T O 1 Xn 0 719 Gauss transformation cont d a Mk I 7 quotre is a Gauss transformation 0 The first k components of T E Rm are zero 0 The Gaussian transformation is unit lower triangular o The vector 739 is called the Gauss vector and the components of Tk 1 n are called multipliers 9 Assume A E IRr39X Gaussian transformations M17 Mn1 can usually be found such that Mn11M2M1A U is upper triangular eg 1 4 7 0 1 0 0 A 258 M1I7 2 100 7210 3 6 10 3 73 0 1 819 Gauss transformation cont d 0 Upper triangularizing 1 4 7 M1A 0 73 76 0 76 711 likewise 1 0 0 1 4 7 M2 0 1 0 0 3 6 0 72 1 0 0 1 a From this we have a matrix AMA Mk11M1A that is upper triangular in columns 1 to 71 o The multipliers in Mk are based on Ak 1k 1 n7 k In particular we need A5324 7 O to proceed o The entry Akk must be checked to avoid a zero divide These quantities are referred to as the pivots and their relative magnitude turns out to be critically important 7 919 LU factorization 0 0 With Gauss transforms M17 Mn1 such that Mn1M1A U is upper triangular It is easy to verify that if Mk I 7 Twel then its inverse MI1 I l T008 More importantly A LU where LMf1Mnjl UMn71MlA can be uniquely factorized It is clear that L is a unit lower triangular matrix as each MI1 is unit lower triangular Used in solving n by n linear questions with back substitutions AxLUxLybyL 1b xU 1y 1019 Solving linear system 0 Given AJMH L l l l l OU39Jgt OOON 1 1 10 0 1 4 7 LM1M1 2 1 0 and UM2M1A 0 73 76 3 21 0 0 1 o If b 111 then y 1717 OT solves Ly b and x 71371370T solves Ux y a Note L is lower triangular with unit diagonal 14 A25 36 Pivoting 0 Consider LU factorization of A 000011 1 0H00001 1 A 1 i 7 10000 1 0 79999 1 LU with relatively small pivots 0 Can get around by interchanging rows with a permutation matrix 0 1 P 7 l1 ol mi 1 17 1 0 1 1 00001 1 00001 1 0 09999 and the triangular factors are composed of acceptably small entries then 1219 Permutation matrix 0 Permutation matrix an identity matrix with rows re ordered 0 0 0 1 i 1 0 0 0 7 0 0 1 0 i0 1 0 0i 0 P is orthogonal and F 1 PT 9 PA is the row permuted version of A and AP is the column permuted version of A 1319 Partial pivoting e To get the smallest possible multipliers need to have All to be the largest entry in the first column 3 17 10 O O 1 6 18 712 A 2 4 72 7E 0107E1A 2 4 72 6 18 712 1 O O 3 17 10 and the Gauss transformation 1 O O 6 18 712 M1 1 0 gtM1E1A 0 2 2 712 0 1 O 8 16 0 To get the smallest possible multiplier in M2 we need to swap rows 2 and 3 1419 Partial pivoting cont d 1 0 0 1 0 0 E2001and M2O 1 0 0 1 0 0 14 1 then 6 18 712 M2E2M1E1A 0 8 16 0 0 16 o The particular row interchange strategy is called partial pivoting Mn11En11M1E1A U U where P En11E1 also known as LU decomposition with partial pivoting or LUP decomposition o Solve PAx LUx Pb ie first solve Ly Pb and then Ux y 9 See also complete pivoting and numerical stability issues 1519 LDU factorization o Asymmetry in LU factorization as the lower factor has 139s on its diagonal a Can be remedied by factorizing the diagonal entries U H Um U12 U1 UH 0 0 1 Ti U 0 HM HQ 0 HM 0 1 0 0 um 0 0 um 0 0 1 o A LU can be scaled so that L and U are unit triangular ie all the diagonal entries of L and U are one so that A LDU where D is a diagonal matrix 0 When A is symmetric A LDLT 1619
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'